JP2023004271A

JP2023004271A - Meta-automated machine learning with improved multi-armed bandit algorithm for selecting and tuning machine learning algorithm

Info

Publication number: JP2023004271A
Application number: JP2021105867A
Authority: JP
Inventors: ミーシャシュミット，; Schmidt Mischa; ユリアガシュティンガー，; Gashuthingaa Yulia
Original assignee: NEC Laboratories Europe GmbH
Current assignee: NEC Laboratories Europe GmbH
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2023-01-17

Abstract

To provide a method for automatically selecting a machine learning algorithm and tuning hyper parameters of the machine learning algorithm.SOLUTION: The method involves receiving a dataset and a machine learning task from a user. Executions of multiple instantiations of different automated machine learning frameworks for a machine learning task are each controlled as a separate arm in terms of available computational resources and time budgets, and thereby multiple machine learning models are trained during execution by the separate arms and performance scores of the trained multiple models are calculated. One or more of the trained multiple models are selected for the machine learning task based on the performance scores.SELECTED DRAWING: Figure 1

Description

[0001]本発明は、機械学習（ＭＬ）に関し、詳細には、ＭＬアルゴリズムを選択及びチューニングするための多腕バンディットアルゴリズムを使用するメタ自動ＭＬの方法及びシステムに関する。 [0001] The present invention relates to machine learning (ML), and in particular to methods and systems for meta-automatic ML using multi-armed bandit algorithms for selecting and tuning ML algorithms.

background

[0002]ＭＬを適用する場合には、いくつかの高レベルの決定が行われなければならない。たとえば、学習アルゴリズム、又はベース学習器は、多数の異なる利用可能な学習アルゴリズムから選択される必要がある。各学習アルゴリズムには、所与のデータセットに対するアプリケーション特化誤差基準に関してアルゴリズムのパフォーマンスを最大にするように最適化できるハイパーパラメータの別のセットが付随している。また、ベース学習器のパフォーマンスを改善するために、ハイパーパラメータのセットをそれぞれが含む異なる特徴事前処理アルゴリズム及び特徴セレクション技法をＭＬパイプラインに組み合わせることもできる。それに応じて、異なるハイパーパラメータがチューニングされる必要があり、異なるデータ事前処理及び特徴エンジニアリング技法が適用されることがある。自動機械学習（ＡｕｔｏＭＬ）は、ベース学習器及びプリプロセッサを選択すること、並びに関連するハイパーパラメータをチューニングすることの自動化を調査する。 [0002] Several high-level decisions must be made when applying ML. For example, the learning algorithm, or base learner, needs to be selected from many different available learning algorithms. Associated with each learning algorithm is another set of hyperparameters that can be optimized to maximize the algorithm's performance with respect to application-specific error criteria for a given dataset. Also, different feature preprocessing algorithms and feature selection techniques, each containing a set of hyperparameters, can be combined in the ML pipeline to improve the performance of the base learner. Different hyperparameters need to be tuned and different data preprocessing and feature engineering techniques may be applied accordingly. Automated machine learning (AutoML) explores automation of selecting base learners and preprocessors, and tuning associated hyperparameters.

[0003]第１に、ＡｕｔｏＭＬは、非熟練者がＭＬを活用できるようにするという目標が動機になっている。第２に、ＡｕｔｏＭＬは、ＭＬを適用するプロセスをより効率的にすることが、たとえば、自動化を用いて熟練のデータサイエンティストの仕事量を減じることが、動機にもなっている。第３に、ＡｕｔｏＭＬは、ベース学習器をＭＬ問題に適用するための原理的な手法を提供することが望まれる（たとえば、参照により本明細書に組み込まれる、ＭｉｓｃｈａＳｃｈｍｉｄｔ他の「ＯｎｔｈｅＰｅｒｆｏｒｍａｎｃｅｏｆＤｉｆｆｅｒｅｎｔｉａｌＥｖｏｌｕｔｉｏｎｆｏｒＨｙｐｅｒｐａｒａｍｅｔｅｒＴｕｎｉｎｇ」、ａｒＸｉｖ；１９０４．０６９６０ｖ１（２０１９年４月１５日）参照）。参照により本明細書に組み込まれる、ＡｎｈＴｒｕｏｎｇ他の「ＴｏｗａｒｄｓＡｕｔｏｍａｔｅｄＭａｃｈｉｎｅＬｅａｒｎｉｎｇ：ＥｖａｌｕａｔｉｏｎａｎｄＣｏｍｐａｒｉｓｏｎｏｆＡｕｔｏＭＬＡｐｐｒｏａｃｈｅｓａｎｄＴｏｏｌ」、ａｒＸｉｖ：１９０８．０５５５７ｖ２（２０１９年９月３日）は、ＭＬパイプライン中の繰り返しタスクを低減し、以て繰り返しタスクを自動化しようとする、いくつかの異なるツール及びプラットフォームを使用してデータサイエンティスト、ＭＬ技術者及びＭＬ研究者の生産性を増大させるＡｕｔｏＭＬの潜在性について記述している。 [0003] First, AutoML is motivated by the goal of making ML accessible to non-experts. Second, AutoML is also motivated to make the process of applying ML more efficient, eg, to reduce the workload of skilled data scientists using automation. Third, AutoML hopes to provide a principled approach for applying base learners to ML problems (e.g., Mischa Schmidt et al., "On the Performance of Differential Evolution for Hyperparameter Tuning", arXiv; 1904.06960v1 (April 15, 2019)). Anh Truong et al., "Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tool," arXiv: 1908.05557v2 (September 3, 2019), incorporated herein by reference, iterates in the ML pipeline Describes the potential of AutoML to increase the productivity of data scientists, ML engineers and ML researchers using several different tools and platforms that seek to reduce tasks and thus automate repetitive tasks. there is

[0004]従来のＭＬを自動化するために、いくつかのオープンソースソフトウェアフレームワークが、たとえば、参照により本明細書に組み込まれる、Ｓｃｈｍｉｄｔ他の「ＯｎｔｈｅＰｅｒｆｏｒｍａｎｃｅｏｆＤｉｆｆｅｒｅｎｔｉａｌＥｖｏｌｕｔｉｏｎｆｏｒＨｙｐｅｒｐａｒａｍｅｔｅｒＴｕｎｉｎｇ」、ａｒＸｉｖ；１９０４．０６９６０ｖ１（２０１９年４月１５日）に列挙されているように存在する。関連する科学的な研究では通常、たとえばＯｐｅｎＭＬコミュニティで提供される、ある範囲のよく知られているテストデータセットに関して、実現性及びＭＬパフォーマンスを記録に残す。これらのようなフレームワークは通常、ユーザのＭＬタスクに対し、最良動作ハイパーパラメータセッティングによる（かつ、タスクのデータについて選択されパラメータ化されたアルゴリズムを訓練する）最も適切なＭＬアルゴリズムを見つけようとする。こうすることは、アルゴリズムセレクション及びハイパーパラメータチューニングと呼ばれる。さらに、フレームワークは、選択されパラメータ化されたアルゴリズムをＭＬタスクのデータによって訓練する。上記の文献は、専門家でないユーザがどのようにしてＡｕｔｏＭＬを容易に呼び出すことができるかについて記述していないが、プログラミングに習熟していることを要求していることに留意されたい。 [0004] Several open source software frameworks for automating conventional ML, for example, Schmidt et al., "On the Performance of Differential Evolution for Hyperparameter Tuning," arXiv, incorporated herein by reference; 1904.06960v1 (April 15, 2019). Relevant scientific studies typically document feasibility and ML performance on a range of well-known test datasets, eg provided in the OpenML community. Frameworks such as these typically try to find the most appropriate ML algorithm for the user's ML task with best-performing hyperparameter settings (and train the selected and parameterized algorithm on the task's data). . Doing so is called algorithm selection and hyperparameter tuning. Additionally, the framework trains the selected and parameterized algorithms with the data of the ML task. Note that the above document does not describe how a non-expert user can easily invoke AutoML, but requires programming proficiency.

[0005]専らディープラーニングについて、ニューラルアーキテクチャサーチ（ＮＡＳ）の主題がいくつかの文献で扱われている。これらの文献に記述されているフレームワークは、ｋｅｒａｓ（ｋｅｒａｓ．ｉｏ）又はｔｅｎｓｏｒｆｌｏｗなどのディープラーニングライブラリをパラメータ化することによってニューラルネットワークのアーキテクチャを考案することに焦点を合わせている。 [0005] The subject of Neural Architecture Search (NAS) has been addressed in several publications, exclusively for deep learning. The frameworks described in these documents focus on devising neural network architectures by parameterizing deep learning libraries such as keras (keras.io) or tensorflow.

[0006]最近、参照により本明細書に組み込まれる、ＭｉｃａｈＪ．Ｓｍｉｔｈ他の「ＴｈｅＭａｃｈｉｎｅＬｅａｒｎｉｎｇＢａｚａａｒ：ＨａｒｎｅｓｓｉｎｇｔｈｅＭＬＥｃｏｓｙｓｔｅｍｆｏｒＥｆｆｅｃｔｉｖｅＳｙｓｔｅｍＤｅｖｅｌｏｐｍｅｎｔ」、ａｒＸｉｖ：１９０５．０８９４２ｖ３（２０１９年１１月１２日）に、ＡｕｔｏＢａｚａａｒと呼ばれるＡｕｔｏＭＬフレームワークが記述されており、このフレームワークは、様々な異なる既存のＭＬ及びデータ操作ライブラリを活用することによって、ＭＬパイプラインテンプレートの概念の基礎を成す。このパイプラインテンプレートは、アルゴリズムセレクション及びハイパーパラメータチューニングのための、ＡｕｔｏＢａｚａａｒの抽象化の手段である。そのため、ＡｕｔｏＢａｚａａｒには、アルゴリズムを選択するステップ（実際には、多くの可能な候補パイプラインバリアントの中からのパイプラインセレクション）、（選択されたパイプラインに伴う様々なアルゴリズム／ステップの）一致したハイパーパラメータをチューニングするステップ、及びパイプラインを訓練するステップ（パイプライン内部のＭＬアルゴリズムを含む）という連続するステップにわたって反復する手法（アルゴリズム２）が記述されている。 [0006] Recently, Micah J. et al., incorporated herein by reference. Smith et al., "The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development," arXiv: 1905.08942v3 (November 12, 2019) describes this framework, called AutoBazaar. , underlies the concept of ML pipeline templates by leveraging a variety of different existing ML and data manipulation libraries. This pipeline template is AutoBazaar's means of abstraction for algorithm selection and hyperparameter tuning. As such, AutoBazaar includes a step of selecting an algorithm (actually a pipeline selection among many possible candidate pipeline variants), a matched A technique (Algorithm 2) that iterates over successive steps of tuning the hyperparameters and training the pipeline (including the ML algorithm inside the pipeline) is described.

[0007]米国特許出願公開第２０１６／０１３２７８７号には、クラウドＡｕｔｏＭＬ概念が次のように記述されている。すなわち、ユーザがデータ走行又はタスクを定義し、データベースに入力する。潜在的に多くの（クラウドセッティングの）ワーカノードのうちの１つが、セレクション戦略によって、ハイパーパラメータをチューニングする対象のいわゆる「ハイパーパーティション」を特定する。チューニング中に、モデルはすでに訓練されており、また、ユーザ指定タスク（たとえば、よく知られている平均二乗誤差（ＭＳＥ）基準）のパフォーマンス関数に基づいたパフォーマンススコアを計算するために、所与のデータセットによってテストされる。セレクション戦略では、ランダムに一様とするか、又はこれまでに達成されたパフォーマンススコアに、標準多腕バンディットアルゴリズム（ＵＣＢ１と呼ばれる）、若しくは「ＢｅｓｔＫ」及び「ＢｅｓｔＫＶｅｌｏｃｉｔｙ」を示した浮漂報酬を扱える多腕バンディットアルゴリズムの２つのバリアントのうちの１つを構築するか、のどちらかができる。ハイパーパーティションは、たとえばどちらのアルゴリズムを走らせるべきかの、分類のハイパーパラメータの選択と定義される。ハイパーパラメータをチューニングするには、ガウス過程を介する、一般に知られているベイズ最適化が適用される。セレクション戦略を適用し、次にベイズ最適化を適用することによって、ワーカは、次にどの訓練ジョブ（ＭＬアルゴリズム／パイプライン）がデータセットに適用されるべきかを特定し、対応する訓練ジョブ明細を中央データベースに入力する。ワーカノードが利用可能（すなわち、アイドル）である場合、ワーカは中央データベースを検査し、そのデータベース内の潜在的に多い訓練ジョブのうちの１つについて作業し、開始したときに、他のワーカが同じジョブに対して作業することを防止するために、その訓練ジョブに印を付ける。ＭＬが完了すると、ワーカはパフォーマンス及び関連するモデルを記憶し、次の訓練ジョブを検査するか、又は特定のデータ走行のうちの１つに対する新たな訓練ジョブを生成しようとする。 [0007] US Patent Application Publication No. 2016/0132787 describes the Cloud AutoML concept as follows. That is, the user defines data runs or tasks and enters them into the database. One of potentially many (in a cloud setting) worker nodes identifies, via a selection strategy, a so-called "hyperpartition" for which to tune hyperparameters. During tuning, the model has already been trained and a given set of Tested by dataset. The selection strategy may be uniform at random, or a standard multi-armed bandit algorithm (referred to as UCB1) to the performance score achieved so far, or a multi-armed multi-band that can handle floating rewards that indicate "BestK" and "BestKVelocity". One can either build one of two variants of the arm bandit algorithm. A hyperpartition is defined as the selection of hyperparameters for classification, eg which algorithm to run. To tune the hyperparameters, the commonly known Bayesian optimization via Gaussian processes is applied. By applying a selection strategy and then Bayesian optimization, the worker identifies which training job (ML algorithm/pipeline) should be applied next to the dataset and the corresponding training job specification into a central database. If a worker node is available (i.e. idle), the worker checks the central database, works on one of the potentially many training jobs in that database, and when it starts, the other workers do the same. Mark the training job to prevent working on it. Once ML is complete, the worker stores the performance and associated models and attempts to check the next training job or create a new training job for one of the specific data runs.

[0008]ディープラーニング手法に焦点を合わせているＧＯＯＧＬＥＣｌｏｕｄＡｕｔｏＭＬ、ＡＺＵＲＥＭＬアルゴリズムを活用するＭＩＣＲＯＳＯＦＴＡｚｕｒｅＭＬ、ＳＡＬＥＳＦＯＲＣＥＴｒａｎｓｍｏｇｒｉｆＡＩ及びＵＢＥＲＬｕｄｗｉｇなどの、サービス（ＡＭＬａａＳ）販売品としてのＡｕｔｏＭＬもある。しかし、これらの機構の内部動作は開示されておらず、したがって、これらの機構がそのＡＭＬａａＳ動作を数百万のユーザ要望に対してどのようにしてスケール変更するのかは公知ではない。また、これらの機構が既存のＭＬアルゴリズム又はディープラーニングアルゴリズムをどのようにして活用し整合させることができるかも、知られていないか開示されていない。 [0008] There are also AutoML as a Service (AMLaaS) offerings such as GOOGLE Cloud AutoML, which focuses on deep learning techniques, MICROSOFT AzureML, which leverages the AZURE ML algorithm, SALESFORCE TransmogrifAI, and UBER Ludwig. However, the inner workings of these mechanisms are not disclosed, so it is not known how these mechanisms scale their AMLaaS operations to the needs of millions of users. Nor is it known or disclosed how these mechanisms can leverage and match existing ML or deep learning algorithms.

overview

[0009]一実施形態では、本発明は、機械学習アルゴリズムを自動的に選択し、機械学習アルゴリズムのハイパーパラメータをチューニングする方法を提供する。データセット及び機械学習タスクがユーザから受け取られる。機械学習タスクに対する、異なる自動機械学習フレームワークの複数のインスタンス化の実行が、使用可能な計算リソース及び時間予算（time budget、タイムバジェット）を考慮したそれぞれ別個の腕として制御され、それによって、別個の腕による実行中に複数の機械学習モデルが訓練され、訓練された複数のモデルのパフォーマンススコアが計算される。パフォーマンススコアに基づき機械学習タスクに対する訓練された複数のモデルのうちの１つ又は複数が、選択される。 [0009] In one embodiment, the present invention provides a method for automatically selecting machine learning algorithms and tuning hyperparameters of machine learning algorithms. A dataset and a machine learning task are received from a user. The execution of multiple instantiations of different automated machine learning frameworks for a machine learning task are controlled as separate arms, each taking into account available computational resources and time budgets, thereby allowing separate A plurality of machine learning models are trained during execution by the arm, and a performance score of the trained models is calculated. One or more of a plurality of trained models for the machine learning task are selected based on the performance scores.

[0010]本発明の諸実施形態について、例示的な図に基づいて以下でより詳細に記述する。本発明は、例示的な実施形態に限定されない。本明細書に記述及び／又は図示のすべての機能は、単独で用いることも、本発明の諸実施形態の異なる組み合わせとして組み合わせることもできる。本発明の様々な実施形態の特徴及び利点は、下記を図示する添付の図面を参照して後続の詳細な記述を読むことにより明らかになろう。 [0010] Embodiments of the invention are described in more detail below on the basis of exemplary figures. The invention is not limited to the exemplary embodiments. All features described and/or illustrated in this specification may be used alone or combined in different combinations in embodiments of the invention. Features and advantages of various embodiments of the present invention will become apparent upon reading the following detailed description with reference to the accompanying drawings illustrated below.

[0011]図１は、階層意思決定と時間の経過を計上する機能との故に本明細書では時間認識による階層自動機械学習（Hierarchical Automated Machine LEarning with Time-awareness）（ＨＡＭＬＥＴ）と呼ばれる、本発明によるメタＡｕｔｏＭＬのシステム及び方法の一実施形態を概略的に図示する図である。[0011] FIG. 1 illustrates the present invention, referred to herein as Hierarchical Automated Machine LEarning with Time-awareness (HAMLET) because of its ability to make hierarchical decisions and account for the passage of time. 1 schematically illustrates one embodiment of a system and method for meta-AutoML according to FIG.

[0012]図２は、ドッカー選択肢を図示する図である。[0012] Figure 2 is a diagram illustrating docker options.

[0013]図３は、本発明の一実施形態による、学習曲線（ＬＣ）が、観測された学習曲線値に基づいてどのように外挿されるかを示す例示的なグラフである。[0013] Figure 3 is an exemplary graph illustrating how a learning curve (LC) is extrapolated based on observed learning curve values, according to one embodiment of the present invention.

[0014]図４ａ、４ｂ及び４ｃは、第１の実験を１５分（図４ａ）、３０分（図４ｂ）、及び１時間（図４ｃ）の予算でＨＡＭＬＥＴバリアント１ランクのボックスプロット図として、また少ない予算では結果が質的に変化しないことを、それぞれ示す図である。[0014] Figures 4a, 4b and 4c show the first experiment as boxplots of HAMLET variant 1 rank with budgets of 15 minutes (Figure 4a), 30 minutes (Figure 4b), and 1 hour (Figure 4c). Moreover, it is a figure which each shows that the result does not change qualitatively with a small budget. ［図４ａ］の説明を参照。See description of Figure 4a. ［図４ａ］の説明を参照。See description of Figure 4a.

[0015]図５ａ、５ｂ、５ｃ及び５ｄは、第１の実験を１０分（図５ａ）、１５分（図５ｂ）、３０分（図５ｃ）、及び１時間（図５ｄ）の予算でＨＡＭＬＥＴバリアント３ランクのボックスプロット図として、また少ない予算では結果が質的に変化しないこと（予算１０分）を、それぞれ示す図である。[0015] Figures 5a, 5b, 5c and 5d show the first experiment in HAMLET with budgets of 10 minutes (Figure 5a), 15 minutes (Figure 5b), 30 minutes (Figure 5c) and 1 hour (Figure 5d). Boxplots of variant 3 ranks and that the lower budget does not qualitatively change the results (10 min budget), respectively. ［図５ａ］の説明を参照。See description of Figure 5a. ［図５ａ］の説明を参照。See description of Figure 5a. ［図５ａ］の説明を参照。See description of Figure 5a.

[0016]図６ａ、６ｂ及び６ｃは、第１の実験を１５分（図６ａ）、３０分（図６ｂ）、及び１時間（図６ｃ）の予算でポリシー間（inter-policy、インターポリシー）比較のランクのボックスプロット図として、またＢ＝９００ｓではポリシーが統計的に区別ができないことを、それぞれ示す図である。[0016] Figures 6a, 6b and 6c illustrate the first experiment inter-policy with budgets of 15 minutes (Figure 6a), 30 minutes (Figure 6b) and 1 hour (Figure 6c). Boxplots of the comparative ranks and the statistical indistinguishability of the policies at B=900 s, respectively. ［図６ａ］の説明を参照。See description of Figure 6a. ［図６ａ］の説明を参照。See description of Figure 6a.

[0017]図７ａ、７ｂ及び７ｃは、第２の実験を２時間（図７ａ）、３時間（図７ｂ）、及び１２時間（図７ｃ）の予算でＨＡＭＬＥＴバリアント１ランクのボックスプロット図として、それぞれ示す図である。[0017] Figures 7a, 7b and 7c show the second experiment as boxplots of HAMLET variant 1 rank at budgets of 2 hours (Figure 7a), 3 hours (Figure 7b), and 12 hours (Figure 7c). 4A and 4B are diagrams showing each; ［図７ａ］の説明を参照。See description of Figure 7a. ［図７ａ］の説明を参照。See description of Figure 7a.

[0018]図８ａ、８ｂ、８ｃ及び８ｄは、第２の実験を１時間（図８ａ）、２時間（図８ｂ）、３時間（図８ｃ）、及び１２時間（図８ｄ）の予算でＨＡＭＬＥＴバリアント３ランクのボックスプロット図として、それぞれ示す図である。[0018] Figures 8a, 8b, 8c and 8d illustrate the second experiment HAMLET with budgets of 1 hour (Figure 8a), 2 hours (Figure 8b), 3 hours (Figure 8c) and 12 hours (Figure 8d). FIG. 10 is a diagram showing each as a boxplot diagram of variant 3 ranks. ［図８ａ］の説明を参照。See description of Figure 8a. ［図８ａ］の説明を参照。See description of Figure 8a. ［図８ａ］の説明を参照。See description of Figure 8a.

[0019]図９ａ、９ｂ、９ｃ、９ｄ、９ｅ、９ｆ、９ｇ及び９ｈは、第２の実験を１５分（図９ａ）、３０分（図９ｂ）、及び４５分（図９ｃ）、１時間（図９ｄ）、２時間（図９ｅ）、３時間（図９ｆ）、６時間（図９ｇ）、及び１２間（図９ｈ）の予算でポリシー間比較のランクのボックスプロット図として、それぞれ示す図である。[0019] Figures 9a, 9b, 9c, 9d, 9e, 9f, 9g and 9h demonstrate that the second experiment was performed for 15 minutes (Figure 9a), 30 minutes (Figure 9b), and 45 minutes (Figure 9c) for 1 hour. (Fig. 9d), 2 hr (Fig. 9e), 3 hr (Fig. 9f), 6 hr (Fig. 9g), and 12 (Fig. 9h) budgets as boxplots of the ranking of inter-policy comparisons, respectively. is. ［図９ａ］の説明を参照。See description of Figure 9a. ［図９ａ］の説明を参照。See description of Figure 9a. ［図９ａ］の説明を参照。See description of Figure 9a. ［図９ａ］の説明を参照。See description of Figure 9a. ［図９ａ］の説明を参照。See description of Figure 9a. ［図９ａ］の説明を参照。See description of Figure 9a. ［図９ａ］の説明を参照。See description of Figure 9a.

[0020]図１０は、すべてのデータセット及び予算についてまとめられた、異なるポリシーの平均ランクの９５％信頼区間を示す図である。[0020] Figure 10 shows the 95% confidence intervals of the average rank of different policies, summarized for all datasets and budgets.

detailed description

[0021]本発明の諸実施形態では、アルゴリズムセレクション及びハイパーパラメータチューニングのために、並びにアンサンブリングのために既存のフレームワーク及びライブラリを活用するメタＡｕｔｏＭＬのシステム及び方法を提示する。これらの方法及びシステムでは、各腕のパフォーマンスを予測するために学習曲線外挿によって強化された修正多腕バンディットアルゴリズムを使用する。システムは、便利な操作性が得られるように設計される。本発明の諸実施形態は、従来のＭＬアルゴリズムのＡｕｔｏＭＬフレームワークと概念的に同様にＮＡＳフレームワークを統合するのに好適に設計される。 [0021] Embodiments of the present invention present meta-AutoML systems and methods that leverage existing frameworks and libraries for algorithm selection and hyperparameter tuning, as well as for ensemble. These methods and systems use a modified multi-arm bandit algorithm augmented by learning curve extrapolation to predict the performance of each arm. The system is designed for convenient operability. Embodiments of the present invention are suitably designed to integrate the NAS framework conceptually as well as the AutoML framework of conventional ML algorithms.

[0022]多数の選択肢の中から最適に選択することを学習するには、ＵＣＢ１などの多腕バンディットアルゴリズムを使用することが可能であるが、一般にこれらのアルゴリズムでは定常報酬分布を仮定する（本質的に、別の腕に対して多腕バンディットアルゴリズムが受け取る期待報酬は、経時的に変化しないはずである）。しかし、ＡｕｔｏＭＬセッティングでは、米国特許出願公開第２０１６／０１３２７８７号のハイパーパーティション又は本発明の諸実施形態のＡｕｔｏＭＬフレームワークのような、腕が良くなるほど授与される計算時間が多くなる（すなわち、腕がより多く引かれる）ので、報酬は非定常である。米国特許出願公開第２０１６／０１３２７８７号ではそのことに、バリアントＢｅｓｔ－Ｋ（受け取った報酬のサブセットだけを反映する―サブセットのサイズは構成パラメータＫによって特定される）又はＢｅｓｔ－Ｋ－Ｖｅｌｏｃｉｔｙ（報酬の違いをＫの最良の報酬のサブセットに反映する）によって対処している。しかし、これらの多腕バンディットアルゴリズムは過去しか見ず、したがって問題を適切に表さない。 [0022] Multi-armed bandit algorithms such as UCB1 can be used to learn to choose optimally among a large number of alternatives, but in general these algorithms assume a stationary reward distribution (essentially Ideally, the expected reward received by a multi-armed bandit algorithm for different arms should not change over time). However, in an AutoML setting, such as the hyper-partition of US2016/0132787 or the AutoML framework of embodiments of the present invention, the better the arm, the more computation time awarded (i.e., the more drawn), so the reward is non-stationary. US Patent Application Publication No. 2016/0132787 states that variants Best-K (reflects only a subset of the rewards received—the size of the subset is specified by the configuration parameter K) or Best-K-Velocity (rewards (reflecting differences in the best reward subset of K). However, these multi-armed bandit algorithms only look past and therefore do not adequately represent the problem.

[0023]ＡＭＬａａＳを提供すること、及び仕事量を調和させることに関して、知られている手法は、米国特許出願公開第２０１６／０１３２７８７号に記述の手法を除いて、たとえばクラウドコンピューティングを用いてＡＭＬａａＳを提供するためにどのようにスケール変更するかにはかかわらない。米国特許出願公開第２０１６／０１３２７８７号では、大規模分散アーキテクチャが使用され、これはクラウドサービスと互換性がある。この文献では、ワーカが作業を中央データベースから引き出し、作業は、１つのＭＬジョブを走らせること（パラメータ化されたＭＬアルゴリズムを訓練すること）、又はハイパーパーティション及びハイパーパラメータ化を決定するためにデータ走行をフェッチして、他のワーカがフェッチ／作業するためのデータベースに入力することを意味する。米国特許出願公開第２０１６／０１３２７８７号の手法は、ジョブ又はデータ走行が優先権を割り当てられ、次にワーカが、たとえばこれらの優先権を考慮に入れて、中央データベース入力を選択することを必要とする。本発明者らは、このことは、その手法では優先権を割り当てることに関してユーザ対話及び知識を必要とするので、不利点であると認識した。さらに、米国特許出願公開第２０１６／０１３２７８７号の手法には、ＡｕｔｏＭＬ問題にとって好ましくない特定の特性／含意がある。以下は、米国特許出願公開第２０１６／０１３２７８７号のすべての学習戦略に当てはまる（すなわち、訓練中に行われる観測結果を利用できないので、それ自体が部分最適であるランダム戦略において一様でない）。すなわち、この手法では、１つのハイパーパーティションをデータ走行のためにセレクション戦略を用いて、具体的には多腕バンディットアルゴリズムを用いて、選ぶ。米国特許出願公開第２０１６／０１３２７８７号に記述されたＵＣＢ１、Ｂｅｓｔ－Ｋ又はＢｅｓｔ－ＫＶｅｌｏｃｉｔｙなどの多腕バンディットアルゴリズムは、観測された経験（あり得る別のハイパーパーティションのすでに訓練されたモデルのパフォーマンススコア）に基づいて選択する。多腕バンディットアルゴリズムがデータベースに新たな経験を観測したとき（すなわち、ワーカがジョブを終了し、モデルをそのパフォーマンスと共にデータベースに記憶した）、多腕バンディットアルゴリズムはその統計情報を更新する。米国特許出願公開第２０１６／０１３２７８７号がアルゴリズムを指定する方法で、アルゴリズムは、問題のデータ走行のモデルの訓練をワーカが終了したときに、ようやく新たな経験を観測することができる。１つのハイパーパーティションが選ばれた後、１つのチューニング戦略が適用される。そのチューニング戦略はベイズ最適化であり、その確率的モデルを、すでに観測されているその関連するハイパーパーティションの経験によって更新する。次に、これらのモデルを使用して最も見込みのあるハイパーパラメータ構成を、たとえば適切に定義された予想改善判定基準を用いて特定する。ベイズ最適化は、ハイパーパーティションを選んだ後だけであり、多腕バンディットフレームワークとは分かれており、前もって個々の腕のパフォーマンスを予測するために用いることができない。従来、次にワーカが中央データベースを介して実行を要求する、単一の最も見込みのあるハイパーパラメータ構成がある。要約すると、米国特許出願公開第２０１６／０１３２７８７号の機構は、単一のデータ走行に対する多数のジョブがどのように要求され実行されるか（モデルが更新される必要があるときに）を説明していない。 [0023] With respect to providing AMLaaS and balancing workloads, known approaches are to provide AMLaaS using, for example, cloud computing, with the exception of the approach described in US Patent Application Publication No. 2016/0132787. , regardless of how it scales to provide US2016/0132787 uses a massively distributed architecture, which is compatible with cloud services. In this paper, workers pull their work from a central database, and the work is either running a single ML job (training a parameterized ML algorithm) or extracting data to determine hyper-partitions and hyper-parameterization. Means fetching a run and entering it into a database for other workers to fetch/work on. The approach of U.S. Patent Application Publication No. 2016/0132787 requires that jobs or data runs be assigned priorities and then workers select central database inputs, e.g., taking into account these priorities. do. The inventors have recognized that this is a disadvantage as the approach requires user interaction and knowledge regarding assigning priority. Furthermore, the approach of US2016/0132787 has certain properties/implications that are not favorable for AutoML problems. The following applies to all learning strategies of US2016/0132787 (i.e., non-uniform in random strategies that are themselves suboptimal since observations made during training are not available). That is, in this approach, one hyper-partition is chosen for data running using a selection strategy, specifically using a multi-armed bandit algorithm. Multi-armed bandit algorithms, such as UCB1, Best-K or Best-K Velocity, described in US Patent Application Publication No. 2016/0132787, use the observed experience (the performance of an already trained model of a possible different hyper-partition score). When the multi-armed bandit algorithm observes a new experience in the database (ie, a worker has finished a job and stored the model with its performance in the database), the multi-armed bandit algorithm updates its statistics. In the way US Patent Application Publication No. 2016/0132787 specifies the algorithm, the algorithm can observe new experiences only when the workers have finished training a model of the data run in question. After one hyperpartition is chosen, one tuning strategy is applied. Its tuning strategy is Bayesian optimization, which updates its probabilistic model with its associated hyper-partition experience already observed. These models are then used to identify the most likely hyperparameter configurations, eg, using well-defined expected improvement criteria. Bayesian optimization is only after choosing a hyperpartition, separate from the multi-armed bandit framework, and cannot be used to predict the performance of individual arms in advance. Traditionally, there is a single most likely hyperparameter configuration that workers then request execution via a central database. In summary, the mechanism of US2016/0132787 describes how multiple jobs for a single data run can be requested and executed (when the model needs to be updated). not

[0024]本発明の諸実施形態では、異なるフレームワークを活用することができる。ＡｕｔｏＢａｚａａｒ、ａｕｔｏ－ｓｋｌｅａｒｎ、米国特許出願公開第２０１６／０１３２７８７号のフレームワーク等は、ＭＬアルゴリズムのパラメータをチューニングする必要があり、また、異なるＡｕｔｏＭＬフレームワークを活用していない。言い換えると、知られている手法は、アルゴリズムセレクション及びハイパーパラメータチューニングをするために異なるＡｕｔｏＭＬフレームワークを考慮することとは対照的に、異なるＭＬアルゴリズム及びそのハイパーパラメータだけを考慮する。本発明者らは、このことが以下の理由のために不利点につながり得ることを発見した。すなわち、ＡｕｔｏＭＬは今もなお活発な研究分野であり、新たなアイデア及びオープンソースフレームワークが頻繁に発表され、明確な最先端フレームワークが存在しない（ＡｎｈＴｒｕｏｎｇ他の「ＴｏｗａｒｄｓＡｕｔｏｍａｔｅｄＭａｃｈｉｎｅＬｅａｒｎｉｎｇ：ＥｖａｌｕａｔｉｏｎａｎｄＣｏｍｐａｒｉｓｏｎｏｆＡｕｔｏＭＬＡｐｐｒｏａｃｈｅｓａｎｄＴｏｏｌｓ」、ａｒＸｉｖ：１９０８．０５５５７ｖ２（２０１９年９月３日）参照）。加えて、異なるタイプのデータセット及び問題では、ハイパーパラメータチューニング及びアルゴリズムセレクションのための異なるアルゴリズムが最良の結果につながり得る（たとえば、ＭｉｓｃｈａＳｃｈｍｉｄｔ他の「ＯｎｔｈｅＰｅｒｆｏｒｍａｎｃｅｏｆＤｉｆｆｅｒｅｎｔｉａｌＥｖｏｌｕｔｉｏｎｆｏｒＨｙｐｅｒｐａｒａｍｅｔｅｒＴｕｎｉｎｇ」、ａｒＸｉｖ；１９０４．０６９６０ｖ１（２０１９年４月１５日）参照）。さらに、たとえば、ニューラルネットワークのハイパーパラメータをチューニングするためにガウス過程を走らせることは、今もなお活発な研究分野である。新たなアルゴリズムが見出されたり、新たなフレームワークが発表されたりした場合、上記の以前の手法は、これらのシステムに容易に取り込むことができない。以前の手法は、その特定のフレームワークセッティングの問題を解決し、そのフレームワークセッティングを以前の手法自体で実施して、そのフレームワークのニーズに合わせる必要があり、たとえば、以前の手法は、どうすればベイズ最適化手法がＮＡＳのために効率的に働くようになるかを解決する必要がある。このことは、研究における新たな発見が容易に取り込まれ得ないことを意味し、改善の遅れにつながる。 [0024] Embodiments of the present invention may utilize different frameworks. AutoBazaar, auto-sklearn, the framework of US Patent Application Publication No. 2016/0132787, etc. require tuning of the parameters of the ML algorithm and do not leverage different AutoML frameworks. In other words, known approaches only consider different ML algorithms and their hyperparameters, as opposed to considering different AutoML frameworks for algorithm selection and hyperparameter tuning. The inventors have discovered that this can lead to disadvantages for the following reasons. AutoML is still an active research field, with new ideas and open source frameworks being published frequently, and no clear state-of-the-art framework (Anh Truong et al., "Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools", arXiv: 1908.05557v2 (September 3, 2019)). Additionally, for different types of data sets and problems, different algorithms for hyperparameter tuning and algorithm selection may lead to the best results (e.g. Mischa Schmidt et al., "On the Performance of Differential Evolution for Hyperparameter Tuning", arXiv ; 1904.06960v1 (April 15, 2019)). Furthermore, running Gaussian processes, for example, to tune the hyperparameters of neural networks is still an active research area. As new algorithms are discovered or new frameworks are announced, the previous approaches described above cannot be easily incorporated into these systems. The previous approach had to solve the problem of that particular framework setting and implement that framework setting in the previous approach itself to suit the needs of that framework, e.g. There is a need to solve how Bayesian optimization methods can work efficiently for NAS. This means that new findings in research cannot be readily incorporated, leading to delays in improvement.

[0025]加えて、米国特許出願公開第２０１６／０１３２７８７号などにある手法が、どのようにしてＧＯＯＧＬＥＡｕｔｏＭＬなどのＡＭＬａａＳフレームワークを取り込むことができるかは分かっていない。このことは、たとえばＧＯＯＧＬＥＡｕｔｏＭＬを取り込むことがオープンソースフレームワークとしてのスコアの面でより良い結果につながり得るので、不利点になり得る。 [0025] Additionally, it is not known how techniques such as those in US Patent Application Publication No. 2016/0132787 can incorporate AMLaaS frameworks such as GOOGLE AutoML. This can be a disadvantage, for example, because incorporating GOOGLE AutoML can lead to better results in terms of score as an open source framework.

[0026]さらに、上記の以前の手法では、ＡｕｔｏＭＬフレームワークを用いて作業するのにプログラミングスキルが必要になるが、使いやすいアプリケーションプログラミングインターフェース（ＡＰＩ）が存在しない。これにより、専門家ではないユーザ（ＡｕｔｏＭＬの重要な目標客層を構築する）にとってのその使いやすさが低減する。 [0026] Moreover, the previous approaches described above require programming skills to work with the AutoML framework, but there is no easy-to-use application programming interface (API). This reduces its usability for non-expert users, who build AutoML's key target audience.

[0027]本発明の諸実施形態は、上に述べた以前の手法の課題及び問題を克服する。 [0027] Embodiments of the present invention overcome the problems and problems of previous approaches described above.

[0028]第１に、非定常報酬に関する課題に対して、本発明の諸実施形態では外挿多腕バンディットアルゴリズムを提示し、このアルゴリズムは、学習曲線関数を腕の過去の報酬に合わせ、その学習曲線関数を残っている時間予算の終わりまで外挿することによって、将来をのぞき見ることを意味する。したがって、本発明の諸実施形態では、時間が進行するにつれて時間予算を代替形態の間でどのようにして割り当てるかについての洞察の改善がもたらされる。学習曲線がランタイムにわたって生成するときにその学習曲線を調べる手法は、米国特許出願公開第２０１６／０１３２７８７号（及び他の研究）における「関数呼び出し」ベースの評価と比較された場合、中央データベースに記録されたモデルパフォーマンスに基づいて機構がすべての統計情報を更新する、また、これらのパフォーマンス統計情報が訓練されたモデルの評価後にモデルパフォーマンスに基づいて更新される、という別の態様でも非常に有益である。これは、統計情報を更新できるようになる前に、１つのモデル訓練（又は並列訓練が遂行される場合には複数の訓練）が終了しなければならないことを暗に示す。その理由は、パフォーマンス統計情報が「機能評価」の単位に基づいているからである（機能とは、データセットでパラメータ化アルゴリズムを訓練すること、及びそのパフォーマンスを記録することを指す）。これは、米国特許出願公開第２０１６／０１３２７８７号の多腕バンディットアルゴリズム及びベイズ最適化が過去のサンプルに作用することによる（すなわち、多腕バンディットアルゴリズム及びベイズ最適化は、新データ（新たに訓練されたモデルのパフォーマンス）を見た後にようやくその予測を更新し、より良い予測に達することができる）。本発明の諸実施形態において、ある期間にわたって学習曲線外挿を活用することによって、多腕バンディットの腕の学習曲線を更新できるので、サンプリングがｘ秒ごとに行われ、予測を更新することができる。新たな最良のモデルがそのスコアと共に報告されない限り、腕のパフォーマンスは、時間が進行する（及び予算が低減される）間、単純にパフォーマンスが変わらないままである。このことは、本発明の諸実施形態では、腕の実行をいつでも停止できること、及び、学習曲線に基づいて別の腕が有利に動作すると予測される場合には、その別のものに実行権を割り当てることができることも意味する。対照的に、米国特許出願公開第２０１６／０１３２７８７号の手法では、腕の評価の変化を見るためにモデルが実行を終了するまで待ち、そうして多腕バンディットアルゴリズム（又はベイズ最適化）の選択を変更しなければならず、これには長時間要することがある。 [0028] First, for the problem of non-stationary rewards, embodiments of the present invention present an extrapolated multi-armed bandit algorithm, which fits a learning curve function to the past rewards of an arm and its It means looking into the future by extrapolating the learning curve function to the end of the remaining time budget. Accordingly, embodiments of the present invention provide improved insight into how time budgets are allocated among alternatives as time progresses. A technique that examines the learning curve as it develops over runtime is recorded in a central database when compared to the "function call" based evaluation in US Patent Application Publication No. 2016/0132787 (and other work). Another aspect that is very useful is that the mechanism updates all statistics based on trained model performance, and these performance statistics are updated based on model performance after evaluation of the trained model. be. This implies that one model training (or multiple trainings if parallel training is performed) must finish before the statistics can be updated. The reason is that the performance statistics are based on the unit of "feature evaluation" (feature refers to training a parameterized algorithm on a dataset and recording its performance). This is because the multi-armed bandit algorithm and Bayesian optimization of US2016/0132787 operate on past samples (i.e., the multi-armed bandit algorithm and Bayesian optimization operate on new data (newly trained Only after looking at the model's performance) can we update its predictions and arrive at a better prediction). In embodiments of the present invention, the learning curve for the arms of a multi-armed bandit can be updated by exploiting learning curve extrapolation over time so that sampling occurs every x seconds and predictions can be updated. . Unless a new best model is reported with its score, the arm's performance simply remains unchanged as time progresses (and the budget is reduced). This means that embodiments of the present invention can stop execution of an arm at any time, and grant execution to another arm if the learning curve predicts that it will perform favorably. It also means that it can be assigned. In contrast, the approach of U.S. Patent Application Publication No. 2016/0132787 waits until the model finishes running to see changes in arm estimates, and then selects a multi-armed bandit algorithm (or Bayesian optimization). must be changed, which can take a long time.

[0029]第２に、ＡＭＬａａＳを提供すること、及び仕事量を調和させることの課題に関して、本発明の諸実施形態は、以下で述べるように、単一のコンピュータからクラウドセッティングまでスケール変更するように設計され、それゆえに、ＡＭＬａａＳセッティングでの動作が得られるようにスケール変更する課題を克服する。米国特許出願公開第２０１６／０１３２７８７号の手法と本発明の諸実施形態との間の、ＡｕｔｏＭＬ作業量を調和させることについての違いは自明である。すなわち、米国特許出願公開第２０１６／０１３２７８７号の手法では、ワーカが作業を中央データベースから引き出し、作業は、１つの機械学習ジョブを走らせること（パラメータ化された機械学習アルゴリズムを訓練すること）、又はハイパーパーティション及びハイパーパラメータ化を決定するためにデータ走行をフェッチして、他のワーカがフェッチ／作業するためのデータベースに入力することを意味する。対照的に、本発明の諸実施形態では、ディスパッチャ・マスタ・ワーカ概念を用い、この概念では、マスタがディスパッチャによってＭＬタスクを割り当てられ、マスタは、ディスパッチャが協働するワーカに時間予算を割り当て、多数のワーカが必要に応じて並列に走ることができる。こうすることは、ＡｕｔｏＭＬタスク（米国特許出願公開第２０１６／０１３２７８７号では「データ走行」と呼ばれる）ごとにリソースを制御するのに有益である。米国特許出願公開第２０１６／０１３２７８７号の手法では、ジョブ又はデータ走行が優先権を割り当てられ、次にワーカが、たとえばこれらの優先権を考慮に入れて、中央データベース入力を選択する必要がある。 [0029] Second, with respect to the challenges of providing AMLaaS and balancing workloads, embodiments of the invention are designed to scale from a single computer to a cloud setting, as described below. , thus overcoming the challenges of scaling to obtain operation in an AMLaaS setting. The difference between the approach of US2016/0132787 and the embodiments of the present invention in balancing the AutoML workload is obvious. That is, in the approach of U.S. Patent Application Publication No. 2016/0132787, workers pull work from a central database, and the work consists of running one machine learning job (training a parameterized machine learning algorithm), Or fetching data runs to determine hyperpartitions and hyperparameterization and input into a database for other workers to fetch/work on. In contrast, embodiments of the present invention use a dispatcher-master-worker concept, in which the master is assigned ML tasks by the dispatcher, the master assigns a time budget to the workers with whom the dispatcher works, Multiple workers can run in parallel as needed. This is useful for controlling resources per AutoML task (called "data run" in US2016/0132787). In the approach of US2016/0132787, jobs or data runs are assigned priorities, and workers are then required to select central database inputs, eg, taking these priorities into account.

[0030]第３に、異なるフレームワークを活用することに関して、また、上に述べた以前の手法とは対照的に、本発明の諸実施形態では、異なるＡｕｔｏＭＬフレームワーク（このフレームワークは、その内部アルゴリズムセレクション及びパラメータチューニング論理を走らせる）を活用することができ、このフレームワークは、改善された多腕バンディットアルゴリズムによって選ぶことができる。本発明の諸実施形態による手法は、概念的にはより単純であるが、上で論じた不利点を克服できるので望ましい。加えて、本発明の諸実施形態は、ＧＯＯＧＬＥＡｕｔｏＭＬなどのＡＭＬａａＳフレームワークを取り込んで、より良いスコアにつながり得る。 [0030] Third, with respect to leveraging different frameworks, and in contrast to previous approaches discussed above, embodiments of the present invention use different AutoML frameworks (which internal algorithm selection and parameter tuning logic) can be leveraged, and this framework can be selected by an improved multi-armed bandit algorithm. Although conceptually simpler, the approach according to embodiments of the present invention is desirable because it overcomes the disadvantages discussed above. Additionally, embodiments of the present invention may incorporate AMLaaS frameworks such as GOOGLE AutoML, leading to better scores.

[0031]第４に、専門家でないユーザにとって使いやすいようにすることに関して、本発明の諸実施形態では、使いやすいＡＰＩを提供することによって、上で論じた課題を克服する。したがって、本発明の諸実施形態では、ＭＬへのより容易で優れたアクセス性を実現する。 [0031] Fourth, with respect to ease of use for non-specialist users, embodiments of the present invention overcome the challenges discussed above by providing an easy-to-use API. Thus, embodiments of the present invention provide easier and better accessibility to ML.

[0032]一実施形態では、本発明は、機械学習アルゴリズムを自動的に選択し、機械学習アルゴリズムのハイパーパラメータをチューニングする方法を提供する。データセット及び機械学習タスクがユーザから受け取られる。機械学習タスクに対する、異なる自動機械学習フレームワークの複数のインスタンス化の実行が、使用可能な計算リソース及び時間予算を考慮してそれぞれ別個の腕として制御され、それによって、別個の腕による実行中に複数の機械学習モデルが訓練され、訓練された複数のモデルのパフォーマンススコアが計算される。パフォーマンススコアに基づき機械学習タスクに対する訓練された複数のモデルのうちの１つ又は複数が選択される。 [0032] In one embodiment, the present invention provides a method for automatically selecting a machine learning algorithm and tuning the hyperparameters of the machine learning algorithm. A dataset and a machine learning task are received from a user. Execution of multiple instantiations of different automated machine learning frameworks for a machine learning task are each controlled as a separate arm in view of available computational resources and time budgets, whereby during execution by separate arms Multiple machine learning models are trained and performance scores of the multiple trained models are calculated. One or more of the trained models for the machine learning task are selected based on the performance scores.

[0033]一実施形態では、パフォーマンススコアは、時間予算の一部分である、実行の時間間隔中の腕のそれぞれの達成されたパフォーマンスに基づき時間予算の残りに外挿される。 [0033] In one embodiment, the performance score is extrapolated to the rest of the time budget based on the achieved performance of each of the arms during the time interval of execution, which is part of the time budget.

[0034]一実施形態では、方法は、外挿されたパフォーマンススコアに基づく時間予算の残りの時間中に腕に計算リソースを割り当てるステップをさらに含む。 [0034] In one embodiment, the method further includes allocating computational resources to the arm during the remaining time of the time budget based on the extrapolated performance score.

[0035]一実施形態では、パフォーマンススコアは、腕のそれぞれの過去の報酬に学習曲線関数を適合させること、及び時間予算の残りの終わりまで過去の報酬を外挿することによって、外挿される。 [0035] In one embodiment, the performance score is extrapolated by fitting a learning curve function to the past rewards of each of the arms and extrapolating the past rewards to the end of the remainder of the time budget.

[0036]一実施形態では、方法は、外挿されたパフォーマンススコアに基づいて、腕のうちの少なくとも１つの実行を凍結するステップをさらに含む。 [0036] In one embodiment, the method further comprises freezing execution of at least one of the arms based on the extrapolated performance score.

[0037]一実施形態では、腕のうちの少なくとも１つの実行を、凍結が行われた時点から再開するステップをさらに含む。 [0037] An embodiment further comprises resuming execution of at least one of the arms from the point at which the freeze occurred.

[0038]一実施形態では、時間予算中に腕に計算リソースを割り当てるために、腕のうちの少なくともいくつかは、セレクション機構を用いる時分割多重化によって実行される。 [0038] In one embodiment, at least some of the arms are performed by time division multiplexing using a selection mechanism to allocate computational resources to the arms during the time budget.

[0039]一実施形態では、腕のうちの少なくともいくつかが並列に実行される。 [0039] In one embodiment, at least some of the arms are executed in parallel.

[0040]一実施形態では、方法は、訓練された複数のモデルからアンサンブルを構築するステップをさらに含む。 [0040] In one embodiment, the method further comprises building an ensemble from the plurality of trained models.

[0041]一実施形態では、腕のそれぞれが、自動機械学習フレームワークのそれぞれのコンテナ画像を有するドッカーコンテナにおけるクラウドコンピュータシステムアーキテクチャのマイクロサービスコンポーネントとして、実行される。 [0041] In one embodiment, each of the arms is run as a microservice component of the cloud computer system architecture in a docker container with a respective container image of the automated machine learning framework.

[0042]一実施形態では、ドッカーコンテナは、より大きいドッカーコンテナ内に収容され、より大きいドッカーコンテナは、腕の実行を制御するコンポーネント用の別個のドッカーコンテナを収容する。 [0042] In one embodiment, the docker container is housed within a larger docker container, which houses a separate docker container for the components that control the execution of the arm.

[0043]一実施形態では、方法は、時間予算内の実行の時間間隔中に腕のそれぞれの学習曲線を構成するステップと、時間予算の残りまで腕のそれぞれのパフォーマンススコアを外挿するステップと、外挿されたパフォーマンススコアに基づいて、腕の少なくともいくつかの実行を凍結又は不能にするステップとをさらに含む。 [0043] In one embodiment, the method comprises the steps of constructing a learning curve for each of the arms during the time interval of execution within the time budget, and extrapolating the performance score for each of the arms to the remainder of the time budget. and freezing or disabling execution of at least some of the arms based on the extrapolated performance score.

[0044]一実施形態では、学習曲線は、時間間隔中に腕のそれぞれによって達成された最大パフォーマンススコアに基づいて構成される。 [0044] In one embodiment, a learning curve is constructed based on the maximum performance score achieved by each of the arms during the time interval.

[0045]別の実施形態では、本発明は、１つ又は複数のプロセッサを備えるクラウドコンピューティングシステムアーキテクチャのドッカーコンテナにカプセル化されたマイクロサービスコンポーネントを提供し、１つ又は複数のプロセッサは、単独又は組み合わせで、使用可能な計算リソース及び時間予算を考慮してそれぞれ別個の腕として、機械学習タスクに対する異なる自動機械学習フレームワークの複数のインスタンス化の実行を制御し、それによって、別個の腕による実行中に複数の機械学習モデルが訓練され、訓練された複数のモデルのパフォーマンススコアが計算される、制御するステップを含む方法の実行をもたらすように構成される。 [0045] In another embodiment, the present invention provides a microservices component encapsulated in a docker container of a cloud computing system architecture comprising one or more processors, the one or more processors independently or in combination, controlling the execution of multiple instantiations of different automated machine learning frameworks for a machine learning task, each as a separate arm given available computational resources and time budgets, thereby allowing separate arms to A plurality of machine learning models are trained during execution, and a performance score of the plurality of trained models is calculated to effect execution of the method including a controlling step.

[0046]さらなる一実施形態では、本発明は、命令を有する有形の非一時的／持続的（non-transitory、ノントランジトリ）なコンピュータ可読媒体を提供し、命令は、１つ又は複数のプロセッサによって実行されると、単独又は組み合わせで、本発明の一実施形態による方法の実行をもたらす。 [0046] In a further embodiment, the invention provides a tangible, non-transitory, computer-readable medium having instructions, the instructions being executed by one or more processors. results in, alone or in combination, execution of a method according to an embodiment of the present invention.

[0047]本発明の諸実施形態は、本明細書でＨＡＭＬＥＴと呼ばれるメタＡｕｔｏＭＬシステム及び方法を提示する。ＨＡＭＬＥＴは、既存のＡｕｔｏＭＬフレームワークを活用して（ＡｕｔｏＭＬフレームワークは、既存のＭＬライブラリを活用して）、ユーザのタスクを解決する。ＨＡＭＬＥＴは、異なるユーザのタスクの並列実行をサポートし、異なるセッティングの配置をサポートする。ＨＡＭＬＥＴにはさらに、専門家でないユーザによる利用をサポートするための使いやすいＡＰＩが付随する。さらに、ＨＡＭＬＥＴは、特別な多腕バンディットアルゴリズムを使用することによって、所与の時間予算制限（及びハードウェア制限）を管理することができる。 [0047] Embodiments of the present invention present a meta-AutoML system and method, referred to herein as HAMLET. HAMLET leverages the existing AutoML framework (the AutoML framework leverages existing ML libraries) to solve the user's task. HAMLET supports parallel execution of tasks for different users and supports disposition of different settings. HAMLET also comes with an easy-to-use API to support usage by non-expert users. Additionally, HAMLET can manage given time-budget constraints (and hardware constraints) by using a special multi-armed bandit algorithm.

[0048]ＨＡＭＬＥＴは、所与のＭＬタスクに対して実現可能な最良のモデルを見つけるために、以下の特性を持つＡｕｔｏＭＬプラットフォームとして得ることができる。すなわち、ＨＡＭＬＥＴは、
・・異なるタイプのＭＬタスクに対して、アルゴリズムセレクションと非常に広範囲のアルゴリズムのハイパーパラメータチューニングとを自動化する（異なるフレームワークの統合によって）、
・・多数の（専門家でない場合がある）ユーザが、異なるハードウェアセッティングによって使用できる、
・・時間予算管理を含む、
・・使いやすいＡＰＩによってＭＬにアクセスしやすくする。 [0048] HAMLET can be taken as an AutoML platform with the following properties to find the best feasible model for a given ML task. That is, HAMLET is
Automate algorithm selection and hyperparameter tuning of a very wide range of algorithms (by integrating different frameworks) for different types of ML tasks,
can be used by multiple (possibly non-expert) users with different hardware settings;
・・・Including time budget management,
- Making ML more accessible with an easy-to-use API.

[0049]以下の表１は、本明細書でＨＡＭＬＥＴを記述するために使用される用語及び定義を示す。

[0049] Table 1 below shows the terms and definitions used to describe HAMLET herein.

[0050]図１は、ＨＡＭＬＥＴシステムアーキテクチャを描写する。有益な一実施形態では、インターフェースのＨＡＭＬＥＴユーザ―ディスパッチャ、ディスパッチャ―ＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔ、及びＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔ―ＨＡＭＬＥＴ腕が、たとえばパイソンのフラスコなどの標準ウェブフレームワークでホストされるＨＴＴＰインターフェースとして実現される。描写されたデータ記憶コンポーネント（データベーステーブル又は別個のデータベース）とのインターフェースは標準手法に基づいており、たとえば、構造化照会言語（ＳＱＬ）に基づいている。データ記憶装置を併合若しくは分離すること、又は描写されたデータベースをｈａｄｏｏｐ分散ファイルシステム（ＨＤＦＳ）などのファイルシステムベースの手法に置き換えることが可能である。これらの実現選択は、インターフェースＣ１～Ｃ５、Ｅ１～Ｅ５及びＦ１～Ｆ３の技法的実現に影響を及ぼす。 [0050] FIG. 1 depicts the HAMLET system architecture. In one useful embodiment, the HAMLET User-Dispatcher, Dispatcher-HAMLET MasterBandit, and HAMLET MasterBandit-HAMLET arms of the interface are implemented as HTTP interfaces hosted in a standard web framework, such as Python's Flask. The interfaces to the depicted data storage components (database tables or separate databases) are based on standard techniques, for example Structured Query Language (SQL). It is possible to merge or separate data storage, or replace the depicted database with a file system based approach such as hadoop Distributed File System (HDFS). These implementation choices influence the technical implementation of the interfaces C1-C5, E1-E5 and F1-F3.

[0051]ディスパッチャコンポーネントは、ＨＡＭＬＥＴユーザがＨＡＭＬＥＴサービスを要求するための接点である。したがって、ディスパッチャのインターフェースＡは、ユーザが、
・・データセットをアップロードし、データセット記述を定義する、
・・機械学習タスク記述（たとえば、パフォーマンス関数、予算、場合によって、どのＡｕｔｏＭＬフレームワーク及び構成を使用すべきか）を提示する、
・・構成されたタスクの開始を要求する、
・・指定された機械学習タスクの進捗についての情報、たとえば、パフォーマンススコア及び経過予算を受け取る、
・・任意選択で、訓練が終了したときに通知を受け取る、
・・どれが最良動作モデル（複数可）又はアンサンブル（複数可）であったかの標示を受け取る、
・・任意選択で、たとえば、塩漬けパイソンオブジェクトなどの標準化フォーマットの、たとえば逐次バイナリオブジェクトのような訓練されたモデルを受け取る、並びに
・・ＨＡＭＬＥＴデータベースの中のモデルの固有の識別子などの、ＨＡＭＬＥＴの中のモデルの参照記号を受け取る
ために使用することができる。 [0051] The dispatcher component is the point of contact for HAMLET users to request HAMLET services. Therefore, the dispatcher's interface A allows the user to
. . . upload datasets and define dataset descriptions;
Presenting a machine learning task description (e.g. performance function, budget, possibly which AutoML framework and configuration to use);
requesting the start of a configured task,
Receive information about the progress of specified machine learning tasks, e.g., performance scores and progress budgets;
optionally receive a notification when the training is over;
Receive an indication of which was the best performing model(s) or ensemble(s);
.optionally receives the trained model, e.g., a serial binary object, in a standardized format, e.g., a salted python object, and .in HAMLET, such as a unique identifier for the model in the HAMLET database. can be used to receive the model reference of the

[0052]ディスパッチャは、タスクをＭａｓｔｅｒＢａｎｄｉｔによって登録し、このＭａｓｔｅｒＢａｎｄｉｔを呼び出して訓練を開始する。そのために、ディスパッチャは、関連する構成パラメータ（たとえば、データセット参照タスク記述、腕構成）をインターフェースＢ経由で通過させる。 [0052] The dispatcher registers a task with a MasterBandit and calls this MasterBandit to start training. To do so, the dispatcher passes the relevant configuration parameters (eg dataset reference task description, arm configuration) via interface B.

[0053]ＨＡＭＬＥＴ腕は、ハイパーパラメータチューニング及びアルゴリズムセレクションのためのＡｕｔｏＭＬフレームワーク／アルゴリズムの具体化であり、これは、たとえば、特定のフレームワークの、又は特定のユーザ定義構成を持つフレームワークの具体化とすることができる。通常、多数のＨＡＭＬＥＴ腕が存在する。ＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔは、異なる決定ルール（異なる実施形態）に基づいてＨＡＭＬＥＴ腕の実行を制御する。ＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔはさらに、タスクに残された予算を管理し、（到達すべきスコアに関する）要求が満たされているかどうかを検査する。インターフェースＤは、ＭａｓｔｅｒＢａｎｄｉｔと腕の間の対話を伝達し、中でもとりわけ、
・・パフォーマンススコア及び訓練されたモデルを報告すること、
・・腕（及びその中のＡｕｔｏＭＬ過程）の実行を開始、停止、凍結、及び継続すること、並びに
・・構成情報を共有すること
を伝達する。 [0053] A HAMLET arm is an instantiation of an AutoML framework/algorithm for hyperparameter tuning and algorithm selection, which is, for example, an instantiation of a specific framework or of a framework with a specific user-defined configuration. can be made into There are usually multiple HAMLET arms. The HAMLET MasterBandit controls execution of HAMLET arms based on different decision rules (different embodiments). The HAMLET MasterBandit also manages the budget left for the task and checks whether the requirements (in terms of score to be reached) are met. Interface D conveys interaction between the MasterBandit and the arm, among other things:
. reporting performance scores and trained models;
Communicate starting, stopping, freezing, and continuing execution of the arm (and the AutoML process therein), and sharing configuration information.

[0054]以下には、コンポーネントが詳述される。
ディスパッチャコンポーネント [0054] The components are detailed below.
dispatcher component

[0055]ディスパッチャは、データセット、データセットについての記述、損失関数、停止基準、及びタスクが関連する機械学習問題のタイプ（回帰、分類又はクラスタ化）を参照する機械学習タスクを受け取る。ユーザはさらに、どの変数が入力され、データセットのどれが出力変数であるかをディスパッチャに指定する。任意選択で、ＨＡＭＬＥＴは、当業者には明らかな最良の実践からのプログラム発見的教授法を適用することによって、どのタイプのデータフォーマットが各列に含まれているかを自動的に推し測ることができる。 [0055] A dispatcher receives a machine learning task that references a dataset, a description of the dataset, a loss function, stopping criteria, and the type of machine learning problem (regression, classification, or clustering) to which the task relates. The user also specifies to the dispatcher which variables are input and which of the data sets are output variables. Optionally, HAMLET can automatically infer what type of data format is contained in each column by applying programmatic heuristics from best practices apparent to those skilled in the art. can.

[0056]１つの実施形態では、機械学習のタイプはデータセットから直接推論することができる。データセット記述に標示されたすべての目標変数が範疇型である場合、機械学習のタイプは分類問題と考えられる。他方で、目標変数が浮動小数点数を含む場合、機械学習のタイプは回帰問題である。どんな目標変数もデータセットに含まれていない場合、機械学習はクラスタ化のタイプである。 [0056] In one embodiment, the type of machine learning can be inferred directly from the dataset. A type of machine learning is considered a classification problem if all target variables labeled in the dataset description are categorical. On the other hand, if the target variable contains floating point numbers, the type of machine learning is a regression problem. Machine learning is a type of clustering when no target variable is included in the dataset.

[0057]データセットを受け取ると、ＨＡＭＬＥＴディスパッチャは、データセットをデータセット記憶装置（たとえば、データベーステーブル）に記憶し、ユーザに返される固有データセット識別子をデータセットに割り当て、また、データセットをユーザの識別子と関連付ける。データセット記述を受け取ると、ＨＡＭＬＥＴディスパッチャは、データセット記述を記述記憶装置（たとえば、データベーステーブル）に記憶し、固有データセット記述識別子をデータセット記述に割り当て、データ記述をデータセット識別子と関連付ける。タスク仕様を受け取ると、ＨＡＭＬＥＴディスパッチャは、タスク仕様をタスク記憶装置（たとえば、データベーステーブル）に記憶し、固有タスク識別子をタスク仕様に割り当てる。 [0057] Upon receiving a dataset, the HAMLET dispatcher stores the dataset in a dataset storage device (eg, a database table), assigns the dataset a unique dataset identifier that is returned to the user, and distributes the dataset to the user. associated with the identifier of Upon receiving the dataset description, the HAMLET dispatcher stores the dataset description in a description store (eg, database table), assigns a unique dataset description identifier to the dataset description, and associates the data description with the dataset identifier. Upon receiving the task specification, the HAMLET dispatcher stores the task specification in a task store (eg, database table) and assigns a unique task identifier to the task specification.

[0058]異なる諸実施形態では、以下の特徴が得られる。
・・特定の一実施形態では、ユーザがタスク（データセットをアップロードする）を指定するための、及びデータセットの記述を指定するためのインターフェースＡは、表現状態転送（ＲＥＳＴ）ベースのインターフェースであり、拡張マークアップ言語（ＸＭＬ）又はＪａｖａＳｃｒｉｐｔ（登録商標）オブジェクト表記法（ＪＳＯＮ）などの標準フォーマットのタスク記述及びデータセット記述を伝達し、また、標準ファイルアップロード機構によるデータセットアップロードを可能にする。
・・特定の関連する一実施形態では、データセットは、カンマ区切り（ｃｓｖ）ファイルフォーマットで記憶されたテーブルデータセットである。データセットは、インターネットサービスから知られている標準ファイルアップロード機構によって、インターフェースＡを介してアップロードされる。
・・有益な一実施形態では、ＨＡＭＬＥＴはユーザに、タスクを解決するために考慮すべきサポートされる腕のうちのどれを指定するかという提案をする。その上、ユーザは、腕の中のコンポーネントの具体的なパラメータ化（たとえば、アルゴリズムの実行、又は特定のハイパーパラメータ範囲）をＨＡＭＬＥＴに指定することもできる。
・・有益な一実施形態では、新たなユーザタスクが提示されたときに、ＨＡＭＬＥＴは、タスクに合わない腕を不能にすると決めること、又は特に見込みのある腕を推奨することができる。ＨＡＭＬＥＴは、タスクのデータセット、たとえば、サイズ、データセットに存在するデータ型、欠けている特徴があるかどうか等、を記述する機能を使用する。また、機械学習タスクのタイプ（たとえば、回帰対クラスタ化）は、どのアームを適用すべきかについての決定を考察するのに意味がある。提案するのに意味のある腕を特定するために、「メタ学習」の従来技術水準から知られている手法を適用することができる。
・・有益な一実施形態では、ユーザは、ＨＡＭＬＥＴを使用できるようになる前に確認証明書を用意しなければならない。
配置 [0058] Different embodiments provide the following features.
In one particular embodiment, interface A for users to specify tasks (upload datasets) and for specifying dataset descriptions is a representational state transfer (REST) based interface. , conveys task descriptions and dataset descriptions in a standard format such as Extensible Markup Language (XML) or JavaScript Object Notation (JSON), and also allows dataset uploads through standard file upload mechanisms.
• In one particular related embodiment, the dataset is a tabular dataset stored in a comma separated value (csv) file format. Datasets are uploaded via interface A by standard file upload mechanisms known from Internet services.
... In one useful embodiment, HAMLET offers the user suggestions on which of the supported arms to consider in order to solve the task. Moreover, the user can also specify to HAMLET specific parameterizations of the components in the arm (eg, running algorithms, or specific hyperparameter ranges).
In one useful embodiment, when presented with a new user task, HAMLET can decide to disable arms that do not fit the task, or recommend arms that are particularly promising. HAMLET uses the ability to describe a task's dataset, eg, size, data types present in the dataset, whether there are missing features, and so on. Also, the type of machine learning task (eg, regression vs. clustering) is meaningful in considering decisions about which arm to apply. In order to identify meaningful arms to suggest, techniques known from the state of the art of "meta-learning" can be applied.
• In one useful embodiment, the user must provide authentication credentials before being able to use HAMLET.
Arrangement

[0059]以下では、ＨＡＭＬＥＴ設計が、単一のコンピュータから巨大数の並列ユーザ及びタスクを持つ分散クラウドセッティングへのスケール変更をどのようにして可能にするかについて記述する。 [0059] In the following, we describe how the HAMLET design enables scaling from a single computer to a distributed cloud setting with huge numbers of parallel users and tasks.

[0060]タスク解決にはかなりの時間を要することがあるので、また、多くの並列タスクが異なるユーザによって要求されることがあるので、システム容量をスケール変更できることが必要である。ＨＡＭＬＥＴが、クラウドサービスセッティングでウェブサービスをスケールアップするために、標準機構と互換性があるように設計され、また、多くのコンカレントユーザ要求をサポートするために採用できることは有益である。 [0060] Because task resolution can take a significant amount of time, and because many parallel tasks can be requested by different users, it is necessary to be able to scale system capacity. HAMLET is designed to be compatible with standard mechanisms for scaling up web services in cloud service settings, and is beneficially adaptable to support many concurrent user demands.

[0061]ＨＡＭＬＥＴの好適な一実施形態では、マイクロサービスコンポーネントとしてのＭａｓｔｅｒＢａｎｄｉｔをたとえばドッカー画像にカプセル化する。この好適な実施形態では、クラウドオーケストレータコンポーネントが、要求をディスパッチャからＭａｓｔｅｒＢａｎｄｉｔまで経路指定する。たとえば、Ｋｕｂｅｒｎｅｔｅｓを使用して、クラウドシステムリソース全体と、リソースの中で、ＭａｓｔｅｒＢａｎｄｉｔコンテナの多くのインスタンスとを管理することができる。同様に、有益な一実施形態では、ＨＡＭＬＥＴ腕コンポーネントをドッカーコンテナとしてのマイクロサービスとして実現する。各腕は別個のコンテナである。このようにして、ＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔマイクロサービスは、２つの選択肢のうちの１つで実現することができる。
１．マイクロサービス自体のコンテナ環境がドッカーサーバインスタンスを提供する場合、ドッカーサーバインスタンスは、マイクロサービス自体のコンテナの内部で腕を走らせることができる。この選択肢では、ＭａｓｔｅｒＢａｎｄｉｔの主ループ（以下参照）と異なる腕のチューニング計算（以下参照）との間でＭａｓｔｅｒＢａｎｄｉｔコンテナに割り当てられた仮想化クラウドリソースを共有する、又は
２．別法として、ＨＡＭＬＥＴが配置されているクラウド環境がドッカーサービスを提供する場合、ＭａｓｔｅｒＢａｎｄｉｔは、その主ループの必要に応じて、腕コンテナの開始、凍結、停止及び終止を要求することができる。このようにして、腕の実行を制御するためのインターフェースＤのコマンドセットは、ドッカーコンテナ実行制御コマンドに置き換えることができる。したがって、インターフェースＤは簡単になる。 [0061] In one preferred embodiment of HAMLET, MasterBandit as a microservice component is encapsulated in, for example, a docker image. In this preferred embodiment, the cloud orchestrator component routes requests from the dispatcher to the MasterBandit. For example, Kubernetes can be used to manage the entire cloud system resources and among the resources many instances of the MasterBandit container. Similarly, one beneficial embodiment implements the HAMLET arm component as a microservice as a docker container. Each arm is a separate container. Thus, the HAMLET MasterBandit microservice can be implemented in one of two options.
1. If the microservice's own container environment provides a docker server instance, the docker server instance can run its arms inside the microservice's own container. In this option, share the virtualized cloud resources allocated to the MasterBandit container between the MasterBandit's main loop (see below) and different arm tuning calculations (see below), or2. Alternatively, if the cloud environment in which HAMLET is located provides docker services, MasterBandit can request start, freeze, stop and terminate arm containers as needed for its main loop. In this way, interface D's command set for controlling arm execution can be replaced with docker container execution control commands. Interface D is thus simplified.

[0062]図２は、ＨＡＭＬＥＴを配置するための２つの一般的な選択肢を例として描写する。文字１及びＮは、コンポーネントの間の関係の通常の基数を表示する。すなわち、単一のディスパッチャＤに関連付けられた多数のＭａｓｔｅｒＢａｎｄｉｔＭＢと、単一のＭａｓｔｅｒＢａｎｄｉｔＭＢに関連付けられた多数の腕Ａ（タスク構成に応じて）とがある。コンポーネント間の要求が、Ｋｕｂｅｒｎｅｔｅｓなどのドッカーオーケストレーションコンポーネントを介して走ることができ、又は、ドッカーサービスコンポーネントによって提供されたインスタンス化ドッカーコンテナの直接参照を使用することができる。１つのシステムが、たとえばロードバランシングの目的で、Ｋｕｂｅｒｎｅｔｅｓオーケストレータによるユーザ要求に対しインスタンス化された多数のディスパッチャを、従来技術で定義された標準機構を活用してホストすることができる。
・・選択肢１では、マイクロサービスコンポーネントのそれぞれが個々のドッカーコンテナにカプセル化されるが、全ドッカーコンテナは、より大きい１つの「外部」ドッカーコンテナの中に収容される。こうすることで、単一のＰＣ又は小型サーバが適切な配置になり得る。
・・選択肢２では、ディスパッチャ、ＭａｓｔｅｒＢａｎｄｉｔ、及び腕コンポーネントは、個々のドッカーコンテナに収容される。 [0062] FIG. 2 depicts, by way of example, two general options for placing the HAMLET. The letters 1 and N denote the normal cardinality of the relationship between the components. That is, there are multiple MasterBandit MBs associated with a single Dispatcher D, and multiple Arms A (depending on the task configuration) associated with the single MasterBandit MB. Requests between components can run through a docker orchestration component such as Kubernetes, or can use direct references to instantiated docker containers provided by docker service components. A single system can host a number of dispatchers instantiated for user requests by the Kubernetes orchestrator, for example for load balancing purposes, leveraging standard mechanisms defined in the prior art.
... In option 1, each of the microservice components is encapsulated in an individual docker container, but all docker containers are contained within one larger "external" docker container. In doing so, a single PC or small server may be a suitable deployment.
• In option 2, the dispatcher, MasterBandit and arm components are housed in individual docker containers.

[0063]別の実施形態では、全マイクロサービスコンポーネント（ディスパッチャ、ＭａｓｔｅｒＢａｎｄｉｔ、腕）が、たとえば仮想化技法としてのドッカーに実行ランタイム制御の関連する標準機構と共に依拠しないで、オペレーティングシステム（ＯＳ）プロセスに、又はコンピューティングスレッドにさえもカプセル化され得る。この実施形態では、腕実行の凍結は、上述のようにインターフェースＤを介して実現することができ、又は別法として、標準オペレーティングシステムプロセス制御コマンドが、腕の実行を凍結及び／又は再開するために使用される。 [0063] In another embodiment, all microservice components (Dispatcher, MasterBandit, arms) are implemented in an operating system (OS) process, rather than relying, for example, on Docker as a virtualization technique, with associated standard mechanisms for execution runtime control. , or even a computing thread. In this embodiment, freezing arm execution may be accomplished via interface D as described above, or alternatively, standard operating system process control commands may be used to freeze and/or resume arm execution. used for

[0064]特に好適な一実施形態は、両方のカプセル化の組み合わせに依拠する。すなわち、ディスパッチャマイクロサービスコンポーネントは、ドッカー画像としてカプセル化される。別のドッカー画像は、同じドッカーコンテナ内部の処理として腕を実現するために、ＭａｓｔｅｒＢａｎｄｉｔと必要なフレームワークライブラリとを束ねる。このようにして、インターフェースＤにおける腕の実行制御コマンド（開始、停止、凍結及び再開）が、インターフェースＤの実装を簡単にする標準ＯＳプロセス制御機構によって実現される。好適な配置では、ディスパッチャは次に、たとえばＫｕｂｅｒｎｅｔｅｓなどのドッカーオーケストレータを介するユーザ要求に応じて、ＭａｓｔｅｒＢａｎｄｉｔの実行を要求することができる。 [0064] One particularly preferred embodiment relies on a combination of both encapsulations. That is, the dispatcher microservice component is encapsulated as a docker image. Another docker image bundles MasterBandit and the necessary framework libraries to implement the arm as a process inside the same docker container. In this way, arm execution control commands (start, stop, freeze and resume) in interface D are implemented by standard OS process control mechanisms which simplify interface D implementation. In a preferred arrangement, the dispatcher can then request the execution of MasterBandit in response to a user request via a docker orchestrator such as Kubernetes, for example.

[0065]好適な配置では、マイクロサービスコンポーネントディスパッチャは、（ユーザの要求に応じて）ドッカーサーバからのＭａｓｔｅｒＢａｎｄｉｔコンテナのインスタンス化（及び管理）、又はＫｕｂｅｒｎｅｔｅｓなどのオーケストレータを要求することができる。 [0065] In a preferred arrangement, the microservice component dispatcher can request the instantiation (and management) of a MasterBandit container from a docker server (on user demand) or an orchestrator such as Kubernetes.

[0066]好適な配置では、マイクロサービスコンポーネントＭａｓｔｅｒＢａｎｄｉｔは、ユーザのＨＡＭＬＥＴタスクを解決するために、ドッカーサーバからの適切な数の腕コンテナのインスタンス化（及び管理）を要求することができる。 [0066] In a preferred arrangement, the microservice component MasterBandit can request the instantiation (and management) of the appropriate number of arm containers from the docker server in order to solve the user's HAMLET task.

[0067]一実施形態では、異なる腕コンテナ画像が異なるフレームワークに対して存在し、ＨＡＭＬＥＴタスク仕様に基づいてインスタンス化される。この実施形態では、ＭａｓｔｅｒＢａｎｄｉｔは、ドッカーサーバに対するその要求において正確な腕コンテナ型を標示しなければならない。ＭａｓｔｅｒＢａｎｄｉｔはさらに、ＨＡＭＬＥＴタスクに関係するフレームワーク固有の構成情報（たとえば、使用するハイパーパラメータ範囲又は考慮するアルゴリズム）を通過させることもできる。 [0067] In one embodiment, different arm container images exist for different frameworks and are instantiated based on the HAMLET task specification. In this embodiment, MasterBandit must indicate the correct arm container type in its requests to the docker server. MasterBandit can also pass framework-specific configuration information related to HAMLET tasks (eg, hyperparameter ranges to use or algorithms to consider).

[0068]別の実施形態では、汎用腕コンテナが、ＨＡＭＬＥＴにサポートされた、かつドッカーサーバで開始するために使用可能な、すべての可能なフレームワークによって構成される。このバリアントでは、ＭａｓｔｅｒＢａｎｄｉｔはただ単に、ドッカーサーバの所望の数の腕コンテナの開始を要求し、コンテナに、腕の挙動を個別化するために、たとえば、ａｕｔｏ－ｓｋｌｅａｒｎ腕として挙動するために必要な構成を、その腕に対するＨＡＭＬＥＴタスク（たとえば、使用するハイパーパラメータ範囲又は考慮するアルゴリズム）に関係するフレームワーク固有の構成情報に加えて、コンテナに渡す。
ＭａｓｔｅｒＢａｎｄｉｔ及び腕コンポーネント [0068] In another embodiment, a generic arm container is constructed with all possible frameworks supported by HAMLET and usable to start with dockerserver. In this variant, MasterBandit simply asks the docker server to start the desired number of arm containers and tells the containers the necessary Configuration is passed to the container along with framework-specific configuration information related to the HAMLET task for that arm (eg, hyperparameter ranges to use or algorithms to consider).
MasterBandit and arm component

[0069]ＡｕｔｏＭＬ用の異なる既存のフレームワーク（たとえば、ａｕｔｏ－ｓｋｌｅａｒｎ、若しくはＰＭＦ、ＮＡＳ（たとえば、ａｕｔｏ－ｋｅｒａｓ）、又は専用機械学習チューニングアルゴリズム（たとえば、ＭｉｓｃｈａＳｃｈｍｉｄｔ他の「ＯｎｔｈｅＰｅｒｆｏｒｍａｎｃｅｏｆＤｉｆｆｅｒｅｎｔｉａｌＥｖｏｌｕｔｉｏｎｆｏｒＨｙｐｅｒｐａｒａｍｅｔｅｒＴｕｎｉｎｇ」、ａｒＸｉｖ；１９０４．０６９６０ｖ１（２０１９年４月１５日）に記述された微分展開ベースの手法、又はハイパーパーティションをチューニングするためのベイズ最適化手法）が、選択されたものとして、又はＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔに入る腕として統合される。ＭａｓｔｅｒＢａｎｄｉｔは、特定のユーザタスクを解決するために、これらの選択されたもの又は腕から選択することができる。異なる選択されたものは、任意の所与のユーザタスクに適用可能であることもないこともある。ＨＡＭＬＥＴ腕はさらに、ＧｏｏｇｌｅＣｌｏｕｄＡｕｔｏＭＬなどの遠隔ＡｕｔｏＭＬフレームワークを、遠隔フレームワークのクライアントライブラリを介して（用意されていれば）、チューナとして統合することもできる。遠隔フレームワークの機能性によっては、以下の実行の実施形態のいくつかは実施可能ではなく、たとえば、ＧＯＯＧＬＥＡｕｔｏＭＬフレームワークは、訓練の実行の凍結をサポートしない。 [0069] Different existing frameworks for AutoML (e.g., auto-sklearn, or PMF, NAS (e.g., auto-keras), or dedicated machine learning tuning algorithms (e.g., Mischa Schmidt et al., "On the Performance of Differential Evolution for Hyperparameter Tuning”, arXiv; Integrated as arms that enter the HAMLET MasterBandit.The MasterBandit can choose from these selections or arms to solve specific user tasks.A different selection can be used for any given It may or may not be applicable to the user task.The HAMLET arm also integrates remote AutoML frameworks, such as Google Cloud AutoML, as tuners via the remote framework's client library (if available). Depending on the functionality of the remote framework, some of the execution embodiments below may not be feasible, for example, the GOOGLE AutoML framework does not support freezing execution of training.

[0070]タスクごとに、単一のＭａｓｔｅｒＢａｎｄｉｔコンポーネントが１つ又は複数の腕と相互作用して、それぞれが、上記のＡｕｔｏＭＬ及びＮＡＳ、並びにＧＯＯＧＬＥＡｕｔｏＭＬなどのＣｌｏｕｄＡＭＬａａＳサービスのライブラリから、そのクライアントライブラリ、又はこれらの外部ライブラリの特定のカスタム化バージョンを統合することによって、抽出される。走る間、腕は、ライブラリをユーザタスクに対し実行する。実行中、これらのライブラリは、多くの機械学習モデルを訓練し、関連するスコアを記録する。訓練されたモデルのパフォーマンススコア及びモデル自体が、図１のデータベースに記憶される。 [0070] For each task, a single MasterBandit component interacts with one or more arms, each from the AutoML and NAS mentioned above, and from the libraries of Cloud AML aaS services such as GOOGLE AutoML, its client libraries, or extracted by integrating specific customized versions of these external libraries. While running, the arm executes the library to user tasks. During execution, these libraries train a number of machine learning models and record associated scores. The performance scores of the trained models and the models themselves are stored in the database of FIG.

[0071]腕の実行中、新たに見つかったモデルが、到達したスコア及び訓練時間（これらのモデルを見つけるために必要とされた時間）と共に、見つかり次第継続して報告され記憶される。ＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔは、それぞれの繰り返しにおいて腕ごとに、スコア及び訓練時間を要求することができる。 [0071] During arm execution, newly found models are continuously reported and stored as they are found, along with the score reached and the training time (the time required to find these models). The HAMLET MasterBandit can request scores and training times for each arm in each repetition.

[0072]指定されたタスクを解決することの実行を制御する論理は、図１のＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔコンポーネントに常駐する。ＭａｓｔｅｒＢａｎｄｉｔは、腕に使用されるべきリソースについて、すなわち、どの時間間隔でどの腕を走らせるかについて決定する。ＭａｓｔｅｒＢａｎｄｉｔは、腕の走行を休止／再開することができる。 [0072] The logic that controls the execution of solving the specified tasks resides in the HAMLET MasterBandit component of FIG. The MasterBandit decides on the resources to be used for the arms, ie which arms to run at which time intervals. MasterBandit can pause/resume arm running.

[0073]有益な一実施形態では、ＭａｓｔｅｒＢａｎｄｉｔは、構成又はユーザ指定された時間予算に対して、腕を並列に実行する。このセッティングは、ユーザのタスクを解決するために巨大な計算リソースが使用可能である場合に有益である。このセッティングでは、ＭａｓｔｅｒＢａｎｄｉｔは、実行のために異なる腕の中から選択する必要がない。この実施形態では、ＭａｓｔｅｒＢａｎｄｉｔは、主ループ内で、又は指定された時間間隔で、タスクの関連する停止基準が満たされるまでの時間、実行（すべての腕からのモデルの訓練）を監視することができる。有益な一実施形態では、ＭａｓｔｅｒＢａｎｄｉｔは、ユーザに診断情報（たとえば、異なる腕の予測パフォーマンス）を提示する目的で、以下で記述される学習曲線外挿（ＬＣＥ）を適用することができる。 [0073] In one useful embodiment, MasterBandit runs arms in parallel for a configured or user-specified time budget. This setting is useful when huge computational resources are available to solve the user's task. With this setting, the MasterBandit does not have to choose between different arms for execution. In this embodiment, MasterBandit can monitor the execution (training of models from all arms) in the main loop or at specified time intervals until the relevant stopping criteria of the task are met. can. In one beneficial embodiment, MasterBandit can apply learning curve extrapolation (LCE), described below, for the purpose of presenting diagnostic information (eg, predictive performance of different arms) to the user.

[0074]別の実施形態では、ＭａｓｔｅｒＢａｎｄｉｔコンポーネントは、適切な選択機構によって時間の計算リソースを使用して、異なる腕を多重化する。このために、ＭａｓｔｅｒＢａｎｄｉｔは、ＨＡＭＬＥＴ腕が走ることができる順番を決定する。このセッティングで、ＭａｓｔｅｒＢａｎｄｉｔがその決定のパフォーマンスを向上させるために機械学習を適用するならば、有益である。そうするために、ＭａｓｔｅｒＢａｎｄｉｔは主ループ内で、タスクの関連する停止基準が満たされるまでの時間、実行される。一変形形態では、ＭａｓｔｅｒＢａｎｄｉｔは、腕のサブセットが並列に実行されることを可能にする。 [0074] In another embodiment, the MasterBandit component multiplexes different arms using computational resources of time through a suitable selection mechanism. For this, the MasterBandit determines the order in which the HAMLET arms can run. In this setting, it would be beneficial if MasterBandit applied machine learning to improve its decision performance. To do so, MasterBandit runs in a main loop for a period of time until the task's associated stopping criteria are met. In one variation, MasterBandit allows subsets of arms to be run in parallel.

[0075]ＬＣＥと表示される特に有益な一実施形態では、ＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔは、多腕バンディットアルゴリズムの新規で発明的な改善されたものを使用し、このアルゴリズムでは、ユーザのタスクを解決中にすべての腕に対して、タスクの残っている時間予算の最後に期待パフォーマンスを外挿する。この外挿は、タスクの中での腕の個々の実行時間にわたって達成された腕の最良パフォーマンスに基づいている。ＨＡＭＬＥＴは、最も高い外挿パフォーマンスを持つものを選択する。この実施形態では、ある期間にわたって達成された異なる腕の最高パフォーマンスだけが外挿において考慮される。 [0075] In one particularly beneficial embodiment, denoted LCE, the HAMLET MasterBandit uses a novel and inventive improvement of the multi-armed bandit algorithm, in which all extrapolate the expected performance to the end of the task's remaining time budget for the arm of . This extrapolation is based on the arm's best performance achieved over the arm's individual execution time in the task. HAMLET chooses the one with the highest extrapolation performance. In this embodiment, only the best performance of different arms achieved over a period of time is considered in the extrapolation.

[0076]ＬＣＥ実施形態の一バリアントでは（以下のアルゴリズム１参照）、ＨＡＭＬＥＴが１つの腕を実行することを決定すると、ＨＡＭＬＥＴは時間間隔を割り当て（この間隔は構成パラメータとすることができる）、その時間間隔で腕を実行し、たとえば、腕マイクロサービス（たとえばインターフェースＤを介する）、腕コンテナをカプセル化する仮想化環境、又はたとえばプロセス制御によって得られる、配置に関係する機構を介して、腕の実行を凍結する。ＭａｓｔｅｒＢａｎｄｉｔは、訓練されたモデルのパフォーマンススコアを検査し、学習アルゴリズムを更新して（たとえば、多腕バンディットの統計情報及び関連する外挿曲線）、次ループ繰り返しのセレクションステップを知らせる。後の繰り返しで同じ腕が再び選ばれた場合、腕の実行は再開し、前に腕が凍結されたところですぐに開始することができ、したがって計算時間が失われない。この手法は特に有益である。その理由は、ＭａｓｔｅｒＢａｎｄｉｔが腕実行を柔軟に割り当て及び再割り当てすることができる一方で、１つの腕がそのパフォーマンススコアをまだ増加させていないが、腕を実行している間に時間が進むと、対応する学習曲線が更新されて、コンピュータリソースが再配分されるべきかどうかを考察できるからである。 [0076] In one variant of the LCE embodiment (see Algorithm 1 below), when HAMLET decides to run one arm, HAMLET assigns a time interval (this interval can be a configuration parameter), Execution of the arm in that time interval, e.g. via an arm microservice (e.g. via interface D), a virtualized environment encapsulating the arm container, or via a configuration-related mechanism, e.g. obtained by process control Freeze the execution of MasterBandit checks the performance scores of the trained models, updates the learning algorithms (eg, multi-armed bandit statistics and associated extrapolation curves), and informs the selection step for the next loop iteration. If the same arm is chosen again in a later iteration, arm execution resumes and can start immediately where the previous arm was frozen, thus no computation time is lost. This approach is particularly useful. The reason is that while MasterBandit allows flexibility in assigning and reassigning arm runs, one arm has not yet increased its performance score, but as time progresses while running the arm, This is because the corresponding learning curve can be updated to consider whether computer resources should be reallocated.

[0077]特定的な一実施形態では、よく知られているＵＣＢ１アルゴリズムなどの多腕バンディットアルゴリズム（たとえば、ＭｉｃａｈＪ．Ｓｍｉｔｈ他の「ＴｈｅＭａｃｈｉｎｅＬｅａｒｎｉｎｇＢａｚａａｒ：ＨａｒｎｅｓｓｉｎｇｔｈｅＭＬＥｃｏｓｙｓｔｅｍｆｏｒＥｆｆｅｃｔｉｖｅＳｙｓｔｅｍＤｅｖｅｌｏｐｍｅｎｔ」、ａｒＸｉｖ：１９０５．０８９４２ｖ３（２０１９年１１月１２日）に参照されているＢＴＢライブラリによって提供されている）は、選択された腕によって達成されたパフォーマンスに基づいて、どの腕が実行され学習すべきかを選択するのに使用することができる。別の実施形態では、多腕バンディットアルゴリズムのＢｅｓｔ－Ｋ又はＢｅｓｔ－Ｋ－ＶｅｌｏｃｉｔｙバリアントをＭａｓｔｅｒＢａｎｄｉｔ（たとえば、ＭｉｃａｈＪ．Ｓｍｉｔｈ他の「ＴｈｅＭａｃｈｉｎｅＬｅａｒｎｉｎｇＢａｚａａｒ：ＨａｒｎｅｓｓｉｎｇｔｈｅＭＬＥｃｏｓｙｓｔｅｍｆｏｒＥｆｆｅｃｔｉｖｅＳｙｓｔｅｍＤｅｖｅｌｏｐｍｅｎｔ」、ａｒＸｉｖ：１９０５．０８９４２ｖ３（２０１９年１１月１２日）に参照されているＢＴＢライブラリによって提供されている）に使用することができる。 [0077] In one particular embodiment, a multi-armed bandit algorithm, such as the well-known UCB1 algorithm (e.g., Micah J. Smith et al., "The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development", arXiv :1905.08942v3 (12 Nov 2019), provided by the BTB library, which selects which arm to run and learn based on the performance achieved by the selected arm. can be used to In another embodiment, the Best-K or Best-K-Velocity variant of the multi-armed bandit algorithm is used as the MasterBandit (e.g., Micah J. Smith et al., "The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System: Development, X 1905.08942v3 (provided by the BTB library referenced Nov. 12, 2019).

[0078]別の実施形態では、ＨＡＭＬＥＴが１つの腕を実行すると決定した場合、ＨＡＭＬＥＴは、腕が事前設定された数のモデル（たとえば、１つ）の訓練を完了するのを待つ。ＨＡＭＬＥＴは、訓練されたモデルのパフォーマンススコアを検査し、学習アルゴリズムを更新して（たとえば、多腕バンディットアルゴリズムのＵＣＢ１統計情報又は関連する外挿曲線）、次ループ繰り返しのセレクションステップを知らせる。この手法では、異なる腕によって（実際上、たとえば実測時間又はＣＰＵ時間の代わりに、時間の単位としてモデル訓練を再解釈して）訓練されたモデルの数について、腕の学習曲線が最良パフォーマンスとして報告されている場合に、ＬＣＥを使う。 [0078] In another embodiment, if HAMLET decides to run one arm, HAMLET waits for the arm to complete training a preset number of models (eg, one). HAMLET checks the performance score of the trained model and updates the learning algorithm (eg UCB1 statistics or associated extrapolation curve for the multi-armed bandit algorithm) to inform the selection step for the next loop iteration. In this approach, an arm's learning curve is reported as the best performance for the number of models trained by different arms (in effect, e.g., reinterpreting model training as units of time instead of wall-clock time or CPU time). If so, use LCE.

[0079]一実施形態では、ＭａｓｔｅｒＢａｎｄｉｔは、学習曲線を適合させ外挿するために逆正接関数を用いる。 [0079] In one embodiment, MasterBandit uses an arctangent function to fit and extrapolate the learning curve.

[0080]一実施形態では、ＭａｓｔｅｒＢａｎｄｉｔは、学習曲線を適合させ外挿するためにニューラルネットワークを用いる。特定の好適な一実施形態では、学習曲線外挿のためのニューラルネットワークは、機械学習問題及びデータセットからの例示的な学習曲線によって事前訓練されている。 [0080] In one embodiment, MasterBandit uses a neural network to fit and extrapolate the learning curve. In one particular preferred embodiment, the neural network for learning curve extrapolation is pre-trained with exemplary learning curves from machine learning problems and datasets.

[0081]特定の一実施形態では、ＭａｓｔｅｒＢａｎｄｉｔは、腕のそれぞれの学習曲線を構成するために、腕の記録されたパフォーマンススコアの凸閉包、すなわち腕の報告された最良スコアのみを考慮する。有益な一実施形態では、ＭａｓｔｅｒＢａｎｄｉｔは、腕の記録スコア間の時間間隔を、腕の記録スコア間のタイムスタンプに対して人工のパフォーマンススコアサンプルを生成することによって埋める。 [0081] In one particular embodiment, MasterBandit considers only the convex hull of an arm's recorded performance scores, ie, the arm's best reported score, to construct the learning curve for each arm. In one beneficial embodiment, MasterBandit fills the time intervals between recorded arm scores by generating artificial performance score samples for the timestamps between recorded arm scores.

[0082]ＭａｓｔｅｒＢａｎｄｉｔの主ループに対する１つの特定の手法が、以下の各アルゴリズムに記述されている。
アルゴリズム１：
・・説明を簡単にするためにモデル持続性及び統計情報持続性をスキップする
・・ＭａｓｔｅｒＢａｎｄｉｔは、ユーザによって指定されたタスクの制約が満たされるまでループする。
・・この制約は時間予算とすることもできる。最初の繰り返しで、すべての腕が、ＭａｓｔｅｒＢａｎｄｉｔにパフォーマンススコアの第１のセットを与えるために一度試みられる。
・・第１の繰り返しの後、ＭａｓｔｅｒＢａｎｄｉｔは、
・・・腕のスコアを使用して腕の学習曲線を更新する
・・・腕の学習曲線を、残っているすべての予算が考慮中の単体の腕に専用であったという仮定のもとまで外挿する。
・・・Ｍａｓｔｅｒは、学習曲線の中から最も好適な腕を選び、この腕を構成時間間隔に対し実行する。
・・・・確率的探査を、アルゴリズム１．２のεパラメータを調整することによって使用することができ、
・・・残っている予算から時間間隔を差し引き、制約が満たされているかどうか検査する。
・・満たされていなければ、ループを繰り返し、さもなくば訓練を停止する。
・ＷｈｉｌｅＣｏｎｓｔｒａｉｎｔＭｅｔ＝Ｆａｌｓｅ：
・・Ｉｔｅｒａｔｉｏｎｋ＝１：各腕が、所定の予算に対して一度走る
・・Ｆｏｒｅａｃｈｉｔｅｒａｔｉｏｎｋ＞１：
・・・Ｆｏｒｅａｃｈａｒｍａ：
・・・・ＵｐｄａｔｅＬＣ＿ａ（ｔ）＝［ｘ＿ａ＝（ｔ１，．．．，ｔｎｏｗ），ｙ＿ａ＝（ｓｃｏｒｅ＿ｔ１，．．．，ｓｃｏｒｅ＿ｔｎｏｗ）］
・・・・Ｅｘｔｒａｐｏｌａｔｅｄ＿ＬＣ＿ａ＝ＵｐｄａｔｅＥｘｔｒａｐｏｌａｔｅｄＬｅａｒｎｉｎｇＣｕｒｖｅ（ＬＣ＿ａ（ｔ），Ｂｕｄｇｅｔ＿ｒｅｍａｉｎｉｎｇ）
・・・Ｎｅｘｔ＿ａｒｍｓ，Ｎｅｘｔ＿Ｂｕｄｇｅｔ＝ＭａｓｔｅｒＣｈｏｏｓｅＡｒｍ（Ｅｘｔｒａｐｏｌａｔｅｄ＿ＬＣｆｗｏｒｅａｃｈａｒｍａ，Ｂｕｄｇｅｔ＿ｒｅｍａｉｎｉｎｇ，Ｄｅｓｉｒｅｄ＿Ｓｃｏｒｅ）
・・・ＰａｕｓｅａｌｌａｒｍｓｎｏｔｉｎＮｅｘｔ＿ａｒｍｓｆｏｒＮｅｘｔ＿Ｂｕｄｇｅｔｓｅｃｏｎｄｓ
・・・ＲｅｓｕｍｅａｌｌａｒｍｓｉｎＮｅｘｔ＿ａｒｍｓｆｏｒＮｅｘｔ＿Ｂｕｄｇｅｔｓｅｃｏｎｄｓ
・・・ＣｏｎｓｔｒａｉｎｔＭｅｔ＝ＣｈｅｃｋＣｏｎｓｔｒａｉｎｔ（Ｂｕｄｇｅｔ＿ｒｅｍａｉｎｉｎｇ，Ｄｅｓｉｒｅｄ＿Ｓｃｏｒｅ）
・・・・ただし、ｔ：時間、
・・・ｓｃｏｒｅ：最良のモデルで達成されたスコア
・・・ＬＣ：Ｌｅａｒｎｉｎｇ＿Ｃｕｒｖｅ
・・・ｋ：繰り返し
・・・ａ：腕
・・・ＬＣ＿ａ（ｔ）：時間ｔｎｏｗに至るまでの時間ｔに依存する、腕ａの学習曲線
・・・ｘ＿ａ：時間（関連するスコアの訓練時間、腕ａに対しｒｅｓｐ．ｓｋｏｒｅが見つかるまでずっと腕が走ってきた秒）
・・・ｙ＿ａ：スコア（腕ａによって見つけられた、単調に増加するスコア）
・・・Ｎｅｘｔ＿ａｒｍｓ：ＭａｓｔｅｒＢａｎｄｉｔによって選ばれた場合に次の繰り返しで走る腕のリスト
・・Ｎｅｘｔ＿Ｂｕｄｇｅｔ：ＭａｓｔｅｒＢａｎｄｉｔによって選ばれた、次の繰り返しの予算、秒
アルゴリズム１．１
・ＵｐｄａｔｅＥｘｔｒａｐｏｌａｔｅｄＬｅａｒｎｉｎｇＣｕｒｖｅ（ＬＣ＿ａ（ｔ），Ｂｕｄｇｅｔ＿ｒｅｍａｉｎｉｎｇ）：
・・・腕ａの学習曲線が外挿される：Ｅｘｔｒａｐｏｌａｔｅｄ＿ＬＣ
・・・Ｅｘｔｒａｐｏｌａｔｅｄ＿ＬＣが、最大使用可能予算（Ｂｕｄｇｅｔ＿ｒｅｍａｉｎｉｎｇ）に至るまでの時間にわたり将来のスコアを推定する。
・・・曲線が、たとえば、ｉｌｏｇ関数又は逆正接関数を用いて、又はたとえばＳＶＭを用いて曲線を適合させることを意味する標準回帰アルゴリズムによってある期間（ｔ）にわたってこれまでに（ｙ）観測されたスコアに基づき、標準回帰アルゴリズムを用いて、適合される
・・Ｘ＝ＬＣ＿ａ（ｔ）．ｘ＿ａ
・・ｙ＝ＬＣ＿ａ（ｔ）．ｙ＿ａ
・・Ｅｘｔｒａｐｏｌａｔｅｄ＿ＬＣ＝ｃｕｒｖｅ＿ｆｉｔ（Ｘ，ｙ）
・・ＲｅｔｕｒｎＥｘｔｒａｐｏｌａｔｅｄ＿ＬＣ

アルゴリズム１．２
・ＭａｓｔｅｒＣｈｏｏｓｅＡｒｍ（Ｅｘｔｒａｐｏｌａｔｅｄ＿ＬＣ＿ａｆｗｏｒｅａｃｈａｒｍａ，Ｂｕｄｇｅｔ＿ｒｅｍａｉｎｉｎｇ，Ｄｅｓｉｒｅｄ＿Ｓｃｏｒｅ）：
・・・Ｍａｓｔｅｒは、目的に基づいて腕を選ぶ：Ｍａｓｔｅｒは、ユーザ定義の所望の精度第一を得ることを期待される腕を選ぶことができ（ｏｐ＝１）、又は残っている時間で最高スコアを得ることが期待される腕を選ぶことができる（ｏｐ＝２）。
・・・以下は、ＭａｓｔｅｒＢａｎｄｉｔバージョンを示し、非常に限定されたリソースの故に、繰り返しごとに腕を１つしか走らせてはならない（ＭａｓｔｅｒＢａｎｄｉｔバージョンは、繰り返しごとに多数の腕を並列に走らせるように適合させることができる）。
・・・εパラメータによって、確率的探査行動が制御される。
・・Ｉｆｏｐ＝１
・・・確率ε１では：
・・・・Ｎｅｘｔ＿ａｒｍ＝ａｒｇｍｉｎ＿ａ（Ｅｘｔｒａｐｏｌａｔｅｄ＿ＬＣ＿ａ．ｘｗｈｅｒｅＥｘｔｒａｐｏｌａｔｅｄ＿ＬＣ＿ａ．ｙ＞Ｄｅｓｉｒｅｄ＿Ｓｃｏｒｅ）
・・・確率（１－ε１）では：
・・・・Ｎｅｘｔ＿ａｒｍ＝ｒａｎｄｏｍ
・・Ｉｆｏｐ＝２
・・・確率ε１では：
・・・・Ｎｅｘｔ＿ａｒｍ＝ａｒｇｍａｘ＿ａ（Ｅｘｔｒａｐｏｌａｔｅｄ＿ＬＣ＿ａ．ｙｗｈｅｒｅＥｘｔｒａｐｏｌａｔｅｄ＿ＬＣ＿ａ．ｘ＝ｔ＿ｍａｘ＿ａｖａｉｌａｂｌｅ）
・・・確率（１－ε１）では：
・・・・Ｎｅｘｔ＿ａｒｍ＝ｒａｎｄｏｍ
・・・・次の繰り返しのための予算を、各腕に対し残っている予算、所望の精度、及び外挿されたＬＣに基づいて割り当てる：
・・・Ｎｅｘｔ＿Ｂｕｄｇｅｔ＝ＡｓｓｉｇｎＢｕｄｇｅｔＡｒｍ（Ｂｕｄｇｅｔ＿ｒｅｍａｉｎｉｎｇ，Ｄｅｓｉｒｅｄ＿Ｓｃｏｒｅ，Ｅｘｔｒａｐｏｌａｔｅｄ＿ＬＣ＿ａ）
・・・ＲｅｔｕｒｎＮｅｘｔ＿ａｒｍ，Ｎｅｘｔ＿Ｂｕｄｇｅｔ

・ただし、ｘ：時間（関連するスコアの訓練時間、腕ａに対しｒｅｓｐ．ｓｋｏｒｅが見つかるまでずっと腕が走ってきた秒）
・・・・ｙ：スコア（腕ａによって見つけられた、単調に増加するスコア）

アルゴリズム１．３
・ＣｈｅｃｋＣｏｎｓｔｒａｉｎｔ（Ｂｕｄｇｅｔ＿ｒｅｍａｉｎｉｎｇ，Ｄｅｓｉｒｅｄ＿Ｓｃｏｒｅ）：
・・・全予算が使用されているかどうか、及び所望の精度がすでに達成されているかどうかを検査する
・・Ｃｏｎｓｔｒａｉｎｔ＿ｍｅｔ＝Ｆａｌｓｅ
・・ｉｆＢｕｄｇｅｔ＿ｕｓｅｄ＞＝Ｂｕｄｇｅｔ＿ｒｅｍａｉｎｉｎｇ：
・・・Ｃｏｎｓｔｒａｉｎｔ＿ｍｅｔ＝Ｔｒｕｅ
・・ｉｆＨｉｇｈｅｓｔ＿Ｓｃｏｒｅ＿Ｒｅａｃｈｅｄ＿ａｂｏｖｅ＿ａｌｌ＿ａｒｍｓ＞＝Ｄｅｓｉｒｅｄ＿Ｓｃｏｒｅ：
・・・Ｃｏｎｓｔｒａｉｎｔ＿ｍｅｔ＝Ｔｒｕｅ
・・ＲｅｔｕｒｎＣｏｎｔｒａｉｎｔ＿ｍｅｔ [0082] One particular approach to the MasterBandit's main loop is described in each of the following algorithms.
Algorithm 1:
Skip model persistence and statistics persistence for simplicity MasterBandit loops until user-specified task constraints are met.
This constraint can also be a time budget. On the first iteration, all arms are attempted once to give the MasterBandit the first set of performance scores.
. . After the first iteration, the MasterBandit:
. . . use the arm score to update the arm's learning curve . Extrapolate.
. . . The Master picks the most suitable arm from the learning curve and runs this arm for the configured time interval.
...a probabilistic search can be used by adjusting the ε parameter of Algorithm 1.2,
. . . Subtract the time interval from the remaining budget and check if the constraints are met.
• If not, repeat the loop, else stop training.
・While ConstraintMet=False:
Iteration k=1: each arm runs once for a given budget For each iteration k>1:
...For each arm a:
. . . Update LC_a(t)=[x_a=(t1,...,tnow), y_a=(score_t1,...,score_tnow)]
. . . Extrapolated_LC_a=UpdateExtrapolatedLearningCurve(LC_a(t), Budget_remaining)
... Next_arms, Next_Budget = MasterChooseArm(Extrapolated_LC forward each arm a, Budget_remaining, Desired_Score)
・・・Pause all arms not in Next_arms for Next_Budget seconds
・・・Resume all arms in Next_arms for Next_Budget seconds
. . . ConstraintMet=CheckConstraint(Budget_remaining, Desired_Score)
・・・ However, t: time,
...score: the score achieved with the best model ... LC: Learning_Curve
. , seconds the arm ran all the way until resp.skore was found for arm a)
y_a: score (monotonically increasing score found by arm a)
Next_arms: List of arms to run on next iteration if chosen by MasterBandit Next_Budget: Budget for next iteration, chosen by MasterBandit, seconds Algorithm 1.1
- Update Extrapolated Learning Curve (LC_a(t), Budget_remaining):
... the learning curve for arm a is extrapolated: Extrapolated_LC
... Extrapolated_LC estimates future scores over time up to the maximum available budget (Budget_remaining).
... the curve has been observed so far (y) over a period of time (t) by a standard regression algorithm, which means fitting the curve, for example using the ilog function or the arctangent function, or using the SVM for example. are fitted using a standard regression algorithm based on the scores obtained X=LC_a(t). x_a
y=LC_a(t). y_a
..Extrapolated_LC=curve_fit(X, y)
.Return Extrapolated_LC

Algorithm 1.2
- MasterChooseArm (Extrapolated_LC_a for each arm a, Budget_remaining, Desired_Score):
. . . Master chooses arm based on objective: Master can choose arm expected to obtain user-defined desired accuracy first (op=1), or with time remaining The arm expected to get the highest score can be chosen (op=2).
... The following shows the MasterBandit version, which, due to very limited resources, should only run one arm per repetition (the MasterBandit version runs multiple arms in parallel per repetition). can be adapted).
... the ε parameter controls the probabilistic exploration behavior.
. If op = 1
...with probability ε1:
...Next_arm=argmin_a (Extrapolated_LC_a.x where Extrapolated_LC_a.y>Desired_Score)
...with probability (1-ε1):
・・・Next_arm=random
. If op = 2
...with probability ε1:
. . . Next_arm=argmax_a (Extrapolated_LC_a.y where Extrapolated_LC_a.x=t_max_available)
...with probability (1-ε1):
・・・Next_arm=random
. . . Allocate the budget for the next iteration based on the remaining budget for each arm, the desired accuracy, and the extrapolated LC:
. . . Next_Budget=AssignBudgetArm(Budget_remaining, Desired_Score, Extrapolated_LC_a)
...Return Next_arm, Next_Budget

where x: time (relevant score training time, seconds the arm ran all the way until resp.skore was found for arm a)
y: score (monotonically increasing score found by arm a)

Algorithm 1.3
- CheckConstraint (Budget_remaining, Desired_Score):
...check if the full budget has been used and if the desired accuracy has already been achieved Constraint_met=False
. if Budget_used >= Budget_remaining:
. . . Constraint_met=True
if Highest_Score_Reached_above_all_arms>=Desired_Score:
. . . Constraint_met=True
.Return Contraint_met

[0083]有益な一実施形態では、ＨＡＭＬＥＴは、特定のデータセットの一部分（たとえば、パフォーマンススコアを計算するために一般に使用されるテストセット）についてのモデルの実測実行時間を、モデルのパフォーマンススコアに加えて追跡し記憶する。こうすると、有益なことに後でユーザに、最高動作モデルだけでなく、実行するのに最速のものを提示することも可能になる。また、こうすることで、以下に標示されるように、アンサンブル構築について知らせることもできる。 [0083] In one beneficial embodiment, HAMLET converts a model's observed execution time for a portion of a particular data set (e.g., a test set commonly used to compute a performance score) into a model's performance score. In addition, track and store. This advantageously allows the user later to be presented with not only the best operating model, but also the fastest one to execute. It can also inform about ensemble construction, as indicated below.

[0084]ユーザのタスクに対して稼働しているとき、ＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔは、訓練されたモデル、及びアンサンブル（下記参照）を記憶し、図１に標示された対応するデータベースでパフォーマンス統計情報を達成する。別法として、腕は、実行中に、訓練されたモデル及び関連付けられたパフォーマンス統計情報をデータベース自体に、記憶をＭａｓｔｅｒＢａｎｄｉｔに残す代わりに記憶する。 [0084] When running against a user's task, HAMLET MasterBandit stores trained models and ensembles (see below) and achieves performance statistics in the corresponding database labeled in FIG. . Alternatively, the arm stores the trained model and associated performance statistics in the database itself during execution instead of leaving the storage in the MasterBandit.

[0085]ユーザ要求によって、ＨＡＭＬＥＴディスパッチャは、これらの記憶されたモデル及び統計情報にアクセスを行う。有益な一バリアントでは、ＨＡＭＬＥＴは、特定の割合又は特定の数のトップ動作モデルだけを保持するように構成されることがある。この構成は、タスク定義と共にユーザによって提供することができ、又は、ＨＡＭＬＥＴ構成パラメータとすることができる。 [0085] Upon user request, the HAMLET dispatcher provides access to these stored models and statistics. In one useful variant, HAMLET may be configured to retain only a certain percentage or number of top behavioral models. This configuration can be provided by the user with the task definition, or it can be a HAMLET configuration parameter.

[0086]一実施形態では、ＭａｓｔｅｒＢａｎｄｉｔはディスパッチャを起動してユーザに、たとえばユーザに関連付けられた、又はユーザのタスクに関連付けられた（次にタスク定義の一部として提供された）Ｅメールアドレスを介して、ユーザのタスクが終了したことを通知することができる。 [0086] In one embodiment, MasterBandit launches a dispatcher to send the user, for example, an email address associated with the user or associated with the user's task (which was then provided as part of the task definition). , the user can be notified that the task has been completed.

[0087]一実施形態では、ＭａｓｔｅｒＢａｎｄｉｔは、ヒューマンシステムアドミニストレータによって許容コンピューティングリソース利用に対し構成される。この場合、ＭａｓｔｅｒＢａｎｄｉｔは適正なモード（すべての腕を並列に走らせる、又は時間で多重化する）を選択することができる。 [0087] In one embodiment, the MasterBandit is configured for allowable computing resource utilization by the human system administrator. In this case, the MasterBandit can select the appropriate mode (running all arms in parallel or multiplexing in time).

[0088]別の有益な一実施形態では、ＭａｓｔｅｒＢａｎｄｉｔには、許容コンピューティングリソース利用について、配置オーケストレーション技術によって知らせること（たとえば、ＭａｓｔｅｒＢａｎｄｉｔがオーケストレータを照会する、又はオーケストレータがＭａｓｔｅｒＢａｎｄｉｔに通知する）、又は適切なクラウドシステムロード情報サービスによって知らせることができる。この場合、ＭａｓｔｅｒＢａｎｄｉｔは、すべての腕を並列に走らせるモードと、時間で多重化するモードとを切り替えることができる。 [0088] In another beneficial embodiment, the MasterBandit is informed of the allowable computing resource utilization by means of deployment orchestration technology (eg, the MasterBandit queries the Orchestrator, or the Orchestrator notifies the MasterBandit). , or by an appropriate cloud system load information service. In this case, the MasterBandit can switch between running all arms in parallel and multiplexing in time.

[0089]有益な一実施形態では、ＨＡＭＬＥＴは、メタＡｕｔｏＭＬクラウドベースのシステムとして提供することができる。
アンサンブリングを持つＨＡＭＬＥＴ [0089] In one beneficial embodiment, HAMLET may be provided as a meta-AutoML cloud-based system.
HAMLET with ensemble

[0090]有益な一実施形態では、ＨＡＭＬＥＴは、タスクの訓練されたモデルの全部又はサブセットに基づいてアンサンブルを構築することも提供する。このため、異なるチューナ（及びフレームワーク）から生成されたモデルは、アンサンブルの形に組み合わせることができる。具体的には、ＨＡＭＬＥＴは、事前設定された数又は比率のトップパフォーミングモデルを選択することができる。別のバリアントでは、ＨＡＭＬＥＴは、モデルを実行するのに最速のものを選択することができる。なお別の実施形態では、ＨＡＭＬＥＴは、アンサンブルを成すアルゴリズムを多様化するために、機械学習アルゴリズムのタイプに基づいてモデルを選択することができる。アンサンブリングをスケジュールするための３つの異なる実施形態が提示される。すなわち、
・・１つの実施形態では、ＭａｓｔｅｒＢａｎｄｉｔは、タスクの停止基準が満たされた後にモデルアンサンブリング計算を実行する。
・・別の実施形態では、主ループ中にＭａｓｔｅｒＢａｎｄｉｔは、アンサンブリング計算を指定された間隔で、たとえば５回繰り返しごとに、又は２０秒ごとに実行する。
・・第３の実施形態では、主ループ中にＨＡＭＬＥＴは、モデルアンサンブリング計算をチューナの特別な形として扱う。すなわち、モデルアンサンブリング計算は、上述した結論及び暗示から選択すべきＭａｓｔｅｒＢａｎｄｉｔの腕のうちの１つとして扱われる。 [0090] In one beneficial embodiment, HAMLET also provides for building ensembles based on all or a subset of the trained models for the task. Thus, models generated from different tuners (and frameworks) can be combined into an ensemble. Specifically, HAMLET can select a preset number or proportion of top performing models. In another variant, HAMLET can choose the fastest one to run the model. In yet another embodiment, HAMLET can select models based on the type of machine learning algorithm to diversify the algorithms that make up the ensemble. Three different embodiments are presented for scheduling the ensemble. i.e.
• In one embodiment, MasterBandit performs model ensemble computations after a task's stopping criteria are met.
... In another embodiment, during the main loop, MasterBandit performs the ensemble calculations at specified intervals, eg, every 5 iterations, or every 20 seconds.
• In a third embodiment, HAMLET during the main loop treats the model ensemble computation as a special form of tuner. That is, the model ensemble computation is treated as one of the MasterBandit's arms to be selected from the conclusions and implications discussed above.

[0091]特定の一実施形態では、ＨＡＭＬＥＴは、アンサンブリング計算のためにＭｅｔａＢａｇｓ概念を用いる（参照により本明細書に組み込まれる、ＪｉｈｅｄＫｈｉａｒｉ他の「ＭｅｔａＢａｇｓ：ＢａｇｇｅｄＭｅｔａ－ＤｅｃｉｓｉｏｎＴｒｅｅｓｆｏｒＲｅｇｒｅｓｓｉｏｎ」参照）。このため、ＨＡＭＬＥＴは、ＭｅｔａＢａｇｓ固有のメタ特徴を計算するために、タスクのデータセットの異なるブートストラップサンプルを生成する。別の実施形態では、ＨＡＭＬＥＴは、アンサンブリングの手段としてモデル平均化（全構成体モデルの予測を平均化する）を用いる。
大規模並行処理、実測時間予算へのアクセスを持つＨＡＭＬＥＴ [0091] In one particular embodiment, HAMLET uses the MetaBags concept for the ensemble computation (see Jihed Khiari et al., "MetaBags: Bagged Meta-Decision Trees for Regression," incorporated herein by reference). . For this reason, HAMLET generates different bootstrap samples of the task's dataset to compute MetaBags-specific meta-features. In another embodiment, HAMLET uses model averaging (averaging the predictions of all constituent models) as a means of ensemble.
Massive Concurrency, HAMLET with access to real-time budgets

[0092]別の実施形態では、ＨＡＭＬＥＴは、総実測時間予算だけが指定されているユーザのタスクを解決するために使用可能な豊富なコンピューティングリソースと共に、クラウドセッティングに配置される。このセッティングでは、ＭａｓｔｅｒＢａｎｄｉｔは、所与の予算について上述したように、指定されたタスクで使用されるべきそのすべての腕を並列に走らせることができる。しかし、腕のフレームワークは、たとえばたくさんのＣＰＵに対して十分にスケール変更するために、特有の難題に直面することがある。したがって、ＨＡＭＬＥＴＭａｓｔｅｒＢａｎｄｉｔは、
・・それぞれの腕の構成を、たとえば、腕で考慮すべきアルゴリズムの数を限定することによって、及び／又はアームのフレームワークで考慮すべき特定的なハイパーパラメータの値の範囲を低減することによって、それぞれの腕が限られた探索空間に焦点を合わせるように修正すること、並びに
・・修正された腕の制限を、アルゴリズムを補填するように構成されている相補形の腕を加えることによって、及び／又は修正された腕の構成適用範囲から修正により除去されたハイパーパラメータ範囲を加えることによって補償すること
を選ぶことができる。
特定の使用事例のＨＡＭＬＥＴ [0092] In another embodiment, HAMLET is deployed in a cloud setting with abundant computing resources available to solve tasks for users with only a total wall clock budget specified. With this setting, the MasterBandit can run all its arms in parallel to be used in a given task, as described above for a given budget. However, the ARM framework can face unique challenges, for example, to scale well to many CPUs. Therefore, the HAMLET MasterBandit is
the configuration of each arm, e.g. by limiting the number of algorithms to be considered in the arm and/or by reducing the range of values of specific hyperparameters to be considered in the framework of the arm. , modifying each arm to focus on a limited search space, and . . . by adding complementary arms configured to compensate for the modified arm limitations, and/or may choose to compensate by adding the hyperparameter ranges removed by the modification from the modified arm configuration coverage.
HAMLET for specific use cases

[0093]ＨＡＭＬＥＴの特定の一実施形態は、スマートシティのために有利に使用することができる。歩行者交通を監視するカメラのセットは、ＮＥＣＣＯＲＰによるＦｉｅｌｄＡｎａｌｙｓｔなどのデモグラフィック検出エンジンにデータ入力する。ＦｉｅｌｄＡｎａｌｙｓｔは、歩行者が監視領域を通り過ぎるときに、ある期間にわたって歩行者のデモグラフィック統計情報（性別、年齢層）の匿名分類を生成する。このデータは、気象データ、及び監視領域の近辺の予定イベントについての入手可能データと一緒に、将来の歩行者交通デモグラフィックを予測することを学習するためのＨＡＭＬＥＴへの入力として役立つ。タスクを解決すると、ＨＡＭＬＥＴは、今後の歩行者デモグラフィックをある一定の精度で予測できるモデルのセットを生み出す。このような予測は次に、動的交通制御決定の基礎を成すことができ（たとえば、予測の場合には交通量を下げるために）、又は、店舗がたとえばその提供品を今後の歩行者の群衆に準備するためのマーケティングを知らせることができる。 [0093] A particular embodiment of HAMLET can be advantageously used for smart cities. A set of cameras that monitor pedestrian traffic feed into a demographic detection engine such as FieldAnalyst by NEC CORP. FieldAnalyst generates anonymous classifications of pedestrian demographic statistics (gender, age group) over time as they pass through a monitored area. This data, along with weather data and available data about upcoming events in the vicinity of the monitored area, serves as input to HAMLET for learning to predict future pedestrian traffic demographics. Upon solving the task, HAMLET produces a set of models that can predict future pedestrian demographics with some accuracy. Such predictions can then form the basis of dynamic traffic control decisions (e.g., to reduce traffic in the case of predictions), or stores can e.g. You can inform your marketing to prepare the crowd.

[0094]スマートシティのための別のＨＡＭＬＥＴの一実施形態では、道路交通を監視するカメラのセットは、たとえば、自動車、大型自動車（トラック、バス等）及び自転車の数をある時間間隔で検出できるフロー分析エンジンにデータ入力する。この交通量フローデモグラフィックデータは、気象データ、及び市中の予定イベントについての他のデータと一緒に、将来の交通量フローデモグラフィック、たとえば、トラックと自動車と自転車のたとえば構成比、を予測することを学習するためのＨＡＭＬＥＴへの入力として役立つ。タスクを解決すると、ＨＡＭＬＥＴは、今後の交通量フローデモグラフィックをある一定の精度で予測できるモデルのセットを生み出す。このような予測は次に、たとえば、交通を管理するために追加の交通警察官を派遣すること、又は、たとえばある一定の構成比をなす運転者の経路再選択の要求を、ナビゲーションシステムプロバイダとのインターフェースを介して送り出すことを決定するための動的交通制御決定の基礎を成すことができる（たとえば、予測の場合には交通量を下げるために）。 [0094] In another HAMLET embodiment for smart cities, a set of cameras monitoring road traffic can detect, for example, the number of cars, large vehicles (trucks, buses, etc.) and bicycles at certain time intervals. Populate the flow analytics engine. This traffic flow demographic data, along with weather data and other data about scheduled events in the city, predicts future traffic flow demographics, e.g., the mix of trucks, cars, and bicycles. serves as input to HAMLET for learning Upon solving the task, HAMLET produces a set of models that can predict future traffic flow demographics with some accuracy. Such predictions can then, for example, communicate with the navigation system provider to dispatch additional traffic police officers to manage traffic or, for example, to request rerouting of a certain percentage of drivers. can form the basis of dynamic traffic control decisions (eg, in the case of forecasting, to reduce traffic).

[0095]ＨＡＭＬＥＴの一実施形態は、エネルギー最適化のためのスマートビルの予測制御に有利に使用することができる。スマートビルは、モノのインターネット（ＩｏＴ）インフラストラクチャプラットフォームを介して、たとえばＦＩＷＡＲＥを介してアクセスできる、室温を測定するセンサを装備する。別法として、又は加えて、ビルの水耕暖房システム動作状態（オン、オフ、温度）は、たとえばそのビル管理システムからアクセス可能である。このセンサデータは、関連する気象データと一緒にＨＡＭＬＥＴに供給されて、高精度予測機械学習モデルをたとえば数時間又は数日さえも前もって特定することができる（特に、たとえばインターネットからの気象予報サービスを使用する場合に）。有益な一バリアントでは、ＨＡＭＬＥＴはさらに、エネルギー計器読み取り値に適用して、暖房システムの動作セッティング及び気象影響が暖房システムのエネルギー使用にどれだけ関係するかを予測できる機械学習モデルを特定することができる。これらの予測モデル（たとえば、室温予測及びエネルギー消費）は次に、遺伝的アルゴリズム、差分進化、又は粒子群最適化などの最適化アルゴリズムによって使用されて、どの暖房システムセッティングのもとでビルがビル固有の目標室温範囲に適合するか、又は違反するか、またどのようにして暖房システムエネルギー利用を最適化できるかを評価することができる。 [0095] An embodiment of HAMLET can be advantageously used for predictive control of smart buildings for energy optimization. Smart buildings are equipped with sensors that measure room temperature, accessible via Internet of Things (IoT) infrastructure platforms, for example via FIWARE. Alternatively or additionally, the building's hydroponic heating system operating status (on, off, temperature) may be accessed, for example, from its building management system. This sensor data, together with relevant weather data, can be fed into HAMLET to identify highly accurate predictive machine learning models, e.g. hours or even days in advance (in particular weather forecast services from e.g. if used). In one useful variant, HAMLET can further identify machine learning models that can be applied to energy meter readings to predict how heating system operating settings and weather effects relate to heating system energy use. can. These predictive models (e.g., room temperature predictions and energy consumption) are then used by optimization algorithms such as genetic algorithms, differential evolution, or particle swarm optimization to determine under what heating system settings the building It can be evaluated whether a specific target room temperature range is met or violated and how heating system energy utilization can be optimized.

[0096]ＨＡＭＬＥＴの一実施形態は、患者退院を予測するために病院で有利に使用することができる。病院管理では、患者がいつ退院しそうであるかを知ることが有利になり得る。この知ることは、患者の健康データ、生理学的測定値（たとえば、心拍数及び血圧）及び一般的な患者情報（たとえば、年齢、性別）を従来技術から知られている方法により数値で符号化し、この数値データを患者が入院している日数と一緒にＨＡＭＬＥＴに供給することによって、達成することができる。ＨＡＭＬＥＴは、新たな患者及びすでに受け入れられている患者について、それぞれの患者がどれくらいの期間入院することになるか予測できる予測機械学習モデルを効率的に生み出す。この退院情報は次に、たとえば、病院のリソース計画に使用することができる。 [0096] An embodiment of HAMLET can be advantageously used in hospitals to predict patient discharge. In hospital administration, it can be advantageous to know when a patient is likely to be discharged. This knowledge numerically encodes patient health data, physiological measurements (e.g. heart rate and blood pressure) and general patient information (e.g. age, gender) by methods known from the prior art, This can be accomplished by supplying HAMLET with this numerical data along with the number of days the patient has been hospitalized. HAMLET effectively creates a predictive machine learning model that can predict how long each patient will be hospitalized for new and already admitted patients. This discharge information can then be used, for example, for hospital resource planning.

[0097]ＨＡＭＬＥＴの一実施形態は、量的取引で有利に使用することができる。投資家の取引決定を知らせることは、証券の基本データ及びその取引価格の時系列をＨＡＭＬＥＴに供給して、ａ）買うべき、又は売るべき証券を分類するモデル、及び／又はｂ）将来の証券価格（たとえば、予測が実行されたときから１週間の株の終値）を予測するモデルを特定することによって、達成することができる。 [0097] An embodiment of HAMLET can be used to advantage in quantitative trading. Informing an investor's trading decisions feeds HAMLET with basic data on a security and a time series of its trading prices to create a) a model to classify which securities to buy or sell, and/or b) future securities. This can be accomplished by specifying a model that predicts the price (eg, the stock's closing price for the week from when the prediction was made).

[0098]ＨＡＭＬＥＴの一実施形態は、電子健康領域で有利に使用することができる。医療データが与えられ、このデータは、そのデータに基づいて病気の分類を学習するためのＨＡＭＬＥＴへの入力として役立つことができる。一例は次のようなものである。すなわち、糖尿病の要因を分析するために病院で収集されたデータ（たとえば、年齢、入院期間、薬剤、他の病気等）。タスクは、糖尿病患者ｐの、緊急糖尿病症例のために再入院しなければならないリスクを、与えられたデータに基づいて分類することである。このタスクを解決すると、ＨＡＭＬＥＴは、患者の緊急サブミッタンスリスクをある一定の精度で分類できるモデルのセットを生み出す。このような分類は次に、医師をその意志決定の際に支援するために、また、特定の治療のためのヒント及び標識を得るために用いることができる。この分類は、他の病気の事例に適用することもできる。 [0098] An embodiment of HAMLET can be used advantageously in the electronic health domain. Given medical data, this data can serve as input to HAMLET for learning a disease classification based on that data. An example is as follows. That is, data collected at hospitals to analyze diabetes factors (eg, age, length of stay, medications, other illnesses, etc.). The task is to classify the risk of diabetic patient p having to be readmitted for an emergency diabetic case based on the given data. Solving this task, HAMLET produces a set of models that can classify a patient's urgent submission risk with a certain degree of accuracy. Such classifications can then be used to assist physicians in their decision-making and to provide hints and indications for specific treatments. This classification can also be applied to other disease cases.

[0099]ＨＡＭＬＥＴの一実施形態は、空気質予測で有利に使用することができる。道路交通及び気象、並びに空気質（たとえば、ＳＯ_２又は微粒子物質のレベル）をある時間間隔で監視するセンサのセット。このセンサのセットは、将来の空気質、たとえばＳＯ_２又は微粒子物質のレベルを予測することを学習するためのＨＡＭＬＥＴへの入力として役立つ。タスクを解決すると、ＨＡＭＬＥＴは、今後の空気質値をある一定の精度で予測できるモデルのセットを生み出す。このような予測は次に、動的交通制御決定のための（たとえば、予測の場合では交通量を下げるための）、又は公共輸送についての決定のための基礎を成すことができる。このような予測の別の用途は、対象のいくつかの領域がひどく汚染されると予測される時刻を一般大衆に知らせる、エンドユーザアプリケーションを構築することである。 [0099] An embodiment of HAMLET can be used advantageously in air quality prediction. A set of sensors that monitor road traffic and weather as well as air quality ₍ eg SO2 or particulate matter levels) at certain time intervals. This set of sensors serves as input to HAMLET for learning to predict future air quality, eg _SO2 or particulate matter levels. Upon solving the task, HAMLET produces a set of models that can predict future air quality values with some accuracy. Such predictions can then form the basis for dynamic traffic control decisions (eg, to reduce traffic in the case of predictions) or decisions about public transport. Another use of such predictions is to build an end-user application that informs the general public of the times when some area of interest is expected to be heavily contaminated.

[0100]一般的及び単独の使用事例において、本発明の諸実施形態は、以下の改善／利点をもたらす。
１）メタＡｕｔｏＭＬ方法の設計であり、新たなＡｕｔｏＭＬフレームワーク（たとえば、ＮＡＳ用、又は従来のＭＬ用）をこれらが従来技術で開発されるときに腕として、時間多重化による効率的な計算リソース利用のためにフレームワークのパフォーマンスを学習及び外挿することによって異なる適用可能ＡｕｔｏＭＬフレームワーク（腕）の中から選ぶべき、新規で発明的な改善された多腕バンディットアルゴリズムを含めて、容易に統合することができ、この多腕バンディットアルゴリズムが、過去のパフォーマンスの統計情報についての計算に基づく多腕バンディットアルゴリズムの欠点を回避する、設計。
２）単一のＰＣから、異なるハードウェアセッティングによる大規模並列クラウド配置までスケール変更可能であるシステムの設計であり、さらに、柔軟なシステムアーキテクチャを使用することによって、走っているフレームワーク（腕）と並列に協働すること、又はフレームワークの実行を時間内に多重化することができ、このシステムアーキテクチャが、たとえばバンディットを、腕マイクロサービスコンポーネントの配置を（予算知識によって）管理するマイクロサービスコンポーネントとしてカプセル化する、設計。
３）コンピューティングリソースの割り当てを乏しい計算リソースのシナリオでフレームワーク実行の時間予算内で改善する、ＬＥＣによる強化多腕バンディットアルゴリズムの提供。より良いスコアを、限定されたリソースが与えられたときに見つけることができ、又は所望のスコアをより速く見つけることができる。
４）ディープラーニング並びに従来のＭＬのサポート。また、異なるタイプの学習問題（回帰、分類、クラスタ化）をサポートする。より多くのアルゴリズムを使用できることは、より良いスコアにつながる。
５）多くの異なるフレームワークにわたってモデルのビルトインアンサンブリングを実現する。このことは、フレームワーク全体からのモデルが使用されるときにより良いスコアにつながる（組み込まれる良好なモデルが多いほど、より良いスコアにつながる）。 [0100] In general and single use cases, embodiments of the present invention provide the following improvements/benefits.
1) The design of the meta-AutoML method, arming new AutoML frameworks (e.g. for NAS, or for conventional ML) as they are developed in the prior art, with efficient computational resources through time multiplexing. Easy integration, including novel and inventive improved multi-armed bandit algorithms to choose among different applicable AutoML frameworks (arms) by learning and extrapolating framework performance for use The design of this multi-armed bandit algorithm avoids the shortcomings of multi-armed bandit algorithms that are based on calculations about past performance statistics.
2) The design of the system, which is scalable from a single PC to a massively parallel cloud deployment with different hardware settings, and a framework (arm) running by using a flexible system architecture. or multiplexes the execution of the framework in time, and this system architecture allows, for example, bandits to manage the placement of arm microservice components (with budget knowledge). Designed to be encapsulated as
3) LEC provides an enhanced multi-armed bandit algorithm that improves the allocation of computing resources within the time budget of framework execution in scarce computing resource scenarios. Better scores can be found given limited resources, or desired scores can be found faster.
4) Deep learning as well as traditional ML support. It also supports different types of learning problems (regression, classification, clustering). Being able to use more algorithms leads to better scores.
5) Provides built-in ensemble of models across many different frameworks. This leads to better scores when models from across the framework are used (more good models incorporated lead to better scores).

[0101]一実施形態では、ユーザ指定機械学習タスクに対する機械学習アルゴリズムを自動的に選択し、チューニングし、訓練する方法は次のステップを含む。
１）データセットをユーザから受け取るステップ。
２）特定の機械学習タスクをユーザから受け取るステップ。
３）特定のタスクについての訓練されたモデルを見つけることを目標として、残っているタスク時間予算に対してパフォーマンスの学習及び外挿を用いて、使用可能な計算リソース及び時間予算に応じてＡｕｔｏＭＬフレームワーク及び／又はアルゴリズム（腕）の多数のインスタンス化の実行を制御するステップ。
４）タスクのための多数の訓練されたモデルを後のユーザ検索のために、並びに特定のタスクの最高スコアリングモデルを見つけるために収集するステップ。 [0101] In one embodiment, a method for automatically selecting, tuning, and training a machine learning algorithm for a user-specified machine learning task includes the following steps.
1) Receiving a dataset from a user.
2) Receiving a specific machine learning task from a user.
3) AutoML frames depending on available computational resources and time budget, with the goal of finding a trained model for a particular task, using performance learning and extrapolation against the remaining task time budget Controlling execution of multiple instantiations of works and/or algorithms (arms).
4) Collecting a number of trained models for a task for later user retrieval as well as to find the highest scoring model for a particular task.

[0102]ＨＡＭＬＥＴの腕としてＨＡＭＬＥＴ自体で統合されたオープンソースフレームワークによるタスクのユーザ呼び出し訓練と比較して、かつリソース管理に多腕バンディットアルゴリズムを使用しないことと比較して、本発明の諸実施形態は、いくつかの改善をもたらす。前者の場合では、ユーザは、結果として得られたモデルを自分自身で記憶し比較しなければならないことがある。このことは以下の欠点を包含する。
・・所与のデータセットに対してどのフレームワークを使用できるかについての決定が、ユーザによってなされる必要があり（このステップのメタＡｕｔｏＭＬ効果が失われる）、また、
・・フレームワークの並列配置が、見込みのあるフレームワークの配置（外挿による）に焦点を合わせるＨＡＭＬＥＴ多腕バンディットアルゴリズムとの統合よりも、より多くの計算パワーを必要とする。
・・同じリソース量に対して低いスコア、並びに／又は同じスコアに対してより多くの時間及び／若しくはハードウェアリソース。
・・ユーザは、１つしかフレームワークを選ぶことができず、また、所与の問題に対して１つのフレームワークによる（フレームワークＡＰＩを介する）訓練を呼び出すことしかできない。これにより、最良のモデル（最高スコア）が見つからない、又は使用されないことになる可能性がある。 [0102] Implementations of the present invention compared to user-invoked training of tasks by an open source framework integrated with HAMLET itself as an arm of HAMLET, and compared to not using a multi-armed bandit algorithm for resource management. Morphology offers several improvements. In the former case, the user may have to store and compare the resulting models themselves. This includes the following drawbacks.
... the decision as to which framework can be used for a given dataset has to be made by the user (losing the meta-AutoML effect of this step), and
• Parallel placement of frameworks requires more computational power than integration with the HAMLET multi-armed bandit algorithm, which focuses on probable framework placement (by extrapolation).
• Lower scores for the same amount of resources and/or more time and/or hardware resources for the same scores.
• The user can only choose one framework and invoke training with one framework (via the framework API) for a given problem. This may result in the best model (highest score) not being found or used.

[0103]以下では、ＨＡＭＬＥＴの諸実施形態について、これらの実施形態によって得られたパフォーマンスの改善及び利点を明示する実験結果と共に記述する。たとえば、自動化アルゴリズムセレクション及びハイパーパラメータチューニングを行うことが、機械学習の適用を容易にする。ＨＡＭＬＥＴは、観測された報酬の履歴を見て長期にわたる期待総報酬を最適化するための最も見込みのある腕を特定する、従来の多腕バンディット戦略よりもパフォーマンスが優れている。本発明者らは、限られた時間予算及び計算リソースについて考察したときに、これらの従来の戦略が報酬の後方視を適用することを認識したが、この後方視は、バンディットが指定時間予算の終わりに最高の最終報酬を予期するために将来をのぞき見ないので、不適切である。対照的に、ＨＡＭＬＥＴは、学習曲線外挿と、機械学習アルゴリズムのセットの中から選択するための計算時間知識とによって、バンディット手法を拡張する。前の作業からの９９個の記録されたハイパーパラメータチューニングトレースによる実験では、検討されたすべてのＨＡＭＬＥＴ変形形態が、他のバンディットベースのアルゴリズムセレクション戦略と同等以上のパフォーマンスを呈した。最良動作ＨＡＭＬＥＴ変形形態では、学習曲線探索と、よく知られている信頼上限探索ボーナスとを組み合わせる。合計して、このバリアントは、統計的優位性が９５％レベルのすべての非ＨＡＭＬＥＴポリシーよりも低い平均ランクを達成する。 [0103] Embodiments of HAMLET are described below, along with experimental results demonstrating the performance improvements and advantages provided by these embodiments. For example, automated algorithm selection and hyperparameter tuning facilitate the application of machine learning. HAMLET outperforms traditional multi-armed bandit strategies that look at the history of observed rewards to identify the most promising arms to optimize expected total rewards over time. The inventors recognized that these conventional strategies apply reward backward-looking when considering limited time-budgets and computational resources; Inappropriate because it doesn't look into the future to anticipate the highest final reward at the end. In contrast, HAMLET extends the bandit approach with learning curve extrapolation and computational time knowledge for choosing among a set of machine learning algorithms. In experiments with 99 recorded hyperparameter tuning traces from previous work, all HAMLET variants considered performed at least as well as other bandit-based algorithm selection strategies. The best-behavior HAMLET variant combines a learning curve search with the well-known confidence upper search bonus. In total, this variant achieves a lower average rank than all non-HAMLET policies with statistical significance at the 95% level.

[0104]ＨＡＭＬＥＴは、データセットに適用されるべきベース学習器を選択できるようにする。一実施形態では、繰り返し手法が、ベース学習器と、階層問題としてのベース学習器のハイパーパラメータの最適化とを選択するようにモデル化される。多腕バンディットは、ベース学習器を選択することに焦点を合わせ、専用コンポーネント（チューナとも呼ばれる）が、そのそれぞれのベース学習器のハイパーパラメータをチューニングすることを担う。この手法は、複数のベース学習器を用いて、これらを追加腕として統合することによって容易に拡張可能である。ＨＡＭＬＥＴは、機械学習問題を解決するために使用可能な計算パワー及び使用可能な時間などのリソースの制限にＡｕｔｏＭＬが直面する実際的なセッティングに適用される。以下の議論では、厳しい実測時間予算内で機械学習タスクを解決するために使用可能な単一のＣＰＵの極端な事例を扱う。このセッティングでは、従来の多腕バンディットの手法は最適ではない。その理由は、この手法が完全な機能評価を観測すること、すなわち、関連する腕の統計情報を更新できるようになるために、パラメータ化ベース学習器をデータセットで訓練すること、を必要とするからである。加えて、ほとんどの多腕バンディットアルゴリズムは固定報酬分配を仮定しているが、この仮定は、チューニングアルゴリズムが達成パフォーマンスを経時的に高めるので、当てはまらない。最後に、典型的なＡｕｔｏＭＬセッティングでは、いくつかの反復試行にわたって平均報酬合計を最大化したいのではなく、最大可能パフォーマンスを達成したい。ＨＡＭＬＥＴは、時間を明確に計上することによって、また、異なる腕の学習曲線を学習し、この学習曲線を、各腕にすでに費やされた計算についての考察のもとに実測時間予算の終わりに外挿することによって、改善された多腕バンディット手法になる。学習曲線外挿と計算時間を計上することを組み合わせることにより、アルゴリズムセレクション問題における多腕バンディットのパフォーマンスが改善される。 [0104] HAMLET allows the selection of a base learner to be applied to a dataset. In one embodiment, an iterative approach is modeled to select a base learner and optimization of the hyperparameters of the base learner as a hierarchical problem. A multi-armed bandit focuses on selecting a base learner, and a dedicated component (also called a tuner) is responsible for tuning the hyperparameters of its respective base learner. This approach can be easily extended by using multiple base learners and merging them as additional arms. HAMLET is applied in practical settings where AutoML faces resource limitations such as available computing power and available time to solve machine learning problems. The following discussion deals with the extreme case of a single CPU that can be used to solve machine learning tasks within a tight wall clock budget. In this setting, the traditional multi-armed bandit approach is suboptimal. The reason is that this approach requires observing full functional evaluation, i.e. training a parameterization-based learner on the dataset to be able to update relevant arm statistics. It is from. In addition, most multi-armed bandit algorithms assume a fixed reward distribution, which is not true as tuning algorithms increase the achieved performance over time. Finally, in a typical AutoML setting, we do not want to maximize the average total reward over several iterations, but rather achieve the maximum possible performance. By explicitly accounting for time, HAMLET also learns the learning curve for the different arms and scales this learning curve at the end of the measured time budget with consideration of the computations already expended on each arm. Extrapolation leads to an improved multi-armed bandit approach. Combining learning curve extrapolation and accounting for computational time improves the performance of multi-armed bandits on algorithm selection problems.

[0105]以下で議論される実験的評価では、６つの異なるベース学習器に対して９９セットのトレースを使用する。その評価は、学習曲線近似の単純な手法でさえも、余裕のない時間予算の領域に向上をもたらすことを示す。全体として、最良動作ＨＡＭＬＥＴ変形形態では、実験に使用されたすべての非ＨＡＭＬＥＴバンディットよりも良いパフォーマンスを９５％信頼度で達成した。 [0105] The experimental evaluations discussed below use 99 sets of traces for six different base learners. The evaluation shows that even a simple technique of learning curve fitting yields improvements in areas of tight time budgets. Overall, the best performing HAMLET variant achieved better performance than all non-HAMLET bandits used in the experiment with 95% confidence.

[0106]多腕バンディット問題の基本形は、以下のように記述することができる。１つのエージェントが、Ｉ個の異なるアクションの間で１つの選択されたものに繰り返し面する。それぞれの選択されたものの後で、エージェントは、選択されたアクションに依存する固定確率分布から選ばれた数値報酬を受け取る。その目的は、ある期間又は時間ステップにわたって期待される総報酬を最大化することである。繰り返されるアクションセレクションを通して、エージェントは獲得物を、最良の腕に集中することによって最大化することができる。アクション価値の推定が維持されるならば、推定価値が最大である任意の時間ステップには少なくとも１つのアクション、貪欲アクション（複数可）、がある。貪欲アクションのうちの１つが選択されると、この選択は搾取と呼ばれる。非貪欲アクションが選択されたならば、この選択は、非貪欲アクションの価値の推定を改善できるようにするので、探索と呼ばれる。探索は、アクション価値推定の精度についての不確実性が常にあるために必要とされる。貪欲アクションは、現在のところ最良に見えるものであるが、他のアクションのうちのいくつかの方が実際には良いこともあり得る。 [0106] The basic form of the multi-armed bandit problem can be described as follows. An agent repeatedly faces a selection among I different actions. After each choice, the agent receives a numerical reward chosen from a fixed probability distribution that depends on the action chosen. The objective is to maximize the expected total reward over a period or time step. Through repeated action selection, the agent can maximize gains by concentrating on the best arm. If an estimate of the action value is maintained, then any time step where the estimated value is maximum has at least one action, the greedy action(s). When one of the greedy actions is chosen, this choice is called exploitation. If a non-greedy action is chosen, this choice is called a search because it allows an improved estimate of the value of the non-greedy action. Search is required because there is always uncertainty about the accuracy of action value estimates. The greedy action currently looks best, but some of the other actions may actually be better.

[0107]単純な探索技法では、ほとんどの時間に貪欲に行動することになるが、確率が小さく、代わりに、確率が等しいすべてのアクションから、アクション価値推定とは無関係にランダムに選択する。この方法は、「ε貪欲」と呼ばれる。ε貪欲の一利点は、時間ステップの数が増加するにつれてバンディットが、どのアクションもみな無限回数だけサンプリングすることである。したがって、バンディットのアクション価値推定は、正確な価値に収束することになる。一般に、ε貪欲アクションセレクションは、非貪欲アクションが試みられるように、しかし無差別に強制する。減衰するεと表示される別の技法は、εを高く初期化し、経時的にε（したがって、探索の速度）を低下させる。 [0107] A simple search technique would act greedily most of the time, but instead randomly select from all actions with small and equal probabilities, independent of the action value estimate. This method is called "ε-greedy". One advantage of ε-greedy is that the bandit samples every action an infinite number of times as the number of time steps increases. Therefore, the bandit's action value estimate will converge to the correct value. In general, the ε-greedy action selection forces non-greedy actions to be attempted, but indiscriminately. Another technique, denoted decaying ε, initializes ε high and slows ε (and thus the speed of the search) over time.

[0108]可能なアクションの中から選択する効果的な別の方法は、信頼上限（ＵＣＢ）法である。この方法はアクションを、その最適であることの潜在性によって選択する。ＵＣＢは、そのそれぞれの価値推定と、アクションが低い価値推定を持つ、又はすでに頻繁に選択されたものが経時的に頻度が低下して選択される、これらの推定の不確実性との両方を考慮に入れて、そのようにする。ＵＣＢバンディットは、各アクションの最高可能真値として仮定するのに妥当なものの上限に基づいて、アクションを選択する。１つのアクションが選択されるたびに、アクション価値推定の認識不確実性が低下するはずである。他方で、別のアクションが選択されるたびに、認識不確実性が増大する。ＵＣＢバンディットの困難さの１つは、非固定問題を扱うことにある。 [0108] Another effective method of selecting among possible actions is the Upper Confidence Bound (UCB) method. This method selects actions by their potential to be optimal. UCB combines both its respective value estimates and the uncertainties in those estimates that actions have low value estimates or are selected less frequently over time than were already frequently selected. Take it into account and do so. The UCB bandit selects actions based on upper bounds on what is reasonable to assume as the highest possible truth value for each action. The perceptual uncertainty of the action value estimate should decrease each time an action is chosen. On the other hand, the cognitive uncertainty increases each time another action is chosen. One of the difficulties of UCB Bandit is dealing with non-stationary problems.

[0109]本発明の諸実施形態により提示及び対処される多腕バンディット問題は、報酬が固定されていない元の問題とは異なる。アルゴリズムセレクションをするとき、報酬は、その腕で費やされる時間が多ければ増加するはずであるのに対して、改善の速度は未知である。目的は、総報酬を増大させることではなく、最良の単一の報酬を見つけることである。 [0109] The multi-armed bandit problem presented and addressed by embodiments of the present invention differs from the original problem, where the reward is not fixed. When making algorithm selection, the reward should increase with more time spent in the arm, whereas the rate of improvement is unknown. The goal is to find the best single reward, not to increase the total reward.

[0110]自動チューンモデル（ＡＴＭ）は、分散協調型スケール変更可能ＡｕｔｏＭＬシステムであり、アルゴリズムセレクション及びハイパーパラメータチューニングを組み込む。ＡＴＭは、２つのステップ、すなわちハイパーパーティションセレクションと後続のハイパーパラメータチューニングとを繰り返すことによってＡｕｔｏＭＬに取り組む。ハイパーパーティションは、１つの特定的なベース学習器、並びにその範疇のハイパーパラメータを含む。ＡＴＭは各ハイパーパーティションセレクションを多腕バンディット問題としてモデル化する。ＡＴＭは、３つのバンディットアルゴリズム、すなわち、標準ＵＣＢベースのアルゴリズム（ＵＣＢ１と呼ばれる）と、ＡｕｔｏＭＬセッティングにおいて出会った浮漂報酬を扱うように設計された２つのバリアントとをサポートする。浮漂報酬のために設計されたバリアントは、これまでに観測された最良のＫ報酬の速度又は平均（それぞれ、ＢｅｓｔＫ－Ｖｅｌｏｃｉｔｙ及びＢｅｓｔＫ－Ｒｅｗａｒｄｓと表示される）に基づいて、アクションを選択するための価値推定を計算する。一旦ハイパーパーティションが選ばれると、残っている未指定パラメータは、たとえばベイズ最適化を用いてベクトル空間から選択することができる。自動機械学習ソフトウェアシステムを開発するためのＭａｃｈｉｎｅＬｅａｒｎｉｎｇＢａｚａａｒフレームワークは、ＡＴＭの作業を拡張し、同じバンディット構造を組み込む。ＨＡＭＬＥＴは、両方と様々に異なる。第１に、ＨＡＭＬＥＴは、ハイパーパーティション間で選ばず、ただ単にベース学習器間で選び、したがって、範疇のハイパーパラメータを選択しない。第２に、ＨＡＭＬＥＴは、観測された報酬に対する学習曲線の単純なモデルに適合する新規のバンディットアルゴリズムを使用するが、所与の時間予算に最高可能報酬を見つけるための学習曲線の外挿に基づいてアクションを選択する。第３に、ＡＴＭ及びＭａｃｈｉｎｅＬｅａｒｎｉｎｇＢａｚａａｒは、完了した機能評価、すなわち、データセットで訓練後のベース学習器のテストパフォーマンス、に基づいてアクション価値統計情報を更新する。対照的に、ＨＡＭＬＥＴの一実施形態では、訓練統計情報を設定可能時間間隔で更新する。ベース学習器のチューナが最近の時間間隔でより良いモデルを見つけるように管理しなくても、ＨＡＭＬＥＴの一実施形態では、チューナの学習曲線の進捗（の不足）を追跡して、学習曲線を外挿することに基づいてリソース割り当ての計算を切り替えられるようにする。 [0110] Auto-Tuned Models (ATMs) are distributed collaborative, scalable AutoML systems that incorporate algorithm selection and hyperparameter tuning. ATM approaches AutoML by iterating in two steps: hyperpartition selection followed by hyperparameter tuning. A hyper-partition contains one specific base learner as well as its category of hyper-parameters. ATM models each hyperpartition selection as a multi-armed bandit problem. ATM supports three bandit algorithms: the standard UCB-based algorithm (called UCB1) and two variants designed to handle stray rewards encountered in the AutoML setting. Variants designed for float rewards have been used to select actions based on the velocity or mean of the best K-rewards observed so far (denoted as BestK-Velocity and BestK-Rewards, respectively). Calculate value estimates. Once a hyperpartition is chosen, the remaining unspecified parameters can be selected from the vector space using, for example, Bayesian optimization. The Machine Learning Bazaar framework for developing automated machine learning software systems extends ATM's work and incorporates the same bandit structure. HAMLET differs from both in many ways. First, HAMLET does not choose between hyperpartitions, it just chooses between base learners and thus does not choose hyperparameters of categories. Second, HAMLET uses a novel bandit algorithm that fits a simple model of the learning curve for the observed reward, but based on extrapolation of the learning curve to find the highest possible reward for a given time budget. to select an action. Third, ATM and Machine Learning Bazaar update action value statistics based on completed functional evaluations, ie test performance of base learners after training on datasets. In contrast, one embodiment of HAMLET updates training statistics at configurable time intervals. Even if the base learner's tuner is not managed to find a better model in recent time intervals, one embodiment of HAMLET tracks the tuner's learning curve progress (shortage) to avoid overshooting the learning curve. Allows switching resource allocation calculations based on

[0111]Ｈｙｐｅｒｂａｎｄは、ベース学習器のサンプリングされたパラメータ化のバンディットベースの早期停止法である。この方法は、ハイパーパラメータ最適化における腕がより多くの訓練時間を与えられると改善され得る、という事実に対応するバンディットを組み込む。Ｈｙｂｅｒｂａｎｄは、連続する半分化の概念に構築される。すなわち、Ｈｙｂｅｒｂａｎｄは、パラメータ化ベース学習器のセットを特定的な予算に対して走らせ、そのパフォーマンスを評価し、悪い方の動作をするセットの半分を停止する。ベース学習器の可能なパラメータ化のより大きいセットが提示された場合、Ｈｙｐｅｒｂａｎｄは、見込みがあるように見えないパラメータ化を停止し、残りの見込みのあるものにより多くの計算リソースを連続して割り当てる。ＨＡＭＬＥＴは、計算リソースをこれまでの過去のパフォーマンスではなく予測パフォーマンスに基づいて割り当てる点で異なる。ＨＡＭＬＥＴの一実施形態では、バンディットは、どのハイパーパラメータセッティングを走らせるかではなく、どのアルゴリズムを走らせるかを決定するために使用される。また、予算を割り当てる手法が異なる。Ｈｙｐｅｒｂａｎｄは、幾何学的探索の概念を適用して予算全体のうちの増加する部分を、数が減少するベース学習器パラメータ化に割り当てる。対照的に、ＨＡＭＬＥＴの一実施形態は、選ばれたチューナを設定された時間間隔に対して継続する。この間隔の後に、チューナは、見つかった最良のモデルの更新をもしあれば報告し、ＨＡＭＬＥＴは、それぞれのチューナの学習曲線を更新する。 [0111] Hyperband is a bandit-based early stopping method for the sampled parameterization of the base learner. This method incorporates a bandit that addresses the fact that the arm in hyperparameter optimization can be improved given more training time. Hyperband is built on the concept of successive halves. That is, Hyperband runs a set of parameterization-based learners against a particular budget, evaluates their performance, and shuts down half of the worst performing set. When presented with a larger set of possible parameterizations of the base learner, Hyperband stops parameterizations that do not look promising and assigns successively more computational resources to the remaining ones. . HAMLET differs in that it allocates computational resources based on expected performance rather than historical past performance. In one embodiment of HAMLET, bandits are used to determine which algorithms to run, rather than which hyperparameter settings to run. Also, the method of allocating the budget is different. Hyperband applies the concept of geometric search to allocate an increasing portion of the total budget to a decreasing number of base learner parameterizations. In contrast, one embodiment of HAMLET continues the selected tuner for a set time interval. After this interval, the tuners report updates to the best model found, if any, and HAMLET updates the learning curve for each tuner.

[0112]用語の「学習曲線」は、（１）繰り返し機械学習アルゴリズムのパフォーマンスをアルゴリズムの訓練時間又は繰り返し回数の関数として、及び（２）機械学習アルゴリズムのパフォーマンスを訓練のために使用可能なデータセットのサイズの関数として、記述するために用いられる。ＨＡＭＬＥＴの一実施形態によって対処されるＡｕｔｏＭＬ難題に対して、その焦点は、前のタイプの学習曲線を外挿することにある。 [0112] The term "learning curve" refers to (1) the performance of an iterative machine learning algorithm as a function of the algorithm's training time or number of iterations, and (2) the performance of the machine learning algorithm as a function of data available for training. Used to describe as a function of set size. For the AutoML challenge addressed by one embodiment of HAMLET, its focus is on extrapolating the previous type of learning curve.

[0113]ディープニューラルネットワーク（ＤＮＮ）のハイパーパラメータ最適化を目標にすることが、パフォーマンスをハイパーパラメータ構成の学習曲線の第１の部分から外挿する確率モデルを使用して可能である。このために、パラメトリック関数のセットをハイパーパラメータ構成ごとに適合させ、重み付き一次結合によって単一のモデルに組み合わせることができる。マルコフ連鎖モンテカルロ法を使用すると、学習曲線の確率的外挿が得られる。これらの確率的外挿は次に、見込みのないハイパーパラメータセッティングでの走行の自動早期終止に使用される。ハイパーパラメータセッティング及びＤＮＮのアーキテクチャの学習曲線を外挿することも可能である。知られているパラメトリック関数と組み合わせたベイズニューラルネットワーク（ＢＮＮ）に依拠して、Ｈｙｐｅｒｂａｎｄに適用する見込みのある候補としてのサンプルを見つけることができる。パラメトリック関数のモデルパラメータ、並びに重みをＢＮＮと混合することを予測すると、以前に観測された学習曲線についての知識を転送することが可能になる。しかし、このことは、良好で高速のパフォーマンスを得るためにＢＮＮを事前訓練するには、以前の学習曲線情報が必要とされることを暗に示す。 [0113] Targeting deep neural network (DNN) hyperparameter optimization is possible using a probabilistic model that extrapolates performance from the first part of the hyperparameter configuration learning curve. For this, a set of parametric functions can be fitted for each hyperparameter configuration and combined into a single model by weighted linear combinations. A stochastic extrapolation of the learning curve is obtained using the Markov Chain Monte Carlo method. These probabilistic extrapolations are then used for automatic early termination of runs with unpromising hyperparameter settings. It is also possible to extrapolate the hyperparameter settings and learning curve of the DNN's architecture. We can rely on Bayesian Neural Networks (BNN) in combination with known parametric functions to find samples as potential candidates for Hyperband application. Predicting the model parameters of the parametric function, as well as mixing the weights with the BNN, makes it possible to transfer knowledge about the previously observed learning curve. However, this implies that prior learning curve information is required to pre-train the BNN for good and fast performance.

[0114]Ｆｒｅｅｚｅ－Ｔｈａｗ最適化として知られる方法は、ハイパーパラメータ探索のための、ガウス過程ベースのベイズ最適化技法である。この方法は、パラメトリック指数関数的減衰モデルに基づく学習曲線モデルを含む。正の確定共分散カーネルが、繰り返し最適化曲線をモデル化するために使用される。Ｆｒｅｅｚｅ－Ｔｈａｗ法は、部分的に完了しているが活発に訓練されていないモデルのセットを維持し、その学習曲線モデルを、各繰り返しにおいてどれが「解ける」べきか、すなわち引き続き訓練すべきかを決定するために使用する。 [0114] A method known as Freeze-Thaw optimization is a Gaussian process-based Bayesian optimization technique for hyperparameter search. This method includes a learning curve model based on a parametric exponential decay model. A positive deterministic covariance kernel is used to model the iterative optimization curve. The Freeze-Thaw method maintains a set of partially completed but not actively trained models, and its learning curve model is determined at each iteration which should be "solved", i.e., should continue to be trained. used to make decisions.

[0115]回帰ベースの外挿モデルが、ハイパーパラメータ最適化の速度を上げるために学習曲線の外挿に使用されることがある。この技法は、以前のビルドからの軌道を使用して新たなビルドの予測をすることに基づいており、ここで「ビルド」とは、特定的なベース学習器パラメータ化による訓練走行を指す。したがって、以前のビルドからのデータを変換し、ノイズ項を加えて現在のビルドに整合させる、かつそのパフォーマンスを外挿することが可能である。この外挿機能は、ハイパーパラメータ構成を早期に特定し停止する役割を果たすことができる。 [0115] Regression-based extrapolation models may be used for learning curve extrapolation to speed up hyperparameter optimization. This technique is based on using trajectories from previous builds to make predictions for new builds, where "build" refers to a training run with a specific base learner parameterization. Therefore, it is possible to transform data from previous builds, add a noise term to match the current build, and extrapolate its performance. This extrapolation function can serve to identify and stop hyperparameter configurations early.

[0116]ＨＡＭＬＥＴの一実施形態では、あまり洗練されていない学習曲線関数を使用して、アルゴリズムセレクションのために後ろ向き多腕バンディットから前向き多腕バンディットに移動することにより得られる利益の一般的性質を明示する。また、ＨＡＭＬＥＴの一実施形態では、ＤＮＮなどの特定的なタイプのベース学習器に限定されない、一般的な手法を提示する。ＨＡＭＬＥＴの一実施形態の有効性を明示する際には、以前の学習曲線が依拠されなかった。以前の学習曲線からの情報の転送は明らかに有益であるが、このことは、ＨＡＭＬＥＴがアルゴリズムセレクションパフォーマンスを、以前の情報の転送及び再使用の故にではなく、単純な学習曲線外挿の故に改善することを明示する。一実施形態では、ＨＡＭＬＥＴは、現在のＡｕｔｏＭＬ問題からだけの情報を使用する。また、ＨＡＭＬＥＴの一実施形態では、個々のハイパーパラメータ構成ではなくベース学習器のチューナの、パフォーマンスの学習曲線を外挿する。 [0116] In one embodiment of HAMLET, a less sophisticated learning curve function is used to estimate the general nature of the benefits obtained by moving from a backward-looking multi-armed bandit to a forward-looking multi-armed bandit for algorithm selection. Clarify. Also, one embodiment of HAMLET presents a general approach that is not limited to a particular type of base learner such as a DNN. No previous learning curve was relied upon in demonstrating the effectiveness of an embodiment of HAMLET. Transferring information from previous learning curves is clearly beneficial, but this suggests that HAMLET improves algorithm selection performance not because of transferring and reusing previous information, but because of simple learning curve extrapolation. specify that In one embodiment, HAMLET uses information only from current AutoML problems. Also, one embodiment of HAMLET extrapolates the learning curve of the performance of the tuner of the base learner rather than the individual hyperparameter configurations.

[0117]下の表２は、ＨＡＭＬＥＴの可能な実施形形態についてさらに記述するために以下で使用される記号及び表記のリストを提示する。

[0117] Table 2 below presents a list of symbols and notations used below to further describe possible implementations of HAMLET.

[0118]ＨＡＭＬＥＴの諸実施形態によれば、ＡｕｔｏＭＬアルゴリズムセレクション問題は、多腕バンディット問題としてモデル化され、各腕は、１つの特定的なベース学習器を担うハイパーパラメータチューナを表す。各繰り返しにおいて、ＨＡＭＬＥＴは、以下で記述する、またアルゴリズム２及び３で概説した、腕の学習曲線の外挿に基づいてどのアクションを取るべきかを選ぶ、すなわち、引き出すべき腕を選択する。アクションを決定した後、バンディットは、対応するハイパーパラメータチューナの実行を事前設定時間間隔Δｔに対し継続する。この間隔が経過すると、腕の実行は休止されるが、後の繰り返しにおけるその時点で、情報を失うことなく再開することができる。その時間間隔内での腕の実行中、バンディットは、その時間間隔内に到達したすべての単調に増加する精度値、並びに、これらの精度にいつ到達したかについての情報（すなわち、これまでに更新された学習曲線）を受け取る。その時間間隔内でチューナが新たな単調に増加する精度値を見つけなかった場合には、この情報も取り入れられる。次に、バンディットは、学習曲線に整合するようにパラメトリック曲線を適合させ、残っている予算をΔｔだけ低減させる。その後、ＨＡＭＬＥＴは次の繰り返しに進む。ＨＡＭＬＥＴが新たなＡｕｔｏＭＬ問題に直面した場合、ＨＡＭＬＥＴは、腕ごとに学習曲線をモデル化するのに十分な値をＨＡＭＬＥＴが収集するまで、好ましくは腕をラウンドロビン法で試す。 [0118] According to embodiments of HAMLET, the AutoML algorithm selection problem is modeled as a multi-armed bandit problem, with each arm representing a hyperparameter tuner responsible for one specific base learner. At each iteration, HAMLET chooses which action to take based on an extrapolation of the arm's learning curve, described below and outlined in Algorithms 2 and 3, ie chooses which arm to draw. After determining the action, the bandit continues execution of the corresponding hyperparameter tuner for a preset time interval Δt. After this interval, arm execution is paused, but can be resumed at that point in a later iteration without loss of information. During execution of the arm within that time interval, the bandit collects all monotonically increasing accuracy values reached within that time interval, as well as information about when these accuracies were reached (i.e., updated so far). learning curve). If the tuner does not find a new monotonically increasing accuracy value within that time interval, this information is also taken. Bandit then fits a parametric curve to match the learning curve and reduces the remaining budget by Δt. HAMLET then proceeds to the next iteration. When HAMLET faces a new AutoML problem, it preferably tries arms in a round-robin fashion until HAMLET collects enough values to model a learning curve for each arm.

[0119]学習曲線外挿：ＨＡＭＬＥＴの諸実施形態によれば、学習曲線は、チューナ（ｙ）によって秒単位の訓練時間（ｘ）にわたって見つけられた精度としてモデル化される。ここで、訓練時間は、ベース学習器のパラメータ化を特定するためにチューナを実行することに費やされた、及びパラメータ化されたベース学習器をデータセットで訓練することに費やされた時間を含む。ＨＡＭＬＥＴでは、学習曲線は、訓練時間にわたって見つけられた最大精度によって定義された、単調に増加する関数として定義される。学習曲線外挿を明示するための例示的なグラフが図３に示されている。この例では、腕の訓練が５００秒間走っており、この腕の最大訓練時間が垂直線で印付けられている。将来の精度値を経時的に予測するために、この時点までに見つけられた精度値が、曲線を適合させるために使用される。このチューナによって見つけられた精度スコアから、単調に増加する値だけが学習曲線のために使用され、他のスコアは無視される。比較のために、実際の将来の学習曲線も示されており、この曲線は、外挿された学習曲線のグランドトゥルースであるが、ＨＡＭＬＥＴバンディットにとっては未知である。ＨＡＭＬＥＴの一実施形態は、指定された時間予算Ｂを最適に使用し、最高精度を達成するベース学習器を、ベース学習器の対応するチューナにほとんどの計算リソースを当てることによって見つける、ということを目標にする。したがって、ＨＡＭＬＥＴは、各チューナの最大達成可能精度を予測しようとする。ＨＡＭＬＥＴは、残っているすべての予算Ｂ_ｒｅｍが各チューナに費やされたと仮定して各チューナの学習曲線を外挿することによって、そのようにする。各チューナｉがすでに、ある訓練期間の時間ｔ_x ⁱを受け取っていることを考慮して、ＨＡＭＬＥＴは、チューナの学習曲線を使用して、図３にマーカで図示されている、ｘ＝ｔ_x ⁱ＋Ｂ_ｒｅｍにおける精度を予測する。 [0119] Learning curve extrapolation: According to HAMLET embodiments, the learning curve is modeled as the accuracy found by the tuner (y) over the training time (x) in seconds. where training time is the time spent running the tuner to specify the parameterization of the base learner and the time spent training the parameterized base learner on the dataset including. In HAMLET, the learning curve is defined as a monotonically increasing function defined by the maximum accuracy found over training time. An exemplary graph for demonstrating the learning curve extrapolation is shown in FIG. In this example, the arm exercise is running for 500 seconds and the maximum exercise time for this arm is marked with a vertical line. Accuracy values found up to this point are used to fit a curve in order to predict future accuracy values over time. From the accuracy scores found by this tuner, only monotonically increasing values are used for the learning curve, other scores are ignored. For comparison, an actual future learning curve is also shown, which is the ground truth of the extrapolated learning curve, but unknown to the HAMLET bandit. One embodiment of HAMLET finds the base learner that optimally uses a specified time budget B and achieves the highest accuracy by dedicating the most computational resources to the base learner's corresponding tuner. make it a goal Therefore, HAMLET tries to predict the maximum achievable accuracy for each tuner. HAMLET does so by extrapolating each tuner's learning curve assuming that all remaining budget _{B_rem} has been spent on each tuner. Considering that each tuner i has already received some training period time t _x ⁱ , HAMLET uses the tuner's learning curve to determine x=t _x Predict accuracy at ⁱ +B _rem .

[0120]学習曲線外挿が、抑制された計算セッティングにおけるアルゴリズムセレクション問題に対して有意義な概念であるかどうかを調査するために、直接的なパラメトリック関数が使用されて、チューナの経時的に観測される精度がモデル化される。ＨＡＭＬＥＴの諸実施形態による問題セッティングでは、学習曲線は、（１）単調に増加している、（２）関数を飽和させる、（３）値ｙ∈「０，１」で、と知られる。前提条件（１）～（３）の類似している形、及び可能な成就の故に、一実施形態によれば、４つのパラメータ（ａ、ｂ、ｃ、ｄ）を持つ逆正接関数を使用して、この第１の調査のセットを変換、伸長及び圧縮することが選ばれる。すなわち、
ｙ＝ａ・ａｒｃｔａｎ（ｂ（ｘ＋ｃ））＋ｄ（１）
ここで、ｓｃｉｋｉｔ－ｌｅａｒｎの曲線適合関数が、所望の曲線のパラメータを適合するために使用される。 [0120] To investigate whether learning curve extrapolation is a meaningful concept for the algorithm selection problem in a constrained computational setting, a direct parametric function was used to observe the tuner over time. modeled accuracy. In the problem setting according to embodiments of HAMLET, the learning curve is known to be (1) monotonically increasing, (2) saturating the function, and (3) with values y ε '0,1'. Because of the similar form and possible fulfillment of preconditions (1)-(3), according to one embodiment, we use an arctangent function with four parameters (a, b, c, d) , it is chosen to transform, decompress and compress this first set of studies. i.e.
y=a arctan(b(x+c))+d (1)
Here, a scikit-learn curve fitting function is used to fit the parameters of the desired curve.

[0121]ＨＡＭＬＥＴバリアント：ＨＡＭＬＥＴは、他の多腕バンディット戦略と同じ探索／搾取ジレンマに直面する。以下では、ＨＡＭＬＥＴが、次の繰り返しで走るべき腕をどのようにして選ぶかという３つのバリアントについて記述する。それぞれで、決定は、学習曲線を時間ｘ＝ｔ_x ⁱ＋Ｂ_ｒｅｍ, ｒⁱに外挿することによって予測された精度を腕ｉごとに含む、ベルトルｒの値に基づいている。３つのバリアントは、アルゴリズム３に概説されている。 [0121] HAMLET variant: HAMLET faces the same search/exploitation dilemma as other multi-armed bandit strategies. Below we describe three variants of how HAMLET chooses which arm to run on the next iteration. In each, the determination is based on the value of Berthol r, which includes the predicted accuracy for each arm i by extrapolating the learning curve to time x=t _x ⁱ +B _rem ,r ⁱ . Three variants are outlined in Algorithm 3.

[0122]ＨＡＭＬＥＴバリアント１固定ε_１及びε_２を持つダブルε貪欲学習曲線外挿：この手法では、ＨＡＭＬＥＴが学習曲線の外挿に基づくε貪欲法で作動する。チューナのサブセットがしばしば残りのものよりも非常に良く動作することを予備実験で観測した後、標準ε貪欲バンディットは次のように修正された。すなわち、チャンスε_２では、ＨＡＭＬＥＴは１つのアクションをランダムに選ぶ。チャンスε_１では、ＨＡＭＬＥＴは、予測精度が２番目に高いチューナを選ぶ。チャンス１－（ε_１＋ε_２）では、ＨＡＭＬＥＴは貪欲アクション、すなわちａｒｇｍａｘ（ｒ）を取る。 [0122] HAMLET Variant ₁ Double ε _- greedy learning curve extrapolation with fixed ε1 and ε2: In this approach, HAMLET operates in an ε-greedy method based on learning curve extrapolation. After observing in preliminary experiments that a subset of tuners often performed much better than the rest, the standard ε-greedy bandit was modified as follows. That is, at chance ε2, HAMLET randomly chooses _one action. At chance ε ₁ , HAMLET chooses the tuner with the second highest prediction accuracy. At chance 1-(ε ₁ +ε ₂ ), HAMLET takes a greedy action, ie argmax(r).

[0123]ＨＡＭＬＥＴバリアント２減衰を持つε貪欲学習曲線外挿：この手法では、ＨＡＭＬＥＴが学習曲線の外挿に基づくε貪欲法で作動する。バリアントは、ε（ｔ）＝１から開始し、このεを

の繰り返しステップでε（Ｂ）＝０まで低減させ、ここで表記ε（ｔ）は、確率的探索パラメータの時間依存性を表示する。各繰り返しにおいて、ＨＡＭＬＥＴは、チャンスが現在ε（ｔ）で１つのアクションをランダムに選ぶ。チャンス１－ε（ｔ）では、ＨＡＭＬＥＴは貪欲アクションを取る。 [0123] HAMLET Variant 2 ε-greedy learning curve extrapolation with decay: In this approach, HAMLET operates in an ε-greedy method based on learning curve extrapolation. A variant starts with ε(t)=1 and sets this ε to

in repeated steps until ε(B)=0, where the notation ε(t) denotes the time dependence of the probabilistic search parameters. At each iteration, HAMLET randomly chooses one action whose chance is now ε(t). At chance 1-ε(t), HAMLET takes greedy action.

[0124]ＨＡＭＬＥＴバリアント３探索ボーナスを持つ学習曲線外挿：このバリアントは、スケール変更された１つのＵＣＢベースの探索ボーナスを学習曲線予測に腕ごとに加えて、スコアを次式の通りに計算する。

ここで、ｎは合計繰り返し数、ｎ^ｉは腕ｉが引き出された回数、ρは探索ボーナスの倍率である。各繰り返しにおいて、ＨＡＭＬＥＴは、最大

を持つ腕を選択する。

[0124] HAMLET VARIANT 3 Learning Curve Extrapolation with Exploration Bonus: This variant adds one scaled UCB-based exploration bonus to the learning curve prediction per arm and calculates the score as follows: .

where n is the total number of iterations, ni is the number of times arm ⁱ has been drawn, and ρ is the exploratory bonus multiplier. At each iteration, HAMLET

Select the arm with the

[0125]ハイパーパラメータチューニングを６つのベース学習器に進化的戦略によって実行した実験のトレース（ＭｉｓｃｈａＳｃｈｍｉｄｔ他の「ＯｎｔｈｅＰｅｒｆｏｒｍａｎｃｅｏｆＤｉｆｆｅｒｅｎｔｉａｌＥｖｏｌｕｔｉｏｎｆｏｒＨｙｐｅｒｐａｒａｍｅｔｅｒＴｕｎｉｎｇ」、ａｒＸｉｖ；１９０４．０６９６０ｖ１、（２０１９年４月１５日）参照）が使用され、以下の議論で参照される。記録された実験トレースで異なるアルゴリズムセレクションポリシーを走らせると、異なるバンディットポリシーをグランドトゥルースに基づいて評価することが可能になる。これらのトレースは、分類データセットからのものである。上の式（１）は、回帰データセットなどの他のデータセット用に調整することができる。 [0125] Tracing an experiment in which hyperparameter tuning was performed on six base learners by an evolutionary strategy (Mischa Schmidt et al., "On the Performance of Differential Evolution for Hyperparameter Tuning", arXiv; 1904.06960v1, (April 2019) 15)) is used and is referenced in the discussion below. Running different algorithm selection policies on recorded experimental traces allows different bandit policies to be evaluated based on ground truth. These traces are from a classification dataset. Equation (1) above can be adjusted for other data sets, such as regression data sets.

[0126]計算リソース及びセットアップ：各チューナ（及びベース学習器）が、単一のドッカーコンテナで、単一のＣＰＵだけがアクセス可能な状態で実行された。異なる実験の並列実行は、完全なＣＰＵコアを各ドッカーコンテナが使用可能であることを確実にするために制限された。メモリリソース可用性には制限がなかった。以下で評価されたＨＡＭＬＥＴの一実施形態では、バンディット論理を、単一のＣＰＵコアに抑制されたドッカーコンテナでも実行する。その実行は、各ドッカーコンテナが１つの完全なＣＰＵコアにアクセスできることを確実にするために、限定された異なる実験走行であった。 [0126] Computational resources and setup: Each tuner (and base learner) ran in a single docker container, accessible only to a single CPU. Parallel execution of different experiments was limited to ensure that a full CPU core was available to each docker container. There was no limit on memory resource availability. In one embodiment of HAMLET evaluated below, the bandit logic also runs in a docker container constrained to a single CPU core. The run was a limited and different experimental run to ensure that each docker container had access to one full CPU core.

[0127]データセット、ベース学習器及びハイパーパラメータチューナ：以下で使用され論じられる実験からのトレースを参照して、アルゴリズムセレクションが、４９個の小さい（ＯｐｅｎＭＬデータセット：｛２３、３０、３６、４８、２８５、６７９、６８３、７２２、７３２、７４１，７５２、７７０、７７３、７９５、７９９、８１２、８２１、８５９、８６２、８７３、８９４、９０６、９０８、９１１、９１２、９１３、９３２、９４３、９７１、９７６、９９５、１０２０、１０３８、１０７１、１１００、１１１５、１１２６、１１５１、１１５４、１１６４、１４１２、１４５２、１４７１、１４８８、１５００、１５３５、１６００、４１３５、４０４７５｝）及び１０個の大きい（ＯｐｅｎＭＬデータセット:｛４６、１８４、２９３、３８９、５５４、７７２、９１７、１０４９、１１２０、１１２８｝繰り返しがそれぞれ５回）データセットに対して遂行された。検証実験パラメータセットを有する以下の表３には、小さいデータセット（実験１と表示）及び大きいデータセット（実験２と表示）についてのアルゴリズムセレクション実験に使用された予算が記録されている。記録されたトレースは、６つのベース学習器、すなわち、ｋ近傍法、線形及びカーネルサポートベクトルマシン（ＳＶＭ）、アダブースト、ランダムフォレスト、及び多層パーセプトロンを含む。

[0127] Datasets, base learners and hyperparameter tuners: Referring to the traces from experiments used and discussed below, the algorithm selection is 49 small (OpenML datasets: {23, 30, 36, 48 , 285, 679, 683, 722, 732, 741, 752, 770, 773, 795, 799, 812, 821, 859, 862, 873, 894, 906, 908, 911, 912, 913, 932, 943, 971 , 976, 995, 1020, 1038, 1071, 1100, 1115, 1126, 1151, 1154, 1164, 1412, 1452, 1471, 1488, 1500, 1535, 1600, 4135, 40475}) and 10 large (OpenML data Set: {46, 184, 293, 389, 554, 772, 917, 1049, 1120, 1128} iterations (5 each) were performed on the data set. Table 3 below, with the validation experiment parameter set, records the budgets used for the algorithm selection experiments for the small data set (labeled Experiment 1) and the large data set (labeled Experiment 2). The recorded traces contain six base learners: k-nearest neighbors, linear and kernel support vector machines (SVM), Adaboost, random forest, and multi-layer perceptron.

[0128]比較評価のためのポリシー及びパラメータ化：ＨＡＭＬＥＴバリアント１～３では、時間が、異なる腕の実行を凍結及び継続できる機能によって（たとえば、標準プロセス制御機構を介して）、Δｔ＝１０秒の間隔で進行する。他のバンディットポリシーとの公平な比較のために、実験では、腕の学習曲線を適合させるのに必要な計算時間を考慮に入れる。この作業では、ＨＡＭＬＥＴバリアント１～３と、単純なラウンドロビン戦略（「ＲｏｕｎｄＲｏｂｉｎ」）、標準ＵＣＢ１バンディット（「ＵＶＢ」）、ＢｅｓｔＫ－Ｒｅｗａｒｄｓ（「ＢｅｓｔＫＲｅｗａｒｄ－Ｋ」、ここでＫは、使用されるパラメータ選択を指す）及びＢｅｓｔＫ－Ｖｅｌｏｃｉｔｙ（「ＢｅｓｔＫＶｅｌｏｃｉｔｙ－Ｋ」）ポリシーとを比較して、ＢＴＢライブラリを活用する。ＨＡＭＬＥＴバリアント１は、「ＭａｓｔｅｒＬＣ－ε_１－ε_２」で提示される。ＨＡＭＬＥＴバリアント２は「ＭａｓｔｅｒＬＣＤｅｃａｙ」で参照され、「ＭａｓｔｅｒＬＣ－ＵＣＢ－ρ」はＨＡＭＬＥＴバリアント３を指す。各パラメータ化可能ポリシー（ＢｅｓｔＫ－Ｒｅｗａｒｄｓ、ＢｅｓｔＫ－Ｖｅｌｏｃｉｔｙ、ＨＡＭＬＥＴバリアント１及び２）では、全データセット及び全予算を考える場合、そのポリシーの最良動作パラメータを特定するために単純なグリッドサーチが走った（表３参照）。この単純なグリッドサーチは、データサイエンティストが最適なポリシーパラメータ化を事前に知っていないことがある、したがって、知識に裏付けられた推測に基づいてパラメータ化する、現実的なセッティングを模倣している。 [0128] Policy and parameterization for comparative evaluation: For HAMLET variants 1-3, the time is Δt = 10 seconds, with the ability to freeze and continue execution of different arms (e.g., via standard process control mechanisms). interval. For fair comparison with other bandit policies, the experiment takes into account the computational time required to fit the arm's learning curve. In this work, HAMLET variants 1-3, a simple round-robin strategy (“Round Robin”), standard UCB1 Bandit (“UVB”), BestK-Rewards (“BestKReward-K”, where K is used parameter selection) and BestK-Velocity (“BestKVelocity-K”) policies to leverage the BTB library. HAMLET variant 1 is presented in "MasterLC-ε ₁ -ε ₂ ". HAMLET variant 2 is referenced by “MasterLCDecay” and “MasterLC-UCB-ρ” refers to HAMLET variant 3. For each parametrizable policy (BestK-Rewards, BestK-Velocity, HAMLET variants 1 and 2), a simple grid search was run to identify the best operating parameters for that policy given the entire dataset and the entire budget. (See Table 3). This simple grid search mimics a realistic setting where data scientists may not know the optimal policy parameterization in advance and therefore parameterize based on educated guesses.

[0129]分析：所与の予算内でデータセットごとに各ポリシーパラメータ化によって達成された最高精度同士は、ボックスプロット図を用いて比較されて、Ｋ、ε_１、ε_２、及びρのうちの最も見込みのある選択されたものが特定された。異なる予算全体にわたって各ポリシーの最良動作ポリシーパラメータ化を特定した後、これらの最良動作ポリシーパラメータ化は互いに比較された。これらのポリシー間比較におけるポリシーの平均ランクの９５％信頼区間が、次に計算された。以下で論じる図で、ＩＱＲは四分位数範囲の略である。四分位数範囲は、データの中間５０％を表す。四分位数範囲は、データの第３の四分位数と第１の四分位数との差であるので、四分位数と関連している。ランク付けされたデータセットでは、四分位数は、データセットを４つの等しい部分に分割する３つの値である。４つの部分のそれぞれは、データの２５％を含む。四分位数１は最も小さい四分位数である。データセットの２５％が四分位数１の下にあり、データセットの７５％が四分位数１の上にあり、以下同様である。ランク付けされたデータセットでは、四分位数は、データセットを４つの等しい部分に分割する３つの値である。４つの部分のそれぞれは、データの２５％を含む。四分位数１は最も小さい四分位数である。データセットの２５％が四分位数１の下にあり、データセットの７５％が四分位数１の上にあり、以下同様である。ランク付けでは、最良の結果が最低のランクを得るように、２番目に良い結果が２番目に低いランクを得るようにソートされ、以下同様である。最悪の結果は最高のランクを得る。実験結果については以下で論じる。 [0129] Analysis: The highest accuracies achieved by each policy parameterization for each dataset within _a given budget _were compared using boxplot plots to determine the of the most promising selects were identified. After identifying the best operating policy parameterizations for each policy across different budgets, these best operating policy parameterizations were compared to each other. The 95% confidence intervals for the average rank of policies in these inter-policy comparisons were then calculated. In the figures discussed below, IQR stands for interquartile range. Interquartile ranges represent the middle 50% of the data. The interquartile range is related to the quartile as it is the difference between the third quartile and the first quartile of the data. In a ranked dataset, quartiles are three values that divide the dataset into four equal parts. Each of the four parts contains 25% of the data. Quartile 1 is the smallest quartile. 25% of the dataset is below quartile 1, 75% of the dataset is above quartile 1, and so on. In a ranked dataset, quartiles are three values that divide the dataset into four equal parts. Each of the four parts contains 25% of the data. Quartile 1 is the smallest quartile. 25% of the dataset is below quartile 1, 75% of the dataset is above quartile 1, and so on. The ranking is sorted so that the best result gets the lowest rank, the second best result gets the second lowest rank, and so on. The worst results get the highest rank. Experimental results are discussed below.

[0130]図４ａ、４ｂ及び４ｃは、実験１のＨＡＭＬＥＴバリアント１ランクのボックスプロット図を示し、かなりのレベルの一定の偶然性、並びに小さすぎるレベルの確率的探索がバリアント１のパフォーマンスには好ましくないことが確認される。ε_１＝０．１及びε_２＝０．１が、他のポリシーと比較するために、同様に動作しているパラメータ化の中から選択された。より小さい予算では、結果は質的に変化しない。 [0130] Figures 4a, 4b and 4c show boxplots of the HAMLET variant 1 ranks of Experiment 1, showing that a significant level of constant randomness and too small a level of probabilistic search are detrimental to variant 1 performance. It is confirmed that ε ₁ =0.1 and ε ₂ =0.1 were chosen among similarly performing parameterizations for comparison with other policies. With a smaller budget, the results do not change qualitatively.

[0131]図５ａ、５ｂ、５ｃ及び５ｄは、実験１のＨＡＭＬＥＴバリアント３ランクのボックスプロット図を示し、ＵＣＢ探索ボーナスをスケール変更するための、中から大のρが、ＵＣＢボーナスを全く不活性にしているために、バリアント３のパフォーマンスには好ましくないことが確認される。ρ＝０．０５が、他のポリシーと比較するために選択された。より小さい予算では、結果は質的に変化しない。 [0131] Figures 5a, 5b, 5c and 5d show boxplots of the HAMLET variant 3 ranks of Experiment 1, for scaling the UCB exploration bonus, medium to large rho inactivates the UCB bonus at all. , it is confirmed that the performance of variant 3 is not favorable. ρ=0.05 was chosen for comparison with other policies. With a smaller budget, the results do not change qualitatively.

[0132]図６ａ、６ｂ及び６ｃは、ポリシー間比較のための異なる予算に対する、小さいデータセットのランクの選択されたボックスプロット図を示す。特に、ＨＡＭＬＥＴバリアント１及び３が、すべての実験にわたって好適なパフォーマンスを達成していることが分かる。 [0132] Figures 6a, 6b and 6c show selected boxplots of small data set ranks for different budgets for inter-policy comparison. In particular, it can be seen that HAMLET variants 1 and 3 achieve good performance across all experiments.

[0133]図４ａ、４ｂ及び４ｃと同様に、図７ａ、７ｂ及び７ｃでは、高レベルの一定の偶然性、並びに小さすぎるレベルの確率的探索が、バリアント１のパフォーマンスには好ましくないことが確認される。ε_１＝０．１及びε_２＝０．１のセレクションが、パラメータ化の中から確認された。 [0133] Similar to Figures 4a, 4b and 4c, Figures 7a, 7b and 7c confirm that high levels of constant randomness and too small levels of probabilistic search are detrimental to Variant 1 performance. be. A selection of ε ₁ =0.1 and ε ₂ =0.1 was confirmed among the parameterizations.

[0134]図５ａ、５ｂ、５ｃ及び５ｄと同様に、図８ａ、８ｂ、８ｃ及び８ｄでは、ＵＣＢ探索ボーナスをスケール変更するための、中から大のρがバリアント３のパフォーマンスには好ましくないことが確認される。ρ＝０．０５が確認された。 [0134] Similar to Figures 5a, 5b, 5c and 5d, Figures 8a, 8b, 8c and 8d show that medium to large p for scaling the UCB search bonus is not favorable for Variant 3 performance. is confirmed. ρ=0.05 was confirmed.

[0135]図９ａ、９ｂ、９ｃ、９ｄ、９ｅ、９ｆ、９ｇ及び９ｈは、８つすべての予算に対するより大きいデータセットのランクのボックスプロット図を示す。ＨＡＭＬＥＴバリアント１（ＭａｓｔｅｒＬＣ－ε_１－ε_２）及びバリアント３（ＭａｓｔｅｒＬＣ－ＵＣＢ－Ｉ）は、好適なパフォーマンスを達成する。より高い予算では、ＢｅｓｔＫＲｅｗａｒｄｓ－７及びＵＣＢが追いつくことができる。 [0135] Figures 9a, 9b, 9c, 9d, 9e, 9f, 9g and 9h show box plots of the ranks of the larger dataset for all eight budgets. HAMLET variant 1 (MasterLC-ε ₁ -ε ₂ ) and variant 3 (MasterLC-UCB-I) achieve good performance. With a higher budget, BestKRewards-7 and UCB can catch up.

[0136]ＨＡＭＬＥＴバリアント１及び３は、時間認識及び学習曲線外挿機能の利益を得る。外挿が有益に奨励されるが、偶然性又は探索ボーナスのレベルがあまりに高いとアルゴリズムセレクションパフォーマンスが低減するので、抑制される（図４ａ～ｃ、図５ａ～ｄ、図７ａ～ｃ及び図８ａ～ｄ参照）。各実験において、低から中のいくつかの予算では、ＨＡＭＬＥＴバリアント１及び３は、競合ポリシーよりも良好に動作する。その範囲外の予算では、ＨＡＭＬＥＴバリアント１及び３は、少なくともＢｅｓｔＫ－Ｒｅｗａｒｄポリシーと同等に動作する。 [0136] HAMLET variants 1 and 3 benefit from time awareness and learning curve extrapolation capabilities. Extrapolation is beneficially encouraged but discouraged because too high levels of chance or search bonuses reduce algorithm selection performance (Figs. 4a-c, 5a-d, 7a-c and 8a-c). d). In each experiment, HAMLET variants 1 and 3 performed better than the competitive policy at some low to medium budgets. For budgets outside that range, HAMLET variants 1 and 3 perform at least as well as the BestK-Reward policy.

[0137]図６ａ～ｃ及び図９ａ～ｈのボックスプロット図は、ＨＡＭＬＥＴバリアントが、ある範囲の予算に対して、比較されるポリシーよりも良好に動作することを標示している。図１０は、全１４８５走行（９９トレース×１５予算レベル）の集約を示す。ここで、探索の不確実性ボーナスと組み合わされたＨＡＭＬＥＴバリアント３学習曲線外挿は特に、それぞれの信頼区間（ＣＩ）が重なり合わないので、他のポリシー（ＨＡＭＬＥＴバリアント１を除く）よりも９５％レベルで統計的に有意な、良好なパフォーマンスを達成する。したがって、学習曲線外挿と計算時間を計上することとの組み合わせが、アルゴリズムセレクション問題における多腕バンディットのパフォーマンスを改善するという本発明者らの発見は確かに正しいと、実験から、また特に、ＨＡＭＬＥＴバリアント３のパフォーマンスから結論付けることができる。適用される学習曲線外挿技法が直接的であることを考慮すると（上の式（１）参照）、代替のポリシー手法と比べてＨＡＭＬＥＴの相対的な利点を増加させるだけであるはずの、より洗練された学習曲線技法も適用することができる。 [0137] The boxplots of Figures 6a-c and Figures 9a-h demonstrate that the HAMLET variant performs better than the compared policies for a range of budgets. Figure 10 shows the aggregation of all 1485 runs (99 traces x 15 budget levels). Here, HAMLET variant 3 learning curve extrapolation combined with search uncertainty bonus is 95% better than other policies (except HAMLET variant 1), especially since their respective confidence intervals (CI) do not overlap. Achieving statistically significant and good performance at level. Therefore, our findings that the combination of learning curve extrapolation and accounting for computational time improve the performance of multi-armed bandits in algorithm selection problems are certainly valid, from experiments and in particular HAMLET It can be concluded from the performance of variant 3. Given that the applied learning curve extrapolation technique is straightforward (see equation (1) above), the more Sophisticated learning curve techniques can also be applied.

[0138]異なるポリシーグループの各最良動作ポリシーが互いに予算ごとに比較された場合に、結果の傾向が変化しないことも検証された。最終的に、ＢｅｓｔＫ－Ｖｅｌｏｃｉｔｙポリシーは、ＢｅｓｔＫ－Ｒｅｗａｒｄｓ戦略又はＵＣＢ戦略よりも通常は非常に悪く動作することが、実験中に観測された。 [0138] It was also verified that the trends in the results did not change when the best performing policies of the different policy groups were compared to each other by budget. Finally, it was observed during experiments that the BestK-Velocity policy usually performed much worse than the BestK-Rewards or UCB strategies.

[0139]要約すると、ある範囲のバンディットポリシーパラメータ化の実験は、学習曲線を外挿する直接的な手法でさえも、バンディットベースのアルゴリズムセレクション問題に対する有用な改正になることを示す。統計的分析は、ＨＡＭＬＥＴバリアントが、本明細書に詳述されているいくつかの改善及び利点をもたらしながら、標準のバンディット手法と少なくとも同様であることを示す。とりわけ、学習曲線外挿とスケール変更されたＵＣＢ探索ボーナスとを組み合わせるＨＡＭＬＥＴバリアント３は、すべてのデータセット及び予算にわたって集約された、異なるポリシーの平均ランクの９５％信頼期区間を図示する図１０に示されるように、すべての非ＨＡＭＬＥＴバリアントよりも優れた動作をする。さらなるパフォーマンス改善さえも、たとえば、より洗練された学習曲線モデル化手法を用いて、たとえばＢＮＮベースの学習曲線予測器を用いて、又はメタ学習概念を統合することによって、たとえばＨＡＭＬＥＴを文脈バンディットに進化させることによって、達成することができる。 [0139] In summary, a range of bandit policy parameterization experiments show that even a straightforward approach to extrapolating the learning curve can be a useful refinement to the bandit-based algorithm selection problem. Statistical analysis indicates that the HAMLET variant is at least similar to the standard Bandit approach while offering several improvements and advantages detailed herein. Notably, HAMLET variant 3, which combines learning curve extrapolation and a rescaled UCB search bonus, is shown in FIG. As shown, it outperforms all non-HAMLET variants. Even further performance improvements can be made, e.g., by using more sophisticated learning curve modeling techniques, e.g., using BNN-based learning curve predictors, or by integrating meta-learning concepts, e.g. It can be achieved by letting

[0140]本発明の諸実施形態について図面及び上記の記述で詳細に図示及び記述してきたが、このような図示及び記述は限定的なものではなく、図示的又は例示的なものと考えられるべきである。変更及び修正が、添付の特許請求の範囲内で当業者によって加えられ得ることを理解されたい。特に、本発明はさらなる諸実施形態を、上述及び下述の異なる実施形態の特徴の任意の組み合わせと共に包含する。加えて、本発明を特徴付ける本明細書で述べられたものは、本発明の、必ずしもすべての実施形態ではなく、一実施形態を指す。 [0140] While embodiments of the present invention have been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. is. It should be understood that changes and modifications can be made by those skilled in the art within the scope of the appended claims. In particular, the invention encompasses further embodiments together with any combination of features of the different embodiments described above and below. Additionally, statements made herein that characterize the invention refer to one, but not necessarily all, embodiments of the invention.

[0001]特許請求の範囲で用いられる用語は、以上の記述と一致する最も広い妥当な解釈が得られるように理解されるべきである。たとえば、ある要素を導入する際に冠詞［ａ］又は［ｔｈｅ］を使用することは、複数の要素を除外していると解釈されるべきではない。同様に、「又は」という記載は、文脈又は前の記述からＡ及びＢのうちの１つだけが意図されていることが明らかではない限り、「Ａ又はＢ」の列挙が「Ａ及びＢ」を除外しないように、包含的であると解釈されるべきである。さらに、「Ａ、Ｂ及びＣのうちの少なくとも１つ」という記載は、Ａ、Ｂ及びＣから成る要素の群のうちの１つ又は複数と解釈されるべきであり、Ａ、Ｂ及びＣがカテゴリーとして、又は別様に関連しているかどうかにかかわらず、列挙された要素Ａ、Ｂ及びＣのそれぞれの少なくとも１つを要求すると解釈されるべきではない。さらには、「Ａ、Ｂ及び／又はＣ」あるいは「Ａ、Ｂ又はＣのうちの少なくとも１つ」という記載は、列挙された要素からの、たとえばＡである任意の単数形の実体、列挙された要素からの、たとえばＡ及びＢである任意のサブセット、又はＡ、Ｂ及びＣのリスト全体を含むと解釈されるべきである。 [0001] The terms used in the claims should be understood to provide the broadest reasonable interpretation consistent with the foregoing description. For example, use of the articles [a] or [the] when introducing an element should not be construed as excluding more than one element. Similarly, the recitation of "or" means that the recitation of "A or B" is the same as that of "A and B," unless it is clear from the context or the preceding description that only one of A and B is intended. should be construed as inclusive so as not to exclude Further, references to "at least one of A, B and C" are to be interpreted as one or more of the group of elements consisting of A, B and C, where A, B and C are It should not be construed as requiring at least one of each of the listed elements A, B and C, whether categorical or otherwise related. Further, references to "A, B and/or C" or "at least one of A, B or C" refer to any singular entity, e.g. It should be construed to include any subset, e.g. A and B, or the entire list of A, B and C from the elements listed.

Claims

A method for automatically selecting a machine learning algorithm and tuning hyperparameters of said machine learning algorithm, comprising:
receiving a dataset and a machine learning task from a user;
controlling the execution of multiple instantiations of different automatic machine learning frameworks for said machine learning task, each as a separate arm given available computational resources and time budgets, thereby during execution by said separate arms; a controlling step in which a plurality of machine learning models are trained in and a performance score of the plurality of trained models is calculated;
selecting one or more of the plurality of trained models for the machine learning task based on the performance score;
A method, including

2. The method of claim 1, wherein the performance score is extrapolated to the remainder of the time budget based on the achieved performance of each of the arms during a time interval of execution that is part of the time budget.

3. The method of claim 2, further comprising allocating the computational resource to the arm during the remaining time of the time budget based on the extrapolated performance score.

3. The performance score is extrapolated by fitting a learning curve function to the past rewards of each of the arms and extrapolating the past rewards to the end of the remainder of the time budget. The method described in .

3. The method of claim 2, further comprising freezing the execution of at least one of the arms based on the extrapolated performance score.

6. The method of claim 5, further comprising resuming said execution of at least one of said arms from the point at which said freezing occurred.

2. The method of claim 1, wherein at least some of the arms are performed by time division multiplexing using a selection mechanism to allocate the computational resources to the arms during the time budget.

2. The method of claim 1, wherein at least some of said arms are executed in parallel.

2. The method of claim 1, further comprising building an ensemble from the plurality of trained models.

2. The method of claim 1, wherein each of said arms is implemented as a microservices component of a cloud computer system architecture in a docker container with a respective container image of said automated machine learning framework.

11. The method of claim 10, wherein the docker container is housed within a larger docker container, the larger docker container housing a separate docker container for a component that controls execution of the arm.

constructing a learning curve for each of said arms during a time interval of said execution within said time budget;
extrapolating the performance score of each of the arms to the remainder of the time budget;
freezing or disabling execution of at least some of the arms based on the extrapolated performance score;
2. The method of claim 1, further comprising:

13. The method of claim 12, wherein the learning curve is constructed based on maximum performance scores achieved by each of the arms during the time interval.

A microservice component encapsulated in a docker container of a cloud computing system architecture comprising one or more processors, said one or more processors, alone or in combination, comprising:
controlling the execution of multiple instantiations of different automated machine learning frameworks for a machine learning task, each as a separate arm given available computational resources and time budgets, thereby during execution by said separate arms A microservice component configured to effect execution of a method comprising a controlling step in which a plurality of machine learning models are trained and a performance score of said plurality of trained models is calculated.

A tangible, non-transitory computer-readable medium having instructions that, when executed by one or more processors, alone or in combination:
controlling the execution of multiple instantiations of different automated machine learning frameworks for a machine learning task, each as a separate arm given available computational resources and time budgets, thereby during execution by said separate arms A tangible, non-transitory computer-readable medium for effecting execution of a method comprising a controlling step in which a plurality of machine learning models are trained and performance scores of the trained models are calculated.