JP2023520313A

JP2023520313A - Generating Performance Forecasts with Uncertainty Intervals

Info

Publication number: JP2023520313A
Application number: JP2022555680A
Authority: JP
Inventors: アーノルド、マシュー、リチャード; エルダー、ベンジャミン、タイラー; ナブラティル、ジリ; ヴェンカタラマン、ガネシュ
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2020-04-07
Filing date: 2021-02-16
Publication date: 2023-05-17
Also published as: IL295764A; WO2021205244A1; GB2609160A; KR20220163362A; US20210312323A1; GB202216256D0; CA3170297A1; AU2021251463A1; US11989626B2; AU2021251463B2; CN115349129A

Abstract

不確定区間を有する性能予測の生成。不確定区間を有する機械学習モデルの性能予測を生成するための技術は、タスクを実行するように構成される第１のモデルおよび製造データセットを取得することを含む。製造データセット上でタスクを実行するとき、第１のモデルの性能を予測する少なくとも１つの測定基準は、第２のモデルを用いて生成される。第２のモデルは、第１のモデルに関連付けられたメタ・モデルである。製造データセット上でタスクを実行するとき、第１のモデルの性能を予測する少なくとも１つの測定基準の不確定性を予測する少なくとも１つの値は、第３のモデルを用いて生成される。第３のモデルは、第２のモデルに関連付けられたメタ・メタ・モデルである。第１のモデルの性能を予測する少なくとも１つの測定基準および少なくとも１つの測定基準の不確定性を予測する少なくとも１つの値の表示が提供される。Generating performance predictions with uncertainty intervals. A technique for generating a performance prediction for a machine learning model with uncertainty intervals includes obtaining a first model configured to perform a task and a manufacturing data set. At least one metric that predicts performance of the first model when performing a task on the manufacturing data set is generated using the second model. The second model is a meta-model associated with the first model. At least one value predicting uncertainty of at least one metric predicting performance of the first model when performing a task on the manufacturing data set is generated using the third model. A third model is a meta-meta model associated with the second model. A representation of at least one metric predicting performance of the first model and at least one value predicting uncertainty of the at least one metric is provided.

Description

本発明は、概して、機械学習に関するものであり、より詳しくは、不確定区間を有する機械学習（ＭＬ）モデルの性能予測を生成するための技術に関するものである。 The present invention relates generally to machine learning, and more particularly to techniques for generating performance predictions for machine learning (ML) models with uncertainty intervals.

近年、ＭＬモデルは、少し例を挙げれば、画像認識、音声処理、言語翻訳および物体分類を含むさまざまなタスクのためにますます使用されてきた。概して、これらのタスクのために用いられるＭＬモデルは、モデルを訓練し、維持するのに必要な計算資源および時間の観点から、ますます複雑で高価になった。さらに、各タスクの異なるゴールのために、１つのタスクまたはドメインのために訓練されるモデルは、モデルが密接に関連する場合であっても、典型的には他のドメインのために使用不可能であるように、モデル自体は劇的に互いと異なりうる。この可能性のため、所定のタスクのためのデータのセットにおけるモデルの性能を予測する試みが実行されてきた。しかしながら、性能予測は、不確定性の複数のソースの悪影響を被り、性能予測の精度に影響を与えうる。 In recent years, ML models have been increasingly used for various tasks including image recognition, speech processing, language translation and object classification, to name a few. In general, the ML models used for these tasks have become increasingly complex and expensive in terms of computational resources and time required to train and maintain the models. Furthermore, due to the different goals of each task, models trained for one task or domain are typically unusable for other domains, even if the models are closely related. The models themselves can differ dramatically from each other, such that . Because of this possibility, attempts have been made to predict a model's performance on a set of data for a given task. Performance prediction, however, suffers from multiple sources of uncertainty, which can affect the accuracy of the performance prediction.

本願明細書において提示される一実施形態は、機械学習（ＭＬ）モデルの性能予測を生成するためのコンピュータ実施方法を含む。コンピュータ実施方法は、概して、タスクを実行するように構成される第１のモデルおよびラベルなしのデータを含む製造データセットを取得するステップを含む。コンピュータ実施方法はまた、第２のモデルを用いて、製造データセット上でタスクを実行するとき、第１のモデルの性能を予測する少なくとも１つの測定基準を生成することを含む。第２のモデルは、第１のモデルに関連付けられたメタ・モデルである。コンピュータ実施方法は、１つまたは複数の第３のモデルを用いて、製造データセット上でタスクを実行するとき、第１のモデルの性能を予測する少なくとも１つの測定基準の不確定性を予測する少なくとも１つの値を生成するステップをさらに含む。１つまたは複数の第３のモデルの各々は、第２のモデルに関連付けられたメタ・メタ・モデルである。コンピュータ実施方法は、第１のモデルの性能を予測する少なくとも１つの測定基準および少なくとも１つの測定基準の不確定性を予測する少なくとも１つの値の表示を提供するステップをさらに含む。 One embodiment presented herein includes a computer-implemented method for generating performance predictions for machine learning (ML) models. The computer-implemented method generally includes obtaining a first model configured to perform a task and a manufacturing data set including unlabeled data. The computer-implemented method also includes using the second model to generate at least one metric that predicts performance of the first model when performing a task on the manufacturing data set. The second model is a meta-model associated with the first model. The computer-implemented method uses one or more third models to predict uncertainty in at least one metric that predicts performance of the first model when performing a task on the manufacturing data set. Further comprising generating at least one value. Each of the one or more third models is a meta-meta-model associated with the second model. The computer-implemented method further includes providing a representation of at least one metric that predicts performance of the first model and at least one value that predicts uncertainty of the at least one metric.

他の実施形態としては、処理ユニットが開示された方法の１つまたは複数の態様を実施することを可能にするコンピュータ可読プログラム・コードを有する記憶媒体を含むコンピュータ・プログラム製品と、開示された方法の１つまたは複数を実施するように構成されるプロセッサ、メモリおよびアプリケーション・プログラムを有するシステムと、を含むが、これらに限定されるものではない。 In another embodiment, a computer program product comprising a storage medium having computer readable program code that enables a processing unit to perform one or more aspects of the disclosed methods and the disclosed methods. and systems having processors, memory and application programs configured to implement one or more of

一実施形態に従って、タスクにおけるモデルの性能およびタスクにおけるモデルの性能の不確定性を予測するのに用いられるネットワーク化されたシステムを示すブロック図である。1 is a block diagram illustrating a networked system used to predict the performance of a model on a task and the uncertainty of the model's performance on a task, according to one embodiment; FIG. 一実施形態に従って、モデルの性能予測およびモデルの性能予測のための不確定区間を生成するためのスタックされたメタ・モデリング・ワークフローを示す。FIG. 4 illustrates a stacked meta-modeling workflow for generating model performance predictions and uncertainty intervals for model performance predictions, according to one embodiment. FIG. 一実施形態に従って、モデルの性能予測およびモデルの性能予測のための不確定区間を生成するためのスタックされたメタ・モデリング・ワークフローを示す。FIG. 4 illustrates a stacked meta-modeling workflow for generating model performance predictions and uncertainty intervals for model performance predictions, according to one embodiment. FIG. 一実施形態に従って、メタ・メタ・モデルを訓練するための１つまたは複数のデータセットを生成するための一例の手順を示す。1 illustrates an example procedure for generating one or more datasets for training a meta-meta-model, according to one embodiment. 一実施形態に従って、メタ・メタ・モデルを訓練するための特徴を生成する一例を示す。4 illustrates an example of generating features for training a meta-meta model, according to one embodiment. 一実施形態に従って、タスクにおけるモデルの性能およびタスクにおけるモデルの性能のための不確定性を予測するための方法のフローチャートである。4 is a flowchart of a method for predicting the performance of a model on a task and the uncertainty for the performance of the model on a task, according to one embodiment. 一実施形態に従って、モデルの性能予測およびモデルの性能予測のための不確定区間を生成するための、スタックされたメタ・モデル・ワークフローのメタ・モデルおよびメタ・メタ・モデルを訓練するための方法のフローチャートである。A method for training a meta-model and a meta-meta-model of a stacked meta-model workflow for generating model performance predictions and uncertainty intervals for model performance predictions, according to one embodiment is a flow chart. 一実施形態に従って、メタ・メタ・モデルを訓練するための方法のフローチャートである。4 is a flowchart of a method for training a meta-meta model, according to one embodiment. 一実施形態に従って、性能予測および不確定区間の例のシミュレーションを示す。FIG. 11 illustrates a simulation of an example performance prediction and uncertainty interval, according to one embodiment. FIG.

ＭＬツールが、回帰および分類、最適化、予測などを含むがこれらに限定されないさまざまなタスクのためにますます用いられる。しかしながら、いくつかの場合には、タスクにおける所定のモデルの性能は、時間とともに変化しうる。例えば、正確に予測するモデルの能力は、どの程度基礎データがモデルの訓練データと異なるかに応じて、時間とともに悪化するかもしれない。現在では、多くの従来技術を用いて、所定のタスクにおけるモデルの性能を予測することができる。例えば、１つの従来技術は、基礎データの１つまたは複数の特徴の移動の量を検出することを含む。他の従来技術は、正確な予測（例えば、任意の階級が正確に予測された回数、受信者動作特性（ＲＯＣ）曲線の下の面積など）の比率を時間とともに測定することによって、モデルの精度をトラックすることを含む。しかしながら、これらの従来技術は、性能予測のためのみに用いられ、モデルの不確定性を捕捉しない。 ML tools are increasingly used for a variety of tasks including, but not limited to, regression and classification, optimization, prediction, and so on. However, in some cases the performance of a given model on a task may change over time. For example, a model's ability to predict accurately may deteriorate over time depending on how much the underlying data differs from the model's training data. Many conventional techniques can now be used to predict the performance of a model on a given task. For example, one conventional technique involves detecting the amount of movement of one or more features of the underlying data. Other prior art techniques measure the accuracy of the model by measuring the rate of correct predictions (e.g., the number of times any class was correctly predicted, the area under the receiver operating characteristic (ROC) curve, etc.) over time. including tracking However, these prior art techniques are only used for performance prediction and do not capture model uncertainty.

加えて、多くの従来技術は、（例えば、基礎となる性能予測タスクを考慮せずに）不確定性のみを予測するために存在する。この種の技術の例は、信頼区間および確率（ｐ）値を含むが、これらに限定されるものではない。加えて、ベイジアン・モデリングにおいて、例えば、偶然的不確定性および認識論的不確定性を含む異なるタイプのモデル不確定性が存在する。偶然的不確定性は、データの固有の不確定性を捕捉し、それの例は、ノイズ、さまざまなデータの欠落、混乱などを含むことができる。認識論的不確定性は、（例えば、モデル・アーキテクチャ、モデル・パラメータ、モデル仮定、パラメータ評価、不十分な訓練などの）モデルに起因する不確定性を捕捉する。 In addition, many conventional techniques exist for predicting uncertainties only (eg, without considering the underlying performance prediction task). Examples of such techniques include, but are not limited to, confidence intervals and probability (p) values. Additionally, in Bayesian modeling, there are different types of model uncertainty, including, for example, chance uncertainty and epistemological uncertainty. Anecdotal uncertainty captures the inherent uncertainty of data, examples of which can include noise, various data omissions, confusion, and the like. Epistemological uncertainty captures uncertainty due to models (eg, model architecture, model parameters, model assumptions, parameter estimates, insufficient training, etc.).

上述した方法についての１つの問題は、選択がベース・モデルのアーキテクチャにわたり存在し、モデルの性能またはモデルの不確定性あるいはその両方の予測を可能にすると一般的に仮定するということである。この選択が存在しない場合、集合技術のようないくつかの方法は、不確定性を捕捉するために用いることができるが、これらの方法は、ベース・モデルがホワイト・ボックス（例えば、内部アーキテクチャおよびパラメータが見えるかまたは知られているモデル）であると仮定する。しかしながら、多くの状況において、顧客提供モデルが使用される場合、上述した仮定のいずれも受け入れられない。したがって、モデルの性能予測およびモデルの不確定性予測の両方を生成するための技術を提供することが望ましくなりうる。 One problem with the methods described above is that they generally assume that choices exist over the architecture of the base model, allowing predictions of model performance and/or model uncertainty. In the absence of this choice, some methods, such as aggregation techniques, can be used to capture the uncertainty, but these methods assume that the base model is a white box (e.g., internal architecture and model) whose parameters are visible or known). However, in many situations, neither of the above assumptions are acceptable when a customer-provided model is used. Accordingly, it may be desirable to provide techniques for generating both model performance predictions and model uncertainty predictions.

本開示の実施形態は、基礎となるタスクにおけるＭＬモデルの性能を、ＭＬモデルの予測された性能のための不確定区間とともに予測するための技術を提供する。より詳しくは、訓練されたＭＬモデルおよび（ラベルなしの）製造データのセット（モデルの訓練データと大きく異なってもよいしまたは異ならなくてもよい）が与えられると、実施形態は、製造データのセットにおけるモデルの性能（例えば、精度または他の性能または品質関連の測定基準）および不確定区間（例えば、予測の周りのバンドまたはエラー・バー）を予測し、特定のテスト・インスタンスまたはテスト・インスタンスのバッチにおけるモデルに関連付けられた不確定性を説明することができる。 Embodiments of the present disclosure provide techniques for predicting the performance of the ML model on the underlying task along with uncertainty intervals for the predicted performance of the ML model. More specifically, given a trained ML model and a set of (unlabeled) manufacturing data (which may or may not be significantly different from the training data for the model), embodiments perform Predict model performance (e.g., accuracy or other performance or quality-related metrics) and uncertainty intervals (e.g., bands or error bars around predictions) on a set, for specific test instances or test instances can explain the uncertainty associated with the model in batches of

以下でさらに詳述する一実施形態において、スタックされたメタ・モデリング方法は、性能予測および不確定性予測を生成するために使用される。例えば、実施形態は、２つのレベルのメタ・モデル、すなわち、（１）ベース・モデルの性能を予測する第１のレベルのメタ・モデル（２）（例えば、ベース・モデルの性能のその予測に関して）第１のレベルのメタ・モデルの不確定性を予測する第２のレベルのメタ・メタ・モデルを生成する。スタックされたメタ・モデリング方法を使用することにより、実施形態は、任意のアーキテクチャのモデルを動作し（例えば、スタックされたメタ・モデリング方法は、モデル・アーキテクチャに不可知論者であり、モデルの内側パラメータのアクセスを有することに依存しない）、複数のタイプの不確定性（例えば、偶然的不確定性、認識論的不確定性など）を捕捉することができる。 In one embodiment, detailed further below, a stacked meta-modeling method is used to generate performance and uncertainty predictions. For example, embodiments include two levels of meta-models: (1) a first level meta-model that predicts the performance of the base model; ) Generate a second level meta-meta model that predicts the uncertainty of the first level meta model. By using the stacked meta-modeling method, embodiments can operate models of arbitrary architectures (e.g., the stacked meta-modeling method is agnostic to the model architecture, and the inner parameters of the model ), multiple types of uncertainty (eg, anecdotal uncertainty, epistemological uncertainty, etc.) can be captured.

本願明細書において、「メタ・モデル」は、概して、（より低いレベルの）モデルのモデルを意味し、「メタ・メタ・モデル」は、概して、メタ・モデルのモデルを意味する。メタ・モデルは、例えば、データと相互作用するより低いレベルのモデルに観察されるパターンを捕捉する。本開示のさまざまな実施形態において、視覚のタスク（例えば、動作認識、物体検出、顔認識、数字または文字分類などのための画像を処理すること）が、ＭＬモデルの機能を説明する例として用いられることに留意されたい。しかしながら、本開示の実施形態は、任意の入力（例えば、映像、音声、テキストなど）を用いて、任意の数のドメインに直ちに適用できる。さらに、本願明細書において用いられるように、モデルの「性能」は、ラベルなしのデータにおけるモデルの１つまたは複数の精度関連の測定基準を意味してもよい。同様に、「性能予測」は、ラベルなしのデータにおける基礎となるベース・モデルの１つもしくは複数の性能または品質関連の測定基準を予測するモデル・ベースの方法を意味してもよい。 As used herein, "meta-model" generally means a model of (lower-level) models, and "meta-meta-model" generally means a model of meta-models. Meta-models, for example, capture patterns observed in lower-level models that interact with data. In various embodiments of the present disclosure, visual tasks (e.g., processing images for action recognition, object detection, face recognition, number or letter classification, etc.) are used as examples to illustrate the functionality of ML models. Note that However, embodiments of the present disclosure are readily applicable to any number of domains, using any input (eg, video, audio, text, etc.). Further, as used herein, the "performance" of a model may refer to one or more accuracy-related metrics of the model on unlabeled data. Similarly, "performance prediction" may refer to a model-based method of predicting one or more performance- or quality-related metrics of an underlying base model in unlabeled data.

本発明の各種実施形態の説明は、説明のために提示され、包括的であることを意図せず、開示される実施形態に限定されることも意図しない。多くの修正およびバリエーションは、記載されている実施形態の範囲および思想を逸脱することなく、当業者にとって明らかである。本願明細書において用いられる用語は、実施形態の原則、実用的な適用または市場で見つかる技術の上の技術的な改善を最も良く説明するために、または、当業者が本願明細書において開示される実施形態を理解することを可能にするために選択された。 The description of various embodiments of the invention is presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terms used herein are used to best describe principles of embodiments, practical applications, or technical improvements over technology found on the market or disclosed herein by a person of ordinary skill in the art. It was chosen to make it possible to understand the embodiment.

以下、この開示において示される実施形態を参照する。しかしながら、本開示の範囲は、特定の記載されている実施形態に限定されるものではない。その代わりに、以下の特徴および要素の任意の組み合わせは、異なる実施形態に関するか否かにかかわらず、考察された実施形態を実施および実践するために考察される。さらに、本願明細書において開示される実施形態が他の可能な解決法または従来技術に勝る利点を達成することができるが、特定の利点が所定の実施形態によって達成されるか否かは、本開示の範囲の制限ではない。したがって、以下の態様、特徴、実施形態および利点は、請求項において明確に詳述される場合を除き、単に説明するのみであり、添付の請求の範囲の要素または制限とみなされない。同様に、「本発明」の参照は、請求項において明確に詳述される場合を除き、本願明細書において開示される任意の発明の主題の一般化として解釈されるべきではなく、添付の請求の範囲の要素または制限とみなされるべきではない。 Reference will now be made to embodiments presented in this disclosure. However, the scope of this disclosure is not limited to particular described embodiments. Instead, any combination of the following features and elements, whether related to different embodiments or not, is contemplated to implement and practice the discussed embodiments. Furthermore, although the embodiments disclosed herein may achieve advantages over other possible solutions or over the prior art, whether particular advantages are achieved by a given embodiment is subject to the present invention. It is not a limitation on the scope of disclosure. Accordingly, the following aspects, features, embodiments and advantages are merely illustrative and should not be considered elements or limitations of the appended claims, except where expressly recited in the claims. Similarly, references to the "present invention" should not be construed as a generalization of any inventive subject matter disclosed herein, except as expressly recited in the claims, rather than to the appended claims. should not be considered an element or limitation of the scope of

図１は、一実施形態に従って、性能予測およびモデルの不確定性予測を生成するのに用いられるネットワーク化されたシステム１００を示すブロック図である。システム１００は、コンピューティング・システム１０２を含む。コンピューティング・システム１０２はまた、ネットワーク１４０を介して他のコンピュータに接続されてもよい。ネットワーク１４０は、さまざまなタイプの１つまたは複数のネットワークを含んでもよく、ローカル・エリアまたはローカル・エリア・ネットワーク（ＬＡＮ）、汎用ワイド・エリア・ネットワーク（ＷＡＮ）、電気通信または携帯電話網または公衆ネットワーク（例えば、インターネット）あるいはその組み合わせを含む。 FIG. 1 is a block diagram illustrating a networked system 100 used to generate performance predictions and model uncertainty predictions, according to one embodiment. System 100 includes computing system 102 . Computing system 102 may also be connected to other computers through network 140 . Network 140 may include one or more networks of various types, such as a local area or local area network (LAN), a general wide area network (WAN), a telecommunications or cellular network or a public network. Including networks (eg, the Internet) or combinations thereof.

コンピューティング・システム１０２は、概して、バス１５０を介してメモリ１０６に接続される１つまたは複数のプロセッサ１０４、ストレージ１０８、ネットワーク・インタフェース１１０、入力装置１５２および出力装置１５４を含む。コンピューティング・システム１０２は、概して、オペレーティング・システム（図示せず）の制御下にある。オペレーティング・システムの例は、ＵＮＩＸ（Ｒ）オペレーティング・システム、ＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓ（Ｒ）オペレーティング・システムのバージョンおよびＬｉｎｕｘ（Ｒ）オペレーティング・システムのディストリビューションを含む（ＵＮＩＸは、米国および他の国のＴｈｅＯｐｅｎＧｒｏｕｐの登録商標であり、ＭｉｃｒｏｓｏｆｔおよびＷｉｎｄｏｗｓは、米国、他の国または両方のＭｉｃｒｏｓｏｆｔＣｏｒｐｏｒａｔｉｏｎの商標であり、Ｌｉｎｕｘは、米国、他の国または両方のＬｉｎｕｓＴｏｒｖａｌｄｓの登録商標である）。さらに一般的にいえば、本願明細書において開示される機能をサポートする任意のオペレーティング・システムを用いてもよい。プロセッサ１０４は、単一のＣＰＵ、複数のＣＰＵ、複数の処理コアを有する単一のＣＰＵなどを表すように含まれる。 Computing system 102 generally includes one or more processors 104 , storage 108 , network interface 110 , input devices 152 and output devices 154 connected to memory 106 via bus 150 . Computing system 102 is generally under the control of an operating system (not shown). Examples of operating systems include the UNIX(R) operating system, versions of the Microsoft Windows(R) operating system and distributions of the Linux(R) operating system (UNIX stands for The Open Group, Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both, and Linux is a trademark of Linus Torvalds in the United States, other countries, or both). More generally, any operating system that supports the functionality disclosed herein may be used. Processor 104 is included to represent a single CPU, multiple CPUs, a single CPU with multiple processing cores, and the like.

メモリ１０６は、性能または他の能力、すなわち、揮発性または不揮発性媒体あるいはその両方、取り外し可能および取り外し不可能な媒体あるいはその両方などのために選択されるさまざまなコンピュータ可読媒体を含んでもよい。メモリ１０６は、キャッシュ、ランダム・アクセス・メモリ（ＲＡＭ）などを含んでもよい。ストレージ１０８は、ディスク・ドライブまたはフラッシュ・ストレージ・デバイスでもよい。単一のユニットとして示されるが、ストレージ１０８は、固定または取り外し可能なストレージ・デバイスあるいはその両方、例えば、固定ディスク・ドライブ、ソリッド・ステート・ドライブ、取り外し可能なメモリ・カード、光記憶装置、ネットワーク接続ストレージ（ＮＡＳ）またはストレージ・エリア・ネットワーク（ＳＡＮ）の組み合わせでもよい。ネットワーク・インタフェース１１０は、コンピューティング・システム１０２がネットワーク１４０を介して他のコンピューティング・システムまたはデバイスと通信することができる任意のタイプのネットワーク通信装置でもよい。 Memory 106 may include a variety of computer readable media selected for performance or other capabilities, ie, volatile and/or nonvolatile media, removable and/or non-removable media, and the like. Memory 106 may include cache, random access memory (RAM), and the like. Storage 108 may be a disk drive or flash storage device. Although shown as a single unit, storage 108 may be fixed and/or removable storage devices such as fixed disk drives, solid state drives, removable memory cards, optical storage devices, network It may be a combination of Attached Storage (NAS) or Storage Area Network (SAN). Network interface 110 may be any type of network communication device that enables computing system 102 to communicate with other computing systems or devices over network 140 .

入力装置１５２は、コンピューティング・システム１０２に対する入力を提供する。例えば、キーボードまたはマウスあるいはその両方を用いてもよい。出力装置１５４は、出力をコンピューティング・システム１０２のユーザに提供するための任意のデバイスでもよい。例えば、出力装置１５４は、任意の従来のディスプレイ・スクリーンでもよい。入力装置１５２とは別々に示されるが、出力装置１５４および入力装置１５２は結合されてもよい。例えば、一体化したタッチ・スクリーンを有するディスプレイ・スクリーンを用いてもよい。 Input device 152 provides input to computing system 102 . For example, a keyboard and/or mouse may be used. Output device 154 may be any device for providing output to a user of computing system 102 . For example, output device 154 may be any conventional display screen. Although shown separately from input device 152, output device 154 and input device 152 may be combined. For example, a display screen with an integrated touch screen may be used.

ここで、ストレージ１０８は、製造データセット１３０、ベース・モデル１３２、テスト・データセット１３４、訓練データセット１３６ならびにオフライン・モデルおよびデータセット１３８を含む。ベース・モデル１３２は、特定のタスクを解析するために用いられるモデル（例えば、画像の分類のための顧客モデル）である。一般的に、任意の適切なタイプのモデルをベース・モデル１３２のために用いることができ、その例は、人工神経回路網、決定木、サポート・ベクター・マシン、回帰分析、ベイジアン・ネットワーク、遺伝的アルゴリズムなどを含むが、これらに限定されるものではない。ベース・モデル１３２は、ブラック・ボックス・モデル（例えば、アーキテクチャまたはパラメータあるいはその両方が知られていないモデル）またはホワイト・ボックス・モデル（例えば、アーキテクチャまたはパラメータあるいはその両方が知られているモデル）とすることができる。 Here, storage 108 includes manufacturing dataset 130 , base model 132 , test dataset 134 , training dataset 136 and offline model and dataset 138 . A base model 132 is a model used to analyze a particular task (eg, a customer model for image classification). In general, any suitable type of model can be used for base model 132, examples of which include artificial neural networks, decision trees, support vector machines, regression analysis, Bayesian networks, genetic algorithm, etc., but is not limited to these. The base model 132 can be a black box model (eg, a model whose architecture and/or parameters are unknown) or a white box model (eg, a model whose architecture and/or parameters are known). can do.

ベース・モデル１３２は、特定の訓練データセット（例えば、訓練データセット１３６）において訓練／開発されてもよく、ラベル付きテスト・データセット（例えば、分類ラベルを含むテスト・データセット１３４）を用いて評価されてもよい。製造データセット１３０は、グラウンド・トゥルース（例えば、分類ラベル）が利用できないラベルなしのデータセットである。ベース・モデル１３２が展開された後、製造データセット１３０は、動作条件から収集されてもよい。オフラインのモデルまたはデータセットあるいはその両方１３８は、さまざまなモデルおよびデータセットを含み、そのいくつかは、ベース・モデル１３２の元のタスクに関連があってもよく、そのいくつかは、ベース・モデル１３２の元のタスクに関連がなくてもよい。オフラインのモデルまたはデータセットあるいはその両方１３８は、図２Ｂおよび図３に関して以下でさらに詳細に述べられる。 The base model 132 may be trained/developed on a particular training dataset (e.g., training dataset 136) and using a labeled test dataset (e.g., test dataset 134 containing classification labels). may be evaluated. Manufacturing dataset 130 is an unlabeled dataset for which no ground truth (eg, classification labels) is available. After base model 132 is developed, manufacturing data set 130 may be gathered from operating conditions. Offline models and/or datasets 138 include various models and datasets, some of which may be relevant to the original task of the base model 132, some of which may be 132 may be unrelated to the original task. Off-line models and/or datasets 138 are discussed in further detail below with respect to FIGS. 2B and 3. FIG.

メモリ１０６は、予測エンジン１２０を含み、予測エンジン１２０は、性能予測およびモデルのための不確定区間を生成するために本願明細書において記載されている技術の１つまたは複数を実施するように構成される。予測エンジン１２０は、性能構成要素１２２および不確定性構成要素１２４を含み、これらの各々は、ソフトウェア、ハードウェアまたはそれらの組み合わせを含むことができる。予測エンジン１２０は、（性能構成要素１２２を介して）ベースＭＬモデル１３２の性能予測を生成（または性能を予測）し、（不確定性構成要素１２４を介して）例えば、スタックされたメタ・モデリング方法を用いて、性能予測の不確定区間を生成するように構成される。例えば、性能構成要素１２２は、（ベース・モデル１３２の）メタ・モデルを用いて、ラベルなしの製造データセット１３０におけるベース・モデル１３２の性能（例えば、１つまたは複数の精度関連の測定基準）を予測することができる。メタ・モデルは、ベース・モデル１３２、テスト・データセット１３４および訓練データセット１３６の１つまたは複数を用いて訓練されてもよい。 Memory 106 includes a prediction engine 120 configured to implement one or more of the techniques described herein for generating uncertainty intervals for performance predictions and models. be done. Prediction engine 120 includes performance component 122 and uncertainty component 124, each of which may include software, hardware, or a combination thereof. The prediction engine 120 generates performance predictions (or predicts performance) of the base ML model 132 (via the performance component 122) and (via the uncertainty component 124) e.g., stacked meta-modeling A method is configured to generate uncertainty intervals for performance prediction. For example, the performance component 122 uses the meta-model (of the base model 132) to estimate the performance (eg, one or more accuracy-related metrics) of the base model 132 on the unlabeled manufacturing dataset 130. can be predicted. The meta-model may be trained using one or more of base model 132 , test dataset 134 and training dataset 136 .

同様に、不確定性構成要素１２４は、（ベース・モデル１３２のメタ・モデルの）メタ・メタ・モデルを用いて、性能構成要素１２２のメタ・モデルの不確定性を予測することができる。メタ・メタ・モデルは、オフラインのモデル／データセット１３８、性能構成要素１２２のメタ・モデル、ベース・モデル１３２、製造データセット１３０、テスト・データセット１３４および訓練データセット１３６の収集の１つまたは複数を用いて訓練されてもよい。一実施形態において、不確定性構成要素１２４は、メタ・モデルの不確定性を表現する不確定区間（エラー・バーとしても知られている）を生成することができる。不確定区間は、性能構成要素１２２の出力（例えば、ベース・モデル１３２の予測された性能）の周りのバンド（または許容度）を詳細に描写する。予測の周りのこのバンド（または許容度）は、（例えば、製造データセット１３０の）特定のテスト・インスタンスにおけるベース・モデル１３２に関連付けられた不確定性を説明することができる。性能構成要素１２２および不確定性構成要素１２４は、図２Ａおよび図２Ｂに関して以下でさらに詳細に述べられる。 Similarly, the uncertainty component 124 can use the meta-meta-model (of the base model 132 meta-model) to predict the uncertainty of the performance component 122 meta-model. The meta-meta-model is one of the off-line model/dataset 138, the meta-model of the performance component 122, the base model 132, the manufacturing dataset 130, the test dataset 134 and the training dataset 136 collection or May be trained with multiple. In one embodiment, the uncertainty component 124 can generate uncertainty intervals (also known as error bars) that represent the uncertainty of the meta-model. The uncertainty interval details a band (or tolerance) around the output of performance component 122 (eg, the predicted performance of base model 132). This band (or tolerance) around the prediction can explain the uncertainty associated with the base model 132 in a particular test instance (eg, of the manufacturing data set 130). Performance component 122 and uncertainty component 124 are described in further detail below with respect to FIGS. 2A and 2B.

図１は、単に、モデルの不確定区間とともに性能予測を生成することができるシステム１００の１つの参考例のみを示し、システム１００のその他の構成はモデルの不確定区間とともに性能予測を生成するために適応されうることに留意されたい。例えば、いくつかの実施形態では、ストレージ１０８の１つまたは複数のコンテンツ（例えば、製造データセット１３０、ベース・モデル１３２、テスト・データセット１３４、訓練データセット１３６およびオフラインのモデル／データセット１３８）または１つもしくは複数のコンテンツ（例えば、性能構成要素１２２および不確定性構成要素１２４）あるいはその両方は、ネットワーク（例えば、クラウド・コンピューティング環境）において、１つまたは複数のコンピューティング・システム１０２にわたり分散されてもよい。この種の実施形態では、第１のコンピューティング・システム１０２は、１つまたは複数の第２のコンピューティング・システム１０２からベース・モデル１３２の性能予測を生成するために用いられるコンテンツの１つまたは複数を検索してもよい。同様に、第１のコンピューティング・システム１０２は、１つまたは複数の第２のコンピューティング・システム１０２からベース・モデル１３２の予測された性能のための不確定区間を生成するために用いられるコンテンツの１つまたは複数を検索してもよい。さらに他の実施形態では、性能予測または不確定区間あるいはその両方は、単一のコンピューティング・システム１０２または複数のコンピューティング・システム１０２によって生成可能である。 Because FIG. 1 merely illustrates one illustrative example of a system 100 that can generate performance predictions with model uncertainty intervals, other configurations of system 100 generate performance predictions with model uncertainty intervals. Note that it can be applied to For example, in some embodiments, one or more of the contents of storage 108 (eg, manufacturing dataset 130, base model 132, test dataset 134, training dataset 136, and offline model/dataset 138) or one or more of the content (eg, performance component 122 and uncertainty component 124) or both across one or more computing systems 102 in a network (eg, a cloud computing environment) May be distributed. In such an embodiment, the first computing system 102 receives one or more of the content used to generate performance predictions for the base model 132 from one or more second computing systems 102 . You can search for more than one. Similarly, the first computing system 102 can generate uncertainty intervals for the predicted performance of the base model 132 from one or more second computing systems 102 . may be searched for one or more of In still other embodiments, performance predictions and/or uncertainty intervals can be generated by a single computing system 102 or multiple computing systems 102 .

図２Ａから図２Ｂは、一実施形態に従って、ベース・モデル（例えば、ベース・モデル１３２）の性能予測およびベース・モデルの性能予測のための不確定区間を生成するためのスタックされたメタ・モデリング・ワークフロー２００を示す。ここで、スタックされたメタ・モデリング・ワークフロー２００は、訓練段階２１０および使用（または展開）段階２１２（図２Ａに示される）ならびに訓練段階２１４および使用段階２１６（図２Ｂに示される）を含む。 2A-2B illustrate stacked meta-modeling for generating a performance prediction of a base model (e.g., base model 132) and uncertainty intervals for the performance prediction of the base model, according to one embodiment. • A workflow 200 is shown. Here, the stacked meta-modeling workflow 200 includes a training phase 210 and a use (or deployment) phase 212 (shown in FIG. 2A) and a training phase 214 and a use phase 216 (shown in FIG. 2B).

図２Ａに示すように、性能構成要素１２２は、メタ・モデル２０２を含み、メタ・モデル２０２は、概して、インスタンス・ベースによってインスタンス上のベース・モデル１３２の成功および失敗の確率を予測することを学習する。ワークフロー２００は、訓練段階２１０の間考慮可能な情報の多くの異なる部分を示す。描写された実施形態において、ベース・モデル１３２および（ラベル付きの）テスト・データセット１３４は、メタ・モデル２０２を訓練するための入力として処理され、用いられる。さらに示されるように、いくつかの実施形態では、訓練データセット１３６はまた、メタ・モデル２０２を訓練するための追加の入力として用いることもできる。 As shown in FIG. 2A, performance component 122 includes meta-model 202, which is generally intended to predict the probability of success and failure of base model 132 on an instance by instance basis. learn. Workflow 200 shows many different pieces of information that can be considered during training phase 210 . In the depicted embodiment, base model 132 and (labeled) test dataset 134 are processed and used as inputs for training meta-model 202 . As further indicated, in some embodiments, training dataset 136 can also be used as additional input for training meta-model 202 .

訓練段階２１０は、メタ・モデル２０２を訓練するための教師あり学習方法を描写する。訓練段階２１０は、（例えば、メタ・モデル２０２が、使用中でないかまたは展開されないとき）オフラインで実行可能であるか、または、（例えば、メタ・モデル２０２が、使用中であるかまたは展開されるとき）オンラインで実行可能である。しかしながら、さらに一般的にいえば、本願明細書において記載されている機能に整合する任意の適切な機械学習訓練メカニズムを用いることができる。ベース・モデル１３２、テスト・データセット１３４または訓練データセット１３６あるいはその組み合わせが訓練段階２１０の間、メタ・モデル２０２に直接入るとして示されるが、当業者は、任意の入力データがメタ・モデル２０２に供給される前に、この情報がさまざまな方法で処理可能であることを認識することにさらに留意されたい。 Training stage 210 depicts a supervised learning method for training meta-model 202 . Training phase 210 can be performed offline (eg, when meta-model 202 is not in use or deployed) or can be performed (eg, when meta-model 202 is in use or deployed). when) is executable online. More generally, however, any suitable machine learning training mechanism consistent with the functionality described herein can be used. Although base model 132 , test dataset 134 or training dataset 136 or a combination thereof are shown as entering directly into meta-model 202 during training phase 210 , those skilled in the art will appreciate that any input data may be input to meta-model 202 . Note further that it recognizes that this information can be processed in a variety of ways before being supplied to the .

一旦メタ・モデル２０２が訓練されると、（ラベルなしの）製造データセット１３０（例えば、分類ラベルなし）は、入力としてメタ・モデル２０２に供給可能であり、メタ・モデル２０２は、（例えば、使用（展開）段階２１２の間）性能予測２０４を出力することができる。いくつかの場合には、ベース・モデル１３２の出力予測スコア（例えば、ｓｏｆｔｍａｘ、ロジット）またはベース・モデル１３２の元の入力特徴あるいはその両方はまた、入力としてメタ・モデル２０２に供給可能である。製造データセット１３０がメタ・モデル２０２に直接入るとして示されるが、当業者は、任意の入力データがメタ・モデル２０２に供給される前に、この情報がさまざまな方法で処理可能であることを認識することに留意されたい。 Once meta-model 202 has been trained, the (unlabeled) manufacturing dataset 130 (e.g., unclassified labels) can be fed to meta-model 202 as an input, and meta-model 202 can (e.g., During the usage (deployment) phase 212) performance predictions 204 can be output. In some cases, the output prediction scores (eg, softmax, logit) of base model 132 and/or the original input features of base model 132 can also be provided as inputs to meta-model 202 . Although manufacturing data set 130 is shown as entering directly into meta-model 202, those skilled in the art will appreciate that this information can be processed in a variety of ways before any input data is provided to meta-model 202. Note to recognize

性能予測２０４は、ベース・モデル１３２が（ラベルなしの）製造データセット１３０におけるそのタスク（例えば、画像分類）でどの程度良く（または正確に）作動するかの予測された計測値である。例えば、ベース・モデル１３２が物体分類のモデルであると仮定すると、性能予測２０４は、製造データセット１３０から物体を分類する際のベース・モデル１３２の成功／失敗確率を示すパーセンテージ値（例えば、７４％）を含むことができる。さらに一般的にいえば、性能予測２０４は、製造データセット１３０におけるベース・モデル１３２の１つまたは複数の精度関連の測定基準を含むことができる。この種の精度関連の測定基準の例は、ＲＯＣスコアの下の面積、真陽性率（ＴＰＲ）、Ｆ１値、偽陽性率（ＦＰＲ）、Ｒ二乗スコア、精度スコア（例えば、モデルが正確になったという予測のパーセンテージ）などを含むことができるが、これらに限定されるものではない。 Performance prediction 204 is a predicted measure of how well (or accurately) the base model 132 will perform at its task (eg, image classification) on the (unlabeled) manufacturing dataset 130 . For example, assuming base model 132 is a model for object classification, performance prediction 204 is a percentage value (e.g., 74 %) can be included. More generally, performance prediction 204 may include one or more accuracy-related metrics of base model 132 in manufacturing dataset 130 . Examples of this type of accuracy-related metric are area under ROC score, true positive rate (TPR), F1 value, false positive rate (FPR), R-squared score, accuracy score (e.g. (percentage of predictions that the

性能予測２０４は、バッチ精度予測または点的な精度予測とすることができる。いくつかの場合には、性能予測２０４は、ベース・モデル１３２の事前展開品質検査のために用いることができる（例えば、ベース・モデル１３２が展開されるべきかを決定する）。いくつかの場合には、性能予測２０４は、ベース・モデル１３２の実行時または展開の間、品質管理／検査（例えば、不正確な予測をフィルタリングする、生産の減少を識別するなど）のために用いることができる。しかしながら、上述したように、性能予測２０４は、精度関連の測定基準において捕捉されない不確定性（例えば、偶然的不確定性、認識論的不確定性など）の複数のソースの悪影響を被りうる。これに対処するために、実施形態は、不確定性構成要素１２４を用いて、性能予測２０４に関する不確定区間を予測する。 Performance prediction 204 can be a batch accuracy prediction or a pointwise accuracy prediction. In some cases, performance predictions 204 may be used for pre-deployment quality checks of base model 132 (eg, determining whether base model 132 should be deployed). In some cases, performance prediction 204 is used for quality control/inspection (e.g., filtering inaccurate predictions, identifying production reductions, etc.) during runtime or deployment of base model 132. can be used. However, as noted above, performance prediction 204 can be adversely affected by multiple sources of uncertainty (eg, anecdotal uncertainty, epistemological uncertainty, etc.) that are not captured in accuracy-related metrics. To address this, embodiments use uncertainty component 124 to predict uncertainty intervals for performance prediction 204 .

図２Ｂに示すように、不確定性構成要素１２４は、メタ・メタ・モデル２０６を含み、メタ・メタ・モデル２０６は、メタ・モデル２０２のエラーおよび実際の観察された性能（例えば、予測された精度マイナス実際の精度の絶対値）の機能を予測することを学習する。ワークフロー２００は、訓練段階２１４の間考慮可能な情報の多くの異なる部分を示す。描写された実施形態において、オフラインのモデル／データセット１３８およびメタ・モデル２０２は、メタ・メタ・モデル２０６を訓練するための入力として処理され、用いられる。さらに示されるように、いくつかの実施形態では、製造データセット１３０はまた、メタ・メタ・モデル２０６を訓練するための追加の入力または唯一の入力として用いることもできる。 As shown in FIG. 2B, the uncertainty component 124 includes a meta-meta-model 206, which includes errors in the meta-model 202 and actual observed performance (e.g., predicted learn to predict the function of the actual accuracy minus the absolute value of the actual accuracy). Workflow 200 shows many different pieces of information that can be considered during training phase 214 . In the depicted embodiment, offline model/dataset 138 and meta-model 202 are processed and used as inputs for training meta-meta-model 206 . As further indicated, in some embodiments, manufacturing dataset 130 may also be used as additional or sole input for training meta-meta-model 206 .

訓練段階２１４は、メタ・メタ・モデル２０６を訓練するための教師あり学習方法を描写する。訓練段階２１４は、１つまたは複数の実行のためにオフラインで実行可能である。しかしながら、さらに一般的にいえば、本願明細書において記載されている機能に整合する任意の適切な機械学習訓練メカニズムを用いることができる。オフラインのモデル／データセット１３８、メタ・モデル２０２または製造データセット１３０あるいはその組み合わせがメタ・メタ・モデル２０６に直接入るとして示されるが、当業者は、任意の入力データがメタ・メタ・モデル２０６に供給される前に、この情報がさまざまな方法で処理可能であることを認識することにさらに留意されたい。 A training phase 214 depicts a supervised learning method for training the meta-meta-model 206 . Training phase 214 can be performed offline for one or more runs. More generally, however, any suitable machine learning training mechanism consistent with the functionality described herein can be used. Although offline models/datasets 138, meta-models 202, or manufacturing datasets 130, or a combination thereof, are shown directly entering meta-meta-models 206, those skilled in the art will appreciate that any input data may be entered into meta-meta-models 206. Note further that it recognizes that this information can be processed in a variety of ways before being supplied to the .

メタ・メタ・モデル２０６は、メタ・モデル２０２がいつ間違いそうであるか、および、どのくらい間違うかを、訓練段階２１４の１つまたは複数のオフラインの実行から学習するように構成される。一実施形態において、メタ・メタ・モデル２０６は、複数のバックグラウンドのオフラインのモデル／データセット１３８（製造データセット１３０を除外）において訓練され、メタ・モデル２０２の予想される不確定性を表す一般的特徴を学習することができる。他の実施形態では、メタ・メタ・モデル２０６は、製造データセット１３０（例えば、オフラインのモデル／データセット１３８なしで）において訓練可能である。さらに他の実施形態では、メタ・メタ・モデル２０６は、オフライン／モデル／データセット１３８および製造データセット１３０において訓練可能である。 Meta-meta-model 206 is configured to learn from one or more offline runs of training phase 214 when and how wrong meta-model 202 is likely to be. In one embodiment, meta-meta-model 206 is trained on multiple background offline models/datasets 138 (excluding manufacturing dataset 130) to represent the expected uncertainty of meta-model 202. General features can be learned. In other embodiments, meta-meta-model 206 can be trained on manufacturing dataset 130 (eg, without off-line model/dataset 138). In still other embodiments, meta-meta-model 206 can be trained on offline/model/dataset 138 and manufacturing dataset 130 .

オフラインのモデル／データセット１３８は、ベース・モデル１３２のタスクに関連があるラベル付きデータセット（例えば、物体分類）またはベース・モデル１３２のタスクに関連がないラベル付きデータセット（例えば、音声からテキスト変換）あるいはその両方を含んでもよい。いくつかの実施形態において、メタ・メタ・モデル２０６を訓練するために用いられるオフラインのモデル／データセット１３８の少なくとも１つの（第１の）データセットは、オフラインのモデル／データセット１３８における１つまたは複数の（第２の）データセットから動的に生成可能である。例えば、第１のデータセットは、第２のデータセットにおける１つまたは複数の特徴を再サンプリングすることに基づいて生成可能である。 Offline models/datasets 138 may be labeled datasets that are relevant to the base model 132 task (e.g., object classification) or unrelated to the base model 132 task (e.g., voice-to-text conversion) or both. In some embodiments, at least one (first) dataset of offline models/datasets 138 used to train meta-meta-model 206 is one Or it can be generated dynamically from multiple (second) data sets. For example, a first data set can be generated based on resampling one or more features in a second data set.

図３は、一実施形態に従って、メタ・メタ・モデル２０６を訓練するための１つまたは複数のデータセットを生成する一例を示す。示すように、（オフラインのモデル／データセット１３８のうちの１つでもよい）（ベースの）ラベル付きデータセット３０２が与えられ、予測エンジン１２０は、メタ・メタ・モデル２０６を訓練するために、１つまたは複数の訓練／テスト・データセット３０４－１から３０４－６および１つまたは複数の製造データセット３０６－１から３０６－６を生成することができる。一実施形態において、予測エンジン１２０は、ラベル付きデータセット３０２の１つまたは複数の特徴に基づいて、（ベースの）ラベル付きデータセット３０２を再サンプリングすることによって、訓練／テスト・データセット３０４および製造データセット３０６を生成することができる。 FIG. 3 illustrates an example of generating one or more datasets for training meta-meta-model 206, according to one embodiment. As shown, given a (base) labeled dataset 302 (which may be one of the offline models/datasets 138), the prediction engine 120, to train the meta-meta-model 206: One or more training/test data sets 304-1 through 304-6 and one or more manufacturing data sets 306-1 through 306-6 may be generated. In one embodiment, prediction engine 120 resamples (base) labeled dataset 302 based on one or more features of labeled dataset 302 to generate training/test dataset 304 and A manufacturing data set 306 can be generated.

一例において、訓練／テスト・データセット３０４および製造データセット３０６は、共変量シフトを導入するためのさまざまな特徴に基づいて、（ベースの）ラベル付きデータセット３０２を再サンプリングすることによって生成可能である。この例では、訓練／テスト・データセット３０４および製造データセット３０６の分布を再サンプリングするために用いられる（ベースの）ラベル付きデータセット３０２の特定の特徴は、（例えば、特徴重要性（feature importance）に基づいて）予測エンジン１２０によって動的に選択可能である。他の例では、訓練／テスト・データセット３０４および製造データセット３０６は、従来の確率シフトを導入するための実際の階級ラベルに基づいて、（ベースの）ラベル付きデータセット３０２を再サンプリングすることによって生成可能である。 In one example, the training/test dataset 304 and production dataset 306 can be generated by resampling the (base) labeled dataset 302 based on various features to introduce covariate shifts. be. In this example, certain features of the (base) labeled dataset 302 used to resample the distributions of the training/test dataset 304 and the production dataset 306 are determined by (e.g., feature importance )) can be dynamically selected by the prediction engine 120 . In another example, the training/test dataset 304 and production dataset 306 are resampled from the (base) labeled dataset 302 based on the actual class labels to introduce conventional stochastic shifts. can be generated by

図３に示すように、特徴が「特徴Ａ」（例えば「イヌ」）（または「特徴Ｂ」が例えば「ネコ」）と仮定すると、「特徴Ａ」（または「特徴Ｂ」）の比率は、訓練／テスト・データセット３０４－１から３０４－６および製造データセット３０６－１から３０６－６の各々において、（例えば、ラベル付きデータセット３０２における特徴を再サンプリングすることによって）変化しうる。ここで、例えば、訓練／テスト・データセット３０４－１は、「特徴Ａ」の１００％（または「特徴Ｂ」の０％）の統計的分布を含み、対応する製造データセット３０６－１は、「特徴Ａ」の０％（または「特徴Ｂ」の１００％）の統計的分布を含む。同様に、対向端では、訓練／テスト・データセット３０４－６は、「特徴Ａ」の０％（または「特徴Ｂ」の１００％）の統計的分布を含み、対応する製造データセット３０６－６は、「特徴Ａ」の１００％（または「特徴Ｂ」の０％）の統計的分布を含む。 As shown in FIG. 3, assuming the feature is "feature A" (e.g. "dog") (or "feature B" is e.g. "cat"), the ratio of "feature A" (or "feature B") is Each of the training/test datasets 304-1 through 304-6 and the production datasets 306-1 through 306-6 may be varied (eg, by resampling features in the labeled dataset 302). Here, for example, training/test data set 304-1 contains a statistical distribution of 100% of "feature A" (or 0% of "feature B"), and the corresponding manufacturing data set 306-1 is: Contains the statistical distribution of 0% of "feature A" (or 100% of "feature B"). Similarly, at the opposite end, the training/test data set 304-6 contains a statistical distribution of 0% of "feature A" (or 100% of "feature B") and the corresponding manufacturing data set 306-6 contains the statistical distribution of 100% of "feature A" (or 0% of "feature B").

図２Ｂに戻って参照すると、いくつかの実施形態では、（例えば、訓練段階２１４の間）メタ・メタ・モデル２０６を訓練することは、入力データセットについてのメタデータ（例えば、訓練／テスト・データセット３０４、製造データセット３０６などを含むことができるオフラインのモデル／データセット１３８）または入力データセットの他の分布特徴空間特性あるいはその両方に基づいて、１つまたは複数の特徴を生成することを含んでもよい。一実施形態において、１つまたは複数の特徴は、ベース・モデル１３２および（訓練および製造データセットを含む）オフラインのモデル／データセット１３８に基づくことができる。例えば、予測エンジン１２０は、テスト・データセットおよび製造データセットにおけるさまざまな分布を計算してもよく、さまざまな分布の間の距離を比較して、メタ・メタ・モデル２０６を訓練するのに用いられる１つまたは複数の特徴を作成してもよい。 Referring back to FIG. 2B, in some embodiments, training meta-meta-model 206 (eg, during training phase 214) includes metadata about the input dataset (eg, training/test parameters). Generating one or more features based on the offline model/dataset 138) and/or other distributed feature spatial characteristics of the input dataset, which may include dataset 304, manufacturing dataset 306, etc. may include In one embodiment, one or more features may be based on base model 132 and off-line models/datasets 138 (including training and manufacturing datasets). For example, the prediction engine 120 may calculate various distributions in the test dataset and the manufacturing dataset and compare the distances between the various distributions used to train the meta-meta model 206. one or more features may be created.

図４に示すように、例えば、第１の分布（ヒストグラム４０２）は、テスト・データセット（例えば、テスト・データセット３０４）に基づいて計算可能であり、第２の分布（ヒストグラム４０４）は、製造データセット（例えば、製造データセット３０６）に基づいて計算可能である。分布の間の距離４０６（またはさらに一般的にいえば、相違）が計算され、特徴（特徴値とも呼ばれる）４０８として用いることができる。このようにメタ・メタ・モデル２０６を訓練するために特徴４０８を生成することは、メタ・モデル２０２からの平均出力が安定しているシナリオでは、著しく雑音が多い（例えば、閾値を上回る）性能予測を説明しうるが、サンプル式の不確定性がバッチ式のエラー・バーのためにあまりに大きいなどのシナリオでは、変化は高い（例えば、閾値を上回る）。 As shown in FIG. 4, for example, a first distribution (histogram 402) can be calculated based on a test data set (eg, test data set 304), and a second distribution (histogram 404) is: It can be calculated based on a manufacturing data set (eg, manufacturing data set 306). The distance 406 (or more generally the difference) between the distributions is computed and can be used as features (also called feature values) 408 . Generating features 408 to train meta-meta-model 206 in this manner results in significantly noisy (e.g., above threshold) performance in scenarios where the average output from meta-model 202 is stable. Variation is high (eg, above a threshold) in scenarios such as where the uncertainty of the sample formula is too large due to the error bars of the batch formula, although it may account for the predictions.

一実施形態において、一例の特徴４０８は、ベース・モデル１３２からの最高信頼スコアに基づいてもよい。例えば、ベース・モデル１３２を用いて、テスト・データセットおよび製造データセットにおけるサンプルをスコア付けすることができる。ベース・モデル１３２が（例えば、階級Ａと階級Ｂとの間の）物体分類のために用いられると仮定すると、ベース・モデル１３２を用いた出力予測スコアは、テスト・データセットにおける各サンプル（またはデータ点）および製造データセットにおける各サンプル（またはデータ点）のために取得可能である。一例として、テスト・データセットにおける第１のサンプルのために、ベース・モデル１３２は、９５％の階級Ａを出力してもよく、製造データセットにおける第１のサンプルのために、ベース・モデル１３２は９０％の階級Ａを出力してもよい。 In one embodiment, example feature 408 may be based on the highest confidence score from base model 132 . For example, base model 132 can be used to score samples in test and manufacturing data sets. Assuming base model 132 is used for object classification (e.g., between class A and class B), the output prediction score using base model 132 is for each sample (or data points) and for each sample (or data point) in the manufacturing data set. As an example, for the first sample in the test dataset, the base model 132 may output 95% class A, and for the first sample in the manufacturing dataset, the base model 132 may output 90% class A.

テスト・データセットおよび製造データセットの各々におけるすべてのサンプルがスコア付けされた後、テスト・データセットおよび製造データセットの両方におけるスコア付けされたサンプルのサブセットは、ヒストグラムを作るために用いることができる。例えば、テスト・データセットにおけるスコア付けされたサンプルからの最高の信頼スコア（例えば、一定の閾値範囲の間、例えば、９０－９５％、９５－９８％、９８－１００％の間など）の（第１の）ヒストグラムを生成することができる。同様に、製造データセットのスコア付けされたサンプルからの最高の信頼スコア（例えば、一定の閾値範囲の間）の（第２の）ヒストグラムを生成することができる。次に、２つのヒストグラムは、ダイバージェンス機能（ｄｉｖｅｒｇｅｎｃｅｆｕｎｃｔｉｏｎ）（例えば、類似性測定基準、相違測定基準）と比較可能である。一実施形態において、２つのヒストグラムの間の距離４０６（例えば、へリンガー距離）は、計算可能であり、（例えば、［０、１］における）距離４０６の値は、メタ・メタ・モデル２０６のための特徴４０８として用いることができる。 After all samples in each of the test and manufacturing data sets have been scored, a subset of the scored samples in both the test and manufacturing data sets can be used to create histograms. . For example, the highest confidence score (e.g., between a certain threshold range, e.g., between 90-95%, 95-98%, 98-100%, etc.) from the scored samples in the test dataset ( First) a histogram can be generated. Similarly, a (second) histogram of the highest confidence scores (eg, between certain threshold ranges) from the scored samples of the manufacturing data set can be generated. The two histograms can then be compared with a divergence function (eg, similarity metric, dissimilarity metric). In one embodiment, the distance 406 (e.g., Hellinger distance) between two histograms can be calculated, and the value of distance 406 (e.g., in [0, 1]) is the value of meta-meta model 206. can be used as feature 408 for

一実施形態において、一例の特徴４０８は、シャドウ・モデルの最高信頼スコアに基づくことができる。ベース・モデルの最高信頼スコアと比較して、本実施形態において、他の（プロキシ）モデルは、ベース・モデル１３２を訓練するのに用いられる同一の訓練データセット（例えば、訓練データセット１３６）において訓練される。次に、テスト・データセットおよび製造データセットからの最高信頼スコアは、（例えば、ベース・モデル１３２と対照的に）プロキシ・モデルを用いて計算される。ヒストグラムは、最高信頼スコアに基づいて生成可能であり、ヒストグラムの間の距離は、メタ・メタ・モデル２０６のための特徴４０８として用いることができる。 In one embodiment, an example feature 408 can be based on the shadow model's highest confidence score. Compared to the base model's highest confidence score, in this embodiment, the other (proxy) model is be trained. The highest confidence scores from the test and manufacturing datasets are then calculated using the proxy model (eg, as opposed to the base model 132). Histograms can be generated based on the highest confidence scores, and the distance between histograms can be used as feature 408 for meta-meta-model 206 .

一実施形態において、一例の特徴４０８は、階級度数距離に基づくことができる。本実施形態において、例えば、ベース・モデル１３２によって各階級であると予測されるテスト・データセットおよび製造データセットにおけるサンプルのパーセンテージのヒストグラムを作成することができる。ヒストグラムの間の距離４０６（例えば、へリンガー距離）は、計算可能であり、距離値は、メタ・メタ・モデル２０６のための特徴４０８として用いることができる。 In one embodiment, an example feature 408 can be based on a class frequency distance. In this embodiment, for example, a histogram of the percentage of samples in the test and manufacturing data sets predicted to be in each class by the base model 132 may be generated. A distance 406 (eg, Hellinger distance) between histograms can be calculated and the distance value can be used as a feature 408 for the meta-meta model 206 .

一実施形態において、一例の特徴４０８は、最高特徴距離に基づくことができる。本実施形態において、例えば、シャドウ・ランダム・フォレスト・モデルは、訓練可能であり、（例えば、所定の条件を満たす）最高の特徴重要性を有するデータ特徴を識別するために用いることができる。一旦識別されると、テスト・データセットおよび製造データセットのヒストグラムは、この次元に投影可能である（例えば、圧縮された１次元の特徴空間）。ヒストグラムの間の距離４０６（例えば、へリンガー距離）は、計算可能であり、距離値は、メタ・メタ・モデル２０６のための特徴４０８として用いることができる。 In one embodiment, the example feature 408 can be based on the highest feature distance. In this embodiment, for example, a Shadow Random Forest model can be trained and used to identify data features with the highest feature importance (eg, meeting a predetermined condition). Once identified, the histograms of the test and manufacturing datasets can be projected into this dimension (eg, a compressed one-dimensional feature space). A distance 406 (eg, Hellinger distance) between histograms can be calculated and the distance value can be used as a feature 408 for the meta-meta model 206 .

一実施形態において、一例の特徴４０８は、メタ・モデル予測（例えば、性能予測２０４）に基づくことができる。本実施形態において、メタ・モデル２０２により予測されるようにテスト・データセットと製造データセットとの間のベース・モデルの精度の変化は、メタ・メタ・モデル２０６のための特徴４０８として用いることができる。一実施形態において、一例の特徴４０８は、（第１のデータセットの）第１の統計的分布と（第２のデータセットの）第２の統計的分布との間の統計的仮説検定に基づくことができる。 In one embodiment, example features 408 may be based on meta-model predictions (eg, performance predictions 204). In this embodiment, the change in accuracy of the base model between the test dataset and the manufacturing dataset as predicted by meta-model 202 is used as feature 408 for meta-meta-model 206. can be done. In one embodiment, an example feature 408 is based on a statistical hypothesis test between a first statistical distribution (of the first data set) and a second statistical distribution (of the second data set). be able to.

上記の特徴が、本願明細書において記載されている機能に整合する、その他の特徴または入力データセットのメタ・データに基づく特徴の任意の組み合わせを用いることができるメタ・メタ・モデル２０６を訓練するのに用いることができる特徴の単なる参考例として提供されることに留意されたい。いくつかの実施形態において、例えば、複数のメタ・メタ・モデルは、異なる特徴に基づいて生成／訓練可能である。これらの実施形態では、実行段階２１６の間用いられる特定のメタ・メタ・モデルは、データの特性（例えば、データセットについてのメタデータ、特徴の数、分布特徴空間特性など）に基づいて、実行時に動的に選択可能である。 The above features train a meta-meta model 206 that can use any combination of other features or features based on the meta data of the input dataset that match the functionality described herein. Note that it is provided merely as an example of features that can be used to. In some embodiments, for example, multiple meta-meta-models can be generated/trained based on different features. In these embodiments, the particular meta-meta-model used during the execution stage 216 is based on the characteristics of the data (e.g., metadata about the dataset, number of features, distribution feature space characteristics, etc.). Sometimes dynamically selectable.

一旦メタ・メタ・モデル２０６が訓練されると、ベース・モデル１３２、テスト・データセット１３４、訓練データセット１３６、メタ・モデル２０２、製造データセット１３０および性能予測２０４の１つまたは複数は、入力としてメタ・メタ・モデル２０６に供給可能であり、メタ・メタ・モデル２０６は、（例えば、使用（展開）段階２１６の間）不確定性予測２０８を出力することができる。不確定性予測２０８は、メタ・モデル２０２から出力される性能予測２０４の不確定性の予測された量である。例えば、ベース・モデル１３２が物体分類のためのモデルであると仮定すると、不確定性予測２０８は、性能予測２０４の不確定性の量を表現する区間（または許容度）（例えば、±４、±７、＋２）を示すことができる。いくつかの実施形態において、不確定性予測２０８は、ベース・モデル１３２の追加の事前展開品質検査のためにまたは実行時またはベース・モデル１３２の展開の間の品質管理／検査のためにあるいはその両方のために用いることができる。 Once meta-meta-model 206 is trained, one or more of base model 132, test dataset 134, training dataset 136, meta-model 202, production dataset 130, and performance prediction 204 may receive input , and meta-meta-model 206 can output uncertainty predictions 208 (eg, during the use (deployment) phase 216). Uncertainty prediction 208 is the predicted amount of uncertainty in performance prediction 204 that is output from meta-model 202 . For example, assuming base model 132 is a model for object classification, uncertainty prediction 208 is an interval (or tolerance) representing the amount of uncertainty in performance prediction 204 (e.g., ±4, ±7, +2) can be shown. In some embodiments, the uncertainty prediction 208 is used for additional pre-deployment quality inspection of the base model 132 or for quality control/inspection during runtime or during deployment of the base model 132 or the like. Can be used for both.

図５は、一実施形態に従って、タスクにおけるモデルの性能およびタスクにおけるモデルの性能のための不確定性を予測するための方法５００のフローチャートである。方法５００は、コンピューティング・システムの予測エンジン（例えば、コンピューティング・システム１０２の予測エンジン１２０）によって実行されてもよい。 FIG. 5 is a flowchart of a method 500 for predicting a model's performance on a task and uncertainty for the model's performance on a task, according to one embodiment. Method 500 may be performed by a prediction engine of a computing system (eg, prediction engine 120 of computing system 102).

方法５００は、予測エンジンがベース・モデル（例えば、ベース・モデル１３２）および製造データのセット（例えば、製造データセット１３０）を取得するブロック５０２において開始してもよい。ブロック５０４において、予測エンジンは、メタ・モデル（例えば、メタ・モデル２０２）を用いて、製造データのセットにおいてベース・モデルの性能（例えば、性能予測２０４）を予測する。例えば、予測エンジンは、製造データのセットにおいてベース・モデルの１つまたは複数の精度関連の測定基準を生成してもよい。１つの特定の例において、精度関連の測定基準は、製造データのセットを用いて、そのタスクでのベース・モデルの成功／失敗確率（例えば、Ｘ％成功）を示してもよい。 Method 500 may begin at block 502 where a prediction engine obtains a base model (eg, base model 132) and a set of manufacturing data (eg, manufacturing data set 130). At block 504, the prediction engine uses the meta-model (eg, meta-model 202) to predict the performance of the base model (eg, performance prediction 204) in the set of manufacturing data. For example, the prediction engine may generate one or more accuracy-related metrics of the base model in the set of manufacturing data. In one particular example, an accuracy-related metric may indicate the success/failure probability (eg, X% success) of a base model at its task using a set of manufacturing data.

ブロック５０６において、予測エンジンは、メタ・メタ・モデル（例えば、メタ・メタ・モデル２０６）を用いて、製造データのセットのためのメタ・モデルの性能の１つまたは複数の不確定区間（例えば、不確定性予測２０８）を予測する。例えば、予測エンジンは、製造データのセットを用いて、そのタスクでのベース・モデルの予測された性能の不確定性の量を示す許容度（またはエラー・バンド）（例えば、±Ｙ）を生成することができる。 At block 506, the prediction engine uses the meta-meta-model (e.g., meta-meta-model 206) to generate one or more uncertainty intervals (e.g., , uncertainty prediction 208). For example, the prediction engine uses a set of manufacturing data to generate a tolerance (or error band) (e.g., ±Y) that indicates the amount of uncertainty in the predicted performance of the base model on that task. can do.

一実施形態において、（ブロック５０６において予測される）不確定区間は非対称でもよい。例えば、予測エンジンは、予測された性能の不確定性を表現するエラーの符号付きの値（例えば、＋または－）を予測することができる。他の例では、第１の不確定区間は、予測された性能の周りの上側のバンド／範囲のために生成可能であり、第２の不確定区間は、予測された性能の周りの下側のバンド／範囲のために生成可能である。 In one embodiment, the uncertainty interval (predicted in block 506) may be asymmetric. For example, the prediction engine can predict a signed value of error (eg, + or -) that expresses the uncertainty of predicted performance. In another example, a first uncertainty interval can be generated for the upper band/range around the predicted performance and a second uncertainty interval can be generated for the lower band/range around the predicted performance. band/range of .

いくつかの実施形態において、複数のメタ・メタ・モデルを用いて、複数の不確定区間（例えば、第１の（上側の）不確定区間のための第１のメタ・メタ・モデルおよび第２の（下側の）不確定区間のための第２のメタ・メタ・モデル）を生成することができる。いくつかの実施形態において、複数の不確定区間は、単一のメタ・メタ・モデルにより生成可能である。例えば、単一のメタ・メタ・モデルは、上側および下側の不確定区間を予測するための２つのサブモジュール（または構成要素）を含むことができる。 In some embodiments, multiple meta-meta-models are used to generate multiple uncertainty intervals (e.g., a first meta-meta-model for a first (upper) uncertainty interval and a second A second meta-meta model for the (lower) uncertainty interval of ) can be generated. In some embodiments, multiple uncertainty intervals can be generated by a single meta-meta-model. For example, a single meta-meta-model can include two sub-modules (or components) for predicting upper and lower uncertainty intervals.

ブロック５０８において、予測エンジンは、ベース・モデルの性能および１つまたは複数の不確定区間の表示を提供する。一実施形態において、予測エンジンは、コンピューティング・デバイス（例えば、コンピューティング・システム１０２）のディスプレイのユーザ・インタフェース上の表示を提供することができる。一実施形態において、予測エンジンは、製造データのセットのためのベース・モデルの性能予測および不確定性予測の要求に応答して、表示を提供することができる。例えば、ユーザは、予測エンジンを用いて、（複数のベース・モデルのうち）どのベース・モデルを製造データのセットにおいて用いるべきか決定してもよい。他の例では、予測エンジンは、ベース・モデルの展開／実行時をモニタし、（例えば、１つまたは複数の所定の時間間隔で）表示をコンピューティング・デバイスに絶えず提供してもよい。次に、この表示を用いて、ベース・モデルがいつ改良されるべきかまたは置換されるべきかあるいはその両方がされるべきかを決定することができる。いくつかの場合には、予測エンジンは、メタ・モデルと同じ頻度で、メタ・メタ・モデルを実行してもよい。 At block 508, the prediction engine provides an indication of the performance of the base model and one or more uncertainty intervals. In one embodiment, the prediction engine may provide a representation on a user interface of a display of a computing device (eg, computing system 102). In one embodiment, the prediction engine can provide an indication in response to requests for base model performance and uncertainty predictions for a set of manufacturing data. For example, a user may use a prediction engine to determine which base model (among multiple base models) should be used in a set of manufacturing data. In another example, the prediction engine may monitor the deployment/runtime of the base model and continually provide an indication to the computing device (eg, at one or more predetermined time intervals). This representation can then be used to determine when the base model should be improved and/or replaced. In some cases, the prediction engine may run the meta-meta model as often as the meta model.

図６は、一実施形態に従って、モデルの性能予測およびモデルの性能予測のための不確定区間を生成するための、スタックされたメタ・モデル・ワークフローのメタ・モデルおよびメタ・メタ・モデルを訓練するための方法６００のフローチャートである。方法６００は、コンピューティング・システムの予測エンジン（例えば、コンピューティング・システム１０２の予測エンジン１２０）によって実行されてもよい。 FIG. 6 is a stacked meta-model workflow for generating model performance predictions and uncertainty intervals for model performance predictions, according to one embodiment. 6 is a flowchart of a method 600 for Method 600 may be performed by a prediction engine of a computing system (eg, prediction engine 120 of computing system 102).

方法６００は、予測エンジンがベース・モデル（例えば、ベース・モデル１３２）、テスト・データセット（例えば、テスト・データセット１３４）および１つまたは複数の追加のデータセット（例えば、オフラインのモデル／データセット１３８）を検索するブロック６０２において開始してもよい。ブロック６０４において、予測エンジンは、ベース・モデルおよびテスト・データセットに基づいて、タスクにおけるベース・モデルの性能を予測するためにメタ・モデル（例えば、メタ・モデル２０２）を訓練する。ブロック６０６において、予測エンジンは、メタ・モデルおよび１つまたは複数の追加のデータセットに基づいて、そのタスクでのベース・モデルの性能の不確定性を予測するためにメタ・メタ・モデルを訓練する。 Method 600 is a method in which a prediction engine uses a base model (eg, base model 132), a test dataset (eg, test dataset 134), and one or more additional datasets (eg, offline models/datasets). set 138) may begin at block 602. At block 604, the prediction engine trains a meta-model (eg, meta-model 202) to predict the performance of the base model on the task based on the base model and the test dataset. At block 606, the prediction engine trains a meta-meta-model to predict the uncertainty of the performance of the base model on the task based on the meta-model and one or more additional datasets. do.

図７は、一実施形態に従って、メタ・メタ・モデルを訓練するための方法７００のフローチャートである。方法７００は、コンピューティング・システムの予測エンジン（例えば、コンピューティング・システム１０２の予測エンジン１２０）によって実行されてもよい。 FIG. 7 is a flowchart of a method 700 for training a meta-meta model, according to one embodiment. Method 700 may be performed by a prediction engine of a computing system (eg, prediction engine 120 of computing system 102).

方法７００は、予測エンジンがラベル付きデータセット（例えば、ラベル付きデータセット３０２）を取得するブロック７０２において開始してもよい。ブロック７０４において、予測エンジンは、ラベル付きデータセットに基づいて、１つまたは複数の追加のデータセット（例えば、訓練／テスト・データセット３０４、製造データセット３０６）を生成する。ブロック７０６において、予測エンジンは、１つまたは複数の追加のデータセットの評価に基づいて、１つまたは複数の特徴（例えば、特徴４０８）を決定（または計算）する。ブロック７０８において、予測エンジンは、１つまたは複数のメタ・メタ・モデル（例えば、メタ・メタ・モデル２０６）を訓練し、各々は、１つまたは複数の特徴に少なくとも部分的に基づいて、メタ・モデル（例えば、メタ・モデル２０２）の不確定区間を予測するように構成される。 Method 700 may begin at block 702 where a prediction engine obtains a labeled dataset (eg, labeled dataset 302). At block 704, the prediction engine generates one or more additional datasets (eg, training/test dataset 304, manufacturing dataset 306) based on the labeled dataset. At block 706, the prediction engine determines (or calculates) one or more features (eg, feature 408) based on evaluation of one or more additional data sets. At block 708, the prediction engine trains one or more meta-meta-models (e.g., meta-meta-model 206), each based at least in part on one or more features. • It is configured to predict the uncertainty interval of the model (eg, meta-model 202).

図８は、一実施形態に従って、ベース・モデルの性能予測および性能予測のための不確定区間の例のシミュレーション８０２および８０４を示す。 FIG. 8 shows simulations 802 and 804 of example uncertainty intervals for performance prediction and performance prediction of a base model, according to one embodiment.

より詳しくは、シミュレーション８０２は、異なるテスト／製造データセット１－Ｋにわたるベース・モデルの性能予測を描写する。示すように、ベース・モデル精度（例えば、ベース・モデルの実際の精度）は線８０８で表現され、ベース・モデルの予測された精度は線８１０で表現され、予測された精度のためのエラー・バーは８０６により表現される。各テスト／製造データセット１－Ｋは、特徴／ラベルの異なる統計的分布を有してもよい。例えば、テスト／製造データセット１は、テスト・データセットにおいて１００％の「特徴Ａ」および製造データセットにおいて０％の「特徴Ａ」の統計的分布を含んでもよい。同様に、対向端では、テスト／製造データセットＫは、テスト・データセットにおける０％の「特徴Ａ」および製造データセットにおける１００％の「特徴Ａ」の統計的分布を含んでもよい。 More specifically, simulation 802 depicts performance predictions of the base model across different test/manufacturing data sets 1-K. As shown, the base model accuracy (e.g., the actual accuracy of the base model) is represented by line 808, the predicted accuracy of the base model is represented by line 810, and the error margin for the predicted accuracy. A bar is represented by 806 . Each test/manufacturing data set 1-K may have a different statistical distribution of features/labels. For example, test/manufacturing data set 1 may include a statistical distribution of 100% "feature A" in the test data set and 0% "feature A" in the manufacturing data set. Similarly, at the opposite end, test/manufacturing data set K may include a statistical distribution of 0% "feature A" in the test data set and 100% "feature A" in the manufacturing data set.

シミュレーション８０２に対応するシミュレーション８０４は、（８４０により表現される）メタ・モデル・エラー、固定のエラー・バー（または不確定性値）８２０およびメタ・モデル・エラーと実際のベース・モデル出力との間のデルタ８３０を示す。示すように、モデルの性能予測のための不確定区間を生成するためにスタックされたメタ・モデル・ワークフローを用いて、実施形態は、固定の不確定性値８２０を用いることと比較して、より正確な不確定区間を生成することができる。 Simulation 804, which corresponds to simulation 802, shows the meta model error (represented by 840), fixed error bars (or uncertainty values) 820 and the meta model error versus the actual base model output. delta 830 between. As shown, using a stacked meta-model workflow to generate uncertainty intervals for model performance prediction, embodiments compared to using a fixed uncertainty value 820: A more accurate uncertainty interval can be generated.

上記では、この開示において示される実施形態を参照する。しかしながら、本開示の範囲は、特定の記載されている実施形態に限定されるものではない。その代わりに、特徴および要素の任意の組み合わせは、異なる実施形態に関するか否かにかかわらず、考察された実施形態を実施および実践するために考察される。さらに、本願明細書において開示される実施形態が他の可能な解決法または従来技術に勝る利点を達成することができるが、特定の利点が所定の実施形態によって達成されるか否かは、本開示の範囲の制限ではない。したがって、本願明細書において述べられる態様、特徴、実施形態および利点は、請求項において明確に詳述される場合を除き、単に説明するのみであり、添付の請求の範囲の要素または制限とみなされない。同様に、「本発明」の参照は、請求項において明確に詳述される場合を除き、本願明細書において開示される任意の発明の主題の一般化として解釈されるべきではなく、添付の請求の範囲の要素または制限とみなされるべきではない。 Reference is made above to embodiments presented in this disclosure. However, the scope of this disclosure is not limited to particular described embodiments. Instead, any combination of features and elements, whether related to different embodiments or not, is contemplated for implementing and practicing the discussed embodiments. Furthermore, although the embodiments disclosed herein may achieve advantages over other possible solutions or over the prior art, whether particular advantages are achieved by a given embodiment is subject to the present invention. It is not a limitation on the scope of disclosure. Accordingly, the aspects, features, embodiments and advantages set forth herein are merely illustrative and shall not be considered elements or limitations of the appended claims, unless expressly recited in the claims. . Similarly, references to the "present invention" should not be construed as a generalization of any inventive subject matter disclosed herein, except as expressly recited in the claims, rather than to the appended claims. should not be considered an element or limitation of the scope of

本発明の態様は、完全にハードウェアの実施形態、（ファームウェア、常駐ソフトウェア、マイクロ・コードなどを含む）完全にソフトウェアの実施形態または本願明細書において、「回路」、「モジュール」または「システム」と概して呼ばれてもよいソフトウェアおよびハードウェア態様を結合する実施形態の形をとってもよい。 Aspects of the present invention may be an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or, as referred to herein, a "circuit," "module," or "system." Embodiments may take the form of combining software and hardware aspects, which may generally be referred to as.

本発明は、システム、方法またはコンピュータ・プログラム製品あるいはその組み合わせでもよい。コンピュータ・プログラム製品は、プロセッサに本発明の態様を実行させるためのコンピュータ可読プログラム命令を有するコンピュータ可読記憶媒体を含んでもよい。 The invention may be a system, method or computer program product or combination thereof. The computer program product may include a computer readable storage medium having computer readable program instructions for causing a processor to carry out aspects of the present invention.

コンピュータ可読記憶媒体は、命令実行装置が使用するための命令を保持し、記憶することができる有形の装置とすることができる。コンピュータ可読記憶媒体は、例えば、限定されることなく、電子記憶装置、磁気記憶装置、光記憶装置、電磁記憶装置、半導体記憶装置または前述の任意の適切な組み合わせでもよい。コンピュータ可読記憶媒体のより具体的な例の包括的ではないリストは、ポータブル・コンピュータ・ディスケット、ハード・ディスク、ランダム・アクセス・メモリ（ＲＡＭ）、読み取り専用メモリ（ＲＯＭ）、消去可能プログラマブル読み取り専用メモリ（ＥＰＲＯＭまたはフラッシュ・メモリ）、スタティック・ランダム・アクセス・メモリ（ＳＲＡＭ）、ポータブル・コンパクト・ディスク読み取り専用メモリ（ＣＤ－ＲＯＭ）、デジタル・バーサタイル・ディスク（ＤＶＤ）、メモリ・スティック、フロッピー（Ｒ）・ディスク、パンチ・カードまたは命令が記録された溝内の隆起構造などの機械的に符号化された装置および前述の任意の適切な組み合わせを含む。本願明細書で使用されるようなコンピュータ可読記憶媒体は、電波または他の自由に伝播する電磁波、導波路もしくは他の伝送媒体を通って伝播する電磁波（例えば、光ファイバ・ケーブルを通過する光パルス）またはワイヤを通して送信される電気信号などの、それ自体一過性の信号であると解釈されるべきではない。 A computer-readable storage medium may be a tangible device capable of retaining and storing instructions for use by an instruction-executing device. A computer-readable storage medium may be, for example, without limitation, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of computer-readable storage media includes portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory. (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy (R) - Mechanically encoded devices such as discs, punch cards or raised structures in grooves on which instructions are recorded and any suitable combination of the foregoing. Computer-readable storage media as used herein include radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses passing through fiber optic cables). ) or an electrical signal transmitted over a wire, per se, should not be construed as a transient signal.

本願明細書に記載されるコンピュータ可読プログラム命令は、コンピュータ可読記憶媒体からそれぞれのコンピューティング／処理装置にあるいはネットワーク、例えばインターネット、ローカル・エリア・ネットワーク、ワイド・エリア・ネットワークまたは無線ネットワークあるいはその組み合わせを介して外部コンピュータもしくは外部記憶装置にダウンロードすることができる。ネットワークは、銅伝送ケーブル、光伝送ファイバ、無線伝送、ルータ、ファイアウォール、スイッチ、ゲートウェイ・コンピュータまたはエッジ・サーバあるいはその組み合わせを含んでもよい。各コンピューティング／処理装置のネットワーク・アダプタ・カードまたはネットワーク・インタフェースは、ネットワークからコンピュータ可読プログラム命令を受信し、コンピュータ可読プログラム命令をそれぞれのコンピューティング／処理装置内のコンピュータ可読記憶媒体に記憶するために転送する。 The computer readable program instructions described herein can be transferred from a computer readable storage medium to a respective computing/processing device or over a network such as the Internet, local area network, wide area network or wireless network or combinations thereof. can be downloaded to an external computer or external storage device via A network may include copper transmission cables, optical transmission fibers, wireless transmissions, routers, firewalls, switches, gateway computers or edge servers, or a combination thereof. A network adapter card or network interface in each computing/processing device for receiving computer-readable program instructions from the network and storing the computer-readable program instructions on a computer-readable storage medium within the respective computing/processing device. transfer to

本発明の動作を実行するためのコンピュータ可読プログラム命令は、アセンブラ命令、命令セット・アーキテクチャ（ＩＳＡ）命令、機械命令、機械依存命令、マイクロコード、ファームウェア命令、状態設定データまたはＳｍａｌｌｔａｌｋ（Ｒ）、Ｃ＋＋などのオブジェクト指向プログラミング言語および「Ｃ」プログラミング言語もしくは同様のプログラミング言語などの従来の手続き型プログラミング言語を含む、１つまたは複数のプログラミング言語の任意の組み合わせで記述されたソース・コードまたはオブジェクト・コードのいずれかでもよい。コンピュータ可読プログラム命令は、完全にユーザのコンピュータ上で、部分的にユーザのコンピュータ上で、スタンド・アローンのソフトウェア・パッケージとして、部分的にユーザのコンピュータ上でおよび部分的に遠隔コンピュータ上であるいは完全に遠隔コンピュータまたはサーバ上で実行されてもよい。後者のシナリオでは、遠隔コンピュータは、ローカル・エリア・ネットワーク（ＬＡＮ）またはワイド・エリア・ネットワーク（ＷＡＮ）を含む任意のタイプのネットワークを介してユーザのコンピュータに接続されてもよくあるいは（例えばインターネット・サービス・プロバイダを使用してインターネットを介して）外部コンピュータに接続されてもよい。いくつかの実施形態では、例えば、プログラマブル・ロジック回路、フィールド・プログラマブル・ゲート・アレイ（ＦＰＧＡ）またはプログラマブル・ロジック・アレイ（ＰＬＡ）を含む電子回路は、本発明の態様を実行するために、コンピュータ可読プログラム命令の状態情報を利用して電子回路を個人専用にすることによって、コンピュータ可読プログラム命令を実行することができる。 Computer readable program instructions for performing operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data or Smalltalk(R), C++. source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Either The computer-readable program instructions reside entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the user's computer. may be executed on a remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or wide area network (WAN) (e.g., the Internet). It may also be connected to an external computer (via the Internet using a service provider). In some embodiments, electronic circuits, including, for example, programmable logic circuits, field programmable gate arrays (FPGAs) or programmable logic arrays (PLAs), can be used with a computer to carry out aspects of the invention. Computer readable program instructions can be executed by personalizing an electronic circuit using the state information of the readable program instructions.

本発明の態様は、本発明の実施形態による方法、装置（システム）およびコンピュータ・プログラム製品のフローチャートまたはブロック図あるいはその両方を参照して本願明細書に記載されている。フローチャートまたはブロック図あるいはその両方の各ブロックならびにフローチャートまたはブロック図あるいはその両方のブロックの組み合わせは、コンピュータ可読プログラム命令によって実施可能であることが理解されよう。 Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart and/or block diagrams, and combinations of blocks in the flowchart and/or block diagrams, can be implemented by computer readable program instructions.

これらのコンピュータ可読プログラム命令は、コンピュータまたは他のプログラマブル・データ処理装置のプロセッサを介して実行される命令が、フローチャートまたはブロック図あるいはその両方のブロックにおいて指定された機能／行為を実施するための手段を作成するように、汎用コンピュータ、専用コンピュータまたは他のプログラマブル・データ処理装置のプロセッサに提供され、機械を生成することができる。これらのコンピュータ可読プログラム命令は、内部に命令が記憶されたコンピュータ可読記憶媒体が、フローチャートまたはブロック図あるいはその両方のブロックで指定された機能／行為の態様を実施する命令を含む製品を含むように、コンピュータ、プログラマブル・データ処理装置または他の装置あるいはその組み合わせを特定のやり方で機能させるように指示することができるコンピュータ可読記憶媒体にも記憶することができる。 These computer readable program instructions are the means by which instructions executed through a processor of a computer or other programmable data processing apparatus perform the functions/acts specified in the flowchart illustrations and/or block diagrams. can be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to create a machine. These computer readable program instructions are such that a computer readable storage medium in which the instructions are stored comprises an article of manufacture that implements aspects of the functions/acts specified in the flowchart and/or block diagram blocks. , a computer, programmable data processing device, or other device, or combination thereof, to function in a specific manner.

また、コンピュータ可読プログラム命令は、コンピュータ、他のプログラマブル装置または他の装置上で実行される命令が、フローチャートまたはブロック図あるいはその両方のブロックで指定された機能／行為を実施するように、コンピュータ、他のプログラマブル・データ処理装置または他の装置にロードされ、一連の動作ステップをコンピュータ、他のプログラマブル装置または他の装置上で実行させてコンピュータ実施プロセスを生成することができる。 Also, computer readable program instructions may be used to cause instructions executed on a computer, other programmable device, or other device to perform the functions/acts specified in the flowchart illustrations and/or block diagrams. It can be loaded into another programmable data processing apparatus or other device to cause a series of operational steps to be executed on the computer, other programmable device or other device to produce a computer-implemented process.

図のフローチャートおよびブロック図は、本発明のさまざまな実施形態によるシステム、方法およびコンピュータ・プログラム製品の可能な実施態様のアーキテクチャ、機能および動作を示す。この点に関して、フローチャートまたはブロック図の各ブロックは、指定された論理機能を実施するための１つまたは複数の実行可能命令を含む、モジュール、セグメントまたは命令の一部を表すことができる。いくつかの代替実施態様では、ブロックに記載された機能は、図に記載された順序とは異なって行われてもよい。例えば、連続して示されている２つのブロックは、実際には、実質的に同時に実行されてもよいし、または、ブロックは、関与する機能に応じて、時には逆の順序で実行されてもよい。ブロック図またはフローチャートあるいはその両方の各ブロックおよびブロック図またはフローチャートあるいはその両方のブロックの組み合わせは、指定された機能または行為を実行するあるいは専用ハードウェアおよびコンピュータ命令の組み合わせを実行する専用のハードウェア・ベースのシステムによって実施可能であることにも留意されよう。 The flowcharts and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block of a flowchart or block diagram can represent a module, segment, or portion of instructions containing one or more executable instructions for performing the specified logical function. In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality involved. good. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, represent specialized hardware and/or combinations of computer instructions that perform the specified functions or acts or implement combinations of specialized hardware and computer instructions. Note also that it can be implemented by the base system.

本発明の実施形態は、クラウド・コンピューティング・インフラストラクチャを通してエンド・ユーザに提供されてもよい。クラウド・コンピューティングは、概して、ネットワーク上のサービスとして、拡張可能な計算資源の提供を意味する。より正式には、クラウド・コンピューティングは、計算資源とその基礎となる技術アーキテクチャ（例えば、サーバ、ストレージ、ネットワーク）との間の抽象概念を提供する計算能力として定義されてもよく、最小の管理作業またはサービス・プロバイダ相互作用によって高速に提供および解放することができる構成可能な計算資源の共有プールに対する便利なオンデマンドのネットワーク・アクセスを可能にする。したがって、計算資源を提供するのに用いられる基礎となる物理システム（またはそれらのシステムの位置）に関係なく、クラウド・コンピューティングによって、ユーザは、「クラウド」内の仮想計算資源（例えば、ストレージ、データ、アプリケーションおよび完全に仮想化されたコンピューティング・システムさえ）にアクセスすることができる。 Embodiments of the present invention may be provided to end users through a cloud computing infrastructure. Cloud computing generally refers to the provision of scalable computing resources as a service over a network. More formally, cloud computing may be defined as computing power that provides an abstraction between computational resources and their underlying technical architecture (e.g., servers, storage, networks), with minimal management It enables convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released by work or service provider interaction. Thus, regardless of the underlying physical systems used to provide the computing resources (or the location of those systems), cloud computing allows users to create virtual computing resources (e.g., storage, data, applications and even fully virtualized computing systems).

典型的には、クラウド・コンピューティングの資源は、利用回数料金制でユーザに提供され、ユーザは、実際に用いられる計算資源（例えば、ユーザによって消費されるストレージ空間の量またはユーザによってインスタンス生成される仮想化システムの数）に対してのみ課金される。ユーザは、いつでも、そして、インターネット全体のどこからでもクラウド内の資源のいずれかにアクセスすることができる。本発明の文脈において、ユーザは、クラウド内の利用可能なアプリケーション（例えば、予測エンジン１２０）または関連データ（例えば、ベース・モデル１３２、テスト・データセット１３４、訓練データセット１３６、オフラインのモデル／データセット１３８、製造データセット１３０など）にアクセスすることができる。例えば、予測エンジン１２０は、クラウド内のコンピューティング・システム上で実行することができ、ベース・モデルの予測された性能のための不確定区間とともに、ベース・モデルの性能を予測することができる。この種の場合には、アプリケーションは、クラウド内のストレージ位置から性能予測または不確定性予測あるいはその両方を生成するのに用いられる入力情報の１つまたは複数を取り出し、クラウド内のストレージ位置に性能予測または不確定性予測あるいはその両方を格納することができる。 Typically, cloud computing resources are provided to users on a pay-as-you-go basis, and users pay for the computational resources actually used (e.g., the amount of storage space consumed by the user or instantiated by the user). number of virtualization systems connected). Users can access any of the resources in the cloud at any time and from anywhere across the Internet. In the context of the present invention, a user can access available applications in the cloud (e.g., prediction engine 120) or related data (e.g., base model 132, test dataset 134, training dataset 136, offline models/data). set 138, manufacturing data set 130, etc.) can be accessed. For example, the prediction engine 120 can run on a computing system in the cloud and can predict the performance of the base model along with uncertainty intervals for the predicted performance of the base model. In this type of case, the application retrieves one or more of the input information used to generate the performance and/or uncertainty predictions from storage locations in the cloud and stores the performance predictions in storage locations in the cloud. Predictions and/or uncertainty predictions can be stored.

上記は、本発明の実施形態に向けられるが、その他および本発明のさらなる実施形態は、その基本的な範囲を逸脱しない範囲で考案されてもよく、その範囲は以下の請求項により決定される。 While the above is directed to embodiments of the invention, other and further embodiments of the invention may be devised without departing from its basic scope, which scope is determined by the following claims. .

Claims

A computer-implemented method for generating a performance prediction for a machine learning (ML) model, comprising:
obtaining a manufacturing data set including a first model configured to perform a task and unlabeled data;
generating at least one metric that predicts the performance of the first model when performing the task on the manufacturing data set using a second model, the second model comprising: , a meta-model associated with the first model;
predicting uncertainty of the at least one metric that predicts the performance of the first model when performing the task on the manufacturing data set using one or more third models generating at least one value, wherein each of said one or more third models is a meta-meta-model associated with said second model;
providing an indication of the at least one metric that predicts the performance of the first model and the at least one value that predicts the uncertainty of the at least one metric;
A computer-implemented method comprising:

wherein the first model is a black box model;
2. The computer-implemented method of claim 1.

wherein the first model is a white box model;
2. The computer-implemented method of claim 1.

at least one of the one or more third models is trained on (i) a training data set for the first model and (ii) the second model;
2. The computer-implemented method of claim 1.

At least one of the one or more third models includes (i) a training data set for the first model, (ii) one or more additional data sets and (iii) the second is trained on a model of
2. The computer-implemented method of claim 1.

at least one of the one or more third models is trained on (i) one or more additional datasets and (ii) the second model;
2. The computer-implemented method of claim 1.

said at least one of said one or more third models is trained offline on said one or more additional datasets and said second model;
7. The computer-implemented method of claim 6.

said one or more additional data sets comprising at least one data set comprising a first set of labels for classifying data in said at least one data set;
the first set of labels is different from a second set of labels for classifying data in a dataset used to train the second model;
7. The computer-implemented method of claim 6.

The one or more additional datasets comprise at least one labeled dataset, the computer-implemented method comprising:
generating a plurality of test datasets and a plurality of manufacturing datasets based on the labeled datasets;
including the plurality of test datasets and the plurality of manufacturing datasets in the one or more additional datasets;
7. The computer-implemented method of claim 6.

Generating the plurality of test data sets and the plurality of manufacturing data sets may include each test data set and corresponding manufacturing data set having a different ratio of the one or more features to the labeled data set. resampling one or more features of the labeled dataset to include
10. The computer-implemented method of claim 9.

The computer-implemented method includes a first statistical distribution of a first data set in the one or more additional data sets and a second statistical distribution of a second data set in the one or more additional data sets. generating at least one feature value for training the one or more third models based at least in part on a statistical distribution; said at least one is further trained with said one or more feature values;
7. The computer-implemented method of claim 6.

the at least one feature value is further generated based on a divergence function between the first statistical distribution and the second statistical distribution;
12. The computer-implemented method of claim 11.

the at least one feature value is further generated based on a statistical hypothesis test between the first statistical distribution and the second statistical distribution;
12. The computer-implemented method of claim 11.

The one or more third models are a subset of the plurality of third models, and the computer-implemented method comprises at least partially characterizing one or more features of the one or more additional data sets. further comprising selecting the one or more third models from the plurality of third models based on
2. The computer-implemented method of claim 1.

the at least one value predicting the uncertainty of the at least one metric comprises an interval range;
2. The computer-implemented method of claim 1.

wherein the interval range is an asymmetric interval range comprising a first upper extent and a second lower extent;
16. The computer-implemented method of claim 15.

the first upper range and the second lower range are generated via one of the one or more third models;
17. The computer-implemented method of claim 16.

the one or more third models comprises a plurality of third models;
said first upper range is generated via a first of said plurality of third models;
the second lower range is generated via a second of the plurality of third models;
17. The computer-implemented method of claim 16.

said first upper range is generated via a first component of said one or more third models;
the second lower range is generated via the one second component of the one or more third models;
17. The computer-implemented method of claim 16.

a system,
one or more computer processors;
a memory containing a program;
The program, when executed by the one or more computer processors, performs acts for generating a performance prediction for a machine learning (ML) model, the acts comprising:
obtaining a manufacturing data set including a first model configured to perform a task and unlabeled data;
generating at least one metric that predicts the performance of the first model when performing the task on the manufacturing data set using a second model, the second model comprising: , a meta-model associated with the first model;
at least one value that predicts the uncertainty of the at least one metric that predicts the performance of the first model when performing the task on the manufacturing data set using a third model; generating, wherein the third model is a meta-meta model associated with the second model;
providing an indication of the at least one metric that predicts the performance of the first model and the at least one value that predicts the uncertainty of the at least one metric;
system including.

A computer readable storage medium having computer readable program code embedded therein, the computer readable program code executable by one or more computer processors for generating a performance prediction of a machine learning (ML) model. perform the actions of
obtaining a manufacturing data set including a first model configured to perform a task and unlabeled data;
generating at least one metric that predicts the performance of the first model when performing the task on the manufacturing data set using a second model, the second model comprising: , a meta-model associated with the first model;
at least one value that predicts the uncertainty of the at least one metric that predicts the performance of the first model when performing the task on the manufacturing data set using a third model; generating, wherein the third model is a meta-meta model associated with the second model;
providing an indication of the at least one metric that predicts the performance of the first model and the at least one value that predicts the uncertainty of the at least one metric;
A computer readable storage medium comprising