JP2022056412A

JP2022056412A - Method, system, and program (mobile ai) for selecting machine learning model

Info

Publication number: JP2022056412A
Application number: JP2021158544A
Authority: JP
Inventors: ウマルアシフ; Asif Umar; カバラーステファンボン; Von Cavallar Stefan; ジアンビンタン; Jianbin Tang; ステファンハラー; Sutefuan Haraa
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2020-09-29
Filing date: 2021-09-28
Publication date: 2022-04-08

Abstract

To allow a machine learning model to be optimized for development into a device on the basis of hardware specifications of the device.SOLUTION: An existing model is acquired and pruned in order to reduce hardware resource consumption of the model. Next, the pruned model is trained on the basis of training data. The pruned model is also trained on the basis of a collection of teacher models. Next, the performance of the trained model is evaluated and is compared with performance requirements which can be based on hardware specifications of a device.SELECTED DRAWING: Figure 1

Description

本開示は、機械学習、より詳細には、機械学習モデルの性能の最適化に関する。 The present disclosure relates to machine learning, and more specifically to optimizing the performance of machine learning models.

ニューラルネットワークなどの機械学習モデルは、用途がますます多様になっている。機械学習モデルは有益であり得るが、機械学習モデルには、依然として関連する欠点およびコストがある。例えば、モデルのトレーニングは特にリソースを大量に消費する可能性があり、精度と必要なコンピューティングリソースはしばしば反比例する。 Machine learning models such as neural networks are becoming more and more versatile. While machine learning models can be useful, machine learning models still have associated drawbacks and costs. For example, training a model can be particularly resource-intensive, and accuracy and required computing resources are often inversely proportional.

具体的には、最新のシステムでも、展開されたモデルが精度とリソース性能要件の両方を満たしていることを保証しながら、比較的低電力デバイスに機械学習モデルを自動的に構成して展開することはできない。例えば、低電力デバイスで使用される機械学習モデルは、（より高度な機械学習モデルで必要とされるハードウェアオーバーヘッドのため）しばしば性能が制限されている。さらに、多くの分野（医学など）では、利用できるトレーニングデータの量が限られている場合があり、新しいモデルのトレーニングが特に困難になる。したがって、所与のタスク（分類など）が従来技術の機械学習モデルを介して確実に達成され得るとしても、そのようなモデルでは、低電力デバイスの所与のハードウェア仕様内での実行には相当な労力を要する可能性が高い。同様に、多くの従来技術のモデルは限られたハードウェアリソースで機能でき得るが、それらの「軽量」構成の結果として、それらの性能（例えば、精度）は、典型的には、損なわれる。 Specifically, even modern systems automatically configure and deploy machine learning models on relatively low power devices, ensuring that the deployed model meets both accuracy and resource performance requirements. It is not possible. For example, machine learning models used in low power devices are often limited in performance (due to the hardware overhead required by more advanced machine learning models). In addition, in many areas (such as medicine), the amount of training data available may be limited, making training new models particularly difficult. Therefore, even if a given task (such as classification) can be reliably accomplished through prior art machine learning models, such a model would not be able to perform within a given hardware specification of a low power device. It is likely to require considerable effort. Similarly, many prior art models can function with limited hardware resources, but their performance (eg, accuracy) is typically compromised as a result of their "lightweight" configuration.

機械学習モデルは有益であり得るが、機械学習モデルには、依然として関連する欠点およびコストがある。例えば、モデルのトレーニングは特にリソースを大量に消費する可能性があり、精度と必要なコンピューティングリソースはしばしば反比例する。 While machine learning models can be useful, machine learning models still have associated drawbacks and costs. For example, training a model can be particularly resource-intensive, and accuracy and required computing resources are often inversely proportional.

本開示のいくつかの実施形態は、第１の方法として例示され得る。第１の方法は、デバイスのハードウェア仕様を受信することを含む。第１の方法は、デバイスのハードウェア仕様に基づいて性能要件を判断することも含む。第１の方法は、層のセットを有する機械学習モデルを取得することも含む。第１の方法は、教師モデルの取得も含む。第１の方法は、機械学習モデルから層のセットの１つまたは複数の層を削除することも含み、その結果、剪定された層のセットを有する学生モデルが作成される。第１の方法は、トレーニングデータおよび教師モデルに基づき、学生モデルをトレーニングすることも含む。第１の方法は、学生モデルの性能を評価し、学生モデルの性能と性能要件を比較することも含む。第１の方法は、最高の性能を有するモデルを選択するために、（比較に基づいて）判断することも含む。この第１の方法は、性能基準を満たすようにモデルをトレーニングしながら、デバイスのハードウェア仕様に基づく要件内で機能するようにモデルを構成することを有利に可能にする。 Some embodiments of the present disclosure may be exemplified as a first method. The first method involves receiving the hardware specifications of the device. The first method also includes determining performance requirements based on the hardware specifications of the device. The first method also includes acquiring a machine learning model with a set of layers. The first method also includes the acquisition of a teacher model. The first method also involves removing one or more layers of a set of layers from the machine learning model, resulting in a student model with a set of pruned layers. The first method also includes training the student model based on the training data and the teacher model. The first method also includes evaluating the performance of the student model and comparing the performance of the student model with the performance requirements. The first method also involves making a judgment (based on comparison) to select the model with the best performance. This first method makes it possible to advantageously configure the model to function within the requirements based on the hardware specifications of the device while training the model to meet performance criteria.

本開示のいくつかの実施形態は、第２の方法として例示され得る。第２の方法は、上記で論じたような第１の方法を含み、（上記で論じたように第１の方法の）トレーニングは、トレーニングデータを学生モデルおよび教師モデルに入力することと、学生モデルから学生特徴を受信することと、教師モデルから教師特徴を受信することと、学生特徴と教師特徴との間の第１の比較を実行することと、学生モデルから学生出力を受信することと、学生出力とトレーニングデータとの間で第２の比較を実行することと、第１の比較と前記第２の比較に基づいて、学生モデルの剪定された層のセットの少なくとも１つの層を調整することとを含む。この第２の方法は、学生モデルが教師モデルと比較して大幅に削減されたリソースオーバーヘッドを有するにもかかわらず、高度な教師モデルと同様に実行するように学生モデルを構成することを有利に可能にする。 Some embodiments of the present disclosure may be exemplified as a second method. The second method includes the first method as discussed above, and the training (of the first method as discussed above) involves inputting training data into the student and teacher models and the student. Receiving student features from the model, receiving teacher features from the teacher model, performing the first comparison between student features and teacher features, and receiving student output from the student model. Performing a second comparison between student output and training data, and adjusting at least one layer of the student model's set of pruned layers based on the first comparison and said second comparison. Including what to do. This second method favorably configures the student model to perform similar to the advanced teacher model, even though the student model has significantly reduced resource overhead compared to the teacher model. to enable.

本開示のいくつかの実施形態は、第３の方法として例示され得る。第３の方法は、上記で論じたような第１の方法を含み、（上記で論じたような第１の方法の）１つまたは複数の層を削除することは、性能要件に基づいて、削除する層の数を判断することと、数に基づいて１つまたは複数の層をランダムに選択することと、選択された層を削除することとを含む。この第３の方法は、効率が向上したデバイスのハードウェア仕様に基づいてモデルを剪定することを有利に可能にする。 Some embodiments of the present disclosure may be exemplified as a third method. The third method includes the first method as discussed above, and removing one or more layers (of the first method as discussed above) is based on performance requirements. It involves determining the number of layers to remove, randomly selecting one or more layers based on the number, and deleting the selected layers. This third method advantageously allows the model to be pruned based on the hardware specifications of the device with improved efficiency.

本開示のいくつかの実施形態は、第４の方法として例示され得る。第４の方法は、ラベル付けされたデータを取得することを含む。第４の方法は、ラベル付けされていないデータを取得することも含む。第４の方法は、ラベル付けされたデータとラベル付けされていないデータを比較することも含む。第４の方法は、比較に基づいてラベル付けされていないデータにラベル付けすることも含まれ、その結果、弱くラベル付けされたデータが生成される。第４の方法は、弱くラベル付けされたデータに基づいたモデルを再トレーニングすることも含み、その結果、モデルが再トレーニングされる。この第４の方法は、最初にラベル付けされていないデータに基づいてモデルを再トレーニングすることを有利に可能にする。 Some embodiments of the present disclosure may be exemplified as a fourth method. A fourth method involves retrieving labeled data. The fourth method also includes retrieving unlabeled data. The fourth method also includes comparing the labeled data with the unlabeled data. The fourth method also includes labeling unlabeled data based on comparison, resulting in weakly labeled data. The fourth method also involves retraining the model based on weakly labeled data, resulting in retraining of the model. This fourth method advantageously allows the model to be retrained based on the initially unlabeled data.

本開示のいくつかの実施形態は、第５の方法として例示され得る。第５の方法は、上記で論じたような第４の方法を含む。第５の方法は、教師モデルから教師出力を取得することも含み、（上記で論じたような第４の方法の）再トレーニングは、強くラベル付けされたデータおよび教師出力にさらに基づく。この第５の方法は、最初に、性能が向上したラベル付けされていないデータに基づいてモデルを再トレーニングすることを有利に可能にする。 Some embodiments of the present disclosure may be exemplified as a fifth method. The fifth method includes a fourth method as discussed above. The fifth method also involves obtaining the teacher output from the teacher model, and the retraining (of the fourth method as discussed above) is further based on the strongly labeled data and the teacher output. This fifth method first makes it possible to advantageously retrain the model based on performance-enhanced unlabeled data.

本開示のいくつかの実施形態は、第６の方法として例示され得る。第６の方法は、上記で論じたような第４の方法を含み、（上記で論じたような第４の方法の）ラベル付けされたデータを取得することは、ラベル付けされた入力データを受信することと、ラベル付けされた入力データに基づくモデルを介して、ラベル付けされた特徴データを生成することとを含み、ラベル付けされた特徴データはラベル付けされたデータである。さらに、第６の方法では、（上記で論じたような第４の方法の）ラベル付けされていないデータを取得することは、ラベル付けされていない入力データを収集することと、ラベル付けされていない入力データに基づくモデルを介して、ラベル付けされていない特徴データを生成することとを含み、ラベル付けされていない特徴データはラベル付けされていないデータである。この第６の方法は、最初に、プライバシーセキュリティが向上し、さらに有効性が向上した、ラベル付けされていないデータに基づいてモデルを再トレーニングすることを有利に可能にする。 Some embodiments of the present disclosure may be exemplified as a sixth method. The sixth method includes a fourth method as discussed above, and retrieving labeled data (of the fourth method as discussed above) takes the labeled input data. Labeled feature data is labeled data, including receiving and generating labeled feature data via a model based on the labeled input data. Further, in the sixth method, retrieving unlabeled data (of the fourth method as discussed above) is labeled as collecting unlabeled input data. Unlabeled feature data is unlabeled data, including generating unlabeled feature data via a model based on no input data. This sixth method first makes it possible to advantageously retrain the model based on unlabeled data with improved privacy security and further effectiveness.

本開示のいくつかの実施形態は、コンピュータプログラム製品で具現化されたプログラム命令を有するコンピュータ可読記憶媒体を有するコンピュータプログラム製品としても例示でき、プログラム命令は、上記で論じたような方法のいずれかをコンピュータに実行させるためにコンピュータによって実行可能である。これは、性能基準を満たすようにモデルをトレーニングしながら、デバイスのハードウェア仕様に基づく要件内で機能するようにモデルを構成することを有利に可能にする。 Some embodiments of the present disclosure may also be exemplified as a computer program product having a computer-readable storage medium having the program instructions embodied in the computer program product, where the program instructions are any of the methods discussed above. Can be run by the computer to get the computer to run. This makes it possible to advantageously configure the model to function within the requirements based on the hardware specifications of the device while training the model to meet performance criteria.

本開示のいくつかの実施形態は、システムとして例示され得る。システムは、メモリおよび中央処理装置（ＣＰＵ）を備え得る。ＣＰＵは、上記で論じたような方法のいずれかを実行するための命令を実行するように構成され得る。これは、性能基準を満たすようにモデルをトレーニングしながら、デバイスのハードウェア仕様に基づく要件内で機能するようにモデルを構成することを有利に可能にする。 Some embodiments of the present disclosure may be exemplified as a system. The system may include memory and a central processing unit (CPU). The CPU may be configured to execute an instruction to perform any of the methods discussed above. This makes it possible to advantageously configure the model to function within the requirements based on the hardware specifications of the device while training the model to meet performance criteria.

上記の概要は、本開示の各例示された実施形態またはすべての実装を説明することを意図するものではない。 The above overview is not intended to illustrate each of the exemplary embodiments or implementations of the present disclosure.

本出願に含まれる図面は、本明細書に組み込まれ、本明細書の一部を形成する。それらは、本開示の実施形態を例示し、説明とともに、開示の原理を説明するのに役立つ。図面は、特定の実施形態を例示するだけであり、本開示を限定するものではない。特許請求される主題の様々な実施形態の特徴および利点は、以下の発明を実施するための形態が進むにつれて、および同様の数字が同様の部分を示す図面を参照することによって明らかになるであろう。 The drawings included in this application are incorporated herein and form part of this specification. They exemplify embodiments of the present disclosure and, along with explanations, serve to explain the principles of disclosure. The drawings merely illustrate certain embodiments and are not intended to limit the disclosure. The features and advantages of the various embodiments of the claimed subject matter will become apparent as the embodiments for carrying out the following invention progress, and by reference to the drawings in which similar numbers show similar parts. Let's do it.

本開示のいくつかの実施形態と一致する、高レベルの自動化された複合知識蒸留法である。A high level of automated complex knowledge distillation method consistent with some embodiments of the present disclosure.

本開示のいくつかの実施形態と一致する、学生モデルを生成するためのモデルの選択および剪定の例示的なダイアグラムである。It is an exemplary diagram of model selection and pruning to generate a student model, consistent with some embodiments of the present disclosure.

本開示のいくつかの実施形態と一致する、トレーニングデータおよび複数の教師モデルに部分的に基づく学生モデルをトレーニングするためのデータフローの例示的なダイアグラムである。It is an exemplary diagram of a data flow for training a student model based in part on training data and multiple teacher models, consistent with some embodiments of the present disclosure.

本開示のいくつかの実施形態と一致する、ラベル付けされていないデータに基づいて機械学習モデルを更新する高レベルの方法を示す。Demonstrates a high level method for updating machine learning models based on unlabeled data, consistent with some embodiments of the present disclosure.

本開示のいくつかの実施形態と一致する、ラベル付けされていない観察データが既知のデータセットに基づいて自動的にラベル付けされ得る方法の例示的な例を示すダイアグラムである。It is a diagram showing an exemplary example of how unlabeled observational data can be automatically labeled based on a known dataset, consistent with some embodiments of the present disclosure.

本開示のいくつかの実施形態と一致する、ラベル付けされていない入力データから導出された匿名化された特徴データに基づいて機械学習モデルを更新する高レベルの方法を示す。Demonstrates a high-level way to update a machine learning model based on anonymized feature data derived from unlabeled input data, consistent with some embodiments of the present disclosure.

本開示のいくつかの実施形態と一致する、トランスダクティブ学習を介したラベル付けされていない入力データのラベル付けの例を示すダイアグラムである。It is a diagram showing an example of labeling of unlabeled input data via transductive learning, consistent with some embodiments of the present disclosure.

本開示の実施形態を実施する際に使用し得る例示的なコンピュータシステムの高レベルのブロック図を示す。A high-level block diagram of an exemplary computer system that can be used in implementing the embodiments of the present disclosure is shown.

本発明は様々な修正形態および代替形態を受け入れることができるが、その具体例が例として図面に示されており、詳細に説明される。しかし、その意図は、本発明を記載された特定の実施形態に限定することではないことを理解されたい。それどころか、その意図は、本発明の精神および範囲内にあるすべての改変物、等価物、および代替物を網羅することである。 The present invention can accept various modified and alternative forms, the specific examples of which are shown in the drawings as examples and will be described in detail. However, it should be understood that the intent is not limited to the particular embodiment described of the present invention. On the contrary, the intent is to cover all modifications, equivalents, and alternatives within the spirit and scope of the invention.

本開示の態様は、デバイスの仕様に関してモデルを最適化するためのシステムおよび方法に関する。より具体的な態様は、機械学習モデル構造を取得し、構造を圧縮し、圧縮モデルをトレーニングし、トレーニングされたモデルの性能を評価し、最高の性能を有するモデルを選択するシステムに関する。 Aspects of the present disclosure relate to systems and methods for optimizing models with respect to device specifications. A more specific embodiment relates to a system that acquires a machine learning model structure, compresses the structure, trains the compression model, evaluates the performance of the trained model, and selects the model with the highest performance.

本開示のさらなる態様は、測定データに基づいてモデルの性能を向上させるためのシステムおよび方法に関する。本開示のより具体的な態様は、ラベル付けされていないデータを受信し、ラベル付けされていないデータを既知のデータと比較し、比較に基づいてラベル付けされていないデータにラベルを付け、新しくラベル付けされたデータに基づいてモデルを再トレーニングするシステムに関する。 A further aspect of the disclosure relates to a system and method for improving the performance of a model based on measurement data. A more specific aspect of the present disclosure is to receive unlabeled data, compare the unlabeled data with known data, label the unlabeled data based on the comparison, and add new labels. Regarding a system that retrains a model based on labeled data.

本開示に記載されるシステムおよび方法は、「複合知識蒸留」を介した自動化されたコンパクトモデル圧縮を可能にし、ハードウェア仕様もしくは関連する性能要件またはその両方に基づいて、システムがニューラルネットワークなどの機械学習モデルを最適化することを有利に可能にする。言い換えれば、ほとんどの最先端のシステムは、精度／速度とリソース要件との間のトレードオフを受け入れるが、本開示のシステムおよび方法を介してトレーニングされたモデルは、モバイルデバイスまたは手首に取り付けられたデバイスなどの比較的低電力のハードウェア上で機能しているにもかかわらず、精度もしくは速度またはその両方を最大化するように最適化できる。 The systems and methods described in this disclosure allow for automated compact model compression via "complex knowledge distillation", such as neural networks, based on hardware specifications and / or related performance requirements. It makes it possible to optimize the machine learning model in an advantageous way. In other words, most state-of-the-art systems accept trade-offs between accuracy / speed and resource requirements, but models trained through the systems and methods disclosed are mounted on mobile devices or wrists. Despite working on relatively low power hardware such as devices, it can be optimized to maximize accuracy and / or speed.

さらに、本開示と一致するシステムおよび方法は、比較的限られたトレーニングデータセットであっても、モデルをトレーニングして、より低電力のデバイス上で確実に動作させることを有利に可能にできる。 Moreover, systems and methods consistent with the present disclosure can advantageously allow the model to be trained to operate reliably on lower power devices, even with relatively limited training data sets.

本開示全体を通して、「機械学習モデル」が参照され、簡単にするために「モデル」に略されている。モデルは、例えば、畳み込みニューラルネットワーク（ＣＮＮ）、多層パーセプトロン（ＭＬＰ）、リカレントニューラルネットワーク（ＲＮＮ）、長短期記憶（ＬＳＴＭ）ＲＮＮなどの人工ニューラルネットワークを含み得る。モデルは、分類、予測など、様々な目的に使用できる。典型的には、モデルはデータの入力を受信し、そのデータをいくつかの「層」に渡す。例えば、データは、（層の構成に基づいて）データを操作し、１つまたは複数の「特徴」（しばしば「特徴ベクトル」にグループ化される）の出力を生成する第１の層に入力され得る。層の構成は、層によって実行されるデータ操作の下位レベルの態様を担当する、いくつかの様々な「チャネル」と「フィルタ」を含む。 Throughout this disclosure, "machine learning models" are referred to and abbreviated to "models" for simplicity. The model may include, for example, an artificial neural network such as a convolutional neural network (CNN), a multi-layer perceptron (MLP), a recurrent neural network (RNN), a long short-term memory (LSTM) RNN. The model can be used for various purposes such as classification and prediction. Typically, the model receives an input of data and passes that data to several "layers". For example, the data is input to a first layer that manipulates the data (based on the composition of the layer) and produces an output of one or more "features" (often grouped into "feature vectors"). obtain. The layer configuration includes several different "channels" and "filters" that are responsible for the lower level aspects of the data manipulation performed by the layer.

一例として、入力データは画像であり得て、第１の層は画像内のエッジを識別し、画像内で検出されたすべてのエッジ（「特徴」）を説明するデータのリスト（「ベクトル」）を生成し得る。第１の層によって生成された特徴は、次いで、第２の層に送られ、第２の層はさらに操作を実行し、第２の特徴ベクトルが生成される。以下同様である。最終結果として、最後に「出力」される。本明細書で使用される「出力」は、特に明記されていない限り、モデルの最終出力／決定／分類を指す。様々な層のレイアウト／構成および順序は、一般にモデルの「構造」と呼ばれる。 As an example, the input data can be an image, the first layer identifies the edges in the image and a list of data (“vectors”) that describes all the edges (“features”) detected in the image. Can be generated. The features produced by the first layer are then sent to the second layer, where the second layer performs further operations to generate a second feature vector. The same applies hereinafter. The final result is "output" at the end. As used herein, "output" refers to the final output / determination / classification of a model, unless otherwise stated. The layout / configuration and order of the various layers is commonly referred to as the "structure" of the model.

機械学習モデルは、多くの場合「トレーニング」され、１つまたは複数の層は、それらの入力を異なる方法で変更するように調整される。簡単な例として、第１の層は初期に入力を２倍にし得るが、トレーニングプロセス中には、入力を２倍ではなく１．５倍するように第１の層を調整し得る。もちろん、より複雑な層では、より微妙な方法で調整できる。 Machine learning models are often "trained" and one or more layers are tuned to change their inputs in different ways. As a simple example, the first layer may initially double the input, but during the training process the first layer may be adjusted to multiply the input by 1.5 instead of doubling. Of course, more complex layers can be adjusted in more subtle ways.

本開示のシステムおよび方法に従ってトレーニングおよび／または最適化された機械学習モデルは、１つまたは複数の性能要件を満たすように最適化し得る。本明細書で使用される性能要件は、リソースベースの要件（最大メモリフットプリント、最大ＣＰＵ使用率、最大推定消費電力など）、出力ベースの要件（精度評価、推論速度など）、またはその２つの組み合わせを含み得る。 Machine learning models trained and / or optimized according to the systems and methods of the present disclosure may be optimized to meet one or more performance requirements. The performance requirements used herein are resource-based requirements (maximum memory footprint, maximum CPU utilization, maximum estimated power consumption, etc.), output-based requirements (accuracy evaluation, inference speed, etc.), or two of them. May include combinations.

多くの分野で、十分にトレーニングされ、精査されたモデルであっても、時間の経過とともに精度が低下し得る。モデルは時間の経過とともに学習するように構成できるが（場合によっては、誤ってモデルの劣化を招き得る）、精度が低下するもう１つの一般的な原因は、モデルへの入力データの変更であり得る。一例として、患者の脳波（ＥＥＧ）データは、患者の状態の変化に関係なく、時間の経過とともに自然に変化し得る。今日記録された健康な患者のＥＥＧは、より短い期間にわたって比較的一貫しているとしても、数年（または数ヶ月）前の同一の健康な患者のＥＥＧとは大きく異なり得る。特に、これは（必然的に）患者自身の生理機能以外の要因によって引き起こされるものではなく、脳波の放出パターンは、時間の経過とともに単純に「ドリフト」する可能性がある。様々な問題（発作など）のパターンは依然として比較的一貫している可能性があるが、「バックグラウンド」または「ベースライン」状態は、そのような重要なパターンの存在（または不在）を検出するためにモデルを解析できなくなる可能性があるポイントまで大きく変化する可能性がある。したがって、患者のＥＥＧを分類する際に当初は正確であったモデルは、最終的には陳腐化し得る。モデルは時間をかけて再トレーニングまたは改善できるが、既知のデータセットは、特に狭い分野（神経学など）では比較的少なくなる可能性がある。 In many areas, even a well-trained and scrutinized model can become less accurate over time. The model can be configured to learn over time (which can inadvertently lead to model degradation), but another common cause of poor accuracy is changes in the input data to the model. obtain. As an example, a patient's electroencephalogram (EEG) data can change spontaneously over time, regardless of changes in the patient's condition. The EEG of a healthy patient recorded today can be significantly different from the EEG of the same healthy patient years (or months) ago, even if it is relatively consistent over a shorter period of time. In particular, this is not (necessarily) caused by factors other than the patient's own physiology, and the EEG emission pattern can simply "drift" over time. Patterns of various problems (such as seizures) can still be relatively consistent, but "background" or "baseline" states detect the presence (or absence) of such significant patterns. Therefore, it may change significantly to the point where it may not be possible to analyze the model. Therefore, a model that was initially accurate in classifying a patient's EEG can eventually become obsolete. Models can be retrained or improved over time, but known datasets can be relatively small, especially in narrow areas (such as neurology).

さらに、バックグラウンド／ベースラインパターンもしくはノイズまたはその組み合わせは患者間で大きく異なる可能性があるため、既知のデータセットでは、特定の患者のモデルをトレーニング（または再トレーニング）するのに十分でない場合がある。当技術分野における１つの解決策は、患者からのデータセットを定期的に記録し、専門家（医療提供者／専門医など）に記録されたデータセットを評価させ、そのデータセットに手動で注釈付け／ラベル付けを行い、その患者固有のデータセットをモデルの再トレーニングに利用することである。しかし、これは、患者および専門家の両方にとって、法外に時間のかかる、不便な、および／または費用がかかるものに可能性がある。さらに、これは、患者のデータの機密にかかわる／個人を特定できる性質に関する懸念を引き起こし得る。 In addition, background / baseline patterns or noise or combinations thereof can vary widely between patients, so known datasets may not be sufficient to train (or retrain) a particular patient's model. be. One solution in the art is to record datasets from patients on a regular basis, have specialists (health care providers / specialists, etc.) evaluate the recorded datasets and manually annotate the datasets. / Labeling and using the patient-specific dataset for model retraining. However, this can be exorbitantly time consuming, inconvenient, and / or costly for both patients and professionals. In addition, this can raise concerns about the confidentiality / personally identifiable nature of the patient's data.

本開示に記載されるシステムおよび方法は、観察された（すなわち、最初はラベル付けされていない）データに基づく機械学習モデルの精度をさらに向上させることを可能にし、観察データに手動で注釈を付ける必要なく、モデルの性能を有利に向上させることを可能にする。これにより、患者の生理機能が変化し、新しいパターン（すなわち、モデルの初期トレーニングデータセットまたは、場合によっては、既知のデータセットに表されていないパターン）が観察された場合でも、時間の経過とともにモデルが徐々に成長して、より堅牢になり得る。 The systems and methods described in this disclosure make it possible to further improve the accuracy of machine learning models based on observed (ie, initially unlabeled) data and manually annotate the observed data. It makes it possible to advantageously improve the performance of the model without the need. This changes the patient's physiology and over time, even if new patterns (ie, patterns not represented in the model's initial training dataset or, in some cases, known datasets) are observed. The model can grow over time and become more robust.

図１は、本開示のいくつかの実施形態と一致する、高レベルの自動化された複合知識蒸留法１００である。方法１００は、デバイス固有の性能要件に関して機械学習モデルのトレーニングおよび最適化を有利に可能にする。 FIG. 1 is a high level automated composite knowledge distillation method 100 consistent with some embodiments of the present disclosure. Method 100 advantageously enables training and optimization of machine learning models with respect to device-specific performance requirements.

方法１００は、動作１０２で性能要件もしくはハードウェア仕様またはその両方を収集することを備える。動作１０２は、例えば、モバイルデバイスなどのデバイスから、デバイスのハードウェア（利用可能なメモリ、プロセッサの製造元／モデルなど）を説明するデータを受信することを含み得る。このデータから性能要件（特に、最大メモリフットプリントなどのリソースベースの要件）を収集できる。いくつかの実施形態では、動作１０２は、ハードウェア仕様外の出力ベースの要件を受信することを含み得る。例えば、動作１０２は、ユーザまたはデバイスから、デバイスが通常使用される状況に基づいて、最小精度評価、最小推論速度などを受信することを含み得る。 Method 100 comprises collecting performance requirements and / or hardware specifications in operation 102. Operation 102 may include receiving data from a device, such as a mobile device, that describes the hardware of the device (available memory, processor manufacturer / model, etc.). Performance requirements, especially resource-based requirements such as maximum memory footprint, can be collected from this data. In some embodiments, operation 102 may include receiving output-based requirements outside the hardware specifications. For example, the operation 102 may include receiving a minimum accuracy rating, a minimum inference speed, etc. from the user or device based on the circumstances in which the device is normally used.

方法１００は、動作１０４でモデルを取得することをさらに備える。動作１０４は、例えば、正確であることが知られているモデルのリポジトリから機械学習モデルをダウンロードすることを含み得る。動作１０４で取得されるモデルは、複数の異なるタイプのモデル（例えば、ＣＮＮ、ＲＮＮなど）のうちの１つであり得る。特に、取得されるモデルは、方法１００を介して作成および最適化されているモデルとは異なる目的のためのものであり得る。一例として、方法１００は、手首に取り付けられたコンピューティングデバイスへの展開を目的とした発作検出モデルをトレーニングおよび最適化するために実行され得る。それにもかかわらず、動作１０４は、入力画像に基づいて画像認識を実行するように以前にトレーニングされている確立されたＣＮＮ（例えば、ＥＦＦＩＣＩＥＮＴＮＥＴモデルなど）を選択することを含み得る。これが可能なのは、方法１００の目的のために、選択されたモデルの構造（例えば、個々の層の組成）が、選択されたモデルの実際の本来の目的（例えば、顔認識、チェスのプレイなど）よりも重要であることが多いためである。動作１０４でアクセスされるリポジトリは、一般に利用可能であり得たり、クラウドデータベースに格納され得たり、ローカルの長期記憶に格納され得たりなどであり得る。 Method 100 further comprises acquiring a model in motion 104. Operation 104 may include, for example, downloading a machine learning model from a repository of models known to be accurate. The model acquired in operation 104 can be one of a plurality of different types of models (eg, CNN, RNN, etc.). In particular, the model obtained may be for a different purpose than the model created and optimized via method 100. As an example, method 100 can be performed to train and optimize seizure detection models intended for deployment to wrist-mounted computing devices. Nevertheless, operation 104 may include selecting an established CNN (eg, an EFFICIENTNET model) that has been previously trained to perform image recognition based on the input image. This is possible because of the purpose of Method 100, the structure of the selected model (eg, the composition of the individual layers) is the actual original purpose of the selected model (eg, face recognition, playing chess, etc.). This is because it is often more important than. The repository accessed in operation 104 may be generally available, stored in a cloud database, stored in local long-term memory, and so on.

方法１００は、動作１０６での性能要件に基づいて、選択された機械学習モデルを剪定することをさらに備える。動作１０６は、例えば、選択されたモデルから１つまたは複数の層を削除することを含み得る。いくつかの実施形態では、動作１０６は、選択されたモデルから１つまたは複数のブロック（層の連続グループ）を削除することを含み得る。動作１０６で削除するために選択された層もしくはブロックまたはその両方は、場合によっては、他の選択された層に隣接する（または、いくつかの実施形態では、隣接しない）層を選択するなどのいくつかの制約を伴って、ランダムに選択され得る。いくつかの実施形態では、層は、動作１０２で受信された性能要件に（部分的に）基づいて削除され得る。動作１０６は、選択されたモデル内の層の数を決定し、その数が閾値を下回るまで層を削除することを含み得る。モデルに含まれる層の総数（モデルの「層の数」）は、モデルのリソース要件の主要な要素である（他の要素は、各層で使用されるフィルタのサイズ、各層の出力の解像度／次元などを含み得る）。例えば、３００の層を有するモデルは、一般に、１００の層を有するモデルよりも大幅に多くのメモリおよびＣＰＵのオーバーヘッドを消費する。このことを考慮して、閾値は、例えば、デバイスのハードウェアの最大メモリもしくはＣＰＵモデルまたはその両方に基づいて判断され得る。閾値は、モデルの特定の特性（例えば、層の最大数、層ごとのチャネルの最大数など）を制限し得る。一例として、デバイスは、５１２メガバイト（ＭＢ）のメモリ（ランダムアクセスメモリ（ＲＡＭ）など）および１２０メガヘルツ（ＭＨｚ）で動作するＣＰＵを有し得る。これらのハードウェア仕様は、閾値を生成する際に考慮され得る。簡単な例として、デバイスに展開されるモデルの最大層数は、ｒ／１０層（ここで、ｒはデバイスの合計メモリ（ＭＢ単位）である）またはｃ／２層（ここで、ｃはデバイスのＣＰＵの周波数（ＭＨｚ単位）である）のいずれか低い方（最も近い整数に切り捨てられる）になる。したがって、上述したデバイス例に展開されるモデルの最大層数は５１層（５１２／１０＜１２０／２）であり得る。これらの指標は、必ずしもモデルのリソース要件（例えば、３層モデルは、必ずしも３０ＭＢのＲＡＭを必要とし得ない、または６層モデルの半分ほどのリソースを消費し得る、など）ではないことに留意されたい。いくつかの実施形態では、閾値は、モデルが消費し得るメモリの最大量、ＣＰＵオーバーヘッドの最大量などを表し得る。 Method 100 further comprises pruning the selected machine learning model based on the performance requirements in motion 106. Operation 106 may include, for example, removing one or more layers from the selected model. In some embodiments, operation 106 may include removing one or more blocks (consecutive groups of layers) from the selected model. The layer and / or block selected to be deleted in operation 106 may, in some cases, select a layer adjacent to (or, in some embodiments, not) adjacent to another selected layer, and the like. It can be randomly selected with some restrictions. In some embodiments, the layer may be removed (partially) based on the performance requirements received in operation 102. Action 106 may include determining the number of layers in the selected model and removing the layers until the number falls below the threshold. The total number of layers in the model (the "number of layers" in the model) is a key factor in the model's resource requirements (other factors are the size of the filter used in each layer, the resolution / dimension of the output of each layer). Etc.). For example, a model with 300 layers generally consumes significantly more memory and CPU overhead than a model with 100 layers. With this in mind, the threshold can be determined, for example, based on the device's hardware maximum memory and / or CPU model. Thresholds can limit certain characteristics of the model, such as the maximum number of layers, the maximum number of channels per layer, and so on. As an example, the device may have 512 megabytes (MB) of memory (such as random access memory (RAM)) and a CPU operating at 120 megahertz (MHz). These hardware specifications can be taken into account when generating thresholds. As a simple example, the maximum number of layers in a model deployed on a device is r / 10 layers (where r is the total memory of the device (in MB)) or c / 2 layers (where c is the device). Whichever is lower (rounded down to the nearest integer) of the CPU frequency (in MHz). Therefore, the maximum number of layers of the model developed in the above-mentioned device example may be 51 layers (512/10 <120/2). It should be noted that these indicators are not necessarily model resource requirements (eg, a 3-tier model may not necessarily require 30 MB of RAM, or may consume as much as half the resources of a 6-tier model). sea bream. In some embodiments, the threshold may represent the maximum amount of memory the model can consume, the maximum amount of CPU overhead, and so on.

剪定され選択されたモデルは、本明細書では「学生モデル」と呼ばれる。特に、いくつかの実施形態では、学生モデルの残りの層（すなわち、動作１０６の一部として削除されなかった層）は、比較的変更されないままであり得る。例えば、層の内部構成（個々の「ニューロン」など）は影響を受けないままであり得て、学生モデルは元の選択されたモデルの構造の一部を保持できる。しかし、削除された層と残りの層との間の接続が切断され、その結果、学生モデルの精度が低下し得るか、または完全に動作できなくなり得る。いくつかの実施形態では、層を剪定するのではなく（またはそれに加えて）、１つまたは複数の層内のチャネルもしくはフィルタまたはその両方を削除して、層の全体的な複雑さを低減し得る。 The pruned and selected model is referred to herein as the "student model". In particular, in some embodiments, the remaining layers of the student model (ie, the layers that were not removed as part of movement 106) may remain relatively unchanged. For example, the internal composition of the layer (such as individual "neurons") can remain unaffected and the student model can retain some of the structure of the original selected model. However, the connection between the deleted layer and the rest of the layers may be broken, resulting in inaccuracies or complete inoperability of the student model. In some embodiments, rather than pruning the layer (or in addition), the channels and / or filters within one or more layers are removed to reduce the overall complexity of the layer. obtain.

例示的な例として、モデルは６つの層（「層１～６」）を有し得て、各層の出力は次の層の入力に供給される（例えば、層１～５のそれぞれは次の層に「接続」される）。言い換えれば、層１は層２に接続され（つまり、層１の出力が層２の入力として使用される）、層２は層３に接続される（つまり、層２の出力は層３の入力として使用される）。以下同様である。層自体は一般的に同一ではなく、各層は、個々の出力を生成するために、それぞれの入力を異なる方法で操作する。層自体は、モデルをトレーニングする過程で時間の経過とともに展開される（動作１０４で取得されたモデルはすでにトレーニングされていることに留意されたい）。例えば、層１は、入力データが画像を含む場合、出力特徴ベクトルが画像のエッジを表すように、入力データを操作して、出力特徴ベクトルを生成し得る。層２は、入力特徴ベクトルが画像のエッジを表す場合、出力特徴ベクトルが画像内で認識された形状を表すように、入力特徴ベクトルを操作して、出力特徴ベクトルを生成し得る。以下同様である。したがって、動作１０６が層１の削除を含む場合、入力データは、層２に直接供給され得る。したがって、層２への入力は、画像のエッジを表す特徴ベクトルではなくなる（代わりに、入力は画像自体になる）。しかし、層２自体は変更されていないため、層２は以前と同じ操作を入力データに対して実行する。したがって、層２の出力特徴ベクトルは、認識された画像の形状を正確に表す可能性が低くなる。それどころか、層２の出力特徴ベクトルは一般的に無関係な「ごみデータ」であり得る。層３～層６も同様に変更されていないため、この不一致はモデル全体にカスケード効果をもたらし得る。結果として、動作１０６の削除により、モデルの最終出力に精度のペナルティが生じ得る。特に、一部の層は、入力にわずかな変更を加えるだけの場合がある。したがって、削除された層に応じて、全体的な精度のペナルティは、比較的小さなものから、モデルを機能的に動作不能にするほど重大なものまでの幅があり得る。 As an exemplary example, the model may have six layers (“Layer 1-6”), and the output of each layer is fed to the input of the next layer (eg, each of Layers 1-5 is: "Connected" to the layer). In other words, layer 1 is connected to layer 2 (ie, the output of layer 1 is used as the input of layer 2) and layer 2 is connected to layer 3 (ie, the output of layer 2 is the input of layer 3). Used as). The same applies hereinafter. The layers themselves are generally not the same, and each layer manipulates its inputs differently to produce individual outputs. The layer itself evolves over time in the process of training the model (note that the model acquired in motion 104 has already been trained). For example, layer 1 may manipulate the input data to generate an output feature vector if the input data includes an image so that the output feature vector represents an edge of the image. Layer 2 may manipulate the input feature vector to generate the output feature vector so that if the input feature vector represents the edge of the image, the output feature vector represents the recognized shape in the image. The same applies hereinafter. Therefore, if the operation 106 involves the deletion of layer 1, the input data may be supplied directly to layer 2. Therefore, the input to layer 2 is no longer a feature vector representing the edges of the image (instead, the input is the image itself). However, since layer 2 itself has not changed, layer 2 performs the same operation on the input data as before. Therefore, the output feature vector of layer 2 is less likely to accurately represent the shape of the recognized image. On the contrary, the output feature vector of layer 2 can be generally irrelevant "garbage data". Since layers 3 to 6 have not changed as well, this discrepancy can result in a cascading effect throughout the model. As a result, the removal of motion 106 can result in an accuracy penalty on the final output of the model. In particular, some layers may make only minor changes to the input. Therefore, depending on the removed layer, the overall accuracy penalty can range from relatively small to severe enough to make the model functionally inoperable.

剪定することにより、モデル全体に重大な精度のペナルティが生じ得るが、モデルの構造はほとんど元の状態のままであり得る。言い換えると、モデルは、削除された層によって残された「ギャップを埋める」ためのいくつかのトレーニングによって、モデルの機能が回復され得る状態にあり得る。上述した例を続けると、層３～６は以前と同じように機能し得るため、層２を調整して、入力画像を画像内の認識された形状を表す出力特徴ベクトルに変換し得れば、モデルは（比較的）正確な結果の生成するようになり得る。 Pruning can result in significant accuracy penalties for the entire model, but the structure of the model can remain largely intact. In other words, the model may be in a state where the function of the model can be restored by some training to "fill the gap" left by the deleted layer. Continuing with the above example, layers 3-6 can function as before, so if layer 2 can be adjusted to convert the input image into an output feature vector representing the recognized shape in the image. , The model can come to produce (relatively) accurate results.

方法１００は、動作１０８で学生モデルをトレーニングすることをさらに備える。動作１０８は、例えば、既知のトレーニングデータを学生モデルに入力することと、学生モデルから出力を受信することと、出力を既知の結果と比較することと、比較に基づいて学生モデルの１つまたは複数の層の調整を行うこととを含み得る。動作１０８は、トレーニングデータを１つまたは複数の「教師」モデル（他の確立された／既知の良好なモデル）に入力することをさらに含み得る。動作１０８は、リポジトリから教師モデルを選択することを含み得て、例えば、教師モデルは、動作１０４でモデルが選択されたリポジトリ内の他のモデルであり得る。場合によっては、動作１０８は、所与のリポジトリ内のすべての教師モデルを利用し得る。場合によっては、動作１０８は、いくつかの教師モデルを（例えば、ランダムに）選択し得る。一例として、動作１０４は、１５個のモデルを有するリポジトリからランダムにモデルを選択することを含み得て、動作１０８は、残りの１４個のモデルの一部または全部を利用することを含み得る。場合によっては、動作１０４の一部として選択されたモデルは、動作１０８で教師モデルとしても選択され得る。 Method 100 further comprises training a student model with motion 108. The operation 108 is, for example, inputting known training data into the student model, receiving an output from the student model, comparing the output with a known result, and one of the student models based on the comparison. It may include adjusting multiple layers. Motion 108 may further include inputting training data into one or more "teacher" models (other established / known good models). The operation 108 may include selecting a teacher model from the repository, for example, the teacher model may be another model in the repository for which the model was selected in operation 104. In some cases, operation 108 may utilize all teacher models in a given repository. In some cases, motion 108 may select several teacher models (eg, randomly). As an example, motion 104 may include randomly selecting a model from a repository having 15 models, and motion 108 may include utilizing some or all of the remaining 14 models. In some cases, the model selected as part of motion 104 may also be selected as a teacher model in motion 108.

いくつかの実施形態では、動作１０８の一部として学生モデルの層に行われる調整は、教師モデルからの出力と学生モデルの出力との間の比較にさらに基づき得る。一般に、学生モデルの出力（「学生出力」）と教師モデルの出力（「教師出力」）が比較され、トレーニングプロセス中のそれらの違いが最小限に抑えられる。例えば、学生出力が教師出力と大幅に異なる場合、出力が予想される出力と比較的類似していても、学生モデルの層は調整され得る（例えば、「３５」の学生出力は「３６」の予想される出力と十分に類似し得るが、教師出力が「３７」の場合でも、学生モデルの層は調整され得る）。 In some embodiments, the adjustments made to the student model layer as part of motion 108 may be further based on a comparison between the output from the teacher model and the output from the student model. In general, the output of the student model (“student output”) and the output of the teacher model (“teacher output”) are compared to minimize their differences during the training process. For example, if the student output is significantly different from the teacher output, the layer of the student model can be adjusted even if the output is relatively similar to the expected output (eg, the student output of "35" is "36". It can be quite similar to the expected output, but even if the teacher output is "37", the layer of the student model can be adjusted).

いくつかの実施形態では、学生モデルと教師モデルの出力を比較するのではなく（またはそれに加えて）、動作１０８は、学生モデルと教師モデルの「中間」層によって出力される特徴を比較することを含み得る。いくつかの実施形態では、比較のために選択された学生モデル層は、ランダムに選択され得る。いくつかの実施形態では、剪定に基づいて層を選択し得て、例えば、最後に調整された（または削除された）層の後に、第１の調整されていない層を選択し得る。一例として、動作１０４で選択されたモデルは、４つの層Ａ、Ｂ、ＣおよびＤを有し得て、動作１０６は、層Ｂを削除することを含み得る。そのような例では、動作１０８は、学生の層Ｃによって出力された特徴ベクトルを、４層の教師モデルの層Ｃによって出力された特徴ベクトルと比較することを含み得る。追加の例として、第１の調整された（または削除された）層の前に、最後の調整されていない層を選択し得る（例えば、学生の層Ａの出力は、教師の層Ｂの出力と比較され得る）。動作１０４で選択されたモデルとは異なる層数を有する教師モデルの場合、および動作１０６で剪定した後に、モデル内のそれらの順序に基づいて出力比較のために層を選択し得る（例えば、各モデルの最後から２番目の層によって出力された特徴ベクトルを比較のために選択し得る）。これは、（動作１０６の剪定のために）実質的に単純であるにもかかわらず、教師モデルと同様の方法で実行するように学生モデルをトレーニングすることを有利に可能にし得る。 In some embodiments, rather than comparing (or in addition to) the output of the student and teacher models, motion 108 compares the features output by the "middle" layer of the student and teacher models. May include. In some embodiments, the student model layer selected for comparison may be randomly selected. In some embodiments, layers may be selected based on pruning, for example, after the last adjusted (or deleted) layer, a first unadjusted layer may be selected. As an example, the model selected in operation 104 may have four layers A, B, C and D, and operation 106 may include removing layer B. In such an example, motion 108 may include comparing the feature vector output by layer C of the student with the feature vector output by layer C of the four-layer teacher model. As an additional example, the last unadjusted layer may be selected before the first adjusted (or deleted) layer (eg, the output of student layer A is the output of teacher layer B). Can be compared with). For teacher models with a different number of layers than the model selected in motion 104, and after pruning in motion 106, layers may be selected for output comparison based on their order in the model (eg, each). The feature vector output by the penultimate layer of the model can be selected for comparison). This may advantageously allow the student model to be trained to perform in a manner similar to the teacher model, albeit substantially simple (due to the pruning of motion 106).

方法１００は、動作１１０で剪定／トレーニングされる学生モデルの性能を評価することをさらに備える。動作１１０は、例えば、異なる既知のデータ（例えば、トレーニングデータとは異なる第２の既知のデータのセット）を学生モデルに入力し、出力を判断する際の学生モデルの性能を分析することを含み得る。一例として、動作１１０は、学生モデルの精度、推論速度（学生モデルが出力を判断するのにかかった時間）、メモリフットプリント（学生モデルに必要なメモリ量）、ピークＣＰＵ使用量、推定電力消費量などを判断することを含み得る。 Method 100 further comprises assessing the performance of the student model pruned / trained at motion 110. The operation 110 includes, for example, inputting different known data (eg, a second set of known data different from the training data) into the student model and analyzing the performance of the student model in determining the output. obtain. As an example, the operation 110 is the accuracy of the student model, the inference speed (the time it takes for the student model to judge the output), the memory footprint (the amount of memory required for the student model), the peak CPU usage, and the estimated power consumption. It may include determining the amount and the like.

方法１００は、トレーニングされた学生モデルが、動作１１２での性能要件を満たしているかどうかを判断することをさらに備える。動作１１２は、例えば、動作１１０で判断された学生モデルの性能を、動作１０２で収集された性能要件と比較することを含み得る。一例として、動作１１２は、学生モデルのメモリ使用量がデバイスハードウェアによって示される最大閾値を下回っているかどうか、学生モデルの推論速度がユーザによって設定された最小値を上回っているかどうか、などを判断することを含み得る。 Method 100 further comprises determining if the trained student model meets the performance requirements for motion 112. Motion 112 may include, for example, comparing the performance of the student model as determined by motion 110 with the performance requirements collected in motion 102. As an example, operation 112 determines whether the student model's memory usage is below the maximum threshold indicated by the device hardware, whether the student model's inference speed is above the user-set minimum, and so on. May include doing.

学生モデルが１つまたは複数の性能要件を満たさない場合（１１２「いいえ」）、方法１００は、新しい学生モデルの生成に進み、動作１０５でモデルを取得することから始めて、再試行する。動作１０５は、動作１０４と実質的に同様の方法で実行され得る。特に、動作１０５で選択されたモデルは、動作１０４（または動作１０５の前の反復）で選択された同じモデルまたは「新しい」モデル（例えば、方法１００によってまだ剪定／トレーニング／評価されていないモデル）であり得る。動作１０５は、例えば、リポジトリまたは他の外部ソース（場合によっては、動作１０４でアクセスされるソースを含む）からモデルを選択することを含み得る。いくつかの実施形態では、動作１０４で選択されたモデルのコピーがキャッシュされ得て、したがって、動作１０５は、コピーをロードすることを含み得る。いくつかの実施形態では、動作１０５で選択されたモデルは、動作１１２の比較に基づき得て、例えば、学生モデルが性能要件よりも有意に劣る場合、動作１０５は、以前に使用された同じモデルではなく、新しいモデルを選択することを含み得る。いくつかの実施形態では、傾向を追跡し得る。例えば、特定のモデルから生成された学生モデルが、剪定／トレーニングに関係なく一貫して劣る場合、特定のモデルは、動作１０５の後の反復で選択から除外され得る。 If the student model does not meet one or more performance requirements (112 "no"), method 100 proceeds to generate a new student model, starting with acquiring the model in motion 105 and retrying. The operation 105 can be performed in substantially the same manner as the operation 104. In particular, the model selected in motion 105 is the same model or "new" model selected in motion 104 (or the iteration prior to motion 105) (eg, a model that has not yet been pruned / trained / evaluated by method 100). Can be. Operation 105 may include, for example, selecting a model from a repository or other external source, optionally including the source accessed by operation 104. In some embodiments, a copy of the model selected in operation 104 may be cached, and therefore operation 105 may include loading a copy. In some embodiments, the model selected in motion 105 is obtained based on a comparison of motion 112, eg, if the student model is significantly inferior to the performance requirements, motion 105 is the same model previously used. It may include selecting a new model instead. In some embodiments, trends can be tracked. For example, if the student model generated from a particular model is consistently inferior regardless of pruning / training, the particular model may be excluded from selection in the iterations after motion 105.

動作１０５でモデルが選択されると、次いで、方法１００は、動作１０６～１１０でのモデルの剪定、トレーニング、評価に戻り、新しい学生モデルを生成する。いくつかの実施形態では、動作１０５は、利用可能なすべてのモデルが「試行」されるまで、反復ごとに新しいモデルを選択し得る。場合によっては、新しい学生モデルは、同じモデルを選択し、より多くの層／より少ない層／異なる層を削除するなど、異なる方法で剪定することによって生成され得るが、それ以外の場合は同じ方法でトレーニングされる。剪定のみが異なる方法で実行されたとしても、やはり、これはトレーニングされた異なる学生モデルになる。したがって、異なる学生モデルが動作１１０で評価されるとき、前のモデルよりも性能が良い（または劣る）と判断され得る。場合によっては、新しい学生モデルは、同じモデルを選択し、同じ方法（例えば、動作１０６の前の反復で削除されたものと同じ層／ブロックを動作１０６で削除する）で剪定することによって生成され得るが、動作１０８の前の反復で使用されたものとは異なるデータセットを使用するなど、動作１０８で異なる方法でトレーニングされる。場合によっては、新しい学生モデルは、別のモデルを選択することによって生成され得るが、それ以外の場合は、同じ層を剪定し、同じ方法でトレーニングされ得る。したがって、新しい学生モデルを生成するための複数の異なる手法が実装され得て、検討される。 Once the model is selected in motion 105, method 100 then returns to pruning, training, and evaluation of the model in motions 106-110 to generate a new student model. In some embodiments, motion 105 may select a new model at each iteration until all available models have been "tried". In some cases, a new student model can be generated by selecting the same model and pruning in different ways, such as removing more layers / less layers / different layers, but otherwise the same method. Trained at. Even if only pruning was performed in different ways, this would still be a different trained student model. Therefore, when different student models are evaluated in motion 110, they may be judged to perform better (or inferior) than the previous model. In some cases, a new student model is generated by selecting the same model and pruning in the same way (eg, deleting the same layers / blocks deleted in the previous iteration of operation 106 in operation 106). You get, but you are trained differently in motion 108, such as using a different dataset than that used in the previous iteration of motion 108. In some cases, a new student model can be generated by choosing another model, but in other cases, the same layer can be pruned and trained in the same way. Therefore, several different methods for generating new student models can be implemented and considered.

学生モデルがすべての性能要件を満たしている場合（１１２「はい」）、方法１００は、動作１１４でモデルの性能が以前のモデルよりも劣るかどうかを判断することをさらに備える。動作１１４は、例えば、現在検討されている学生モデル（「現在の」学生モデル）の性能を、以前の学生モデルのセットの性能と比較することを含み得る。現在の学生モデルの性能は、セット内の以前の各学生モデルの性能、またはセット内の以前の各学生モデルの性能の平均または最大と比較し得る。実施形態／ユースケースに応じて、以前の学生モデルのセットは、動作１１２の性能要件を満たさなかった学生モデルを除外し得る。実施形態／ユースケースに応じて、以前の学生モデルのセットは、すべての以前の学生モデルのみ、または一部のみ（例えば、直前の学生モデルのみ、最後の３つの学生モデルなど）であり得る。 If the student model meets all performance requirements (112 "yes"), method 100 further comprises determining in motion 114 whether the model's performance is inferior to the previous model. Motion 114 may include, for example, comparing the performance of the currently studied student model (the "current" student model) with the performance of a previous set of student models. The performance of the current student model can be compared to the performance of each previous student model in the set, or the average or maximum of the performance of each previous student model in the set. Depending on the embodiment / use case, the previous set of student models may exclude student models that do not meet the performance requirements of motion 112. Depending on the embodiment / use case, the set of previous student models can be all previous student models only or only some (eg, previous student model only, last three student models, etc.).

例示的な例として、３５個の学生モデルが生成され、評価され得る（「モデル１」、「モデル２」などと呼ばれる）。モデル２、モデル３２、モデル３３、およびモデル３５（この例では、モデル３５は「現在の」学生モデル）はそれぞれ、すべての性能要件を満たし得て、モデル１、モデル３～３１、およびモデル３４それぞれがすべての性能要件を満たし得ない。したがって、動作１１４は、モデル３５の性能を、ユースケースもしくは実施形態またはその両方に応じて、モデル３４、モデル３２～３４、モデル３３、モデル２、３２、および３３、モデル１～３４、リストされたグループのいずれかの平均および／または最大などの性能と比較することを含み得る。 As an exemplary example, 35 student models can be generated and evaluated (referred to as "model 1", "model 2", etc.). Model 2, Model 32, Model 33, and Model 35 (in this example, Model 35 is the "current" student model) can meet all performance requirements, respectively, Model 1, Models 3-31, and Model 34. Each cannot meet all performance requirements. Thus, operation 114 lists the performance of model 35, model 34, models 32-34, model 33, models 2, 32, and 33, models 1-34, depending on the use case and / or embodiment. It may include comparing with performance such as average and / or maximum of any of the groups.

性能の比較は、「カテゴリ」ごとに整理され得て、例えば、動作１１４は、異なるモデルの消費されたメモリの量、推論速度、精度などを比較することを含み得る。実施形態／ユースケースに応じて、動作１１４は、現在の学生モデルの性能が、各性能カテゴリ、カテゴリの大部分、または１つまたは複数のカテゴリの特定のグループにおいて、以前の学生モデルのセットの性能よりも優れているかどうかを判断することを含み得る。例えば、現在の学生モデルの性能が、より高い精度、より遅い推論速度などを有していれば、その性能は「優れている」と見なされ得る。いくつかの実施形態では、各カテゴリの性能を使用して、全体的な性能評価（例えば、０から１までの数）を計算し得て、動作１１４は、様々なモデルの性能評価を比較することを含み得る。 Performance comparisons may be organized by "category", for example, operation 114 may include comparing the amount of memory consumed, inference speed, accuracy, etc. of different models. Depending on the embodiment / use case, the behavior 114 is a set of previous student models in which the performance of the current student model is in each performance category, most of the categories, or in a particular group of one or more categories. It may include determining if it is better than performance. For example, if the performance of the current student model has higher accuracy, slower inference speed, etc., the performance can be considered "excellent". In some embodiments, the performance of each category can be used to calculate an overall performance rating (eg, a number from 0 to 1), where operation 114 compares the performance ratings of the various models. Can include that.

現在の学生モデルの性能が以前のモデルのセットの性能よりも優れている場合（１１４「いいえ」）、方法１００は動作１０５に戻り、別の学生モデルを生成して評価する。このようにして、学生モデルは（モデルの選択、剪定、およびトレーニングを介して）継続的に生成され、性能が向上する限り評価される。現在の学生モデルが生成された第１の学生モデルである場合、動作１１４は既定で１１４「いいえ」になる。現在の学生モデルが以前のモデルよりも劣る場合（１１４「はい」）、ループを終了し得て、動作１１６で最高の性能を有するモデルを選択することによって方法１００を終了し得る。本明細書で使用される「最高の性能」は、１つまたは複数の評価されたカテゴリ（例えば、最速の推論速度、最高の精度、最低のリソース要件など）で最大の性能を有するモデルを指し得る。異なるモデルが異なるカテゴリで優れている場合、対応する最良のモデルが残りのカテゴリの最低限の性能要件を満たしている限り、一方のカテゴリを他方のカテゴリよりも優先し得る（例えば、精度）。「最高の性能」モデルを選択するための他の基準も当業者によって理解されよう。例えば、各カテゴリでの学生モデルの性能には、定量化可能な評価（例えば、０から１の範囲）を割り当て得る。性能要件は、同様の方法で評価によって表し得る。最低限必要な評価と各モデルの評価との差を判断し得て、合計の差が最も大きい（または平均の差が最も大きいなど）モデルを選択し得る。 If the performance of the current student model is superior to that of the previous set of models (114 "no"), method 100 returns to operation 105 to generate and evaluate another student model. In this way, student models are continuously generated (through model selection, pruning, and training) and evaluated as long as performance improves. If the current student model is the generated first student model, the action 114 defaults to 114 "no". If the current student model is inferior to the previous model (114 "yes"), the loop may be terminated and method 100 may be terminated by selecting the model with the best performance in motion 116. As used herein, "best performance" refers to the model with the highest performance in one or more evaluated categories (eg, fastest inference speed, highest accuracy, lowest resource requirements, etc.). obtain. If different models are superior in different categories, one category may take precedence over the other (eg, accuracy) as long as the corresponding best model meets the minimum performance requirements of the remaining categories. Other criteria for selecting the "best performance" model will also be understood by those of skill in the art. For example, the performance of the student model in each category may be assigned a quantifiable rating (eg, ranging from 0 to 1). Performance requirements can be expressed by evaluation in a similar manner. The difference between the minimum required evaluation and the evaluation of each model can be determined, and the model with the largest total difference (or the largest average difference, etc.) can be selected.

動作１０５～１１４をループすることにより、方法１００を実行するシステムが、デバイスのハードウェア仕様などの特定の性能要件に最適化された機械学習モデルを生成することが有利に可能になる。いくつかの実施形態では、方法１００を動作１１６に進める前に、ループを最小回数実行しなければならない。いくつかの実施形態では、例えば、現在の学生モデルが一連の性能目標（例えば、性能要件に基づくより高い閾値）を超えたかどうか、モデルの最大数が試行されたかどうか、または性能要件を超えたかどうかなど、ループを終了するための他の条件が存在し得る。様々なユースケースで様々な条件を利用し得る。 Looping operations 105-114 allows the system performing method 100 to advantageously generate machine learning models optimized for specific performance requirements such as device hardware specifications. In some embodiments, the loop must be executed a minimum number of times before the method 100 can proceed to operation 116. In some embodiments, for example, has the current student model exceeded a set of performance goals (eg, higher thresholds based on performance requirements), has the maximum number of models tried, or has exceeded the performance requirements? There may be other conditions for ending the loop, such as whether. Different conditions can be used in different use cases.

図２は、本開示のいくつかの実施形態と一致する、学生モデルを生成するためのモデルの選択および剪定の例示的なダイアグラム２００である。機械学習モデルリポジトリ２０２は、機械学習モデル２２０、２４０、２６０および２８０（総称して「モデル２２０～２８０」）のセットを含む。リポジトリ２０２は、例えば、精査されトレーニングされたモデルのオンラインデータベースを含み得る。モデル２２０～２８０はそれぞれ、異なる種類のモデルであり得る（例えば、モデル２２０は畳み込みニューラルネットワークであり得て、モデル２８０はリカレントニューラルネットワークであり得る、など）か、またはモデル２２０～２８０の一部（またはすべて）が同じ種類のモデルであり得る。モデル２２０～２８０は、画像認識、自然言語処理などの様々なタスクを実行するようにトレーニングされ得る。 FIG. 2 is an exemplary diagram 200 of model selection and pruning for generating student models, consistent with some embodiments of the present disclosure. The machine learning model repository 202 includes a set of machine learning models 220, 240, 260 and 280 (collectively "models 220-280"). Repository 202 may include, for example, an online database of scrutinized and trained models. Models 220-280 can each be of a different type (eg, model 220 can be a convolutional neural network, model 280 can be a recurrent neural network, etc.), or part of models 220-280. (Or all) can be models of the same type. Models 220-280 can be trained to perform various tasks such as image recognition, natural language processing and the like.

モデル２２０～２８０のそれぞれは、１つまたは複数の層を含む。モデル２２０は層２２２、２２４、および２２６を含み、モデル２４０は層２４２、２４４、および２４６を含み、モデル２６０は層２６２、２６４、および２６６を含み、モデル２８０は層２８２、２８４、および２８６を含む（総称して、層２２２～２８６）。層２２２～２８６は、データの入力を受信し、データを操作し、出力を生成するように構成される。各層への入力は、第１の層（すなわち、層２２２、２４２、２６２、または２８２）での初期入力データセット、または前の層から出力された特徴ベクトルのいずれかであり得る。 Each of the models 220-280 comprises one or more layers. Model 220 includes layers 222, 224, and 226, model 240 contains layers 242, 244, and 246, model 260 contains layers 262, 264, and 266, and model 280 contains layers 282, 284, and 286. Includes (collectively, layers 222-286). Layers 222-286 are configured to receive data inputs, manipulate the data, and generate outputs. The input to each layer can be either the initial input dataset at the first layer (ie, layers 222, 242, 262, or 282), or the feature vector output from the previous layer.

モデル２２０～２８０のうちの１つが、学生モデルとして使用するために選択され得る。図２に示す例では、破線のボックス２０１で示すように、モデル２４０が選択される。いくつかの実施形態では、精度：層数の比率、メモリフットプリント、推論速度などの「理想的な」初期モデルを選択することを試みるために、様々な選択基準が実装され得る。しかし、いくつかの実施形態では、ランダムな選択は、実装が比較的簡単で、実行が高速のため、有益であり得る。さらに、剪定およびトレーニングプロセスの予測できない影響により、ランダムな選択は依然として比較的効果的であり得る。モデル２４０の選択は、図１に関連して上記で論じた方法１００の動作１０４に関して説明したのと同様の方法で実行され得る。 One of models 220-280 may be selected for use as a student model. In the example shown in FIG. 2, the model 240 is selected, as shown by the dashed box 201. In some embodiments, various selection criteria may be implemented to attempt to select an "ideal" initial model such as accuracy: layer ratio, memory footprint, inference speed, etc. However, in some embodiments, random selection can be beneficial because it is relatively easy to implement and fast to execute. Moreover, due to the unpredictable effects of pruning and training processes, random selection can still be relatively effective. The selection of model 240 may be performed in a manner similar to that described for operation 104 of method 100 discussed above in connection with FIG.

モデル２４０が選択されると、剪定され、図２に示す例では、モデル２４０の層２４４が削除され、結果として学生モデル２４１は層２４２および層２４６のみを有する。これにより、結果として、学生モデル２４１は、「元の」モデル２４０と比較した場合、より小さなサイズおよびリソース要件を有し得る。削除する層の選択は、図１に関連して上記で論じた方法１００の動作１０６と同様の方法で実行され得る。削除された特定の層は、ランダムに選択され得るか、１つまたは複数の基準に基づいて選択され得る。例えば、１つまたは複数の連続ブロック（層の総数の１０％の最小サイズを有する）がランダムに選択され得る、中央の２０層が選択され得る、などである。削除される層の数もランダムに判断し得るが、性能要件に応じて制限を設定し得る。例えば、モバイルデバイスに実装される学生モデルの基礎として選択された１５０層モデルでは、１００層が削除され得て、削除された正確な層はランダムに選択され得る。 When model 240 is selected, it is pruned and in the example shown in FIG. 2, layer 244 of model 240 is removed, resulting in student model 241 having only layer 242 and layer 246. As a result, the student model 241 may have smaller size and resource requirements when compared to the "original" model 240. The selection of layers to be deleted can be performed in a manner similar to operation 106 of method 100 discussed above in connection with FIG. The specific layer deleted can be randomly selected or selected based on one or more criteria. For example, one or more contiguous blocks (having a minimum size of 10% of the total number of layers) may be randomly selected, the central 20 layers may be selected, and so on. The number of layers to be deleted can also be determined randomly, but limits can be set according to performance requirements. For example, in a 150-layer model selected as the basis for a student model implemented on a mobile device, 100 layers can be deleted and the exact layer deleted can be randomly selected.

モデル２２０～２８０は、３層を大幅に超える可能性があり、機械学習モデルは、一般的に数百の層を有することに留意することが重要である。ただし、モデル２２０～２８０は、説明を簡単にするために、それぞれ３層しかないように示される。同様に、学生モデル２４１の層はモデル２４０よりも少ないが、学生モデル２４１の層は２層よりも大幅に多くなり得る（場合によっては数百にもなる）。一方、学生モデル２４１は、モデル２２０、２４０、２６０、および２８０と比較して単一の層のみが欠落しているように示されているが、学生モデル２４１は、モデル２２０～２８０のいずれよりも大幅に少ない層を含み得る。さらに、モデル２２０～２８０は必ずしも同じ構造を有しているとは限らない場合があり、例示的な例として、それぞれ３層を有する４つのモデルおよび２層を有する学生モデルではなく、いくつかのユースケースでは、それぞれが異なる数の層を有する、１５個の教師モデル（ただし、１５個すべてが少なくとも２００層を有する）および３０層を有する学生モデルを含み得る。 It is important to note that models 220-280 can significantly exceed three layers, and machine learning models typically have hundreds of layers. However, models 220-280 are shown to have only three layers each for simplicity of explanation. Similarly, the student model 241 has fewer layers than the model 240, but the student model 241 can have significantly more (and possibly hundreds) of layers than the two. On the other hand, the student model 241 is shown to be missing only a single layer compared to the models 220, 240, 260, and 280, whereas the student model 241 is from any of the models 220-280. Can also contain significantly fewer layers. Moreover, the models 220-280 may not necessarily have the same structure, and as an exemplary example, some models are not four models with three layers and a student model with two layers, respectively. Use cases may include 15 teacher models (although all 15 have at least 200 layers) and a student model with 30 layers, each with a different number of layers.

剪定することにより、リソースベースの性能が大幅に向上し得る。言い換えると、剪定後、学生モデル２４１はモデル２４０よりも大幅に「軽量」になり得る。例えば、学生モデル２４１は、モデル２４０よりも動作するためのメモリの消費を大幅に少なくし得る。したがって、剪定することはまた、モデル２４０と比較した場合、学生モデル２４１に重大な精度のペナルティをもたらし得る。これは、層の接続および順序がモデルの有効性の重要な要素であり、層の順序のわずかな変更でもモデルの精度に大きな影響を与える可能性があるためである。剪定することは、複数の層を削除することを含むため、結果として学生モデル２４１の精度が大幅に低下する可能性がある。しかし、これらのペナルティは、図３を参照して以下で説明するような特別なトレーニングによって有利に軽減できる。 Pruning can significantly improve resource-based performance. In other words, after pruning, the student model 241 can be significantly "lighter" than the model 240. For example, the student model 241 may consume significantly less memory to operate than the model 240. Therefore, pruning can also result in a significant accuracy penalty for student model 241 when compared to model 240. This is because the connection and order of the layers is an important factor in the effectiveness of the model, and even a small change in the order of the layers can have a significant impact on the accuracy of the model. Pruning involves removing multiple layers, which can result in a significant reduction in the accuracy of the student model 241. However, these penalties can be advantageously mitigated by special training as described below with reference to FIG.

図３は、本開示のいくつかの実施形態と一致する、トレーニングデータ３０２および複数の教師モデル３２０、３６０に部分的に基づく学生モデル３４０をトレーニングするためのデータフローの例示的なダイアグラム３００である。このトレーニングにより、学生モデルの精度が大幅に向上し、剪定することによって被る精度のペナルティを有利に軽減でき得る。トレーニングデータ３０２は、例えば、特定の病状（癌性増殖の可能性、骨折など）を示す一連のＸ線画像、診断された状態の患者の一連の脳波記録（ＥＥＧ）チャートなどの注釈付きデータが有名であり得る。 FIG. 3 is an exemplary diagram 300 of a data flow for training a student model 340 based in part on training data 302 and a plurality of teacher models 320 and 360, consistent with some embodiments of the present disclosure. .. This training can greatly improve the accuracy of the student model and can advantageously reduce the accuracy penalty incurred by pruning. The training data 302 includes annotated data such as a series of X-ray images showing a specific medical condition (potential for cancerous growth, fracture, etc.), a series of electroencephalogram (EEG) charts of a patient in a diagnosed state, and the like. Can be famous.

教師モデル３２０および３６０は、モデルリポジトリから選択された確立／事前トレーニングされたモデルであり得る。例えば、教師モデル３２０および３６０は、図２を参照して上述したように、モデルリポジトリ２０２のモデル２２０および２６０に対応し得る。 Teacher models 320 and 360 can be established / pre-trained models selected from the model repository. For example, teacher models 320 and 360 may correspond to models 220 and 260 in the model repository 202, as described above with reference to FIG.

データ３０２は、教師モデル３２０および３６０ならびに学生モデル３４０などのいくつかのモデルに入力され得る。例えば、データ３０２は、モデル３２０の第１の層３２２に入力され得る。層３２２は、特徴３２３（特徴ベクトルの形態など）を生成し得て、特徴３２３は、次いで、層３２４に入力され得る。同様に、層３２４は、特徴３２３を操作し、次いで、出力特徴３２５を生成し得る。次いで、特徴３２５は、最終層３２６に入力され、結果として出力３２８を生じ得る。いくつかの実施形態では、出力３２８をデータ３０２のラベルと比較して、教師モデル３２０の精度をチェックし得る。ただし、教師モデル３２０の精度は、学生モデル３４０をトレーニングするために十分であると見なされ得る（したがって、出力３２８をデータ３０２と比較する必要がなくなる）。 Data 302 can be input to several models such as teacher models 320 and 360 as well as student models 340. For example, the data 302 may be input to the first layer 322 of the model 320. Layer 322 may generate features 323 (such as feature vector morphology), which may then be input to layer 324. Similarly, layer 324 may manipulate feature 323 and then produce output feature 325. Feature 325 can then be input to the final layer 326, resulting in output 328. In some embodiments, the output 328 may be compared to the label of the data 302 to check the accuracy of the teacher model 320. However, the accuracy of the teacher model 320 may be considered sufficient to train the student model 340 (thus eliminating the need to compare the output 328 with the data 302).

同様に、データ３０２は、モデル３６０の第１の層３６２に入力され得る。層３６２は、出力特徴３６３を生成し得て、次いで、出力特徴３６３は層３６４に入力され得る。同様に、層３６４は、特徴３６３を操作し、次いで、出力特徴３６５を生成し得る。次いで、特徴３６５は、最終層３６６に入力され、結果として出力３６８を生じ得る。いくつかの実施形態では、出力３６８をデータ３０２と比較して、教師モデル３６０の精度をチェックし得る。ただし、教師モデル３６０の精度は、学生モデル３４０をトレーニングするために十分であると見なされ得る（したがって、出力３６８をデータ３０２と比較する必要がなくなる）。 Similarly, the data 302 may be input to the first layer 362 of the model 360. The layer 362 may generate the output feature 363 and then the output feature 363 may be input to the layer 364. Similarly, layer 364 may manipulate feature 363 and then produce output feature 365. Feature 365 can then be input to the final layer 366, resulting in output 368. In some embodiments, the output 368 may be compared to the data 302 to check the accuracy of the teacher model 360. However, the accuracy of the teacher model 360 may be considered sufficient to train the student model 340 (thus eliminating the need to compare the output 368 with the data 302).

データ３０２はまた、学生モデル３４０の第１の層３４２に入力され得る。第１の層３４２は、入力データ３０２を操作し、第１の出力特徴３４３を生成し得る。次いで、これらの特徴は、学生出力３４８を生成するために、層３４６に入力され得る。出力３４８は、入力データ３０２の既知の値と比較し得て、比較に基づいて、層３４２および３４６に対して調整を行い得る。ただし、機能３４３はまた、機能３２３、３２５、３６３、および３６５の１つまたは複数と比較し得る。さらに、層３４２および３４６への調整は、少なくとも部分的に、これらの比較に基づき得る。例えば、特徴３４２と特徴３２３との間の差が比較的大きい場合、層３４２の重みに加えられる変更の大きさは、比較的大きくなり得る。これは、トレーニング学生モデル３４０が、最小の精度コスト（出力３４８のデータ３０２との比較は、層３４２／３４６が調整されたときにも考慮されるため）で、教師モデル３２０および３６０と同様の方法で機能することを有利に可能にし得る（教師モデル３２０および３６０はかなり複雑で、層が多いにもかかわらず）。 The data 302 can also be input to the first layer 342 of the student model 340. The first layer 342 may manipulate the input data 302 to generate the first output feature 343. These features can then be input to layer 346 to generate student output 348. Output 348 may be compared to known values of input data 302 and adjustments may be made to layers 342 and 346 based on the comparison. However, function 343 may also be compared to one or more of functions 323, 325, 363, and 365. Further, adjustments to layers 342 and 346 may be based on these comparisons, at least in part. For example, if the difference between feature 342 and feature 323 is relatively large, the magnitude of the change applied to the weight of layer 342 can be relatively large. This is similar to the teacher models 320 and 360, with the training student model 340 having the lowest accuracy cost (because the comparison with the data 302 of the output 348 is also taken into account when the layers 342/346 are adjusted). It may be possible to favorably function in a way (despite the fact that the teacher models 320 and 360 are fairly complex and have many layers).

本開示全体を通して、様々な形式のデータ、特に「ラベル付けされた」データと「ラベル付けされていない」データが参照される。「ラベル付けされた」データはさらに、「強く」ラベル付けされたデータ（通常は外部ソースから受信）および「弱く」ラベル付けされたデータのグループに分けられる。一般に、強くラベル付けされたデータとは、すでに分類、注釈が付けられている、またはその他の方法で既知の理解されているデータを指す。例えば、画像認識の文脈において、既知のデータは、画像に示されるアイテムまたはオブジェクトを詳述する情報（すなわち、メタデータ）を含む画像であり得る。追加の例として、ＥＣＧを（１人または複数の医療専門家によって精査された）正常な心臓リズムとして記述するメタデータを含む公衆衛生データベースから受信した心電図（ＥＣＧまたはＥＫＧ）は、強くラベル付けされたデータであり得る。強くラベル付けされたデータは、モデルトレーニングデータベース、人工知能（ＡＩ）コミュニティなど、様々なソースから公開され得る。注釈またはラベルのないデータは、「ラベル付けされていない」データと呼ばれる。非限定的な例として、生のセンサデータは、ラベル付けされていないデータであり得る。 Throughout this disclosure, various forms of data, in particular "labeled" and "unlabeled" data, are referred to. "Labeled" data is further divided into groups of "strongly" labeled data (usually received from external sources) and "weakly" labeled data. In general, strongly labeled data refers to data that has already been categorized, annotated, or otherwise known and understood. For example, in the context of image recognition, known data can be an image containing information (ie, metadata) detailing the item or object shown in the image. As an additional example, an electrocardiogram (ECG or EKG) received from a public health database containing metadata describing the ECG as a normal cardiac rhythm (scrutinized by one or more medical professionals) is strongly labeled. It can be data. Strongly labeled data can be published from a variety of sources, including model training databases and artificial intelligence (AI) communities. Data without annotations or labels is called "unlabeled" data. As a non-limiting example, raw sensor data can be unlabeled data.

本明細書で使用される「弱く」ラベル付けされたデータは、以前にラベル付けされていなく、本開示と一致するシステムおよび方法を介するなど、人間の専門家による手動注釈以外の手段によってラベル付けされたデータを指す。「弱い」という名称は、データが手動注釈などの従来の方法でラベル付けされておらず、正しくラベル付けされていることが確認されていないという事実を暗に示し得る。 The "weakly" labeled data used herein are not previously labeled and are labeled by means other than manual annotation by human experts, such as through systems and methods consistent with this disclosure. Refers to the data that has been created. The name "weak" can imply the fact that the data has not been labeled by traditional methods such as manual annotation and has not been confirmed to be correctly labeled.

さらに、本明細書で使用される「データ」は、「入力データ」（デバイスから記録されたデータなど）または「特徴データ」（機械学習モデルの中間層からの出力など）を指し得る。一例として、ＥＣＧ（「入力データ」）は、モデルの第１の層に入力され得て、第１の層は、複数の特徴（「特徴データ」）を含む特徴ベクトルを出力し得る。次いで、特徴ベクトルはモデルの第２の層に入力され得て、次いで、第２の層は、第２の特徴ベクトル（その特徴は特徴データでもある）を出力し得る。以下同様である。入力データは、ラベル付けされているか、ラベル付けされていないかのいずれかである。ラベル付けされた入力データから生成された特徴データは、同じラベルを継承し得る。例えば、ラベル付けされたＥＣＧは、第１のモデル層によって第１の特徴ベクトルに変換され得る。次いで、ＥＣＧのラベルに基づいて第１の特徴ベクトルにラベルを付けして、ラベル付けされた特徴データが得ることができる。したがって、データを取得することは、（例えば、患者データを収集すること、またはラベル付けされたデータをダウンロードすることによって、などで）入力データを受信すること、または（モデルに特徴データを入力すること、およびモデルの層から出力された特徴データをフェッチすることによって）特徴データを生成することを指し得る。 Further, as used herein, "data" can refer to "input data" (such as data recorded from a device) or "feature data" (such as output from the middle layer of a machine learning model). As an example, the ECG (“input data”) may be input to the first layer of the model, which may output a feature vector containing the plurality of features (“feature data”). The feature vector can then be input to the second layer of the model, and then the second layer can output the second feature vector, the feature of which is also feature data. The same applies hereinafter. The input data is either labeled or unlabeled. Feature data generated from labeled input data may inherit the same label. For example, the labeled ECG can be transformed into a first feature vector by the first model layer. The first feature vector can then be labeled based on the ECG label to obtain the labeled feature data. Therefore, retrieving data means receiving input data (eg, by collecting patient data or downloading labeled data, etc.), or inputting feature data into the model. It can also refer to generating feature data (by fetching feature data output from the layers of the model).

図４は、本開示のいくつかの実施形態と一致する、ラベル付けされていないデータに基づいて機械学習モデルを更新する高レベルの方法４００を示す。方法４００は、例えば、コンピュータなど、その上に機械学習モデルがインストールされたデバイスによって実行され得る。方法４００は、動作４０２でデータを収集することを備える。動作４０２は、例えば、１つまたは複数のセンサを介してデータを記録することを含み得る。動作４０２で収集されたデータは、ラベル付けされていない入力データであり得る。例えば、動作４０２は、脳波記録データ（例えば、電極のセットからのデータのＥＥＧ読み取りの画像）を受信することを含み得る。ＥＥＧデータが正常な脳活動を示しているのか、例えば、発作を示しているのかは、データにラベルが付いていないため、最初は不明である。 FIG. 4 shows a high level method 400 for updating a machine learning model based on unlabeled data, consistent with some embodiments of the present disclosure. Method 400 can be performed by a device on which a machine learning model is installed, such as a computer, for example. Method 400 comprises collecting data in operation 402. Operation 402 may include recording data, for example, via one or more sensors. The data collected in operation 402 can be unlabeled input data. For example, motion 402 may include receiving electroencephalogram recording data (eg, an EEG read image of data from a set of electrodes). Whether the EEG data indicate normal brain activity, for example, seizures, is initially unclear because the data are unlabeled.

方法４００は、動作４０４で既知のデータを受信することをさらに備える。動作４０４は、例えば、オンラインデータベースからデータセットをダウンロードすること、ローカルストレージデバイスからデータセットをロードすることなどを含み得る。既知のデータには、専門家／専門医などによって注釈が付けられた一連の測定値が含まれ得る。この既知のデータは、本明細書では「強く」ラベル付けされたデータとも呼ばれる。動作４０４は、単一のデータセットまたは複数のデータセットを受信することを含み得る。 Method 400 further comprises receiving known data in operation 404. Operation 404 may include, for example, downloading a dataset from an online database, loading a dataset from a local storage device, and the like. Known data may include a set of measurements annotated by a specialist / specialist or the like. This known data is also referred to herein as "strongly" labeled data. Operation 404 may include receiving a single data set or multiple data sets.

方法４００は、動作４０６で、ラベル付けされたデータをラベル付けされていないデータと比較することをさらに備える。動作４０６は、例えば、ラベル付けされていないデータセットのデータポイントとラベル付けされたデータセットのデータポイントとの間の差を計算することを含み得る。いくつかの実施形態では、動作４０６は、データのクラスタリングを含み得る。非限定的な高レベルの例として、データセットの各データポイントは、Ｘ－Ｙ座標対として表され得て、動作４０６は、Ｘ－Ｙ平面内で互いに近いデータポイントのグループまたは「クラスタ」を認識するために各データポイントをプロットすることを含み得る。より具体的な例として、脳機能の様々な状態を示す、ラベル付けされた公的に利用可能なＥＥＧ画像のグループ（各画像は、単一の読み取りに対応する「データポイント」として機能する）は、同様にラベル付けされたＥＥＧ画像が互いに近接するようにプロットされ得る。言い換えれば、正常な脳機能を示すものとしてラベル付けされたＥＥＧ画像は、第１のクラスタにプロットされ得て、第１のタイプの発作を示すものとしてラベル付けされたＥＥＧ画像を第２のクラスタにプロットされ得て、第２のタイプの発作を示すものとしてラベル付けされたＥＥＧ画像は、第３のクラスタにプロットされ得る。以下同様である。様々なラベル付けされた画像をプロットするために使用されるアルゴリズムは、画像がラベルに従ってクラスタ化されるまで繰り返し変更され得る。次いで、ラベル付けされていないデータからの画像をプロットし、同じアルゴリズムを使用してクラスタに編成し得る。このような比較の例は、図５のダイアグラム５００に示される。動作４０６は、変換アルゴリズムを介してラベル付けされたデータとラベル付けされていないデータとを比較することを含み得る。 Method 400 further comprises comparing the labeled data with the unlabeled data in operation 406. Operation 406 may include, for example, calculating the difference between a data point in an unlabeled data set and a data point in a labeled data set. In some embodiments, operation 406 may include clustering of data. As a non-limiting high-level example, each data point in a dataset can be represented as an XY coordinate pair, where operation 406 is a group or "cluster" of data points close to each other in the XY plane. It may include plotting each data point for recognition. As a more specific example, a group of labeled publicly available EEG images showing different states of brain function (each image acts as a "data point" corresponding to a single read). Can be plotted so that similarly labeled EEG images are close to each other. In other words, EEG images labeled as indicating normal brain function can be plotted in a first cluster, and EEG images labeled as indicating a first type of seizure can be plotted in a second cluster. EEG images that can be plotted in and labeled as indicating a second type of seizure can be plotted in a third cluster. The same applies hereinafter. The algorithm used to plot the various labeled images can be iteratively modified until the images are clustered according to the labels. Images from unlabeled data can then be plotted and organized into clusters using the same algorithm. An example of such a comparison is shown in Diagram 500 of FIG. Operation 406 may include comparing labeled and unlabeled data via a conversion algorithm.

動作４０６を介して識別されたいくつかのクラスタは、ラベル付けされていないデータセットからのデータポイントを含み得ない。一例として、第１のクラスタは、「正常な心臓リズム」とラベル付けされた各ＥＣＧである既知のデータポイントを含み得て、第２のクラスタは、「心房細動」とラベル付けされた各ＥＣＧである既知のデータポイントを含み得て、第３のクラスタは、「心室細動」とラベル付けされた各ＥＣＧである既知のデータポイントを含み得て、ラベル付けされていないデータは、特定の患者について測定されたＥＣＧのセットであり得る。ラベル付けされていないデータセット（例えば、患者のＥＣＧ）は、第１のクラスタにいくつかのデータポイントを含み得て、第２のクラスタにいくつかのデータポイントを含み得るが、第３のクラスタにデータポイントを含み得ない。患者が比較的健康で、心臓疾患の既往がないがない場合、ラベル付けされていないデータに心室細動を示すＥＣＧが含まれていることは期待され得ない（心室細動では通常、心停止が続き、通常は致命的であるため）。 Some clusters identified via operation 406 cannot contain data points from unlabeled datasets. As an example, the first cluster may contain known data points that are each ECG labeled "normal cardiac rhythm" and the second cluster may contain each labeled "atrial fibrillation". A third cluster may contain known data points, which may be ECGs, and a third cluster may contain known data points, which are ECGs labeled as "ventricular fibrillation", and unlabeled data may be specific. It can be a set of ECGs measured for a patient with. An unlabeled dataset (eg, the patient's ECG) may contain some data points in the first cluster and some data points in the second cluster, but the third cluster. Cannot contain data points. If the patient is relatively healthy and has no history of heart disease, it cannot be expected that unlabeled data will contain ECG indicating ventricular fibrillation (usually cardiac arrest in ventricular fibrillation). (Because it is usually fatal).

方法４００は、動作４０８で収集されたデータにラベル付けすることをさらに備える。動作４０８は、動作４０２を介して受信されたラベル付けされていないデータセットに含まれる各データポイントのラベルを判断することを含み得る。動作４０８は、動作４０６の比較に基づいてラベルを判断することを含み得る。例えば、動作４０８は、受信した既知のデータセットからの最も類似するデータポイントのラベルに基づいて、各データポイントのラベルを判断することを含み得る。いくつかの実施形態では、動作４０８は、変換アルゴリズムを実装するコンピュータシステムによって実行され得る。動作４０８を介して判断されたラベルは、「従来の」方法（専門家による手動分析および注釈を介するなど）で判断されてはいないので、動作４０８を介して判断されたラベルは、「弱い」ラベルと見なされ得る。 Method 400 further comprises labeling the data collected in operation 408. The operation 408 may include determining the label of each data point contained in the unlabeled data set received via the operation 402. The operation 408 may include determining the label based on the comparison of the operation 406. For example, operation 408 may include determining the label of each data point based on the label of the most similar data point from a known data set received. In some embodiments, the operation 408 may be performed by a computer system that implements the conversion algorithm. Labels determined via motion 408 are "weak" because labels determined via motion 408 are not determined by "conventional" methods (such as through manual analysis and annotation by experts). Can be considered a label.

いくつかの実施形態では、教師モデルの出力は、弱いラベルと見なすことができる。例示的な例として、ラベル付けされていないデータは、７つのターゲットクラス（７クラスの発作タイプ分類に関連する）の確率を表す７つのエントリを含むベクトルを出力し得る教師モデルに入力され得る。このベクトルにａｒｇｍａｘ演算を適用して、最も高い確率のインデックスを見つけることができる。ベクトルのこのインデックスは、ラベル付けされていないデータの最も可能性の高いクラスと見なされ得るため、弱い／合成ラベルとして扱うことができる。 In some embodiments, the output of the teacher model can be considered a weak label. As an exemplary example, unlabeled data can be input to a teacher model that can output a vector containing 7 entries representing the probabilities of 7 target classes (related to 7 classes of seizure type classification). You can apply the argmax operation to this vector to find the index with the highest probability. This index of vector can be treated as a weak / synthetic label because it can be considered the most likely class of unlabeled data.

一例として、第１のラベル付けされていないＥＥＧ画像（例えば、動作４０２を介して取得される収集された患者データ）は、第１のクラスタ内にあり得て、第１のクラスタはまた、いくつかのラベル付けされた画像を含む（例えば、動作４０４を介して受信した既知の「強く」ラベル付けされたデータセットからのＥＥＧ画像）。動作４０８は、第１のクラスタ内のラベル付けされたＥＥＧ画像のそれぞれがすべて「正常な脳活動」とラベル付けされていることを識別し、したがって、その第１のラベルを第１のラベル付けされていない画像（および第１のクラスタ内の任意の他のラベル付けされていないデータポイント）に割り当てることを含み得る。したがって、第１のラベル付けされていないＥＥＧ画像には、画像を「正常な脳活動」を表すものとして識別する「弱い」ラベルを割り当てることができる。 As an example, a first unlabeled EEG image (eg, collected patient data acquired via motion 402) can be within the first cluster, and the first cluster is also how many. Includes such labeled images (eg, EEG images from known "strongly" labeled datasets received via operation 404). Action 408 identifies that each of the labeled EEG images in the first cluster is labeled as "normal brain activity" and therefore its first label is first labeled. It may include assigning to an unlabeled image (and any other unlabeled data point in the first cluster). Therefore, the first unlabeled EEG image can be assigned a "weak" label that identifies the image as representing "normal brain activity".

動作４０８は、（ラベルを含むメタデータをデータセットに追加することなどによって）判断されたラベルをデータセットに割り当てることをさらに含み得る。メタデータは、ラベルが「弱い」ラベルであることをさらに示し得る。 Operation 408 may further include assigning the determined label to the dataset (eg, by adding metadata containing the label to the dataset). Metadata may further indicate that the label is a "weak" label.

場合によっては、すべてのデータポイントがラベル付けされ得るとは限らず、例えば、ラベル付けされていないデータポイントは、動作４０６（「ラベル付けされていない外れ値」）を介して識別されたクラスタの外側に存在し得るか、または直接クラスタの間に存在し得る。ラベル付けされていない外れ値を処理する手順は、実施形態／ユースケースによって異なり得る。例えば、いくつかの実施形態では、動作４０８は、ラベル付けされていないデータセットからラベル付けされていない外れ値を破棄することをさらに含み得る。いくつかの実施形態では、動作４０８は、「最良の推測」（例えば、外れ値に最も近いクラスタ）に基づいて、ラベル付けされていない外れ値にラベルを付けることをさらに含み得るか、または、動作４０８は、異なるアルゴリズム（例えば、最近傍）を実装することなどによって、外れ値にラベルを付けることを試みるための追加の「パス」を実行することを含み得る。データポイントは、閾値の比較に基づいて「外れ値」と見なされ得て、差の閾値は、ラベル付けされていないデータポイントとラベル付けされたデータポイント（またはクラスタの中心）との間の最大差を表すように定義できる。ラベル付けされていないデータポイントが、最も近いラベル付けされたデータポイント（またはラベル付けされたデータポイントの最も近いクラスタ）と、その差の閾値よりも大きい量の差で異なる場合、ラベル付けされていないデータポイントは外れ値と見なされ得る。 In some cases, not all data points can be labeled, for example, unlabeled data points are of clusters identified via operation 406 (“unlabeled outliers”). It can exist on the outside or directly between clusters. The procedure for handling unlabeled outliers may vary from embodiment / use case to embodiment. For example, in some embodiments, operation 408 may further comprise discarding unlabeled outliers from the unlabeled dataset. In some embodiments, operation 408 may further comprise labeling unlabeled outliers based on a "best guess" (eg, the cluster closest to the outliers), or Operation 408 may include performing additional "paths" to attempt to label outliers, such as by implementing different algorithms (eg, nearest neighbors). Data points can be considered "outliers" based on threshold comparisons, and the difference threshold is the maximum between the unlabeled data point and the labeled data point (or the center of the cluster). It can be defined to represent the difference. Labeled if the unlabeled data points differ from the closest labeled data point (or the closest cluster of labeled data points) by an amount greater than the difference threshold. No data points can be considered outliers.

方法４００は、動作４１０で機械学習モデルを再トレーニングすることをさらに備える。機械学習モデルは、動作４０２を介して収集されたデータなどの入力データを分析または分類するようにすでにトレーニングされ得る。入力データの性質が変化すると、モデルは時間の経過とともに精度が低下し得るが、これは、再トレーニングによって軽減できる。具体的には、動作４１０は、機械学習モデルを再トレーニングするためのトレーニングデータとして弱くラベル付けされたデータを使用することを含み得る。例えば、動作４１０は、弱くラベル付けされたデータをモデルに入力することと、モデルから出力を受信することと、出力とラベルとの間の誤差を計算することと、誤差に基づいてモデルの層を調整することと、誤差が減少しなくなるまで繰り返すことを含み得る。いくつかの実施形態では、動作４１０は、弱くラベル付けされたデータを使用してトレーニングすることに加えて、強くラベル付けされたデータセットを使用してモデルをトレーニングすることをさらに有する。さらに、いくつかの実施形態では、強くラベル付けされたデータを教師モデルに入力して、教師出力を生成し得て、方法１００の動作１０８と同様の方法で、教師出力を考慮して、強くラベル付けされたデータおよび弱くラベル付けされたデータに基づいて、学生モデルを再トレーニングし得る。 Method 400 further comprises retraining the machine learning model with motion 410. Machine learning models can already be trained to analyze or classify input data, such as data collected via motion 402. As the nature of the input data changes, the model can become less accurate over time, which can be mitigated by retraining. Specifically, motion 410 may include using weakly labeled data as training data for retraining the machine learning model. For example, operation 410 inputs weakly labeled data into the model, receives output from the model, calculates the error between the output and the label, and layers the model based on the error. Can include adjusting and repeating until the error is no longer reduced. In some embodiments, motion 410 further comprises training the model with a strongly labeled dataset, in addition to training with weakly labeled data. Further, in some embodiments, strongly labeled data can be input into the teacher model to generate a teacher output, in a manner similar to operation 108 of method 100, strongly considering the teacher output. The student model can be retrained based on the labeled and weakly labeled data.

動作４１０は、再トレーニングによって性能が向上し得ない（または性能が低下し得る）可能性を考慮して、モデルの「ライブ」バージョンを変更せずに、モデルの「作業コピー」に対して実行し得る。場合によっては、弱くラベル付けされたデータと強くラベル付けされたデータは、集約トレーニングデータセットに結合され得て、集約トレーニングデータセットは、モデルを再トレーニングするために利用され得る。場合によっては、弱くラベル付けされたデータは、第１の学生モデルをトレーニングするために使用され得て、強くラベル付けされたデータは、第２の学生モデルをトレーニングするために使用され得て、２つの学生モデルは後にアンサンブルモデルに結合され得る。 Action 410 runs against a "working copy" of the model without changing the "live" version of the model, taking into account the possibility that retraining may not improve (or degrade) performance. Can be. In some cases, weakly labeled data and strongly labeled data can be combined into an aggregate training dataset, which can be used to retrain the model. In some cases, weakly labeled data can be used to train the first student model, and strongly labeled data can be used to train the second student model. The two student models can later be combined with the ensemble model.

方法４００は、動作４１２でトレーニングされたモデルを評価することをさらに備える。動作４１２は、例えば、強くラベル付けされたデータを再トレーニングされたモデルに入力すること、出力を受信すること、および再トレーニングされたモデルの精度を判断することを含み得る。動作４１２は、動作４１０の一部としてモデルを再トレーニングするために利用されなかった強くラベル付けされたデータを参照して、「評価データ」を使用してモデルを評価することを含み得る。 Method 400 further comprises evaluating a model trained in motion 412. Operation 412 may include, for example, inputting strongly labeled data into the retrained model, receiving output, and determining the accuracy of the retrained model. Motion 412 may include evaluating the model using "evaluation data" with reference to strongly labeled data that was not used to retrain the model as part of motion 410.

方法４００はさらに、再トレーニングされたモデルの性能が、動作４１４でのモデルの前のバージョン（「ライブ」モデル）の性能と比較して向上してるか、または低下しているかを判断することを備える。動作４１４は、例えば、再トレーニングされたモデルの精度をライブモデルの精度と比較することを含み得る。例えば、再トレーニングされたモデルは、動作４１２で評価データの９８％を正しく分類し得る。ライブモデルの精度が９５％であった場合、精度が高いため、再トレーニングされたモデルが優れていると見なされる（４１４「はい」）。ｆ１スコア、感度、特異性など、他の性能メトリック（精度以外）が考慮され得る。例えば、動作４１４は、再トレーニングされたモデルの感度をライブモデルの感度と比較することを含み得る。 Method 400 further determines whether the performance of the retrained model is improved or reduced compared to the performance of the previous version of the model in operation 414 (the "live" model). Be prepared. Motion 414 may include, for example, comparing the accuracy of the retrained model with the accuracy of the live model. For example, a retrained model may correctly classify 98% of the evaluation data in motion 412. If the accuracy of the live model is 95%, then the retrained model is considered superior because of the high accuracy (414 "yes"). Other performance metrics (other than accuracy) such as f1 score, sensitivity, and specificity can be considered. For example, motion 414 may include comparing the sensitivity of the retrained model with the sensitivity of the live model.

再トレーニングされたモデルにより性能の向上がもたらされる場合（４１４「はい」）、方法４００は、動作４１６で、ライブモデルを再トレーニングされたモデルと置き換えることをさらに有する。動作４１６は、例えば、動作４１０でモデルを再トレーニングするときに行われた変更に基づいてモデルの層を更新することを含み得る。いくつかのユースケースでは、モデルの古いバージョンをアーカイブして、必要に応じてロールバックを有効にし得る。例示的な例として、４１２での評価は、必ずしもすべての態様でモデルの性能を徹底的にテストするとは限らないため、再トレーニングされたモデルの性能は、いくつかの文脈では低下し得る（例えば、評価データに表示されなかった、心房細動の稀な表示を示すＥＣＧなど、入力データの特定のサブタイプが表示された場合の精度は比較的低くなる）。したがって、再トレーニングされたモデルが後で不十分であることが判明した場合に備えて、モデルの１つまたは複数の以前の反復のバックアップを保持することには利点があり得る。ただし、場合によっては、モデルを実行しているデバイスの記憶領域の制約により、バックアップが使用できない場合がある（ただし、バックアップは依然としてクラウドサーバなどの他の場所で維持され得る）。 If the retrained model results in improved performance (414 "yes"), method 400 further comprises replacing the live model with the retrained model in motion 416. Motion 416 may include, for example, updating the layers of the model based on the changes made when retraining the model in motion 410. In some use cases, older versions of the model can be archived and rollback enabled if needed. As an exemplary example, the performance of a retrained model can be degraded in some contexts, as the evaluation at 412 does not necessarily thoroughly test the performance of the model in all aspects (eg,). , The accuracy is relatively low when certain subtypes of input data are displayed, such as ECG, which indicates a rare display of atrial fibrillation that was not displayed in the evaluation data). Therefore, it may be advantageous to keep a backup of one or more previous iterations of the model in case the retrained model later turns out to be inadequate. However, in some cases, backups may not be available due to storage constraints on the device running the model (although backups can still be maintained elsewhere, such as on a cloud server).

再トレーニングされたモデルが、ライブモデルと比較して性能の向上をもたらさない場合（４１４「いいえ」）、方法４００は、動作４１８で再トレーニングされたモデルを破棄し、ライブモデルを使用し続けることをさらに有する。 If the retrained model does not provide a performance improvement compared to the live model (414 "no"), method 400 discards the retrained model in motion 418 and continues to use the live model. Further have.

方法４００は、ユーザ入力に応答して実行され得る（例えば、ユーザは、デバイスまたはシステムに方法４００を実行させ得る）。いくつかの実施形態では、方法４００は、定期的に、例えば、３ヶ月ごと、毎週などに実行され得る。弱くラベル付けされたデータの使用を含む再トレーニングは、有利には、方法４００を実行するシステムが機械学習モデルを更新することを可能にする。具体的には、弱くラベル付けされたデータの「現実世界」の性質により、入力データが移動しても、堅牢さおよび正確さを維持するために、機械学習モデルを更新することが有利に可能になる。例えば、ユーザの生理機能が時間の経過とともに変化すると、モデルに入力されるデータに、モデルがトレーニングされていないパターンが含まれ得る。しかし、基本的なパターンは一貫性を維持し得るので、方法４００は、新しいパターンが提示された場合でも、追加の専門的なデータ分析または注釈を必要とせずに、正確さを維持するために、モデルを更新することを有利に可能にできる。 The method 400 may be performed in response to user input (eg, the user may have the device or system perform method 400). In some embodiments, method 400 may be performed on a regular basis, eg, every 3 months, weekly, and so on. Retraining, including the use of weakly labeled data, advantageously allows the system performing Method 400 to update the machine learning model. Specifically, the "real world" nature of weakly labeled data makes it advantageous to update machine learning models to maintain robustness and accuracy as input data moves. become. For example, as the user's physiology changes over time, the data entered into the model may include patterns in which the model has not been trained. However, since the basic pattern can be consistent, Method 400 is to maintain accuracy even when a new pattern is presented, without the need for additional specialized data analysis or annotation. , It can be advantageous to update the model.

いくつかの実施形態では、再トレーニングされたモデルはまた、デバイスのハードウェア仕様に基づくリソース要件と比較され得る。再トレーニングされたモデルのリソースオーバーヘッドが要件を超える場合、再トレーニングされたモデルは剪定され、再度トレーニングされ得る。これは、（図１を参照して上記で論じた）方法１００の動作１１２、１０５、および１０６と同様の方法で実行され得る。これにより、リソース要件内での（例えば、精度を高めることによる）より良い実行と機能を両立させるために、既存のモデルを再トレーニングすることが有利に可能になる。 In some embodiments, the retrained model can also be compared to resource requirements based on the hardware specifications of the device. If the resource overhead of the retrained model exceeds the requirements, the retrained model can be pruned and retrained. This can be performed in a manner similar to operations 112, 105, and 106 of method 100 (discussed above with reference to FIG. 1). This makes it possible to advantageously retrain existing models in order to achieve both better execution and functionality within resource requirements (eg, by increasing accuracy).

図５は、本開示のいくつかの実施形態と一致する、ラベル付けされていない観察データが既知のデータセットに基づいて自動的にラベル付けされ得る方法の例示的な例を示すダイアグラム５００である。ダイアグラム５００は、例えば、変換などの半教師あり機械学習プロセスを示すことができる。ダイアグラム５００は、例えば、様々なデータポイントに（Ｘ，Ｙ）座標を割り当てることによって生成された、データポイントの２Ｄプロットを表し得る。ダイアグラム５００は、データポイント５１２、５１４、および５１６（図５に正方形で示す）を含む観測されたラベル付けされていないデータセット、既知の「強く」ラベル付けされたデータポイント５２２、５２３、および５２６の第１のセット（図５に三角形で示す）、および既知の強くラベル付けされたデータポイント５３２、５３４、５３５、および５３６の第２のセット（図５に円形で示す）の３つの異なるデータセットからの形状を含む。例えば、観察されたデータセットは、患者のＥＥＧスキャンであり得る（すなわち、データポイント５１２は、第１の患者記録からのＥＥＧデータの第１のセットであり得て、データポイント５１４は、同じ患者による第２の記録からのＥＥＧデータの第２のセットであり得る、など）。さらに、強くラベル付けされたデータポイントの第１のセットは、既知の結果を持つ公的に利用可能なＥＥＧデータセットの第１のグループであり得る。例えば、データポイント５２２は、第１のタイプの発作を示すために識別されるＥＥＧスキャンであり得て、データポイント５２３は、第１のタイプの発作を示すために専門家によっても識別される異なるＥＥＧスキャンであり得て、データポイント５２６は、正常な脳活動を示すために識別されるＥＥＧスキャンであり得る。強くラベル付けされたデータポイントの第２のセットは、（場合によってはソースからの）注釈付きＥＥＧスキャンの異なるグループであり得る。３つ以上のラベル付けされたデータセットの使用も検討されており、ラベル付けおよび再トレーニングに役立ち得る。いくつかの実施形態では、第１のデータセットを利用してモデルを再トレーニングし得て、第２のデータセットを利用して、再トレーニングされたモデルの性能（すなわち、精度）を評価し得る。 FIG. 5 is a diagram 500 illustrating an exemplary example of how unlabeled observational data can be automatically labeled based on a known dataset, consistent with some embodiments of the present disclosure. .. Diagram 500 can show a semi-supervised machine learning process, such as transformation. Diagram 500 may represent, for example, a 2D plot of data points generated by assigning (X, Y) coordinates to various data points. Figure 500 shows an observed unlabeled dataset containing data points 512, 514, and 516 (shown in squares in FIG. 5), known "strongly" labeled data points 522, 523, and 526. Three different data of the first set of (indicated by a triangle in FIG. 5) and the second set of known strongly labeled data points 532, 534, 535, and 536 (indicated by a circle in FIG. 5). Includes shape from the set. For example, the observed data set can be a patient's EEG scan (ie, data point 512 can be the first set of EEG data from the first patient record, and data point 514 is the same patient. Can be a second set of EEG data from a second recording by, etc.). In addition, the first set of strongly labeled data points can be the first group of publicly available EEG datasets with known results. For example, data point 522 may be an EEG scan identified to indicate a first type of seizure, and data point 523 may also be identified by an expert to indicate a first type of seizure. It can be an EEG scan, data point 526 can be an EEG scan identified to indicate normal brain activity. A second set of strongly labeled data points can be a different group of annotated EEG scans (possibly from the source). The use of three or more labeled datasets is also being considered and may be useful for labeling and retraining. In some embodiments, the first dataset may be used to retrain the model and the second dataset may be used to evaluate the performance (ie, accuracy) of the retrained model. ..

ダイアグラム５００は、例えば、様々なデータポイントに（Ｘ，Ｙ）座標を割り当てることによって生成された、データポイントの２Ｄプロットを表し得る。Ｘ値およびＹ値はアルゴリズムを介して割り当て得て、ラベル付けされたグループのデータポイントが同様にラベル付けされたデータポイントを有する別個のクラスタにプロットされるまで、アルゴリズムを繰り返し変更し得る。例えば、ラベル付けされたＥＥＧのセットは、第１のタイプの発作を示すものとしてラベル付けされたいくつかのＥＥＧ（「発作ＥＥＧ」）および正常な脳活動を示すものとしてラベル付けされたいくつかのＥＥＧ（「正常ＥＥＧ」）を含み得る。アルゴリズムの第１の反復では、ＥＥＧのファイルサイズに基づいて決定されたＸ座標と、ＥＥＧの作成日に基づいて決定されたＹ座標を使用して、これらのＥＥＧのそれぞれをプロットし得るが、これによって発作ＥＥＧが互いに近くにプロットされる（または、通常のＥＥＧが互いに近くにプロットされる）可能性は低い。したがって、アルゴリズムを変更して、（Ｘ，Ｙ）座標を異なる方法で決定し再評価し得て、このプロセスは、すべての発作ＥＥＧが、近くの通常のＥＥＧなしで、指定された半径の第１の領域（第１のクラスタ）内にプロットされ、すべての通常のＥＥＧが、近くの発作ＥＥＧなしで同じ半径の第２の領域（第２のクラスタ）内にプロットされるまで繰り返され得る。アルゴリズムがラベル付けされたデータポイントを互いに近くに確実にプロットすると、ラベル付けされていないデータポイントは、適切にプロットされると見なすことができる（例えば、正常な脳活動を表すラベル付けされていないＥＥＧは、第２のクラスタにプロットされると見なすことができる）。 Diagram 500 may represent, for example, a 2D plot of data points generated by assigning (X, Y) coordinates to various data points. The X and Y values can be assigned via the algorithm and the algorithm can be iteratively modified until the data points in the labeled group are plotted in separate clusters with similarly labeled data points. For example, a set of labeled EEGs may be labeled as indicating a first type of seizure (“seizure EEG”) and some labeled as indicating normal brain activity. EEG (“normal EEG”) may be included. In the first iteration of the algorithm, each of these EEGs can be plotted using the X coordinate determined based on the file size of the EEG and the Y coordinate determined based on the EEG creation date. This makes it unlikely that the seizure EEGs will be plotted close to each other (or the normal EEGs will be plotted close to each other). Therefore, the algorithm could be modified to determine and re-evaluate the (X, Y) coordinates in different ways, and this process would allow all seizure EEGs to have a specified radius without the usual EEG nearby. It can be plotted within one region (first cluster) and repeated until all normal EEGs are plotted within a second region (second cluster) of the same radius without nearby seizure EEGs. If the algorithm reliably plots the labeled data points close to each other, the unlabeled data points can be considered to be properly plotted (eg, unlabeled to represent normal brain activity). EEG can be considered to be plotted in the second cluster).

（Ｘ，Ｙ）座標は、必ずしもＥＥＧの特徴的なプロパティを表すとは限らず、それらは、外部の観察者には恣意的に見え得ることに留意されたい。言い換えれば、ダイアグラム５００は、平均、頻度、大きさなどのよく理解されたメトリックに関するデータのプロットである必要はない。簡単に言えば、軸は必要なものであり、アルゴリズムにより、同じクラスタ内で同じラベルを有するデータポイントがプロットされる限り、システムは、同じアルゴリズムが適切なクラスタ内にラベル付けされていないデータをプロットするという比較的高い信頼性を有し得る。 It should be noted that the (X, Y) coordinates do not necessarily represent the characteristic properties of the EEG and they can be seen arbitrarily by an outside observer. In other words, the diagram 500 does not have to be a plot of data on well-understood metrics such as mean, frequency and magnitude. Simply put, axes are necessary, and as long as the algorithm plots data points with the same label in the same cluster, the system will have the same algorithm unlabeled data in the appropriate cluster. It can have a relatively high reliability of plotting.

特に、ダイアグラム５００のデータポイントはすべて、所与のクラスタのメンバがすべて互いに比較的近くに配置されるように、別個のグループまたは「クラスタ」に比較的分離される。例示的なダイアグラム５００では、データポイント５１２、５２２、５２３および５３２はクラスタ５０２内にあり、データポイント５１４、５３４および５３５はクラスタ５０４内にあり、データポイント５１６、５２６、５３６はクラスタ５０６内にある。様々な強くラベル付けされたデータポイントも、それぞれ、「Ａ」、「Ｂ」、「Ｃ」の３つの異なるラベルのうちの１つを有する。さらに、ダイアグラム５００の所与の各クラスタの既知のラベルはすべて等しい。例えば、データポイント５２２、５２３、および５３２には、すべて「Ａ」のラベルが付けられているなどである。これは、データポイントがどのようにプロットされているかの結果であり得る。 In particular, all the data points in Diagram 500 are relatively separated into separate groups or "clusters" so that all members of a given cluster are located relatively close to each other. In the exemplary diagram 500, data points 512, 522, 523 and 532 are in cluster 502, data points 514, 534 and 535 are in cluster 504, and data points 516, 526 and 536 are in cluster 506. .. The various strongly labeled data points also have one of three different labels, "A", "B", "C", respectively. In addition, all known labels for each given cluster in Diagram 500 are equal. For example, data points 522, 523, and 532 are all labeled "A". This can be the result of how the data points are plotted.

データポイントの値をアルゴリズムに入力することと、アルゴリズムの出力として座標を受信することと、出力座標に従ってデータポイントをプロットすることと、特定の値のデータポイントが互いに比較的近接してプロットされるまでアルゴリズムを調整することによって、様々なデータポイントに座標を割り当て得る。このように、ラベル付けされていないデータポイントの値がアルゴリズムに入力された場合、その結果として得られる位置を含むクラスタは、データポイントの「正しい」ラベルに対応する可能性がある。例えば、クラスタ５０２内にラベル付けされていないデータポイント５１２がプロットされ、プロットアルゴリズムがクラスタ５０２内の「Ａ」のラベルが付けられたデータポイントをプロットすることを優先するため、データポイント５１２には「Ａ」のラベルを付けるべきであると推測できる。このプロセスは、ラベル付けされていないデータセット内のデータポイントごとに繰り返すことができ、その結果、データセットのラベルが「弱く」なり、観測データに基づいてモデルを再トレーニングすることを有利に可能にする。 Entering the value of a data point into the algorithm, receiving the coordinates as the output of the algorithm, plotting the data points according to the output coordinates, and plotting the data points of a particular value relatively close to each other. Coordinates can be assigned to various data points by adjusting the algorithm up to. Thus, if the value of an unlabeled data point is entered into the algorithm, the cluster containing the resulting position may correspond to the "correct" label of the data point. For example, the data points 512 are plotted because the unlabeled data points 512 are plotted in the cluster 502 and the plotting algorithm prefers to plot the data points labeled "A" in the cluster 502. It can be inferred that it should be labeled "A". This process can be repeated for each data point in the unlabeled dataset, which results in the dataset being "weakened" and favorably retraining the model based on the observed data. To.

非限定的な例として、「Ａ」は第１のタイプの発作を指し、「Ｂ」は正常な脳活動を指し、「Ｃ」は第２のタイプの発作を指し得る。したがって、アルゴリズムは、クラスタ５０２内の第１のタイプの発作を示すすべてのＥＥＧ、クラスタ５０４内の正常な脳活動を示すすべてのＥＥＧ、およびクラスタ５０６内の第２のタイプの発作を示すすべてのＥＥＧをプロットし得る。並び替えの正確な性質が恣意的であるように見えたり、よく理解されていない（例えば、Ｘ軸およびＹ軸がデータのよく知られたプロパティに対応していない）場合でも、アルゴリズムがＥＥＧを確実に並び替えていることを確認できる。したがって、同じアルゴリズムが、クラスタ５０２内のラベル付けされていないＥＥＧをプロットする場合、ラベル付けされていないＥＥＧは、第１のタイプの発作を示していると推測できる。 As a non-limiting example, "A" may refer to a first type of seizure, "B" may refer to normal brain activity, and "C" may refer to a second type of seizure. Therefore, the algorithm shows all EEGs indicating a first type of seizure in cluster 502, all EEGs indicating normal brain activity in cluster 504, and all EEGs indicating a second type of seizure in cluster 506. EEG can be plotted. Even if the exact nature of the sort seems arbitrary or is not well understood (eg, the X-axis and Y-axis do not correspond to well-known properties of the data), the algorithm does EEG. You can confirm that they are sorted reliably. Therefore, if the same algorithm plots unlabeled EEGs within cluster 502, it can be inferred that the unlabeled EEGs indicate a first type of seizure.

図６は、本開示のいくつかの実施形態と一致する、ラベル付けされていない入力データから導出された匿名化された特徴データに基づいて機械学習モデルを更新する高レベルの方法６００を示す。一般に、方法６００は、方法４００で実行される動作と実質的に同様であり得るいくつかの動作を含む。しかし、方法６００は、それ自体が収集されたデータ（すなわち、入力データ）に基づく、ラベル付けされていない「匿名化された」特徴（すなわち、特徴データ）に基づいてモデルを再トレーニングする。これは、例えば、方法６００を実行するデバイスによって収集されたデータが、個人識別情報（ＰＩＩ）、健康情報などの機密性の高いものであり得る場合に有利であり得る。 FIG. 6 shows a high level method 600 for updating a machine learning model based on anonymized feature data derived from unlabeled input data, consistent with some embodiments of the present disclosure. In general, method 600 includes some actions that may be substantially similar to those performed by method 400. However, Method 600 retrains the model based on unlabeled "anonymized" features (ie, feature data) based on the data itself collected (ie, input data). This can be advantageous, for example, if the data collected by the device performing method 600 can be highly confidential, such as personally identifiable information (PII), health information, and the like.

方法６００は、動作６０２で、ラベル付けされた入力データおよびラベル付けされていない入力データを取得することを備える。動作６０２は、例えば、１つまたは複数のセンサを介してラベル付けされていないデータを記録することと、オンラインデータベースからラベル付けされたデータセットをダウンロードすることと、ローカルストレージデバイスからデータセットをロードすることなどを含み得る。例えば、動作６０２は、電極のセットからラベル付けされていない脳波記録データを受信すること、および公開データベースから１つまたは複数のラベル付けされたＥＥＧデータセットをダウンロードすることを含み得る。 Method 600 comprises acquiring labeled and unlabeled input data in operation 602. Operation 602, for example, records unlabeled data through one or more sensors, downloads a labeled dataset from an online database, and loads the dataset from a local storage device. It can include things like doing. For example, operation 602 may include receiving unlabeled EEG data from a set of electrodes and downloading one or more labeled EEG data sets from a public database.

方法６００は、動作６０４で、ラベル付けされていない入力データを機械学習モデルに入力することをさらに備える。動作６０４は、方法６００を実行するデバイス上で実行される機械学習モデルなどの機械学習モデルの通常の動作と実質的に同様の方法で（またはその一部として）実行され得る。 Method 600 further comprises inputting unlabeled input data into the machine learning model in operation 604. The operation 604 may be performed in a manner substantially similar to (or as part of) the normal operation of a machine learning model, such as a machine learning model performed on a device performing method 600.

方法６００は、動作６０６で、ラベル付けされていない特徴データ（すなわち、１つまたは複数のラベル付けされていない特徴）を生成することをさらに備える。機械学習モデルは、複数の「中間」層を含み得て、各中間層は、入力特徴を受信し、それらを操作し、出力特徴を生成して、後続の層に入力する。動作６０６は、これらの層のうちの１つから出力された特徴を受信することを含み得る。一般に、特徴自体は比較的「抽象的」であり得るので、機械学習モデルの次の層への入力として使用される以外は、本質的に難解で／無意味であり得る。これは、機械学習モデルの（様々な層を繰り返し調整することによる）トレーニング方法の性質の影響である。人間の観察者にとって、様々な機械学習モデルにとっても、中間層（「隠し層」と呼ばれることもある）は「ブラックボックス」として機能する。ただし、これらは通常、決定論的であり（ただし、これはモデルによって異なり得る）、入力データの傾向は一般に、これらの中間特徴に反映される。したがって、機械学習モデルの中間特徴は、生データが機密性の高いものであり得る場合に、生データの実行可能な代替手段を提示する。さらに、モデルの高レベル層または中間層で生成された特徴は、生の入力データ（例えば、エッジ、コーナー、均質パッチなど）から有用な情報をキャプチャし得て、したがって、入力データと比較して、ターゲットクラスを区別するためにより効果的であり得る。いくつかの実施形態では、機械学習モデルは、その典型的な出力に加えて、またはその代わりに、特定の特徴ベクトルを出力するように構成され得る。例えば、５０層を有する機械学習モデルは、最終出力に加えて、第３７層によって生成された特徴ベクトルを出力するように構成され得る。次いで、第３７層の特徴ベクトルを、動作６０６の「生成されたラベル付けされていない特徴データ」として利用し得る。 Method 600 further comprises generating unlabeled feature data (ie, one or more unlabeled features) in operation 606. A machine learning model can include multiple "intermediate" layers, each of which receives input features, manipulates them, produces output features, and inputs them to subsequent layers. Operation 606 may include receiving features output from one of these layers. In general, the features themselves can be relatively "abstract" and can be esoteric / meaningless in nature except when used as an input to the next layer of a machine learning model. This is an effect of the nature of the training method (by repeatedly adjusting different layers) of the machine learning model. For human observers, and for various machine learning models, the middle layer (sometimes called the "hidden layer") acts as a "black box." However, they are usually deterministic (although this can vary from model to model), and input data trends are generally reflected in these intermediate features. Therefore, an intermediate feature of a machine learning model presents a viable alternative to raw data when it can be sensitive. In addition, features generated in the high-level or middle layers of the model can capture useful information from raw input data (eg edges, corners, homogeneous patches, etc.) and therefore compared to the input data. , Can be more effective in distinguishing target classes. In some embodiments, the machine learning model may be configured to output a particular feature vector in addition to or instead of its typical output. For example, a machine learning model with 50 layers may be configured to output the feature vector generated by layer 37 in addition to the final output. The feature vector of layer 37 can then be used as the "generated unlabeled feature data" of operation 606.

方法６００は、動作６０８で、ラベル付けされた入力データを同じ機械学習モデルに入力することをさらに備える。動作６０８は、動作６０４と実質的に同様の方法で実行され得る。各入力データポイントのラベルは、対応する生成された特徴データのラベル付けを可能にするために、保持またはキャッシュされ得る。 Method 600 further comprises inputting the labeled input data into the same machine learning model in operation 608. The operation 608 can be performed in substantially the same manner as the operation 604. The label for each input data point can be retained or cached to allow labeling of the corresponding generated feature data.

方法６００は、動作６１０でラベル付けされた特徴データを生成することをさらに備える。動作６１０は、動作６０６と実質的に同様の方法で実行され得る。さらに、生成された特徴は、対応する入力データに関連付けられたラベルに基づいてラベル付けされ得る。 Method 600 further comprises generating feature data labeled with operation 610. The operation 610 can be performed in substantially the same manner as the operation 606. In addition, the generated features can be labeled based on the label associated with the corresponding input data.

方法６００は、動作６１２でのラベル付けされた特徴とラベル付けされていない特徴との間の比較に基づいて、ラベル付けされていない特徴データにラベル付けることをさらに備える。動作６１２は、例えば、方法４００の動作４０６と実質的に同様の方法で、ラベル付けされていない特徴とラベル付けされた特徴とを比較し、ラベル付けされていない特徴とラベル付けされた特徴との間の差を計算することを含み得る。動作６１２は、方法４００の動作４０８と実質的に同様の方法で、動作６０６を介して生成されたラベル付けされていない特徴に含まれる各特徴のラベルを判断することをさらに含み得る。様々な特徴は、特徴が生成された最初の入力データとは大幅に異なり得るが、機械学習モデルの性質により、ラベル付けされた特徴とラベル付けされていない特徴の比較は依然として同様に有用であり、（図５を参照して論じた例と同様に、特徴をクラスタにプロットすることによって、などの）同様の方法でラベル付けされていない特徴データにラベル付けすることが可能である。 Method 600 further comprises labeling unlabeled feature data based on a comparison between the labeled feature and the unlabeled feature in operation 612. Operation 612 compares unlabeled features with labeled features and with unlabeled features and labeled features, for example, in a manner substantially similar to operation 406 of method 400. May include calculating the difference between. The operation 612 may further include determining the label of each feature contained in the unlabeled features generated via the operation 606 in a manner substantially similar to the operation 408 of the method 400. The various features can differ significantly from the initial input data from which the features were generated, but due to the nature of the machine learning model, comparing labeled and unlabeled features is still equally useful. , (Similar to the example discussed with reference to FIG. 5, by plotting features into clusters, etc.), it is possible to label unlabeled feature data in a similar manner.

方法６００は、動作６１４で新たにラベル付けされた（「弱く」ラベル付けされた）特徴データを介して機械学習モデルを再トレーニングすることをさらに備える。いくつかの実施形態では、動作６１４は、強くラベル付けされた特徴データおよび１つまたは複数の教師モデルからの特徴データを介した再トレーニングをさらに含み得る。動作６１４は、例えば、モデルの「次の層」に弱くラベル付けされた特徴を入力することを含み得る。一例として、動作６０６で生成された時点でラベル付けされた特徴データは、５０層モデルの第３７層から出力され得る。動作６１４は、弱くラベル付けされた特徴データをモデルの第３８層に入力することと、モデルから出力を受信することと、出力とラベルとの間の誤差を計算することと、誤差に基づいてモデルの層３８～５０の１つまたは複数を調整することと、誤差が減少しなくなるまで繰り返すことを含み得る。いくつかの実施形態では、動作６１４は、弱くラベル付けされた特徴データを使用してトレーニングすることに加えて、強くラベル付けされたデータセットを使用してモデルをトレーニングすることをさらに有する。 Method 600 further comprises retraining the machine learning model via the newly labeled (“weakly” labeled) feature data in motion 614. In some embodiments, motion 614 may further include retraining via feature data that is strongly labeled and feature data from one or more teacher models. Operation 614 may include, for example, entering a weakly labeled feature in the "next layer" of the model. As an example, the feature data labeled at the time generated in operation 606 can be output from layer 37 of the 50 layer model. Operation 614 inputs weakly labeled feature data into layer 38 of the model, receives output from the model, calculates the error between the output and the label, and is based on the error. It may include adjusting one or more of layers 38-50 of the model and repeating until the error is no longer reduced. In some embodiments, motion 614 further comprises training the model with a strongly labeled dataset, in addition to training with weakly labeled feature data.

動作６１４は、再トレーニングによって性能が向上し得ない（または性能が低下し得る）可能性を考慮して、モデルの「ライブ」バージョンを変更せずに、モデルの「作業コピー」に対して実行し得る。次いで、モデルは、必要に応じて、方法４００の動作４１２～４１６と実質的に同様の方法で評価および更新され得る。いくつかの実施形態では、強くラベル付けされたデータを教師モデルに入力して教師出力を生成し得て、強くラベル付けされたデータと弱くラベル付けされたデータとの結合を使用して、教師出力と併せて学生モデルを再トレーニングし得る。いくつかの実施形態では、動作６１４は、方法１００を介したような、複合知識蒸留を使用した再トレーニングを含み得る。 Operation 614 is performed on a "working copy" of the model without changing the "live" version of the model, taking into account the possibility that retraining may not improve (or reduce) performance. Can be. The model can then be evaluated and updated as needed in a manner substantially similar to the operations 412-416 of Method 400. In some embodiments, strongly labeled data can be input into the teacher model to generate teacher output, using the combination of strongly labeled data and weakly labeled data to teach. The student model can be retrained along with the output. In some embodiments, operation 614 may include retraining using complex knowledge distillation, such as via method 100.

図７は、本開示のいくつかの実施形態と一致する、トランスダクティブ学習を介したラベル付けされていない入力データ（センサデータなど）のラベル付けの例を示すダイアグラム７００である。ダイアグラム７００は、方法６００がどのように実行できるかの例示的な例として提示される。ラベル付けされていない入力データ７０２は、１つまたは複数のセンサからなどの観察を介して収集され得る。ラベル付けされていない入力データ７０２は、機械学習モデル７０１に入力される。モデル７０１は、ラベル付けされていない特徴データ７１２を出力し得る。 FIG. 7 is a diagram 700 showing an example of labeling of unlabeled input data (such as sensor data) via inductive learning, consistent with some embodiments of the present disclosure. Diagram 700 is presented as an exemplary example of how Method 600 can be performed. The unlabeled input data 702 may be collected via observations such as from one or more sensors. The unlabeled input data 702 is input to the machine learning model 701. Model 701 may output unlabeled feature data 712.

既知のデータ７０４は、第１のラベル付けされた入力データ７０６および第２のラベル付けされた入力データ７０８などの複数のデータセットを含み得る。入力データセット７０６および７０８の両方がモデル７０１に入力され、その結果、それぞれ、ラベル付けされた特徴データ７１６およびラベル付けされた特徴データ７１８が得られる。明確にするために、モデル７０１は、データセットごとに複数回実行され得る。例えば、入力データセット７０２、７０６、および７０８のそれぞれは、５つのデータポイントを含み得る。したがって、モデル７０１は、ラベル付けされていない入力データセット７０２のラベル付けされていない入力データポイントごとに１回（５つの特徴７１２が得られる）、既知の入力データセット７０６のラベル付けされた入力データポイントごとに１回（５つのラベル付けされた特徴７１６が得られる）、および既知の入力データセットのラベル付けされた入力データポイント７０８ごとに１回（５つのラベル付けされた特徴７１８が得られる）の１５回実行され得る。モデル７０１のこれらの反復は、任意の順序で実行し得る。 The known data 704 may include a plurality of datasets, such as a first labeled input data 706 and a second labeled input data 708. Both input data sets 706 and 708 are input to model 701, resulting in labeled feature data 716 and labeled feature data 718, respectively. For clarity, model 701 may be run multiple times per dataset. For example, each of the input datasets 702, 706, and 708 may contain five data points. Therefore, the model 701 has one labeled input in the known input data set 706, once for each unlabeled input data point in the unlabeled input data set 702 (five features 712 are obtained). Once per data point (5 labeled features 716 are obtained) and once per labeled input data point 708 of a known input dataset (5 labeled features 718 are obtained) Can be performed 15 times. These iterations of model 701 can be performed in any order.

特に、モデル７０１は通常、分類（図７には図示せず）を出力するように構成され得るが、モデル７０１を更新する目的で、モデル７０１は、分類に加えて／分類の代わりに特定の中間特徴（ラベル付けされていない特徴データ７１２など）を出力するように再構成され得る。この再構成は一時的であり得るか、または永続的であり得る。いくつかの実施形態では、システムは、モデル７０１を再構成することなく、層の出力をフェッチするように構成され得る。特定の特徴データの出力は、同じ試行内の各データセット（未知および既知）に同じ特徴が利用されている限り、更新の試行の間で異なり得る。例えば、モデル７０１が第４２層によって生成された特徴ベクトルを出力するように構成され得る、方法６００の第１の反復などの第１の更新の試行が実行され得る。モデル７０１が第３７層によって生成された特徴ベクトルを出力するように構成される、方法６００の第２の反復などの第２の更新の試行が実行され得る。 In particular, the model 701 can usually be configured to output a classification (not shown in FIG. 7), but for the purpose of updating the model 701, the model 701 is specific in addition to / classification instead of classification. It may be reconfigured to output intermediate features (such as unlabeled feature data 712). This reconstruction can be temporary or permanent. In some embodiments, the system may be configured to fetch the output of the layer without reconfiguring the model 701. The output of specific feature data can vary between update attempts as long as the same features are utilized for each dataset (unknown and known) within the same trial. For example, a first update attempt, such as a first iteration of method 600, in which model 701 may be configured to output the feature vector generated by layer 42, may be performed. A second update attempt, such as a second iteration of method 600, in which the model 701 is configured to output the feature vector generated by layer 37 may be performed.

ラベル付けされていない特徴データ７１２、強くラベルされた特徴データ７１６、および強くラベルされた特徴データ７１８は、変換アルゴリズム７０５に入力される。変換アルゴリズム７０５は、例えば、特徴７１２～７１６のそれぞれをクラスタに編成し得て、各クラスタは、特定のラベルを有する特徴を含む。したがって、ラベル付けされていない特徴７１２のそれぞれは、推論に基づいて（すなわち、特徴７１２のそれぞれがプロットされるクラスタに基づいて）ラベル付けされ得る。この結果、「弱く」ラベル付けされた特徴７２２が得られ、これが、強くラベル付けされた特徴７２６および強くラベル付けされた特徴７２８と結合して、再トレーニング特徴７２０を形成し、これを使用してモデル７０１を再トレーニングおよび評価することができる。 The unlabeled feature data 712, the strongly labeled feature data 716, and the strongly labeled feature data 718 are input to the conversion algorithm 705. The conversion algorithm 705 may, for example, organize each of the features 712-716 into clusters, each cluster containing a feature with a particular label. Thus, each of the unlabeled features 712 can be labeled based on inference (ie, based on the cluster in which each of the features 712 is plotted). The result is a "weakly" labeled feature 722, which is combined with a strongly labeled feature 726 and a strongly labeled feature 728 to form a retraining feature 720, which is used. The model 701 can be retrained and evaluated.

ここで図８を参照すると、例えば、方法１００、４００、および６００を含む、本開示の様々な態様を実行するように構成され得る例示的なコンピュータシステム８００の高レベルのブロック図が示される。例示的なコンピュータシステム８００は、本開示の実施形態による、（例えば、コンピュータの１つまたは複数のプロセッサ回路またはコンピュータプロセッサを使用する）本明細書に記載の１つまたは複数の方法またはモジュール、および任意の関連する機能または動作を実装する際に使用され得る。いくつかの実施形態では、コンピュータシステム８００の主要な構成要素は、１つまたは複数のＣＰＵ８０２、メモリサブシステム８０８、端末インターフェース８１６、記憶インターフェース８１８、Ｉ／Ｏ（入力／出力）デバイスインターフェース８２０、およびネットワークインターフェース８２２は、これらすべてが、メモリバス８０６、Ｉ／Ｏバス８１４、およびＩ／Ｏバスインターフェースユニット８１２を介した構成要素間通信のために、直接的または間接的に通信可能に結合され得る。 Referring here to FIG. 8, a high level block diagram of an exemplary computer system 800 that may be configured to perform various aspects of the present disclosure, including, for example, methods 100, 400, and 600, is shown. An exemplary computer system 800 is the one or more methods or modules described herein (eg, using one or more processor circuits or computer processors of a computer), and the embodiments of the present disclosure. It can be used to implement any related function or behavior. In some embodiments, the main components of the computer system 800 are one or more CPUs 802, a memory subsystem 808, a terminal interface 816, a storage interface 818, an I / O (input / output) device interface 820, and The network interface 822 may all be communicable directly or indirectly for inter-component communication via the memory bus 806, I / O bus 814, and I / O bus interface unit 812. ..

コンピュータシステム８００は、１つまたは複数の汎用プログラマブル中央処理装置（ＣＰＵ）８０２を含み得て、その一部または全部は、本明細書ではＣＰＵ８０２と総称する１つまたは複数のコア８０４Ａ、８０４Ｂ、８０４Ｃ、および８０４Ｄを含み得る。いくつかの実施形態では、コンピュータシステム８００は、比較的大規模なシステムに典型的な複数のプロセッサを備え得るが、他の実施形態では、コンピュータシステム８００は、代替的に、単一のＣＰＵシステムであり得る。各ＣＰＵ８０２は、ＣＰＵコア８０４上のメモリサブシステム８０８に記憶された命令を実行し得て、１つまたは複数のレベルのオンボードキャッシュを含み得る。 The computer system 800 may include one or more general purpose programmable central processing units (CPUs) 802, some or all of which are cores 804A, 804B, 804C, collectively referred to herein as CPU802. , And 804D may be included. In some embodiments, the computer system 800 may include multiple processors typical of a relatively large system, whereas in other embodiments, the computer system 800 is an alternative, single CPU system. Can be. Each CPU 802 may execute instructions stored in the memory subsystem 808 on the CPU core 804 and may include one or more levels of onboard cache.

いくつかの実施形態では、メモリサブシステム８０８は、データおよびプログラムを記憶するためのランダムアクセス半導体メモリ、ストレージデバイス、またはストレージ媒体（揮発性または不揮発性のいずれか）を含み得る。いくつかの実施形態では、メモリサブシステム８０８は、コンピュータシステム８００の仮想メモリ全体を表し得て、コンピュータシステム８００に結合された、またはネットワークを介して接続された他のコンピュータシステムの仮想メモリも含み得る。メモリサブシステム８０８は、概念的には単一のモノリシックエンティティであり得るが、いくつかの実施形態では、メモリサブシステム８０８は、キャッシュおよび他のメモリデバイスの階層など、より複雑な配置であり得る。例えば、メモリは複数レベルのキャッシュに存在し得て、これらのキャッシュは機能によってさらに分割され得るため、あるキャッシュは命令を保持し、別のキャッシュは１つまたは複数のプロセッサによって使用される、非命令データを保持する。様々ないわゆる不均一メモリアクセス（ＮＵＭＡ）コンピュータアーキテクチャのいずれかで知られているように、メモリは異なるＣＰＵまたはＣＰＵのセットにさらに分散され、関連付けられ得る。いくつかの実施形態では、メインメモリまたはメモリサブシステム８０４は、ＣＰＵ８０２によって使用されるメモリの制御およびフローのための要素を含み得る。これは、メモリコントローラ８１０を含み得る。 In some embodiments, the memory subsystem 808 may include a random access semiconductor memory, a storage device, or a storage medium (either volatile or non-volatile) for storing data and programs. In some embodiments, the memory subsystem 808 may represent the entire virtual memory of the computer system 800 and may also include virtual memory of another computer system coupled to or connected via a network to the computer system 800. obtain. The memory subsystem 808 can be conceptually a single monolithic entity, but in some embodiments the memory subsystem 808 can be a more complex arrangement, such as a hierarchy of caches and other memory devices. .. For example, memory can reside in multiple levels of cache, and these caches can be further divided by function, so one cache holds instructions and another cache is used by one or more processors, non. Holds instruction data. As is known in any of the various so-called non-uniform memory access (NUMA) computer architectures, memory can be further distributed and associated with different CPUs or sets of CPUs. In some embodiments, the main memory or memory subsystem 804 may include elements for control and flow of memory used by the CPU 802. This may include a memory controller 810.

メモリバス８０６は、ＣＰＵ８０２、メモリサブシステム８０８、およびＩ／Ｏバスインターフェース８１２の間の直接通信パスを提供する単一のバス構造として図８に示されているが、メモリバス８０６は、いくつかの実施形態では、階層構成、スター構成、またはウェブ構成のポイントツーポイントリンク、複数の階層バス、並列および冗長パス、または任意のその他の適切なタイプの構成など、様々な形式のいずれかに配置され得る、複数の異なるバスまたは通信パスを含み得る。さらに、Ｉ／Ｏバスインターフェース８１２およびＩ／Ｏバス８１４は、単一のそれぞれのユニットとして示されているが、コンピュータシステム８００は、いくつかの実施形態では、複数のＩ／Ｏバスインターフェースユニット８１２、複数のＩ／Ｏバス８１４、またはその両方を含み得る。さらに、Ｉ／Ｏバス８１４を様々なＩ／Ｏデバイスに向かう様々な通信経路から分離する複数のＩ／Ｏインターフェースユニットが示されているが、他の実施形態では、Ｉ／Ｏデバイスの一部または全部が、１つまたは複数のシステムＩ／Ｏバスに直接接続され得る。 Although the memory bus 806 is shown in FIG. 8 as a single bus structure that provides a direct communication path between the CPU 802, the memory subsystem 808, and the I / O bus interface 812, there are several memory buses 806. In an embodiment of, it is placed in one of various forms, such as a hierarchical, star, or web-configured point-to-point link, multiple hierarchical buses, parallel and redundant paths, or any other suitable type of configuration. It may contain several different buses or communication paths. Further, while the I / O bus interface 812 and the I / O bus 814 are shown as a single unit, the computer system 800, in some embodiments, is a plurality of I / O bus interface units 812. , Multiple I / O buses 814, or both. Further, a plurality of I / O interface units are shown that separate the I / O bus 814 from various communication paths to various I / O devices, but in other embodiments, some of the I / O devices. Or all may be directly connected to one or more system I / O buses.

いくつかの実施形態では、コンピュータシステム８００は、マルチユーザメインフレームコンピュータシステム、シングルユーザシステム、または直接ユーザインターフェースをほとんどまたは全く有していないが、他のコンピュータシステム（クライアント）からの要求を受信するサーバコンピュータまたは同様のデバイスであり得る。さらに、いくつかの実施形態では、コンピュータシステム８００は、デスクトップコンピュータ、ポータブルコンピュータ、ラップトップまたはノートブックコンピュータ、タブレットコンピュータ、ポケットコンピュータ、電話、スマートフォン、モバイルデバイス、または任意の他の適切なタイプの電子デバイスとして実装され得る。 In some embodiments, the computer system 800 has little or no multi-user mainframe computer system, single user system, or direct user interface, but receives requests from other computer systems (clients). It can be a server computer or a similar device. Further, in some embodiments, the computer system 800 is a desktop computer, portable computer, laptop or notebook computer, tablet computer, pocket computer, telephone, smartphone, mobile device, or any other suitable type of electronic. Can be implemented as a device.

図８は、例示的なコンピュータシステム８００の代表的な主要構成要素を示すことを意図していることに留意されたい。しかし、いくつかの実施形態では、個々の構成要素は、図８に示すよりも多かれ少なかれ複雑になり得たり、図８に示すもの以外の構成要素またはその構成要素に加えて構成要素が存在し得たり、そのような構成要素の数、タイプ、および構成が変化し得たりする。 It should be noted that FIG. 8 is intended to show the representative key components of an exemplary computer system 800. However, in some embodiments, the individual components can be more or less complex than those shown in FIG. 8, or there are components other than those shown in FIG. 8 or in addition to those components. It can be obtained or the number, type, and composition of such components can vary.

本発明は、任意の可能な技術的詳細レベルの統合におけるシステム、方法、および／またはコンピュータプログラム製品であり得る。コンピュータプログラム製品は、プロセッサに本発明の態様を実行させるためのコンピュータ可読プログラム命令をその上に有するコンピュータ可読記憶媒体（または複数の媒体）を含み得る。 The present invention can be a system, method, and / or computer program product in any possible level of technical detail integration. The computer program product may include a computer-readable storage medium (or a plurality of media) having computer-readable program instructions on it for causing the processor to perform aspects of the invention.

コンピュータ可読記憶媒体は、命令実行デバイスが使用するための命令を保持および記憶できる有形のデバイスであり得る。コンピュータ可読記憶媒体は、例えば、限定されないが、電子ストレージデバイス、磁気ストレージデバイス、光ストレージデバイス、電磁ストレージデバイス、半導体ストレージデバイス、または前述の任意の適切な組み合わせであり得る。コンピュータ可読記憶媒体のより具体的な例の非網羅的なリストには、ポータブルコンピュータディスケット、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、読み取り専用メモリ（ＲＯＭ）、消去可能プログラマブル読み取り専用メモリ（ＥＰＲＯＭまたはフラッシュメモリ）、静的ランダムアクセスメモリ（ＳＲＡＭ）、ポータブルコンパクトディスク読み取り専用メモリ（ＣＤ－ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ）、メモリスティック、フロッピー（登録商標）ディスク、その上に命令が記録されたパンチカードまたは溝に浮き彫りされた構造などの機械的にエンコードされたデバイス、および前述の任意の適切な組み合わせが含まれる。本明細書で使用されるコンピュータ可読記憶媒体は、電波または他の自由に伝播する電磁波、導波管または他の伝送媒体を通って伝播する電磁波（例えば、光ファイバーケーブルを通過する光パルス）、または電線を通じて伝送される電気信号など、それ自体が一時的な信号であると解釈されるべきではない。 The computer-readable storage medium can be a tangible device capable of holding and storing instructions for use by the instruction executing device. The computer-readable storage medium can be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination described above. A non-exhaustive list of more specific examples of computer-readable storage media includes portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory). ), Static Random Access Memory (SRAM), Portable Compact Disk Read-Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Flop (Registered Trademark) Disk, Punch with instructions recorded on it Includes mechanically encoded devices such as cards or embossed structures in grooves, and any suitable combination described above. The computer-readable storage medium used herein is an electromagnetic wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (eg, an optical pulse through an optical fiber cable), or. It should not be construed as a temporary signal in itself, such as an electrical signal transmitted through an electric wire.

本明細書に記載のコンピュータ可読プログラム命令は、コンピュータ可読記憶媒体からそれぞれのコンピューティング／処理デバイスに、またはネットワーク、例えば、インターネット、ローカルエリアネットワーク、ワイドエリアネットワークもしくはワイヤレスネットワークまたはその両方を介して外部コンピュータまたは外部ストレージデバイスにダウンロードできる。ネットワークは、銅伝送ケーブル、光伝送ファイバ、無線伝送、ルータ、ファイアウォール、スイッチ、ゲートウェイコンピュータ、もしくはエッジサーバまたはその組み合わせを含み得る。各コンピューティング／処理デバイス内のネットワークアダプタカードまたはネットワークインターフェースは、ネットワークからコンピュータ可読プログラム命令を受信し、それぞれのコンピューティング／処理デバイス内のコンピュータ可読記憶媒体に記憶するためにコンピュータ可読プログラム命令を転送する。 The computer-readable program instructions described herein are external from a computer-readable storage medium to each computing / processing device, or via a network such as the Internet, local area network, wide area network, wireless network, or both. Can be downloaded to your computer or external storage device. The network may include copper transmission cables, optical transmission fibers, wireless transmissions, routers, firewalls, switches, gateway computers, or edge servers or combinations thereof. A network adapter card or network interface within each computing / processing device receives computer-readable program instructions from the network and transfers computer-readable program instructions for storage on a computer-readable storage medium within each computing / processing device. do.

本発明の動作を実行するためのコンピュータ可読プログラム命令は、アセンブラ命令、命令セットアーキテクチャ（ＩＳＡ）命令、機械命令、機械依存命令、マイクロコード、ファームウェア命令、状態設定データ、集積回路の構成データ、または、Ｓｍａｌｌｔａｌｋ（登録商標）、Ｃ＋＋などのオブジェクト指向プログラミング言語、および「Ｃ」プログラミング言語または同様のプログラミング言語などの手続型プログラミング言語を含む、１つまたは複数のプログラミング言語の任意の組み合わせで記述されたソースコードまたはオブジェクトコードのいずれかであり得る。コンピュータ可読プログラム命令は、全部がユーザのコンピュータ上で、一部がユーザのコンピュータ上で、スタンドアロンのソフトウェアパッケージとして、一部がユーザのコンピュータ上で一部がリモートコンピュータ上で、または全部がリモートコンピュータ上でもしくはサーバ上で実行され得る。後者のシナリオでは、リモートコンピュータは、ローカルエリアネットワーク（ＬＡＮ）またはワイドエリアネットワーク（ＷＡＮ）を含む任意のタイプのネットワークを介してユーザのコンピュータに接続され得るか、または、（例えば、インターネットサービスプロバイダを使用してインターネットを介して）外部コンピュータに接続され得る。いくつかの実施形態では、例えば、プログラマブルロジック回路、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、またはプログラマブルロジックアレイ（ＰＬＡ）を含む電子回路は、本発明の態様を実行するために、電子回路をパーソナライズするためのコンピュータ可読プログラム命令の状態情報を利用することによって、コンピュータ可読プログラム命令を実行し得る。 The computer-readable program instructions for performing the operations of the present invention are assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcodes, firmware instructions, state setting data, integrated circuit configuration data, or , Smalltalk®, object-oriented programming languages such as C ++, and any combination of programming languages, including procedural programming languages such as the "C" programming language or similar programming languages. It can be either source code or object code. Computer-readable program instructions are all on the user's computer, some on the user's computer, as a stand-alone software package, some on the user's computer, some on the remote computer, or all on the remote computer. It can be run on or on the server. In the latter scenario, the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or wide area network (WAN), or (eg, an internet service provider). Can be connected to an external computer (using via the Internet). In some embodiments, for example, an electronic circuit comprising a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA) is used to personalize the electronic circuit in order to carry out aspects of the invention. Computer-readable program instructions can be executed by using the state information of the computer-readable program instructions.

本発明の態様は、本発明の実施形態による方法、装置（システム）、およびコンピュータプログラム製品のフローチャート図および／またはブロック図を参照して本明細書に記載される。フローチャート図および／またはブロック図の各ブロック、およびフローチャート図および／またはブロック図のブロックの組み合わせは、コンピュータ可読プログラム命令によって実装できることが理解されよう。 Aspects of the invention are described herein with reference to flowcharts and / or block diagrams of methods, devices (systems), and computer program products according to embodiments of the invention. It will be appreciated that each block of the flowchart and / or block diagram, and the combination of blocks of the flowchart and / or block diagram, can be implemented by computer-readable program instructions.

これらのコンピュータ可読プログラム命令は、コンピュータのプロセッサ、または他のプログラマブルデータ処理装置に提供され、マシンを生成し得て、その結果、コンピュータまたは他のプログラマブルデータ処理装置のプロセッサを介して実行される命令が、フローチャートおよび／またはブロック図の１つまたは複数のブロックで指定された機能／動作を実施するための手段を作成する。これらのコンピュータ可読プログラム命令はまた、コンピュータ、プログラマブルデータ処理装置、および／または他のデバイスに特定の方法で機能するように指示できるコンピュータ可読記憶媒体内に記憶され得て、その結果、その中に記憶された命令を有するコンピュータ可読記憶媒体は、フローチャートおよび／またはブロック図の１つまたは複数のブロックで指定された機能／動作の態様を実施する命令を含む製品を備える。 These computer-readable program instructions are provided to the computer's processor, or other programmable data processing device, and can generate a machine, resulting in instructions executed through the computer or other programmable data processing device's processor. Creates means for performing a function / operation specified in one or more blocks of a flowchart and / or a block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that can instruct the computer, programmable data processor, and / or other device to function in a particular way, and as a result, in it. A computer-readable storage medium having stored instructions comprises a product comprising instructions that perform a mode of function / operation specified in one or more blocks of a flowchart and / or a block diagram.

コンピュータ可読プログラム命令はまた、コンピュータ、他のプログラマブルデータ処理装置、または他のデバイスにロードされ、コンピュータ、他のプログラマブル装置、または他のデバイス上で一連の動作ステップを実行させて、コンピュータ実装プロセスを生成し得て、その結果、コンピュータ、他のプログラマブル装置、または他のデバイス上で実行される命令が、フローチャートおよび／またはブロック図の１つまたは複数のブロックで指定された機能／動作を実装する。 Computer-readable program instructions can also be loaded onto a computer, other programmable data processor, or other device to perform a series of operating steps on the computer, other programmable device, or other device to complete the computer implementation process. As a result, the instructions that can be generated and executed on a computer, other programmable device, or other device implement the function / behavior specified in one or more blocks of the flowchart and / or block diagram. ..

図中のフローチャートおよびブロック図は、本発明の様々な実施形態によるシステム、方法、およびコンピュータプログラム製品の可能な実装のアーキテクチャ、機能、および動作を示す。これに関して、フローチャートまたはブロック図内の各ブロックは、指定された論理機能を実装するための１つまたは複数の実行可能命令を含む、モジュール、セグメント、または命令の一部を表し得る。いくつかの代替的な実装では、ブロックに示される機能は、図に示される順序とは異なり得る。例えば、連続して示される２つのブロックは、実際には、１つのステップとして実行され、部分的または全体的に時間的に重複する方法で、同時に、または実質的に同時に実行され得るか、または、関連する機能に応じてブロックが逆の順序で実行され得る。また、ブロック図および／またはフローチャート図の各ブロック、ならびにブロック図および／またはフローチャート図のブロックの組み合わせは、指定された機能または動作を実行する、または特別な目的のハードウェアとコンピュータ命令との組み合わせを実行する特別な目的のハードウェアベースのシステムによって実装できることにも留意されたい。 The flowcharts and block diagrams in the figure show the architecture, function, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or part of an instruction that contains one or more executable instructions for implementing a given logical function. In some alternative implementations, the functions shown in the blocks may differ from the order shown in the figure. For example, two blocks shown in succession can actually be performed as one step and can be performed simultaneously or substantially simultaneously in a partially or totally temporally overlapping manner, or , The blocks may be executed in reverse order depending on the associated function. Also, each block of the block diagram and / or the flowchart diagram, and the combination of the blocks of the block diagram and / or the flowchart diagram perform a specified function or operation, or a combination of special purpose hardware and computer instructions. Also note that it can be implemented by a special purpose hardware-based system that runs.

本開示の様々な実施形態の説明は、例示の目的で提示されているが、網羅的であること、または開示された実施形態に限定することを意図するものではない。多くの修正および変形は、記載する実施形態の範囲および精神から逸脱することなく、当業者に明らかになるであろう。本明細書で使用される用語は、実施形態の原理、実際の適用、または市場で見られる技術を越える技術的改善を説明するため、または他の当業者が本明細書に開示される実施形態を理解できるようにするために選択されたものである。 The description of the various embodiments of the present disclosure is presented for purposes of illustration but is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those of skill in the art without departing from the scope and spirit of the embodiments described. The terminology used herein is to describe the principles of the embodiment, the actual application, or a technological improvement beyond the technology found in the market, or embodiments disclosed herein by those of ordinary skill in the art. It was chosen to help you understand.

Claims

How to choose a machine learning model
At the stage of receiving the hardware specifications of the device,
At the stage of determining performance requirements based on the hardware specifications of the device,
At the stage of acquiring a set of machine learning models, the set of machine learning models is
A first machine learning model with a set of first layers,
With a teacher model, with stages,
A step of removing one or more layers of the first set of layers from the first machine learning model, with the result creating a student model with a set of pruned layers. ,
Training data and
Based on the teacher model
The stage of training the student model and
At the stage of evaluating the performance of the student model,
At the stage of comparing the performance of the student model with the performance requirements,
A method comprising: determining to select the model with the highest performance, at least based on the comparison steps.

The method of claim 1, further comprising receiving additional performance requirements from the user, wherein the comparison stage further comprises comparing the performance of the student model with the additional performance requirements.

The step of deleting one or more layers is
At the stage of determining the number of layers to be deleted based on the performance requirements,
The step of selecting the one or more layers based on the number, and
The method of claim 1 or 2, comprising the step of removing the selected layer.

The method of claim 3, wherein the one or more layers are randomly selected.

The step of deleting one or more layers has a step of deleting one or more blocks, and each block has a set of consecutive layers, according to any one of claims 1 to 4. The method described.

The training stage is
At the stage of inputting training data into the student model and the teacher model,
At the stage of receiving student characteristics from the student model,
The stage of receiving teacher features from the teacher model and
The stage of performing the first comparison between the student characteristics and the teacher characteristics, and
At the stage of receiving student output from the student model,
The stage of performing a second comparison between the student output and the training data,
One of claims 1-5, comprising the step of adjusting at least one layer of the set of pruned layers of the student model based on the first comparison and the second comparison. The method described.

The method according to any one of claims 1 to 6, wherein each of the set of machine learning models is selected from a model repository.

It ’s a system,
With memory
A central processing unit (CPU) coupled to a memory is provided, and the CPU is
Receiving the hardware specifications of the device and
Determining performance requirements based on the hardware specifications and
To acquire a set of machine learning models, wherein the set of machine learning models has a first machine learning model having a set of first layers and a teacher model.
Removing one or more layers of the first set of layers from the first machine learning model results in a first student model having a first set of pruned layers. Created, deleted, and
Training data and
Based on the teacher model
Training the first student model and
To evaluate the performance of the first student model and
Comparing the performance of the first student model with the performance requirements,
A system configured to execute instructions to determine and do to generate a second student model, at least on the basis of said comparison.

The CPU is further configured to receive additional performance requirements from the user, and the comparison further comprises comparing the performance of the first student model with the additional performance requirements. The system according to claim 8.

Removing the one or more layers can
Determining the number of layers to remove based on the performance requirements
Selecting the one or more layers based on the number,
The system of claim 8 or 9, wherein the selected layer is removed.

10. The system of claim 10, wherein the one or more layers are randomly selected.

13. The system described.

The training mentioned above
Inputting training data into the first student model and the teacher model,
Receiving student characteristics from the first student model and
Receiving teacher features from the teacher model and
Performing a first comparison between the student characteristics and the teacher characteristics,
Receiving student output from the first student model and
Performing a second comparison between the student output and the training data,
8-12, which comprises adjusting at least one layer of the first pruned layer set of the first student model based on the first comparison and the second comparison. The system according to any one of the above.

The system according to any one of claims 8 to 13, wherein each of the set of machine learning models is selected from a model repository.

On the computer
The procedure for receiving the hardware specifications of the device and
A procedure for determining performance requirements based on the hardware specifications of the device, and
A procedure for acquiring a set of machine learning models, wherein the set of machine learning models is
A first machine learning model with a set of first layers,
With the teacher model, the procedure to get,
A procedure for removing one or more layers of the first set of layers from the first machine learning model, resulting in a student model with a set of pruned layers being created and deleted. Procedure and
Training data and
Based on the teacher model
The procedure for training the student model and
The procedure for evaluating the performance of the student model and
A procedure for comparing the performance of the student model with the performance requirements,
A computer program that causes the decision to select the model with the best performance, at least based on the above comparisons.

15. 15. The listed computer program.

The procedure for removing one or more layers is
A procedure for determining the number of layers to be deleted based on the performance requirements and
The procedure for selecting the one or more layers based on the number and
15. The computer program of claim 15 or 16, comprising the procedure of removing the selected layer.

13. The listed computer program.

The training procedure is
The procedure for inputting training data into the student model and the teacher model,
The procedure for receiving student characteristics from the student model and
The procedure for receiving teacher features from the teacher model and
The procedure for performing the first comparison between the student characteristics and the teacher characteristics,
The procedure for receiving student output from the student model and
A procedure for performing a second comparison between the student output and the training data,
15. One of claims 15-18, comprising the procedure of adjusting at least one layer of the set of pruned layers of the student model based on the first comparison and the second comparison. Described computer program.

The computer program according to any one of claims 15 to 19, wherein each of the set of machine learning models is selected from a model repository.