JP2023011312A

JP2023011312A - Machine learning device, machine learning method, and machine learning program

Info

Publication number: JP2023011312A
Application number: JP2021115105A
Authority: JP
Inventors: 真季高見; Maki Takami; 英樹竹原; Hideki Takehara; 晋吾木田; Shingo Kida; 尹誠楊; Yincheng Yang
Original assignee: JVCKenwood Corp
Current assignee: JVCKenwood Corp
Priority date: 2021-07-12
Filing date: 2021-07-12
Publication date: 2023-01-24

Abstract

To provide a machine learning technique capable of improving classification accuracy and memory use efficiency in incremental learning that combines an initial session for performing clustering of a base class with an additional session for performing clustering of a new class.SOLUTION: A feature extraction unit 10 extracts a feature vector of an input sample using a learned model. A clustering unit 40 performs clustering of the input sample based on a feature vector of the input sample and a class to which the input sample belongs to generate a per-class centroid. The clustering unit 40 discriminates a parameter used in clustering between an initial session for clustering a sample of a training data set of a base class used in generating a learned model and an additional session for clustering a sample of a training data set of a new class to be newly added.SELECTED DRAWING: Figure 2

Description

本発明は、機械学習技術に関する。 The present invention relates to machine learning technology.

人間は長期にわたる経験を通して新しい知識を学習することができ、昔の知識を忘れないように維持することができる。一方、畳み込みニューラルネットワーク（Convolutional Neural Network(CNN)）の知識は学習に使用したデータセットに依存しており、データ分布の変化に適応するためにはデータセット全体に対してＣＮＮのパラメータの再学習が必要となる。ＣＮＮでは、新しいタスクについて学習していくにつれて、昔のタスクに対する推定精度は低下していく。このようにＣＮＮでは連続学習を行うと新しいタスクの学習中に昔のタスクの学習結果を忘れてしまう致命的忘却(catastrophic forgetting)が避けられない。 Humans can learn new knowledge through long-term experience, and can maintain old knowledge without forgetting it. On the other hand, the knowledge of the convolutional neural network (CNN) depends on the dataset used for training, and in order to adapt to changes in the data distribution, the CNN parameters must be retrained for the entire dataset. Is required. As the CNN learns about new tasks, its estimation accuracy for old tasks decreases. In this way, continuous learning in a CNN inevitably causes catastrophic forgetting, in which learning results of old tasks are forgotten during learning of new tasks.

致命的忘却を回避する手法として、学習済みモデルに対して新たなクラスを追加で学習させ、既存のクラスの分類とともに、新たなクラスの分類を可能にするクラスインクリメンタル学習が提案されている。 As a method for avoiding fatal forgetting, class incremental learning has been proposed, in which a new class is additionally learned to a trained model, and a new class can be classified together with the existing class.

特許文献１には、インクリメンタル学習のための認知的インスピレーションモデルとしてＣＢＣＬ（Centroid-Based Concept Learning）と呼ばれる手法が提案されている。ＣＢＣＬでは、データサンプルから抽出した特徴をクラスタリングしてセントロイド（重心）の形で保存し、近傍のセントロイドに基づく重み付き投票によりラベルの予測を行う。 Patent Document 1 proposes a technique called CBCL (Centroid-Based Concept Learning) as a cognitive inspiration model for incremental learning. In CBCL, features extracted from data samples are clustered and stored in the form of centroids, and label prediction is performed by weighted voting based on neighboring centroids.

A. Ayub and A. R. Wagner, "Cognitively-Inspired Model for Incremental Learning Using a Few Examples," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 897-906.A. Ayub and A. R. Wagner, "Cognitively-Inspired Model for Incremental Learning Using a Few Examples," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 897-906.

ＣＢＣＬでは、事前学習済みモデルを用いて新規クラスのトレーニングデータセットのサンプルをクラスタリングするが、事前学習済みモデルの生成に用いられたベースクラスのトレーニングデータセットのサンプルはクラスタリングしない。ＣＢＣＬをベースクラスのクラスタリングを行う初期セッションと新規クラスのクラスタリングを行う追加セッションを組み合わせたインクリメンタル学習に拡張する場合、初期セッションと追加セッションにおいてクラスタリングに用いるパラメータを共通にすると、ベースクラスのセントロイドの数と新規クラスのセントロイドの数に偏りが生じる可能性が高い。それにより、セントロイドに基づく投票がベースクラスと新規クラスとの間で公平に行われず、分類精度が悪化することが予想される。また、最適なセントロイド数はベースクラスと新規クラスとの間で差があると考えられるため、セントロイドの保存に要するメモリの使用にも無駄が生じることが予想される。 In CBCL, the pretrained model is used to cluster the samples of the new class training data set, but not the samples of the base class training data set used to generate the pretrained model. When extending CBCL to incremental learning that combines the initial session for clustering the base class and the additional session for clustering the new class, if the parameters used for clustering are common in the initial session and the additional session, the centroid of the base class There is likely to be a bias in the numbers and the number of centroids in the new class. As a result, it is expected that the centroid-based voting will not be conducted fairly between the base class and the new class, and the classification accuracy will deteriorate. In addition, since the optimal number of centroids is considered to differ between the base class and the new class, it is expected that memory usage required for storing the centroids will also be wasted.

本発明はこうした状況に鑑みてなされたものであり、その目的は、ベースクラスのクラスタリングを行う初期セッションと新規クラスのクラスタリングを行う追加セッションを組み合わせたインクリメンタル学習において分類精度の向上とメモリ使用の効率化を図ることができる機械学習技術を提供することにある。 The present invention has been made in view of this situation, and its object is to improve classification accuracy and memory usage efficiency in incremental learning that combines an initial session for clustering the base class and an additional session for clustering the new class. The purpose is to provide a machine learning technology that can realize

上記課題を解決するために、本発明のある態様の機械学習装置は、学習済みモデルを用いて入力サンプルの特徴ベクトルを抽出する特徴抽出部と、前記入力サンプルの特徴ベクトルおよび前記入力サンプルが属するクラスに基づき、前記入力サンプルをクラスタリングしてクラス毎のセントロイドを生成するクラスタリング部と含む。前記クラスタリング部は、学習済みモデルを生成する際に用いられたベースクラスのトレーニングデータセットのサンプルをクラスタリングする初期セッションと、新しく追加する新規クラスのトレーニングデータセットのサンプルをクラスタリングする追加セッションとで、クラスタリングで用いるパラメータを異ならせる。 In order to solve the above problems, a machine learning device according to one aspect of the present invention includes a feature extraction unit that extracts a feature vector of an input sample using a trained model, and a feature vector of the input sample and the input sample to which the feature vector belongs. a clustering unit that clusters the input samples based on the classes to generate centroids for each class. The clustering unit includes an initial session for clustering the base class training data set samples used to generate the trained model, and an additional session for clustering the newly added new class training data set samples, Different parameters are used for clustering.

本発明の別の態様は、機械学習方法である。この方法は、学習済みモデルを用いて入力サンプルの特徴ベクトルを抽出する特徴抽出ステップと、前記入力サンプルの特徴ベクトルおよび前記入力サンプルが属するクラスに基づき、前記入力サンプルをクラスタリングしてクラス毎のセントロイドを生成するクラスタリングステップとを含む。前記クラスタリングステップは、学習済みモデルを生成する際に用いられたベースクラスのトレーニングデータセットのサンプルをクラスタリングする初期セッションと、新しく追加する新規クラスのトレーニングデータセットのサンプルをクラスタリングする追加セッションとで、クラスタリングで用いるパラメータを異ならせる。 Another aspect of the invention is a machine learning method. This method includes a feature extraction step of extracting a feature vector of an input sample using a trained model, and clustering the input sample based on the feature vector of the input sample and the class to which the input sample belongs to cluster the cents of each class. and a clustering step to generate the roids. The clustering step includes an initial session for clustering the base class training data set samples used in generating the trained model and an additional session for clustering the newly added new class training data set samples, Different parameters are used for clustering.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 Any combination of the above constituent elements, and any conversion of expressions of the present invention into methods, devices, systems, recording media, computer programs, etc. are also effective as embodiments of the present invention.

本発明によれば、ベースクラスのクラスタリングを行う初期セッションと新規クラスのクラスタリングを行う追加セッションを組み合わせたインクリメンタル学習において分類精度の向上とメモリ使用の効率化を図ることができる。 According to the present invention, classification accuracy can be improved and memory usage efficiency can be improved in incremental learning in which an initial session for clustering a base class and an additional session for clustering a new class are combined.

第１の実施の形態のインクリメンタル学習の処理の流れを説明する図である。FIG. 4 is a diagram for explaining the flow of incremental learning processing according to the first embodiment; 第１の実施の形態に係る機械学習装置の構成図である。1 is a configuration diagram of a machine learning device according to a first embodiment; FIG. 図３（ａ）および図３（ｂ）は、図１の機械学習装置による機械学習手順を説明するフローチャートである。3(a) and 3(b) are flow charts for explaining the machine learning procedure by the machine learning apparatus of FIG. 第２の実施の形態に係る推論装置の構成図である。FIG. 11 is a configuration diagram of an inference device according to a second embodiment; 図４の推論装置による推論手順を説明するフローチャートである。5 is a flowchart for explaining an inference procedure by the inference device of FIG. 4;

（第１の実施の形態）
ＣＢＣＬでは、事前学習済みモデルを更新することなく特徴抽出器として使用する。ＣＢＣＬは、事前学習済みモデルを生成する際に用いた事前学習したクラス（「ベースクラス」と呼ぶ）を分類することは前提としておらず、事前学習とは関連のない新しいクラス（「新規クラス」と呼ぶ）を継続的に学習することを想定している。 (First embodiment)
CBCL uses pretrained models as feature extractors without updating them. CBCL is not premised on classifying pre-trained classes (called "base classes") used to generate pre-trained models, and new classes unrelated to pre-learning ("new classes" ) is assumed to be continuously learned.

一方で、一般にインクリメンタル学習の多くの手法は、ベースクラスの分類能力を保持したまま、追加で新規クラスの分類を可能とするように学習を行う。新規クラスの分類ができるようになるだけでなく、ベースクラスの分類精度も極端に悪くならないように、破滅的忘却の防止に留意したアルゴリズムが考案されている。 On the other hand, in general, many methods of incremental learning perform learning so as to enable additional new class classification while maintaining the classification ability of the base class. Algorithms have been devised that take into account the prevention of catastrophic forgetting so that not only can classification of new classes become possible, but also the classification accuracy of base classes does not deteriorate significantly.

ＣＢＣＬは、ベースクラスのクラスタリングを前提としたアルゴリズムではないが、第１の実施の形態では、ＣＢＣＬを一般のインクリメンタル学習のようにベースクラスと新規クラスの両方のクラスタリングに拡張することを前提とする。 CBCL is not an algorithm premised on base class clustering, but in the first embodiment, it is premised on extending CBCL to clustering of both base classes and new classes like general incremental learning. .

図１は、ＣＢＣＬをベースクラスのクラスタリングと新規クラスのクラスタリングを組み合わせたインクリメンタル学習に拡張した場合の処理の流れを説明する図である。 FIG. 1 is a diagram for explaining the flow of processing when CBCL is extended to incremental learning in which base class clustering and new class clustering are combined.

ベースクラスのトレーニングデータセット３００を用いて生成された事前学習済みモデルを特徴抽出器として予め選択する（Ｓ１０）。事前学習済みモデルは以降の処理において更新されず、固定である。 Pre-trained models generated using the base class training data set 300 are pre-selected as feature extractors (S10). The pre-trained model is not updated in subsequent processing and is fixed.

ベースクラスに対してＣＢＣＬのアルゴリズムを適用する一連の処理を「初期セッション」と呼ぶ。初期セッションではベースクラスのトレーニングデータセット３００を入力し、ベースクラスに対してクラス毎にセントロイドを生成する（Ｓ２０）。初期セッションに入力されるデータサンプルは、事前学習で使用したトレーニングデータのすべてまたは一部、あるいは事前学習で使用したトレーニングデータと同じクラスでかつ特徴の似たデータが好ましい。ベースクラスのクラス数は事前学習で分類可能なクラス数に影響される。 A series of processes for applying the CBCL algorithm to the base class is called an "initial session". In the initial session, a base class training data set 300 is input and a centroid is generated for each class for the base class (S20). The data sample input in the initial session is preferably all or part of the training data used in pre-learning, or data in the same class as the training data used in pre-learning and having similar features. The number of base classes is affected by the number of classes that can be classified by pre-learning.

新規クラスに対してＣＢＣＬのアルゴリズムを適用する一連の処理を「追加セッション」と呼ぶ。初期セッションの後、追加セッションではセッション毎に新規クラスのトレーニングデータセット３１０、３２０、３３０を入力し、新規クラスに対してクラス毎にセントロイドを生成する（Ｓ２１、Ｓ２２、Ｓ２３）。新規クラスのクラス数は限定しない。追加セッションはセッション単位で継続的に繰り返されることを前提としており、新規クラスＣ１の追加セッションＳ１（Ｓ２１）、新規クラスＣ２の追加セッションＳ２（Ｓ２２）、新規クラスＣ３の追加セッションＳ３（Ｓ２３）…と続く。各追加セッションにおいて各新規クラスに対してクラス毎にセントロイドが生成される。この時、各セッションは前段のセッションのクラス毎のセントロイドを引き継ぐことになる。なお、後段のセッションで扱う新規クラスは、既出でないクラスとする場合が主となるが、既出のクラスを再度扱う場合もあり得る。 A series of processes for applying the CBCL algorithm to a new class is called an "additional session". After the initial session, additional sessions input new class training data sets 310, 320, 330 for each session and generate class-by-class centroids for the new classes (S21, S22, S23). The number of new classes is not limited. It is assumed that the additional sessions are continuously repeated in session units, and the additional session S1 (S21) for the new class C1, the additional session S2 (S22) for the new class C2, the additional session S3 (S23) for the new class C3, and so on. and continues. A centroid is generated for each class for each new class in each additional session. At this time, each session inherits the centroid for each class of the previous session. The new classes to be dealt with in the later sessions are mainly the classes that have not already been introduced, but it is possible that the classes that have already been introduced will be dealt with again.

図２は、第１の実施の形態に係る機械学習装置１００の構成図である。機械学習装置１００は、特徴抽出部１０、学習済みモデル記憶部２０、セントロイド生成部３０、セントロイド記憶部５０、およびラベル予測部６０を含む。セントロイド生成部３０はクラスタリング部４０を含み、ラベル予測部６０は、近傍セントロイド選択部７０および重み付き投票部８０を含む。 FIG. 2 is a configuration diagram of the machine learning device 100 according to the first embodiment. The machine learning device 100 includes a feature extraction unit 10 , a trained model storage unit 20 , a centroid generation unit 30 , a centroid storage unit 50 and a label prediction unit 60 . The centroid generation unit 30 includes a clustering unit 40 , and the label prediction unit 60 includes a neighborhood centroid selection unit 70 and a weighted voting unit 80 .

機械学習装置１００は、トレーニングデータの入力を受けて、ＣＢＣＬの手法によってクラスタリングを学習し、各クラスのセントロイドを生成し、未知データサンプルに対してセントロイドにもとづいてラベル予測を行う。あるトレーニングデータセットに対する一連の学習処理をセッションと捉えると、ＣＢＣＬはセッション単位で継続的に学習を繰り返されることを前提とする。各セッションにおけるトレーニングデータセットのクラス数は限定しない。図１を参照してＣＢＣＬアルゴリズムを適用したセッションの処理内容を説明するが、処理内容は初期セッションと各追加セッションにおいて同様である。 The machine learning device 100 receives input of training data, learns clustering by the CBCL technique, generates centroids for each class, and performs label prediction for unknown data samples based on the centroids. If a series of learning processes for a certain training data set is regarded as a session, CBCL assumes that learning is continuously repeated for each session. The number of classes in the training dataset in each session is unlimited. The processing contents of the session to which the CBCL algorithm is applied will be described with reference to FIG. 1, but the processing contents are the same in the initial session and each additional session.

各セッションにおいて、機械学習装置１００に学習対象となるトレーニングデータセットが入力されると、特徴抽出部１０は、学習済みモデル記憶部２０から予め選択した事前学習済みモデルを読み出し、事前学習済みモデルを特徴抽出手段として用いて、トレーニングデータセットの入力サンプルの特徴ベクトルを抽出する。サンプルの一例は画像であるが、サンプルは必ずしも画像に限るものではない。 In each session, when a training data set to be learned is input to the machine learning device 100, the feature extraction unit 10 reads out a pre-selected pre-trained model from the learned model storage unit 20, and extracts the pre-trained model. It is used as a feature extractor to extract feature vectors of the input samples of the training data set. An example of a sample is an image, but a sample is not necessarily limited to an image.

学習フェーズにおいて、特徴抽出部１０は、抽出された入力サンプルの特徴ベクトルをセントロイド生成部３０に供給する。セントロイド生成部３０は、入力サンプルの特徴ベクトルおよび入力サンプルが属するクラスのラベルに基づき、入力サンプルをクラスタリングしてクラス毎のセントロイドを生成し、生成したクラス毎のセントロイドをセントロイド記憶部５０に記憶する。 In the learning phase, the feature extraction unit 10 supplies the extracted feature vectors of the input samples to the centroid generation unit 30 . The centroid generation unit 30 clusters the input samples based on the feature vector of the input sample and the label of the class to which the input sample belongs, generates a centroid for each class, and stores the generated centroid for each class in the centroid storage unit. Store in 50.

セントロイド生成部３０は、クラスタリング部４０を含み、一例としてＡｇｇ－Ｖａｒクラスタリングと呼ばれる機械学習を繰り返すことによってクラス毎のセントロイドを生成する。クラス毎のセントロイド数は１つまたは複数であり、全クラスで一定ではない。 The centroid generation unit 30 includes a clustering unit 40, and generates a centroid for each class by repeating machine learning called Agg-Var clustering, for example. The number of centroids for each class is one or more and is not constant for all classes.

クラスタリング部４０は、入力サンプルの特徴ベクトルと入力サンプルが属するクラスの各セントロイドとの距離を計算し、入力サンプルが属するクラスの複数のセントロイドの内、入力サンプルの特徴ベクトルに最も近い最近傍セントロイドを選択する。クラスタリング部４０は、入力サンプルと最近傍セントロイドの距離ｄｍｉｎを閾値Ｄと比較することにより、入力サンプルの特徴ベクトルを最近傍セントロイドに統合するか、入力サンプルの特徴ベクトルを入力サンプルが属するクラスのセントロイドとして新しく追加するかを決める。具体的には、クラスタリング部４０は、入力サンプルと最近傍セントロイドの距離ｄｍｉｎが閾値Ｄ未満（ｄｍｉｎ＜Ｄ）の場合、入力サンプルの特徴ベクトルを最近傍セントロイドに統合し、距離ｄｍｉｎが閾値Ｄ以上（ｄｍｉｎ≧Ｄ）の場合、入力サンプルの特徴ベクトルを入力サンプルが属するクラスのセントロイドとして新しく追加する。 The clustering unit 40 calculates the distance between the feature vector of the input sample and each centroid of the class to which the input sample belongs, and finds the closest neighbor to the feature vector of the input sample among a plurality of centroids of the class to which the input sample belongs. Select a centroid. The clustering unit 40 compares the distance dmin between the input sample and the nearest centroid with a threshold value D to integrate the feature vector of the input sample into the nearest centroid or classify the feature vector of the input sample into the class to which the input sample belongs. Decide whether to add a new centroid for Specifically, when the distance dmin between the input sample and the nearest centroid is less than a threshold D (dmin<D), the clustering unit 40 integrates the feature vectors of the input sample with the nearest centroid, and the distance dmin is the threshold If D or more (dmin≧D), the feature vector of the input sample is newly added as the centroid of the class to which the input sample belongs.

セッション毎のトレーニングデータセットのすべてにＡｇｇ－Ｖａｒクラスタリングを適用すると、そのセッションにおける全クラスのセントロイドが確定する。前段のセッションのセントロイドは引き継がれるため、前段のセッションまでに学習されたクラスのセントロイドを含みながら、後段のセッションにおいてセントロイドが更新されていく。なお、前述のように後段のセッションで既出のクラスを扱う場合には、その既出のクラスに対し既に生成されたセントロイドを初期状態として、Ａｇｇ－Ｖａｒクラスタリングの処理が行われる。 Applying Agg-Var clustering to all of the training data sets for each session establishes the centroids for all classes in that session. Since the centroid of the previous session is inherited, the centroid is updated in the latter session while including the centroid of the class learned up to the previous session. As described above, when dealing with an existing class in a subsequent session, the Agg-Var clustering process is performed with the centroid already generated for the existing class as an initial state.

推論フェーズにおいて、特徴抽出部１０は、未知サンプルから抽出された特徴ベクトルをラベル予測部６０に供給する。ラベル予測部６０は、セントロイド記憶部５０に記憶された確定したセントロイドに基づき、各サンプルのクラスラベルの予測を行う。 In the inference phase, the feature extraction unit 10 supplies feature vectors extracted from unknown samples to the label prediction unit 60 . The label prediction unit 60 predicts the class label of each sample based on the determined centroids stored in the centroid storage unit 50 .

ラベル予測部６０において、近傍セントロイド選択部７０は、未知サンプルの特徴ベクトルに対して、各クラスのすべてのセントロイドとの距離を考慮してｎ近傍のセントロイドを選択する。 In the label prediction unit 60, the neighborhood centroid selection unit 70 selects n neighborhood centroids for the feature vector of the unknown sample, considering the distances from all centroids of each class.

重み付き投票部８０は、近傍セントロイド選択部７０により選択されたｎ個のセントロイドのそれぞれが属する候補クラスに対して、未知サンプルの特徴ベクトルとｎ近傍のセントロイドとの距離に応じた重みを付けて候補クラスｙ毎の予測スコアＰｒｅｄ（ｙ）を算出することにより、重み付き投票を行う。距離に応じた重みとして、たとえば距離の逆数を用いる。すなわち距離が短いほど投票に与える重みが大きくなる。 The weighted voting unit 80 assigns a weight according to the distance between the feature vector of the unknown sample and the n-neighboring centroids to the candidate class to which each of the n centroids selected by the neighborhood centroid selection unit 70 belongs. Weighted voting is performed by calculating the prediction score Pred(y) for each candidate class y with . For example, the reciprocal of the distance is used as the weight according to the distance. That is, the shorter the distance, the greater the weight given to the vote.

重み付き投票部８０は、予測スコアＰｒｅｄ（ｙ）が最大値をとる候補クラスに付与されたラベルを未知サンプルのクラスラベルとして予測する。 The weighted voting unit 80 predicts the label assigned to the candidate class having the maximum prediction score Pred(y) as the class label of the unknown sample.

なお、Ａｇｇ－Ｖａｒクラスタリングや予測スコアの算出などＣＢＣＬの処理は特許文献１に記載の方法を用いることができる。 The method described in Patent Document 1 can be used for CBCL processing such as Agg-Var clustering and prediction score calculation.

従来のＣＢＣＬでは、Ａｇｇ－Ｖａｒクラスタリングのパラメータである閾値Ｄは、１回の学習において１つであり、特に制御されない固定値である。ここで、従来のＣＢＣＬにしたがって閾値Ｄを固定として、図１および図２で説明した各セッションの処理を行う場合の問題点を考える。 In conventional CBCL, the threshold value D, which is a parameter of Agg-Var clustering, is one in one learning and is a fixed value that is not particularly controlled. Here, let us consider a problem when processing each session described with reference to FIGS. 1 and 2 with the threshold value D fixed according to the conventional CBCL.

トレーニングデータセットは、通常、各セット単位ではクラス毎のサンプルの数や特徴分布に大きな偏りがないよう選択される。しかし、ベースクラスのデータセットと新規クラスのデータセットとを比べると、クラス毎のサンプルの数や特徴分布は異なる可能性が高い。 Training data sets are usually selected so that there is no large bias in the number of samples per class or feature distribution in each set. However, when comparing the base class dataset and the new class dataset, the number of samples and feature distributions for each class are likely to be different.

ＣＢＣＬアルゴリズムにおいては、学習済みモデルとして特徴抽出の機能が優れたモデルを使用することが望ましいため、学習済みモデルはビッグデータに基づいたものであるべきであり、ベースクラスのトレーニングデータは１クラスにつきたとえば数百サンプル以上であることが前提となる。一方で、新規クラスのトレーニングデータはビッグデータであってもよいが、スモールデータであること、すなわちサンプル数が少ない方が望ましい。特に、特許文献１に記載される通り、Ｆｅｗ－Ｓｈｏｔ形式、すなわちトレーニングデータが１クラスにつきたとえば数サンプル程度であることが理想である。ビッグデータを用いた学習は、多くのサンプルをあらかじめ用意する手間や長い学習時間を要するという実用上の課題があり、また、人間のように少ない回数で徐々に学習できるようにするため、スモールデータを用いたインクリメンタル学習が好ましい。 In the CBCL algorithm, it is desirable to use a model with excellent feature extraction function as a trained model, so the trained model should be based on big data, and the base class training data is For example, it is assumed that there are several hundred samples or more. On the other hand, the training data for the new class may be big data, but preferably small data, that is, the number of samples is small. In particular, as described in Patent Document 1, ideally, the Few-Shot format, ie, training data of, for example, several samples per class. Learning using big data has practical issues such as the time and effort required to prepare a large number of samples in advance and the long learning time. Incremental learning using is preferred.

そのため、図１において、初期セッションにおいても追加セッションにおいても、Ａｇｇ－Ｖａｒクラスタリングのパラメータである閾値Ｄが共通であるとすると、ベースクラスと新規クラスの間でセントロイドの数に偏りが生じると考えられる。概ね、ベースクラスがビッグデータであればセントロイドの数が多く、新規クラスがスモールデータであればセントロイドの数が少なくなることが予想される。その場合、ラベル予測処理において、セントロイドに基づく投票がベースクラスに優位となり、ベースクラスと新規クラスの間で投票が公平に行われず、分類精度が悪化することが予想される。また、ベースクラスのセントロイドが多く保存されることになるが、新規クラスのセントロイドに比べ不要なセントロイドが多く含まれるため、メモリの使用効率が低下することが予想される。 Therefore, in FIG. 1, if the threshold D, which is a parameter of Agg-Var clustering, is common in both the initial session and the additional session, it is thought that there will be a bias in the number of centroids between the base class and the new class. be done. In general, it is expected that the number of centroids will be large if the base class is big data, and the number of centroids will be small if the new class is small data. In that case, it is expected that in the label prediction process, the centroid-based vote will be dominant over the base class, and the vote will not be impartially performed between the base class and the new class, degrading the classification accuracy. In addition, although many base class centroids will be saved, it is expected that memory usage efficiency will decrease because many unnecessary centroids will be included compared to new class centroids.

そこで、第１の実施の形態では、Ａｇｇ－Ｖａｒクラスタリングのパラメータである閾値Ｄについて、ベースクラスのトレーニングデータセットをクラスタリングする初期セッションと、新規クラスのトレーニングデータセットをクラスタリングする追加セッションとで、それぞれ適した値を設定する。初期セッションの閾値Ｄｆ、追加セッションの閾値Ｄｉについて、閾値Ｄｆは閾値Ｄｉよりも大きく設定し、なるべくベースクラスのセントロイドが少なくなるように制御する。言い換えれば、閾値Ｄｉは閾値Ｄｆよりも小さくし、なるべく新規クラスのセントロイドが多くなるように制御する。これにより、閾値Ｄをベースクラスと新規クラスで共通とする場合に比べ、セントロイドに基づく投票の公平性が改善され、分類精度を向上させることができる。また、ベースクラスのセントロイドが少なくなることで、不要なセントロイドを保存せずに済むため、メモリ使用の効率化を図ることができる。 Therefore, in the first embodiment, the threshold D, which is a parameter of Agg-Var clustering, is set in the initial session for clustering the base class training data set and the additional session for clustering the new class training data set. Set a suitable value. Regarding the threshold Df for the initial session and the threshold Di for the additional session, the threshold Df is set higher than the threshold Di, and control is performed so that the base class centroid is reduced as much as possible. In other words, the threshold Di is set smaller than the threshold Df, and control is performed to increase the number of new class centroids as much as possible. As a result, the fairness of voting based on the centroid can be improved and the classification accuracy can be improved compared to the case where the threshold D is shared between the base class and the new class. In addition, since the number of centroids in the base class is reduced, unnecessary centroids do not need to be stored, so memory usage can be made more efficient.

初期セッションの閾値Ｄｆと追加セッションの閾値Ｄｉは、あるクラスにおけるセントロイド間の距離の分布が同程度になることが望ましい。例えばトレーニングデータの元データあるいは事前学習済みモデルを用いて特徴抽出を行った特徴データを基に、特徴データの確率分布に基づいて閾値Ｄｆと閾値Ｄｉを設定する。確率分布が正規分布に従うと仮定した場合、ベースクラスのトレーニングデータのクラス毎の分散σの自乗の平均が大きいほどＤｆ＞＞Ｄｉとなるように、すなわち閾値Ｄｆと閾値Ｄｉの値の差が大きくなるように設定する。これにより、ベースクラスのクラス毎のセントロイド数をなるべく減らすことができる。 It is desirable that the initial session threshold Df and the additional session threshold Di have similar distributions of distances between centroids in a class. For example, the threshold Df and the threshold Di are set based on the probability distribution of the feature data based on the feature data extracted using the original data of the training data or the pre-trained model. Assuming that the probability distribution follows a normal distribution, the larger the average of the squares of the variances σ of the training data of the base class for each class, the greater the difference between the values of the threshold Df and the threshold Di. set to be As a result, the number of centroids per class of the base class can be reduced as much as possible.

図３（ａ）および図３（ｂ）は、図１の機械学習装置１００による機械学習手順を説明するフローチャートである。 3(a) and 3(b) are flowcharts for explaining the machine learning procedure by the machine learning device 100 of FIG.

図３（ａ）は、学習段階におけるクラスタリング処理の流れを示す。クラスタリングに用いるパラメータである閾値Ｄに初期セッションの閾値Ｄｆを設定する（Ｓ３０）。初期セッションのクラスタリング処理を実行する（Ｓ３２）。閾値Ｄに追加セッションの閾値Ｄｉを設定する（Ｓ３４）。追加セッションのクラスタリング処理を実行する（Ｓ３６）。次の追加セッションをクラスタリングする場合（Ｓ３８のＹ）、ステップＳ３６に戻り、そうでない場合（Ｓ３８のＮ）、終了する。 FIG. 3(a) shows the flow of clustering processing in the learning stage. The threshold value Df of the initial session is set to the threshold value D, which is a parameter used for clustering (S30). An initial session clustering process is executed (S32). The threshold Di of the additional session is set to the threshold D (S34). Clustering processing of the additional session is executed (S36). If the next additional session is to be clustered (Y of S38), return to step S36; otherwise (N of S38), end.

図３（ｂ）は、図３（ａ）のステップＳ３２の初期セッションのクラスタリング処理およびステップＳ３６の追加セッションのクラスタリング処理の詳細な手順を示す。初期セッションと追加セッションでクラスタリング処理は同じである。初期セッションのクラスタリング処理では、ベースクラスのトレーニングデータセットのサンプルが入力され、追加セッションのクラスタリング処理では、新規クラスのトレーニングデータセットのサンプルが入力される。 FIG. 3(b) shows detailed procedures of the initial session clustering process in step S32 and the additional session clustering process in step S36 of FIG. 3(a). The clustering process is the same for initial and additional sessions. The initial session clustering process is input with samples of the base class training data set, and the additional session clustering process is input with samples of the new class training data set.

特徴抽出部１０は、学習済みモデルを用いて入力サンプルの特徴ベクトルを抽出する（Ｓ４０）。 The feature extraction unit 10 extracts the feature vector of the input sample using the trained model (S40).

クラスタリング部４０は、入力サンプルが属するクラスの複数のセントロイドの内、入力サンプルの特徴ベクトルに最も近い最近傍セントロイドを選択する（Ｓ４２）。クラスタリング部４０は、入力サンプルの特徴ベクトルと最近傍セントロイドの距離ｄｍｉｎを閾値Ｄと比較する（Ｓ４４）。 The clustering unit 40 selects the nearest neighbor centroid closest to the feature vector of the input sample from among the plurality of centroids of the class to which the input sample belongs (S42). The clustering unit 40 compares the distance dmin between the feature vector of the input sample and the nearest centroid with a threshold value D (S44).

クラスタリング部４０は、ｄｍｉｎ＜Ｄの場合（Ｓ４６のＹ）、入力サンプルの特徴ベクトルを最近傍セントロイドに統合する（Ｓ４８）。ｄｍｉｎ≧Ｄの場合（Ｓ４６のＮ）、入力サンプルの特徴ベクトルを所属クラスのセントロイドとして新しく追加する（Ｓ５０）。 If dmin<D (Y in S46), the clustering unit 40 integrates the feature vectors of the input samples into the nearest centroid (S48). If dmin≧D (N of S46), the feature vector of the input sample is newly added as a centroid of the class to which it belongs (S50).

（第２の実施の形態）
図４は、第２の実施の形態に係る推論装置２００の構成図である。推論装置２００は、特徴抽出部１０、学習済みモデル記憶部２０、セントロイド生成部３０、セントロイド記憶部５０、およびラベル予測部６０を含む。セントロイド生成部３０はクラスタリング部４０を含み、ラベル予測部６０は、近傍セントロイド選択部７０および重み付き投票部８０を含む。 (Second embodiment)
FIG. 4 is a configuration diagram of an inference device 200 according to the second embodiment. Inference device 200 includes feature extraction unit 10 , trained model storage unit 20 , centroid generation unit 30 , centroid storage unit 50 , and label prediction unit 60 . The centroid generation unit 30 includes a clustering unit 40 , and the label prediction unit 60 includes a neighborhood centroid selection unit 70 and a weighted voting unit 80 .

第１の実施の形態の機械学習装置１００と第２の実施の形態の推論装置２００は、クラスタリング部４０と重み付き投票部８０の構成と動作が異なり、それ以外の構成と動作は共通する。 The machine learning apparatus 100 of the first embodiment and the inference apparatus 200 of the second embodiment differ in the configuration and operation of the clustering unit 40 and the weighted voting unit 80, and are otherwise common in configuration and operation.

第１の実施の形態の機械学習装置１００のクラスタリング部４０は、クラスタリングで用いるパラメータである閾値Ｄを初期セッションと追加セッションとで異ならせたが、第２の実施の形態の推論装置２００では、従来のＣＢＣＬのようにベースクラスのクラスタリングを行うことなく、すべての追加セッションでクラスタリングで用いるパラメータを共通にしてもよい。あるいは、第２の実施の形態の推論装置２００においても第１の実施の形態のように初期セッションのクラスタリングを行う場合は、推論装置２００のクラスタリング部４０においてもクラスタリングで用いるパラメータである閾値Ｄを初期セッションと追加セッションとで異ならせてもよい。 The clustering unit 40 of the machine learning device 100 of the first embodiment differentiates the threshold value D, which is a parameter used in clustering, between the initial session and the additional session. The parameters used for clustering may be common for all additional sessions without clustering the base class as in conventional CBCL. Alternatively, when the inference apparatus 200 of the second embodiment also performs clustering of the initial session as in the first embodiment, the clustering unit 40 of the inference apparatus 200 also sets the threshold value D, which is a parameter used for clustering, to It may be different for initial sessions and additional sessions.

以下、第２の実施の形態の推論装置２００のラベル予測部６０における重み付き投票部８０の構成と動作を説明する。 The configuration and operation of the weighted voting unit 80 in the label prediction unit 60 of the inference device 200 according to the second embodiment will be described below.

重み付き投票部８０は、近傍セントロイド選択部７０により選択されたｎ個のセントロイドのそれぞれが属する候補クラスに対して、未知サンプルの特徴ベクトルとセントロイドとの距離に応じた重みを付けて候補クラスｙ毎の予測スコアＰｒｅｄ（ｙ）を算出し、さらに予測スコアＰｒｅｄ（ｙ）に対し補正係数Ａを掛けて補正予測スコアＰｒｅｄ（ｙ）’を算出し、補正予測スコアＰｒｅｄ（ｙ）’が最大値をとるクラスに付与されたラベルを未知サンプルのラベルとして予測する。 The weighted voting unit 80 weights the candidate classes to which each of the n centroids selected by the neighborhood centroid selection unit 70 belongs according to the distance between the feature vector of the unknown sample and the centroid. A prediction score Pred(y) for each candidate class y is calculated, and the prediction score Pred(y) is multiplied by a correction coefficient A to calculate a corrected prediction score Pred(y)′, and the corrected prediction score Pred(y)′ is calculated. Predict the label assigned to the class with the maximum value as the label of the unknown sample.

ここで、従来のＣＢＣＬでは、この補正係数Ａはクラスのトレーニングデータのサンプル数Ｎ_ｙの逆数である。すなわち、Ａ＝１／Ｎ_ｙであり、Ｐｒｅｄ（ｙ）’＝（１／Ｎ_ｙ）Ｐｒｅｄ（ｙ）である。 Here, in the conventional CBCL, this correction factor A is the reciprocal of the number _Ny of samples of the training data of the class. That is, A=1/N _y and Pred(y)′=(1/N _y )Pred(y).

この補正係数Ａ＝１／Ｎ_ｙは、サンプル数が多いクラスはセントロイド数も多くなる可能性が高く、Ｐｒｅｄ（ｙ）の最大値だけで判定するとサンプル数の多いクラスが優位になる可能性があり、不公平であるという前提に基づいている。 This correction coefficient A = 1/N _y is likely to increase the number of centroids for classes with a large number of samples. It is based on the premise that there is and is unfair.

ここで、トレーニングデータセットは、通常、各セット単位ではクラス毎のサンプルの数や特徴分布に大きな偏りがないよう選択される。しかし、理想的なアプリケーションとしては半永久的にセッションが繰り返されることが望ましいため、全セッションでクラス毎のサンプルの数や特徴分布が統一されるとは限らない。また、従来のＣＢＣＬでは、ベースクラスを分類することを前提とせず、新規クラスを継続的に学習することを想定しているが、ＣＢＣＬをベースクラスと新規クラスの両方の分類に拡張するアプリケーションも多く考えられる。その場合は、ベースクラスがビッグデータ（たとえば１クラスにつき数百サンプル以上）であり、新規クラスがスモールデータ、理想的にはＦｅｗ－Ｓｈｏｔ形式（たとえば１クラスにつき数サンプル程度）であるという場合も多い。 Here, the training data set is usually selected so that the number of samples for each class and the feature distribution are not greatly biased in each set. However, as an ideal application, it is desirable to repeat sessions semipermanently, so the number of samples for each class and feature distribution are not always uniform in all sessions. In addition, conventional CBCL is not premised on classifying the base class and assumes continuous learning of new classes, but there are applications that extend CBCL to classify both base classes and new classes. I can think of many. In that case, the base class is big data (for example, several hundred samples or more per class), and the new class is small data, ideally in Few-Shot format (for example, about several samples per class). many.

全セッションの全クラスでサンプル数や特徴分布が統一されていれば、補正係数Ａは従来のＣＢＣＬと同様、Ａ＝１／Ｎ_ｙで問題ない。しかし、そうでない場合には、補正係数Ａを１／Ｎ_ｙとすると、よりサンプル数の多い候補クラスの予測スコアＰｒｅｄ（ｙ）’が小さくなり、その候補クラスがラベルとして選ばれにくくなり、よりサンプル数の少ない候補クラスの予測スコアＰｒｅｄ（ｙ）’が大きくなり、その候補クラスがラベルとして選ばれやすくなる。したがって、クラス毎のサンプル数によって投票が公平に行われず、分類精度が悪化することが予想される。 If the number of samples and the feature distribution are the same for all classes in all sessions, the correction coefficient A can be A=1/N _y as in the conventional CBCL. Otherwise, if the correction coefficient A is 1/N _y , the prediction score Pred(y)' of the candidate class with a larger number of samples will be smaller, making it more difficult for that candidate class to be selected as a label. A candidate class with a small number of samples has a large prediction score Pred(y)′, and the candidate class is more likely to be selected as a label. Therefore, it is expected that the number of samples for each class will not give fair voting, and that the classification accuracy will deteriorate.

そこで、第２の実施の形態では、Ｐｒｅｄ（ｙ）に掛ける補正係数Ａにクラス毎のセントロイド数を反映させる。クラス毎のセントロイド数をＮ^＊ _ｙとして、例えば補正係数Ａをセントロイド数の逆数である１／Ｎ^＊ _ｙにすると、各セントロイドがクラスラベルの予測に与える影響力すなわちエネルギーがクラス間で同程度になる。これにより、各クラスでサンプル数や特徴分布が統一されていない場合でも、セントロイドに基づく投票の公平性が改善され、分類精度の悪化を防ぐことができる。 Therefore, in the second embodiment, the correction coefficient A by which Pred(y) is multiplied reflects the number of centroids for each class. Assuming that the number of centroids for each class is N ^* _y , and the correction coefficient A is, for example, 1/N ^* _y , which is the reciprocal of the number of centroids, the influence that each centroid has on the prediction of the class label, that is, the energy is become the same. As a result, even if the number of samples and feature distribution are not the same for each class, the fairness of voting based on the centroid can be improved and the deterioration of classification accuracy can be prevented.

このように、第２の実施の形態では、重み付き投票部８０は、クラスｙ毎の予測スコアＰｒｅｄ（ｙ）に対し補正係数Ａ＝１／Ｎ^＊ _ｙを掛けて補正予測スコアＰｒｅｄ（ｙ）’を算出し、補正予測スコアＰｒｅｄ（ｙ）’が最大値をとるクラスを未知サンプルのクラスラベルとして予測する。 Thus, in the second embodiment, the weighted voting unit 80 multiplies the prediction score Pred(y) for each class y by the correction coefficient A=1/N ^* _y to obtain the corrected prediction score Pred(y) ' is calculated, and the class having the maximum corrected prediction score Pred(y)' is predicted as the class label of the unknown sample.

図５は、図４の推論装置２００による推論手順を説明するフローチャートである。 FIG. 5 is a flowchart for explaining the inference procedure by the inference apparatus 200 of FIG.

特徴抽出部１０は、学習済みモデルを用いて未知サンプルの特徴ベクトルを抽出する（Ｓ７０）。 The feature extraction unit 10 extracts the feature vector of the unknown sample using the trained model (S70).

近傍セントロイド選択部７０は、未知サンプルの特徴ベクトルに対してｎ近傍セントロイドを選択する（Ｓ７２）。 The neighborhood centroid selection unit 70 selects n neighborhood centroids for the feature vector of the unknown sample (S72).

重み付き投票部８０は、未知サンプルの特徴ベクトルとｎ近傍セントロイドの距離に応じた重みを付けて、ｎ近傍セントロイドが属する候補クラスの予測スコアを算出する（Ｓ７４）。重み付き投票部８０は、候補クラスのセントロイド数に応じた補正係数で候補クラスの予測スコアを補正する（Ｓ７６）。重み付き投票部８０は、補正予測スコアが最大となるクラスがクラスラベルであると予測する（Ｓ７８）。 The weighted voting unit 80 calculates the prediction score of the candidate class to which the n-neighboring centroids belong by adding weights according to the distances between the feature vector of the unknown sample and the n-neighboring centroids (S74). The weighted voting unit 80 corrects the predicted score of the candidate class with a correction coefficient according to the number of centroids of the candidate class (S76). The weighted voting unit 80 predicts that the class with the maximum corrected prediction score is the class label (S78).

以上説明した機械学習装置１００および推論装置２００の各種の処理は、ＣＰＵやメモリ等のハードウェアを用いた装置として実現することができるのは勿論のこと、ＲＯＭ（リード・オンリ・メモリ）やフラッシュメモリ等に記憶されているファームウェアや、コンピュータ等のソフトウェアによっても実現することができる。そのファームウェアプログラム、ソフトウェアプログラムをコンピュータ等で読み取り可能な記録媒体に記録して提供することも、有線あるいは無線のネットワークを通してサーバと送受信することも、地上波あるいは衛星ディジタル放送のデータ放送として送受信することも可能である。 The various processes of the machine learning device 100 and the reasoning device 200 described above can of course be realized as a device using hardware such as a CPU and memory, and can also be implemented using a ROM (read only memory) or flash memory. It can also be realized by firmware stored in a memory or the like, or software such as a computer. The firmware program or software program may be recorded on a computer-readable recording medium and provided, transmitted to or received from a server via a wired or wireless network, or transmitted or received as data broadcasting of terrestrial or satellite digital broadcasting. is also possible.

以上述べたように、第１の実施の形態の機械学習装置１００によれば、ＣＢＣＬをベースクラスのクラスタリングを行う初期セッションと新規クラスのクラスタリングを行う追加セッションを組み合わせたインクリメンタル学習に拡張した場合、ベースクラスと新規クラスとでクラスタリングで使用するパラメータを異ならせることにより、ベースクラスおよび新規クラスの分類精度を向上させるとともに、無駄なセントロイドの数を減らしてメモリ使用の効率化を図ることができる。 As described above, according to the machine learning device 100 of the first embodiment, when CBCL is extended to incremental learning in which an initial session for clustering the base class and an additional session for clustering the new class are combined, By using different parameters for clustering between the base class and the new class, it is possible to improve the classification accuracy of the base class and the new class, reduce the number of wasted centroids, and improve the efficiency of memory usage. .

第２の実施の形態の推論装置２００によれば、ＣＢＣＬのラベル予測処理において、候補クラスに対して投票を行う際の補正係数にセントロイド数を反映させることにより、クラス毎のトレーニングデータのサンプル数や特徴分布に偏りがあった場合でも、分類精度が悪化することを防ぐことができる。特にベースクラスと新規クラスの両方のクラスタリングを行うようにＣＢＣＬを拡張する場合、ベースクラスと新規クラスの間でサンプル数や特徴分布に大きな偏りがあることが多いが、セントロイド数に基づいて候補クラスの投票を補正することで公平性を担保することができる。 According to the inference apparatus 200 of the second embodiment, in the label prediction processing of CBCL, by reflecting the number of centroids in the correction coefficient when voting for the candidate class, the sample of the training data for each class It is possible to prevent the classification accuracy from deteriorating even when the number or feature distribution is biased. Especially when extending CBCL to cluster both the base class and the novel class, there is often a large bias in the number of samples and feature distributions between the base class and the novel class. Fairness can be ensured by correcting class votes.

以上、本発明を実施の形態をもとに説明した。実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on the embodiments. It should be understood by those skilled in the art that the embodiments are examples, and that various modifications can be made to combinations of each component and each treatment process, and that such modifications are within the scope of the present invention. .

１０特徴抽出部、２０学習済みモデル記憶部、３０セントロイド生成部、４０クラスタリング部、５０セントロイド記憶部、６０ラベル予測部、７０近傍セントロイド選択部、８０重み付き投票部、１００機械学習装置、２００推論装置。 10 feature extraction unit 20 trained model storage unit 30 centroid generation unit 40 clustering unit 50 centroid storage unit 60 label prediction unit 70 neighborhood centroid selection unit 80 weighted voting unit 100 machine learning device , 200 reasoning apparatus.

Claims

a feature extraction unit that extracts a feature vector of an input sample using a trained model;
a clustering unit that clusters the input samples to generate a centroid for each class based on the feature vector of the input samples and the class to which the input samples belong;
The clustering unit includes an initial session for clustering the base class training data set samples used to generate the trained model, and an additional session for clustering the newly added new class training data set samples, A machine learning device characterized by varying parameters used in clustering.

The clustering unit selects a nearest neighbor centroid closest to the feature vector of the input sample from among a plurality of centroids of the class to which the input sample belongs, and the distance between the input sample and the nearest neighbor centroid is the parameter If the distance is less than a threshold, integrate the feature vector of the input sample into the nearest neighbor centroid, and if the distance is greater than or equal to the threshold, integrate the feature vector of the input sample into the centroid of the class to which the input sample belongs 2. The machine learning device according to claim 1, wherein the machine learning device is newly added as .

3. The machine learning device according to claim 2, wherein the clustering unit sets the threshold used in the initial session to be greater than the threshold used in the additional session.

A feature extraction step of extracting a feature vector of the input sample using the trained model;
a clustering step of clustering the input samples to generate centroids for each class based on the feature vector of the input samples and the class to which the input samples belong;
The clustering step includes an initial session for clustering the base class training data set samples used in generating the trained model and an additional session for clustering the newly added new class training data set samples, A machine learning method characterized by varying parameters used in clustering.

A feature extraction step of extracting a feature vector of the input sample using the trained model;
causing a computer to perform a clustering step of clustering the input samples to generate a centroid for each class based on the feature vectors of the input samples and the classes to which the input samples belong;
The clustering step includes an initial session for clustering the base class training data set samples used in generating the trained model and an additional session for clustering the newly added new class training data set samples, A machine learning program characterized by varying parameters used in clustering.