JP7266075B2

JP7266075B2 - Data selection support device, data selection support method, and data selection support program

Info

Publication number: JP7266075B2
Application number: JP2021154504A
Authority: JP
Inventors: 勝人伊佐野; 尭理中尾; 光義山足; 紘和阿部
Original assignee: Mitsubishi Electric Information Systems Corp
Current assignee: Mitsubishi Electric Information Systems Corp
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2023-04-27
Anticipated expiration: 2041-09-22
Also published as: JP2023045892A

Description

本開示は、モデルの学習で用いる訓練データの選別を支援する技術に関する。 The present disclosure relates to technology for assisting selection of training data used in model learning.

モデルの学習を行う場合には、多数の訓練データが入力として用いられる。用意された訓練データには、学習によるモデルの精度向上を妨げる訓練データが含まれる場合がある。精度向上を妨げる訓練データは、例えば、誤ったラベル付けがされた訓練データである。誤ったラベル付けがされた訓練データとは、例えば、犬の画像データに対して、猫というラベルが付された訓練データである。精度向上を妨げる訓練データを取り除いた上で、学習を行うことが望ましい。 When training a model, a large amount of training data is used as input. The prepared training data may include training data that hinders the improvement of model accuracy through learning. Training data that hinders accuracy improvement is, for example, mislabeled training data. Mislabeled training data is, for example, training data in which image data of a dog is labeled as a cat. It is desirable to perform learning after removing training data that hinders accuracy improvement.

最適化アルゴリズムを適用して、モデルの訓練及び検証を繰り返して、訓練データを選別する方法がある。この方法では、モデルの訓練及び検証を数千回から数万回繰り返す必要がある。 There is a method of applying an optimization algorithm to iteratively train and validate the model to screen the training data. This method requires thousands to tens of thousands of iterations of model training and validation.

ディープラーニングにより学習を行う場合には、１回の訓練にかかる時間が長い。そのため、特別な環境でなければ、ディープラーニングにより学習を行う場合に、最適化アルゴリズムを適用することは非現実的である。特別な環境とは、ＧＰＵクラスタが利用できる環境である。ＧＰＵは、ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔの略である。 When performing learning by deep learning, it takes a long time for one training session. Therefore, unless it is a special environment, it is unrealistic to apply an optimization algorithm when learning by deep learning. A special environment is an environment in which a GPU cluster can be used. GPU is an abbreviation for Graphics Processing Unit.

非特許文献１には、ＤａｔａＣａｒｔｏｇｒａｐｈｙ（以下、ＤＣ）について記載されている。ＤＣは、ディープラーニングの訓練中に、訓練データである画像の確信度の時間的推移を観測して、画像を分類する手法である。
ＤＣでは、画像について、各エポックにおける確信度の平均値及び標準偏差が計算される。そして、平均値及び標準偏差に基づき、画像が、学習容易なものと、汎用化性能を向上させるものと、学習困難なものとのいずれかに分類される。学習困難なものに分類された画像は、モデルの精度向上を妨げる可能性が高い。
ＤＣでは、モデルの訓練及び検証を１度実施するたけで画像の分類ができる。そのため、ディープラーニングにより学習を行う場合にも適用することができる。 Non-Patent Document 1 describes Data Cartography (DC). DC is a method of observing the temporal transition of the degree of certainty of an image, which is training data, during deep learning training to classify the image.
DC computes the mean and standard deviation of the confidence at each epoch for the image. Then, based on the mean and standard deviation, the images are classified as easy to learn, improve generalization performance, or difficult to learn. Images classified as hard to learn are likely to hinder the model from improving accuracy.
In DC, image classification can be done by performing model training and validation only once. Therefore, it can be applied to learning by deep learning.

Ｓｗａｙａｍｄｉｐｔａ，Ｓｗａｂｈａ，ｅｔａｌ． “ＤａｔａｓｅｔＣａｒｔｏｇｒａｐｈｙ：ＭａｐｐｉｎｇａｎｄＤｉａｇｎｏｓｉｎｇＤａｔａｓｅｔｓｗｉｔｈＴｒａｉｎｉｎｇＤｙｎａｍｉｃｓ．” Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２０２０ＣｏｎｆｅｒｅｎｃｅｏｎＥｍｐｉｒｉｃａｌＭｅｔｈｏｄｓｉｎＮａｔｕｒａｌＬａｎｇｕａｇｅＰｒｏｃｅｓｓｉｎｇ（ＥＭＮＬＰ）．２０２０．Swayamdipta, Swabha, et al. "Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics." Proceedings of the 2020 Conference on Empirical Methods in Natural Language Pro cessing (EMNLP). 2020. ＲａｆａｅｌＭｕｌｌｅｒ，ＳｉｍｏｎＫｏｒｎｂｌｉｔｈ，ａｎｄＧｅｏｆｆｒｅｙＥＨｉｎｔｏｎ．Ｗｈｅｎｄｏｅｓｌａｂｅｌｓｍｏｏｔｈｉｎｇｈｅｌｐ？ＩｎＡｄｖａｎｃｅｓｉｎＮｅｕｒａｌＩｎｆｏｒｍａｔｉｏｎＰｒｏｃｅｓｓｉｎｇＳｙｓｔｅｍｓ，ｐｐ．４６９４－４７０３，２０１９．Rafael Muller, Simon Kornblith, and Geoffrey E Hinton. When does label smoothing help? In Advances in Neural Information Processing Systems, pp. 4694-4703, 2019.

ＤＣでは、モデルの学習が十分に行われた場合には、データ数が少ないラベルの画像は、学習困難なものに分類されるべきであっても、汎用化性能を向上させるものに分類される可能性が高い（非特許文献１参照）。 In DC, when the model is sufficiently trained, images labeled with a small number of data are classified as those that improve generalization performance, even if they should be classified as difficult to learn. It is highly possible (see Non-Patent Document 1).

また、ディープラーニングでは、確信度が高くなる傾向がある（非特許文献２参照）。そのため、訓練データが学習容易なものに分類され易い傾向がある。 In addition, deep learning tends to increase the degree of certainty (see Non-Patent Document 2). Therefore, training data tends to be easily classified into those that are easy to learn.

以上のことから、ＤＣでは、学習困難なものに分類されるべき画像が適切に分類されない可能性がある。そのため、モデルの精度向上を妨げる画像が適切に抽出されない可能性がある。 From the above, DC may not properly classify images that should be classified as difficult to learn. Therefore, there is a possibility that images that hinder the improvement of model accuracy are not properly extracted.

本開示は、モデルの精度向上を妨げる訓練データを適切に抽出可能な構成を実現できるようにすることを目的とする。 An object of the present disclosure is to realize a configuration that can appropriately extract training data that hinders improvement in model accuracy.

本開示に係るデータ選別支援装置は、
エポック数Ｅ以下のいずれかのエポックｅについて、複数の訓練データそれぞれに対するモデルの確信度の平均値μ_ｅを計算する確信度平均計算部と、
前記複数の訓練データそれぞれについて、前記エポックｅにおける前記モデルの確信度と、前記確信度平均計算部によって計算された前記平均値μ_ｅとの間の乖離度を計算する乖離度計算部と
を備える。 The data selection support device according to the present disclosure is
a certainty average calculation unit that calculates the average value μ _e of the model's certainty for each of the plurality of training data for any epoch e equal to or less than the epoch number E;
a divergence calculation unit for calculating, for each of the plurality of training data, a degree of divergence between the confidence of the model at the epoch e and the average value μ _e calculated by the confidence average calculation unit; .

前記確信度平均計算部は、前記エポック数Ｅ以下の各エポックｅについて、前記複数の訓練データそれぞれに対するモデルの確信度の平均値μ_ｅを計算し、
前記乖離度計算部は、前記複数の訓練データそれぞれについて、前記各エポックｅにおける前記モデルの確信度と、前記確信度平均計算部によって計算された前記平均値μ_ｅとの間の乖離度を計算する。 The confidence average calculation unit calculates an average value μ _e of the confidence of the model for each of the plurality of training data for each epoch e equal to or less than the epoch number E,
The divergence calculator calculates the divergence between the confidence of the model at each epoch e and the average value μ _e calculated by the average confidence calculator for each of the plurality of training data. do.

前記乖離度計算部は、前記各エポックｅにおける前記乖離度の平均値μ_ｉを計算する。 The divergence calculator calculates an average value μ _i of the divergence in each epoch e.

前記データ選別支援装置は、さらに、
前記各エポックｅについて、前記複数の訓練データそれぞれに対するモデルの確信度のばらつきを計算する確信度ばらつき計算部
を備え、
前記乖離度計算部は、前記確信度ばらつき計算部によって計算された前記確信度のばらつきを用いて、前記乖離度の平均値μ_ｉを計算する。 The data selection support device further
a confidence variation calculation unit that calculates a confidence variation of the model for each of the plurality of training data for each epoch e;
The divergence calculator calculates an average value μ _i of the divergence using the variation in the confidence calculated by the confidence variance calculator.

前記確信度平均計算部は、前記各エポックｅについて、数１に示すように、ｉ＝１，．．．，Ｎの各整数ｉについての訓練データの画像ｘｉの正解ラベルｙ_ｉ ^＊に対する確信度ｐ_{ｉ，θ（ｅ）}の平均値μ_ｅを計算し、
前記確信度ばらつき計算部は、前記各エポックｅについて、数２に示すように、前記確信度のばらつきσ_ｅを計算し、
前記乖離度計算部は、ｉ＝１，．．．，Ｎの各整数ｉについての訓練データについて、数３に示すように、前記乖離度の平均値μ_ｉを計算する。

The certainty average calculator calculates i=1, . . . _, N with respect to the correct label y _i ^* of _the training data image xi for each integer i;
The confidence variation calculation unit calculates the confidence variation σ _e as shown in Equation 2 for each epoch e,
The divergence calculator calculates i=1, . . . , N, the average value μ _i of the deviation is calculated as shown in Equation (3).

前記データ選別支援装置は、さらに、
前記乖離度計算部によって計算された前記乖離度のばらつきを計算する乖離度ばらつき計算部
を備える。 The data selection support device further
A divergence degree variation calculation unit is provided for calculating the variation in the degree of divergence calculated by the divergence degree calculation unit.

前記データ選別支援装置は、さらに、
数４に示すように、前記乖離度計算部によって計算された前記乖離度のばらつきσ_ｉを計算する乖離度ばらつき計算部
を備える。

The data selection support device further
As shown in Equation 4, a divergence degree variation calculator is provided for calculating the divergence degree variation _σi calculated by the divergence degree calculator.

前記データ選別支援装置は、さらに、
前記乖離度と、前記乖離度のばらつきとの少なくともいずれかに基づき、前記複数の訓練データから削除する訓練データを選別するデータ選別部
を備える。 The data selection support device further
A data selection unit that selects training data to be deleted from the plurality of training data based on at least one of the degree of deviation and variation in the degree of deviation.

前記データ選別部は、各訓練データについて他の訓練データとの間の特徴の距離に基づき、削除する訓練データを選別する。 The data selection unit selects training data to be deleted based on the feature distance between each training data and other training data.

前記データ選別支援装置は、さらに、
前記乖離度と、前記乖離度のばらつきとの一方を縦軸とし、他方を横軸として、前記複数の訓練データをプロットした特徴マップを表示する特徴表示部
を備える。 The data selection support device further
A feature display unit is provided for displaying a feature map obtained by plotting the plurality of training data, with one of the degree of deviation and the variation of the degree of deviation set as a vertical axis and the other set as a horizontal axis.

前記特徴表示部は、各訓練データについて他の訓練データとの間の特徴の距離を表示する。 The feature display section displays feature distances between each training data and other training data.

本開示に係るデータ選別支援方法は、
コンピュータが、エポック数Ｅ以下のいずれかのエポックｅについて、複数の訓練データそれぞれに対するモデルの確信度の平均値μ_ｅを計算し、
コンピュータが、前記複数の訓練データそれぞれについて、前記エポックｅにおける前記モデルの確信度と、前記平均値μ_ｅとの間の乖離度を計算する。 The data selection support method according to the present disclosure is
A computer calculates the average value μ _e of the model confidence for each of the plurality of training data for any epoch e equal to or less than the epoch number E,
A computer calculates the degree of divergence between the confidence of the model at the epoch e and the mean μ _e for each of the plurality of training data.

本開示に係るデータ選別支援プログラムは、
エポック数Ｅ以下のいずれかのエポックｅについて、複数の訓練データそれぞれに対するモデルの確信度の平均値μ_ｅを計算する確信度平均計算処理と、
前記複数の訓練データそれぞれについて、前記エポックｅにおける前記モデルの確信度と、前記確信度平均計算処理によって計算された前記平均値μ_ｅとの間の乖離度を計算する乖離度計算処理と
を行うデータ選別支援装置としてコンピュータを機能させる。 The data selection support program according to the present disclosure is
Confidence average calculation processing for calculating the average value μ _e of the model confidence for each of the plurality of training data for any epoch e equal to or less than the epoch number E;
performing a divergence calculation process of calculating a divergence between the confidence of the model at the epoch e and the mean value _μe calculated by the confidence average calculation process for each of the plurality of training data; The computer functions as a data selection support device.

本開示では、エポックｅについて、複数の訓練データそれぞれに対するモデルの確信度の平均値μ_ｅを計算し、複数の訓練データそれぞれについて、モデルの確信度と平均値μ_ｅとの間の乖離度を計算する。この乖離度を用いることにより、モデルの精度向上を妨げる画像を適切に抽出可能な構成を実現することができる。 In the present disclosure, the average value μ _e of the model confidence for each of the plurality of training data is calculated for the epoch e, and the divergence between the model confidence and the average value μ _e is calculated for each of the plurality of training data. calculate. By using this degree of divergence, it is possible to realize a configuration that can appropriately extract an image that hinders improvement in model accuracy.

実施の形態１に係るデータ選別支援装置１０の構成図。1 is a configuration diagram of a data selection support device 10 according to Embodiment 1. FIG. 実施の形態１に係るデータ選別支援装置１０の動作の全体的な流れを示すフローチャート。4 is a flowchart showing the overall flow of operations of the data selection support device 10 according to Embodiment 1; 実施の形態１に係るデータ選別支援装置１０の動作の詳細を示すフローチャート。4 is a flowchart showing details of the operation of the data selection support device 10 according to the first embodiment; 実施の形態１に係る乖離度ｄ_ｉ，ｅの説明図。FIG. 5 is an explanatory diagram of the degree of divergence d _i,e according to the first embodiment; 実施の形態１に係る乖離度ｄ_ｉ，ｅの説明図。FIG. 5 is an explanatory diagram of the degree of divergence d _i,e according to the first embodiment; 実施の形態１の効果の説明図。FIG. 4 is an explanatory diagram of the effect of the first embodiment; 変形例１に係るデータ選別支援装置１０の構成図。FIG. 2 is a configuration diagram of a data selection support device 10 according to Modification 1; 実施の形態２に係るデータ選別支援装置１０の構成図。FIG. 2 is a configuration diagram of a data selection support device 10 according to Embodiment 2; 実施の形態２に係るデータ選別支援装置１０の動作の詳細を示すフローチャート。8 is a flowchart showing details of the operation of the data selection support device 10 according to the second embodiment; 実施の形態２の効果の説明図。FIG. 8 is an explanatory diagram of the effect of the second embodiment; データ数が多いラベルの画像とデータ数が少ないラベルの画像との確信度の変化を示す図。The figure which shows the change of the certainty degree of the image of the label with many data, and the image of the label with few data. データ数が少ないラベルの画像との確信度の変化を示す図。The figure which shows the change of the confidence with the image of the label with few data. データ数が少ないラベルの画像における精度向上の妨げとなる画像の乖離度ｄ_ｉ，ｅを示す図。FIG. 10 is a diagram showing the degree of divergence d _i,e of an image that hinders accuracy improvement in an image of a label with a small number of data; データ数が多いラベルの画像との確信度の変化を示す図。The figure which shows the change of the certainty factor with the image of the label with many data. データ数が多いラベルの画像における精度向上の妨げとなる画像の乖離度ｄ_ｉ，ｅを示す図。FIG. 10 is a diagram showing the degree of divergence d _i,e of an image that hinders accuracy improvement in an image labeled with a large amount of data; 実施の形態３に係るデータ選別支援装置１０の構成図。FIG. 10 is a configuration diagram of a data selection support device 10 according to Embodiment 3; 実施の形態３に係るデータ選別支援装置１０の動作の詳細を示すフローチャート。10 is a flowchart showing details of the operation of the data selection support device 10 according to the third embodiment; どのラベルが正しいかの見分けがつき難い画像の説明図。An explanatory diagram of an image in which it is difficult to distinguish which label is correct. 実施の形態４に係るデータ選別支援装置１０の構成図。FIG. 10 is a configuration diagram of a data selection support device 10 according to Embodiment 4; 実施の形態４に係るデータ選別支援装置１０の動作の詳細を示すフローチャート。FIG. 11 is a flow chart showing details of the operation of the data selection support device 10 according to the fourth embodiment; FIG.

実施の形態１．
＊＊＊構成の説明＊＊＊
図１を参照して、実施の形態１に係るデータ選別支援装置１０の構成を説明する。
データ選別支援装置１０は、コンピュータである。
データ選別支援装置１０は、プロセッサ１１と、メモリ１２と、ストレージ１３と、通信インタフェース１４とのハードウェアを備える。プロセッサ１１は、信号線を介して他のハードウェアと接続され、これら他のハードウェアを制御する。 Embodiment 1.
*** Configuration description ***
The configuration of the data selection support device 10 according to the first embodiment will be described with reference to FIG.
The data selection support device 10 is a computer.
The data selection support device 10 includes hardware including a processor 11 , a memory 12 , a storage 13 and a communication interface 14 . The processor 11 is connected to other hardware via signal lines and controls these other hardware.

プロセッサ１１は、プロセッシングを行うＩＣである。ＩＣはＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔの略である。プロセッサ１１は、具体例としては、ＣＰＵ、ＤＳＰ、ＧＰＵである。ＣＰＵは、ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔの略である。ＤＳＰは、ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒの略である。ＧＰＵは、ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔの略である。 The processor 11 is an IC that performs processing. IC is an abbreviation for Integrated Circuit. The processor 11 is, for example, a CPU, DSP, or GPU. CPU is an abbreviation for Central Processing Unit. DSP is an abbreviation for Digital Signal Processor. GPU is an abbreviation for Graphics Processing Unit.

メモリ１２は、データを一時的に記憶する記憶装置である。メモリ１２は、具体例としては、ＳＲＡＭ、ＤＲＡＭである。ＳＲＡＭは、ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙの略である。ＤＲＡＭは、ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙの略である。 The memory 12 is a storage device that temporarily stores data. Specific examples of the memory 12 are SRAM and DRAM. SRAM is an abbreviation for Static Random Access Memory. DRAM is an abbreviation for Dynamic Random Access Memory.

ストレージ１３は、データを保管する記憶装置である。ストレージ１３は、具体例としては、ＨＤＤである。ＨＤＤは、ＨａｒｄＤｉｓｋＤｒｉｖｅの略である。また、ストレージ１３は、ＳＤ（登録商標）メモリカード、ＣｏｍｐａｃｔＦｌａｓｈ（登録商標）、ＮＡＮＤフラッシュ、フレキシブルディスク、光ディスク、コンパクトディスク、ブルーレイ（登録商標）ディスク、ＤＶＤといった可搬記録媒体であってもよい。ＳＤは、ＳｅｃｕｒｅＤｉｇｉｔａｌの略である。ＤＶＤは、ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋの略である。 The storage 13 is a storage device that stores data. A specific example of the storage 13 is an HDD. HDD is an abbreviation for Hard Disk Drive. The storage 13 may be a portable recording medium such as an SD (registered trademark) memory card, CompactFlash (registered trademark), NAND flash, flexible disk, optical disk, compact disk, Blu-ray (registered trademark) disk, or DVD. SD is an abbreviation for Secure Digital. DVD is an abbreviation for Digital Versatile Disk.

通信インタフェース１４は、外部の装置と通信するためのインタフェースである。通信インタフェース１４は、具体例としては、Ｅｔｈｅｒｎｅｔ（登録商標）、ＵＳＢ、ＨＤＭＩ（登録商標）のポートである。ＵＳＢは、ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓの略である。ＨＤＭＩは、Ｈｉｇｈ－ＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅの略である。 The communication interface 14 is an interface for communicating with an external device. The communication interface 14 is, for example, an Ethernet (registered trademark), USB, or HDMI (registered trademark) port. USB is an abbreviation for Universal Serial Bus. HDMI is an abbreviation for High-Definition Multimedia Interface.

データ選別支援装置１０は、機能構成要素として、入力受付部２１と、精度計算部２２とを備える。精度計算部２２は、確信度平均計算部２２１と、確信度ばらつき計算部２２２と、乖離度計算部２２３とを備える。データ選別支援装置１０の各機能構成要素の機能はソフトウェアにより実現される。
ストレージ１３には、データ選別支援装置１０の各機能構成要素の機能を実現するプログラムが格納されている。このプログラムは、プロセッサ１１によりメモリ１２に読み込まれ、プロセッサ１１によって実行される。これにより、データ選別支援装置１０の各機能構成要素の機能が実現される。 The data selection support device 10 includes an input reception unit 21 and an accuracy calculation unit 22 as functional components. The accuracy calculator 22 includes a certainty average calculator 221 , a certainty variation calculator 222 , and a divergence calculator 223 . The function of each functional component of the data selection support device 10 is implemented by software.
The storage 13 stores a program that implements the function of each functional component of the data selection support device 10 . This program is read into the memory 12 by the processor 11 and executed by the processor 11 . Thereby, the function of each functional component of the data selection support device 10 is realized.

図１では、プロセッサ１１は、１つだけ示されていた。しかし、プロセッサ１１は、複数であってもよく、複数のプロセッサ１１が、各機能を実現するプログラムを連携して実行してもよい。 Only one processor 11 is shown in FIG. However, there may be a plurality of processors 11, and the plurality of processors 11 may cooperate to execute programs that implement each function.

＊＊＊動作の説明＊＊＊
図２から図５を参照して、実施の形態１に係るデータ選別支援装置１０の動作を説明する。
実施の形態１に係るデータ選別支援装置１０の動作手順は、実施の形態１に係るデータ選別支援方法に相当する。また、実施の形態１に係るデータ選別支援装置１０の動作を実現するプログラムは、実施の形態１に係るデータ選別支援プログラムに相当する。 ***Description of operation***
The operation of the data selection support device 10 according to the first embodiment will be described with reference to FIGS. 2 to 5. FIG.
The operation procedure of the data selection support device 10 according to the first embodiment corresponds to the data selection support method according to the first embodiment. Also, a program that realizes the operation of the data selection support device 10 according to the first embodiment corresponds to the data selection support program according to the first embodiment.

図２を参照して、実施の形態１に係るデータ選別支援装置１０の動作の全体的な流れを説明する。
（ステップＳ１：学習処理）
学習装置３１は、複数の訓練データ４１と、複数の検証データ４２とを入力として、ディープラーニングにより学習を行い、モデル３２を更新する。訓練データ４１は、モデル３２の学習を行うためのデータである。検証データ４２は、学習されたモデル３２の精度を検証するためのデータである。
ディープラーニングにより学習を行う際、各訓練データ４１についての認識結果が得られる。また、認識結果についての確信度が計算される。本実施の形態では確信度は、０から１の範囲で算出されるものとする。認識結果とは、例えば、画像に認識対象が含まれるか否かである。確信度は、認識結果がどの程度の確からしさ（ｉ番目の訓練データｘ_ｉの正解ラベルｙ_ｉ ^＊であるか）で認識されたかである。
ディープラーニングでは、複数の訓練データ４１それぞれを用いて、複数回繰り返し学習が行われる。同じ訓練データ４１を用いて学習を繰り返す回数をエポック数Ｅと呼ぶ。エポック数Ｅは、２以上である。ｅ度目の学習であることをエポックｅと呼ぶ。つまり、同じ訓練データ４１を用いて、エポック１からエポックＥまでの学習が行われる。 An overall flow of operations of the data selection support device 10 according to the first embodiment will be described with reference to FIG.
(Step S1: learning process)
The learning device 31 receives a plurality of training data 41 and a plurality of verification data 42 and performs learning by deep learning to update the model 32 . The training data 41 is data for learning the model 32 . The verification data 42 is data for verifying the accuracy of the learned model 32 .
When performing learning by deep learning, a recognition result for each piece of training data 41 is obtained. Also, a confidence factor for the recognition result is calculated. In this embodiment, it is assumed that the certainty is calculated in the range of 0 to 1. The recognition result is, for example, whether or not the recognition target is included in the image. The degree of certainty is the degree of certainty (whether the i-th training data x _i is the correct label y _i ^* ) with which the recognition result is recognized.
In deep learning, learning is repeatedly performed multiple times using each of a plurality of training data 41 . The number of repetitions of learning using the same training data 41 is called the number of epochs E. FIG. The number of epochs E is two or more. The e-th learning is called an epoch e. That is, learning from epoch 1 to epoch E is performed using the same training data 41 .

（ステップＳ２：選別処理）
データ選別支援装置１０は、ステップＳ１で得られた確信度を取得する。データ選別支援装置１０は、確信度に基づき乖離度を計算する。そして、乖離度を利用して訓練データ４１の選別が行われる。 (Step S2: sorting process)
The data selection support device 10 acquires the certainty factor obtained in step S1. The data selection support device 10 calculates the degree of divergence based on the degree of certainty. Then, the training data 41 is sorted using the degree of divergence.

必要に応じて、ステップＳ１からステップＳ２の処理が繰り返し実行される。２度目以降に実行される際には、選別された訓練データ４１を入力として学習が行われる。これにより、精度の高いモデル３２が生成される。
またステップＳ１の処理は、ディープラーニングによらずエポックごとに確信度が取得できる機械学習で学習してもよい。 The processing from step S1 to step S2 is repeatedly executed as necessary. When it is executed for the second time or later, learning is performed with the selected training data 41 as input. As a result, a model 32 with high accuracy is generated.
In addition, the process of step S1 may be learned by machine learning that can acquire the degree of certainty for each epoch without relying on deep learning.

図３を参照して、実施の形態１に係るデータ選別支援装置１０の動作の詳細を説明する。
（ステップＳ１１：入力受付処理）
入力受付部２１は、図２のステップＳ１で得られた各訓練データ４１についての確信度を取得する。具体的には、入力受付部２１は、各訓練データ４１について、エポック数Ｅ以下の各エポックｅにおける確信度を取得する。 Details of the operation of the data selection support device 10 according to the first embodiment will be described with reference to FIG.
(Step S11: Input reception processing)
The input reception unit 21 acquires the certainty factor for each training data 41 obtained in step S1 of FIG. Specifically, the input receiving unit 21 acquires the certainty factor in each epoch e equal to or less than the epoch number E for each piece of training data 41 .

（ステップＳ１２：確信度平均計算処理）
確信度平均計算部２２１は、エポック数Ｅ以下のいずれかのエポックｅについて、数１１に示すように、複数の訓練データ４１それぞれに対するモデル３２の確信度の平均値μ_ｅを計算する。

Ｎは訓練データ４１の数である。ｐ_{ｉ，θ（ｅ）}（ｙ_ｉ ^＊｜ｘ_ｉ）は、エポックｅにおいてパラメータθのモデルが算出した、ｉ番目の訓練データ４１の画像ｘ_ｉの正解ラベルｙ_ｉ ^＊に対する確信度である。 (Step S12: Confidence Average Calculation Process)
The certainty average calculation unit 221 calculates the average value μ _e of the certainty of the model 32 for each of the plurality of training data 41 as shown in Equation 11 for any epoch e equal to or less than the epoch number E.

N is the number of training data 41 . p _i,θ(e) (y _i ^* |x _i ) is the confidence factor for the correct label y _i ^* of the i-th training data 41 image x _i calculated by the model with parameter θ at epoch e.

具体例としては、確信度平均計算部２２１は、最終エポックであるエポックＥについてのモデル３２の確信度の平均値μ_Ｅを計算する。なお、確信度平均計算部２２１は、最終エポックに限らず、途中段階のエポックｅ（ｅ＜Ｅ）についてのモデル３２の確信度の平均値μ_ｅを計算してもよい。また、確信度平均計算部２２１は、複数のエポックｅについてのモデル３２の確信度の平均値μ_ｅを計算してもよい。 As a specific example, the confidence factor average calculation unit 221 calculates the average value μ _E of the confidence factors of the model 32 for the epoch E, which is the final epoch. Note that the confidence factor average calculation unit 221 may calculate the average value μ _e of the confidence factors of the model 32 not only for the final epoch but also for an epoch e (e<E) in the middle stage. Further, the certainty average calculation unit 221 may calculate the average value μ _e of the certainty of the model 32 for a plurality of epochs e.

実施の形態１では、確信度平均計算部２２１は、エポック数Ｅ以下の各エポックｅについて、モデル３２の確信度の平均値μ_ｅを計算する。つまり、確信度平均計算部２２１は、ｅ＝１，．．．，Ｅの各エポックｅについて、モデル３２の確信度の平均値μ_ｅを計算する。 In Embodiment 1, the certainty average calculation unit 221 calculates the average value μ _e of the certainty of the model 32 for each epoch e equal to or less than the epoch number E. FIG. That is, the certainty average calculation unit 221 calculates e=1, . . . , E, compute the mean μ _e of the confidence of the model 32 .

（ステップＳ１３：確信度ばらつき計算処理）
確信度ばらつき計算部２２２は、エポックｅについて、数１２に示すように、複数の訓練データ４１それぞれに対するモデル３２の確信度のばらつきσ_ｅを計算する。エポックｅは、ステップＳ１２でモデル３２の確信度の平均値μ_ｅが計算されたエポックである。

ここでは、ばらつきとして標準偏差が計算される。しかし、ばらつきとして分散が計算されてもよい。 (Step S13: Confidence Variation Calculation Process)
The certainty variation calculation unit 222 calculates the variation σ _e of the certainty of the model 32 for each of the plurality of training data 41 as shown in Equation 12 for the epoch e. The epoch e is the epoch in which the mean value μ _e of the confidence factor of the model 32 was calculated in step S12.

Here, the standard deviation is calculated as the variation. However, the variance may be calculated as a scatter.

実施の形態１では、エポック数Ｅ以下の各エポックｅについて、モデル３２の確信度の平均値μ_ｅが計算される。そのため、確信度ばらつき計算部２２２は、エポック数Ｅ以下の各エポックｅについて、モデル３２の確信度のばらつきσ_ｅを計算する。 In Embodiment 1, for each epoch e equal to or less than the number of epochs E, the average value μ _e of the confidence factor of the model 32 is calculated. Therefore, the certainty factor variation calculation unit 222 calculates the certainty factor variation σ _e of the model 32 for each epoch e equal to or less than the number of epochs E.

（ステップＳ１４：乖離度計算処理）
乖離度計算部２２３は、複数の訓練データ４１それぞれについて、数１３に示すように、エポックｅにおけるモデル３２の確信度と、平均値μ_ｅとの間の乖離度ｄ_ｉ，ｅを計算する。平均値μ_ｅは、ステップＳ１２で計算された値である。エポックｅは、ステップＳ１２でモデル３２の確信度の平均値μ_ｅが計算されたエポックである。

(Step S14: Deviation degree calculation process)
The divergence calculation unit 223 calculates the divergence d _i,e between the confidence of the model 32 at the epoch e and the average value μ _e for each of the plurality of training data 41 as shown in Equation 13. The average value μ _e is the value calculated in step S12. The epoch e is the epoch in which the mean value μ _e of the confidence factor of the model 32 was calculated in step S12.

図４に示すように、乖離度ｄ_ｉ，ｅは、処理対象の訓練データ４１についての確信度と、全ての訓練データ４１の確信度の平均値μ_ｅとの距離である。つまり、乖離度ｄ_ｉ，ｅは、処理対象の訓練データ４１についての確信度が、平均値μ_ｅと比べて、どの程度高いか又は低いかを示す。 As shown in FIG. 4, the degree of divergence d _i,e is the distance between the certainty of the training data 41 to be processed and the average value μ _e of the certainty of all the training data 41 . That is, the degree of divergence d _i,e indicates how high or low the degree of certainty for the training data 41 to be processed is compared to the average value μ _e .

数１３で計算される乖離度ｄ_ｉ，ｅを用いてもよい。しかし、実施の形態１では、乖離度計算部２２３は、数１４に示すように、処理対象の訓練データ４１についての確信度と、全ての訓練データ４１の確信度の平均値μ_ｅとの距離を、確信度のばらつきσ_ｅで除して乖離度ｄ_ｉ，ｅを計算する。

これにより、乖離度ｄ_ｉ，ｅは、確信度のばらつきが小さい場合には、絶対値が大きくなり、確信度のばらつきが大きい場合には、絶対値が小さくなる。つまり、処理対象の訓練データ４１についての確信度と、全ての訓練データ４１の確信度の平均値μ_ｅとの距離が近い場合でも、確信度のばらつきが小さい場合には、乖離度ｄ_ｉ，ｅは大きくなる。一方、処理対象の訓練データ４１についての確信度と、全ての訓練データ４１の確信度の平均値μ_ｅとの距離が遠い場合でも、確信度のばらつきが大きい場合には、乖離度ｄ_ｉ，ｅは小さくなる。 The degree of divergence d _i,e calculated by Equation 13 may be used. However, in Embodiment 1, as shown in Equation 14, the divergence calculation unit 223 calculates the distance between the certainty of the training data 41 to be processed and the average value μ _e of the certainty of all the training data 41. is divided by the variation σ _e of the confidence to calculate the degree of divergence d _i,e .

As a result, the degree of divergence d _i,e has a large absolute value when the variation in the certainty is small, and a small absolute value when the variation in the certainty is large. In other words, even if the distance between the certainty of the training data 41 to be processed and the average value μ _e of the certainty of all the training data 41 is close, if the variation of the certainty is small, the divergence di _{, e} becomes larger. On the other hand, even when the distance between the certainty of the training data 41 to be processed and the average value μ _e of the certainty of all the training data 41 is long, if the variation of the certainty is large, the divergence di _{, e} becomes smaller.

実施の形態１では、エポック数Ｅ以下の各エポックｅについて、モデル３２の確信度の平均値μ_ｅが計算される。そのため、乖離度計算部２２３は、エポック数Ｅ以下の各エポックｅおける、複数の訓練データ４１それぞれについての乖離度ｄ_ｉ，ｅを計算する。そして、実施の形態１では、乖離度計算部２２３は、数１５に示すように、複数の訓練データ４１それぞれについて、エポック数Ｅ以下の全てのエポックｅにおける乖離度ｄ_ｉ，ｅの平均値μ_ｉを計算する。なお、乖離度計算部２２３は、平均値μ_ｉではく、乖離度ｄ_ｉ，ｅの中央値といった他の統計値を計算してもよい。

In Embodiment 1, for each epoch e equal to or less than the number of epochs E, the average value μ _e of the confidence factor of the model 32 is calculated. Therefore, the divergence calculator 223 calculates the divergence d _i,e for each of the plurality of training data 41 in each epoch e equal to or less than the epoch number E. Then, in the first embodiment, the divergence degree calculation unit 223 calculates _the mean value μ Compute _i . Note that the divergence degree calculation unit 223 may calculate other statistical values such as the median value of the divergence degrees d _i,e instead of the average value μ _i .

つまり、図５に示すように、各エポックｅにおける乖離度ｄ_ｉ，ｅが計算される。そして、複数の訓練データ４１それぞれについて、全てのエポックｅにおける乖離度ｄ_ｉ，ｅの平均値μ_ｉが計算される。 That is, as shown in FIG. 5, the degree of divergence d _i,e at each epoch e is calculated. Then, for each of the plurality of training data 41, the average μ _i of the degree of divergence d _i,e in all epochs e is calculated.

＊＊＊実施の形態１の効果＊＊＊
以上のように、実施の形態１に係るデータ選別支援装置１０は、複数の訓練データそれぞれについて、モデルの確信度と平均値μ_ｅとの間の乖離度ｄ_ｉ，ｅを計算する。この乖離度ｄ_ｉ，ｅを用いることにより、モデル３２の精度向上を妨げる画像を適切に抽出可能な構成を実現することができる。 *** Effect of Embodiment 1 ***
As described above, the data selection support apparatus 10 according to Embodiment 1 calculates the degree of divergence d _i,e between the certainty of the model and the average value μ _e for each of a plurality of pieces of training data. By using the degree of divergence d _i,e , it is possible to realize a configuration that can appropriately extract an image that hinders improvement in the accuracy of the model 32 .

エポックｅにおいて、多くの訓練データ４１の確信度が低いとする。単純に確信度を用いて訓練データ４１を評価してしまうと、多くの訓練データ４１が適切でないと判断されてしまう可能性がある。例えば、図６に示すように、確信度の平均値μ_ｅが０．３５であったとする。確信度の閾値として０．４０が用いられ、閾値以下の確信度であった訓練データ４１が適切でないと判断されるとする。この場合には、多くの訓練データ４１が適切でないと判断されてしまう。
しかし、乖離度ｄ_ｉ，ｅを用いた場合には、全体的に確信度が低い場合であっても、特に確信度が低い訓練データ４１だけを抽出することができる。全体的に確信度が低い理由は、モデル３２が十分学習できていないエポック数又は対象ラベルの数が他のラベルよりも少ないなどであるが、モデル３２の精度向上を妨げる画像を適切に抽出できる可能性がある。 Suppose that many training data 41 have low confidence in epoch e. If the training data 41 is simply evaluated using the certainty factor, many of the training data 41 may be determined to be inappropriate. For example, as shown in FIG. 6, assume that the average value _μe of the confidence factor is 0.35. Assume that 0.40 is used as a confidence threshold, and that the training data 41 having a confidence below the threshold is determined to be inappropriate. In this case, many training data 41 are determined to be inappropriate.
However, when the degree of divergence d _i,e is used, only the training data 41 with a particularly low degree of certainty can be extracted even when the degree of certainty is generally low. The reason for the overall low confidence is that the number of epochs where the model 32 has not been sufficiently learned or the number of target labels is smaller than other labels, but images that hinder the accuracy improvement of the model 32 can be appropriately extracted. there is a possibility.

特に、実施の形態１では、距離を確信度のばらつきσ_ｅで除して乖離度ｄ_ｉ，ｅが計算される。これにより、確信度のばらつきσ_ｅを考慮して、モデル３２の精度向上を妨げる画像を適切に抽出可能となる。 In particular, in Embodiment 1, the divergence degree d _i,e is calculated by dividing the distance by the variation σ _e of the certainty. This makes it possible to appropriately extract an image that hinders improvement in the accuracy of the model 32 in consideration of the variation σ _e of the confidence.

また、実施の形態１では、エポック数Ｅ以下の全てのエポックｅにおける乖離度ｄ_ｉ，ｅの平均値μ_ｉが計算される。これにより、乖離度ｄ_ｉ，ｅの時間的な推移を考慮して、モデル３２の精度向上を妨げる画像を適切に抽出可能となる。 Further, in Embodiment 1, the average value μ _i of the degrees of divergence d _i,e in all epochs e equal to or less than the number of epochs E is calculated. This makes it possible to appropriately extract an image that hinders improvement in the accuracy of the model 32 in consideration of the temporal transition of the degree of divergence d _i,e .

＊＊＊他の構成＊＊＊
＜変形例１＞
実施の形態１では、各機能構成要素がソフトウェアで実現された。しかし、変形例１として、各機能構成要素はハードウェアで実現されてもよい。この変形例１について、実施の形態１と異なる点を説明する。 ***Other Configurations***
<Modification 1>
In Embodiment 1, each functional component is realized by software. However, as Modification 1, each functional component may be implemented by hardware. Differences between the first modification and the first embodiment will be described.

図７を参照して、変形例１に係るデータ選別支援装置１０の構成を説明する。
各機能構成要素がハードウェアで実現される場合には、データ選別支援装置１０は、プロセッサ１１とメモリ１２とストレージ１３とに代えて、電子回路１５を備える。電子回路１５は、各機能構成要素と、メモリ１２と、ストレージ１３との機能とを実現する専用の回路である。 The configuration of the data selection support device 10 according to Modification 1 will be described with reference to FIG.
When each functional component is realized by hardware, the data selection support device 10 includes an electronic circuit 15 instead of the processor 11, memory 12 and storage 13. FIG. The electronic circuit 15 is a dedicated circuit that realizes the functions of each functional component, memory 12 and storage 13 .

電子回路１５としては、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ロジックＩＣ、ＧＡ、ＡＳＩＣ、ＦＰＧＡが想定される。ＧＡは、ＧａｔｅＡｒｒａｙの略である。ＡＳＩＣは、ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔの略である。ＦＰＧＡは、Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙの略である。
各機能構成要素を１つの電子回路１５で実現してもよいし、各機能構成要素を複数の電子回路１５に分散させて実現してもよい。 Electronic circuit 15 may be a single circuit, multiple circuits, programmed processors, parallel programmed processors, logic ICs, GAs, ASICs, FPGAs. GA is an abbreviation for Gate Array. ASIC is an abbreviation for Application Specific Integrated Circuit. FPGA is an abbreviation for Field-Programmable Gate Array.
Each functional component may be implemented by one electronic circuit 15, or each functional component may be implemented by being distributed among a plurality of electronic circuits 15. FIG.

＜変形例２＞
変形例２として、一部の各機能構成要素がハードウェアで実現され、他の各機能構成要素がソフトウェアで実現されてもよい。 <Modification 2>
As a modification 2, some functional components may be implemented by hardware and other functional components may be implemented by software.

プロセッサ１１とメモリ１２とストレージ１３と電子回路１５とを処理回路という。つまり、各機能構成要素の機能は、処理回路により実現される。 The processor 11, the memory 12, the storage 13 and the electronic circuit 15 are called a processing circuit. That is, the function of each functional component is realized by the processing circuit.

実施の形態２．
実施の形態２では、乖離度ｄ_ｉ，ｅのばらつきσ_ｉを計算する点が実施の形態１と異なる。実施の形態２では、この異なる点を説明し、同一の点については説明を省略する。 Embodiment 2.
The second embodiment differs from the first embodiment in that the variation σ _i of the degree of divergence d _i,e is calculated. In the second embodiment, this different point will be explained, and the explanation of the same point will be omitted.

＊＊＊構成の説明＊＊＊
図８を参照して、実施の形態２に係るデータ選別支援装置１０の構成を説明する。
データ選別支援装置１０は、機能構成要素として、乖離度ばらつき計算部２２４を備える点が図１に示すデータ選別支援装置１０と異なる。乖離度ばらつき計算部２２４の機能は、他の機能構成要素と同様に、ソフトウェア又はハードウェアによって実現される。 *** Configuration description ***
The configuration of the data selection support device 10 according to the second embodiment will be described with reference to FIG.
The data selection support device 10 differs from the data selection support device 10 shown in FIG. 1 in that it includes a divergence degree variation calculator 224 as a functional component. The functions of the divergence degree variation calculator 224 are implemented by software or hardware in the same manner as other functional components.

＊＊＊動作の説明＊＊＊
図９を参照して、実施の形態２に係るデータ選別支援装置１０の動作を説明する。
ステップＳ２１からステップＳ２４の処理は、図３のステップＳ１１からステップＳ１４の処理と同じである。 ***Description of operation***
The operation of the data selection support device 10 according to the second embodiment will be described with reference to FIG.
The processing from step S21 to step S24 is the same as the processing from step S11 to step S14 in FIG.

（ステップＳ２５：乖離度ばらつき計算処理）
乖離度ばらつき計算部２２４は、複数の訓練データ４１それぞれについて、数１６に示すように、ステップＳ２４で計算された乖離度ｄ_ｉ，ｅのばらつきσ_ｉを計算する。

ここでは、ばらつきとして標準偏差が計算される。しかし、ばらつきとして分散が計算されてもよい。 (Step S25: deviation degree variation calculation process)
The divergence degree variation calculation unit 224 calculates the variation σ _i of the divergence degree d _i,e calculated in step S 24 as shown in Equation 16 for each of the plurality of training data 41 .

＊＊＊実施の形態２の効果＊＊＊
以上のように、実施の形態２に係るデータ選別支援装置１０は、乖離度ｄ_ｉ，ｅのばらつきσ_ｉを計算する。このばらつきσ_ｉを用いることにより、モデル３２の精度向上に寄与する画像を適切に抽出可能な構成を実現することができる。 *** Effect of Embodiment 2 ***
As described above, the data selection support device 10 according to the second embodiment calculates the variation σ _i of the degree of divergence d _i,e . By using this variation σ _i , it is possible to realize a configuration that can appropriately extract an image that contributes to improving the accuracy of the model 32 .

図１０では、乖離度ｄ_ｉ，ｅの平均値μ_ｉを縦軸とし、乖離度ｄ_ｉ，ｅのばらつきσ_ｉを横軸として、複数の訓練データ４１をプロットした特徴マップを示す。図１０では、訓練データ４１が正常か異常かをモデル３２が識別する場合を想定している。最終エポックでの認識結果が正常である場合には正常の点がプロットされ、異常の場合には異常の点がプロットされている。
乖離度ｄ_ｉ，ｅの平均値μ_ｉが第１閾値（図１０では－２．０）以上の場合には、画像が学習容易なものに分類される。乖離度ｄ_ｉ，ｅのばらつきσ_ｉが第２閾値（図１０では３）以上の場合には、画像が汎用化性能を向上させるものに分類される。乖離度ｄ_ｉ，ｅの平均値μ_ｉが第１閾値未満の場合には、画像が学習困難なものに分類される。
このように、乖離度ｄ_ｉ，ｅの平均値μ_ｉと、乖離度ｄ_ｉ，ｅのばらつきσ_ｉとに基づき、訓練データ４１である画像を分類することができる。そして、例えば、学習困難なものに分類された画像については訓練データ４１から削除するといったことが考えられる。また、汎用化性能を向上させるものに分類された画像については、類似する画像を訓練データ４１に追加するといったことが考えられる。また、学習容易なものに分類された画像については、数を減らすために間引くといったことが考えられる。 FIG. 10 shows a feature map in which a plurality of training data 41 are plotted, with the vertical axis representing the average value μ _i of the degree of deviation d _i,e and the horizontal axis representing the variation σ _i of the degree of deviation d _i,e . FIG. 10 assumes that the model 32 identifies whether the training data 41 is normal or abnormal. Normal points are plotted when the recognition result in the final epoch is normal, and abnormal points are plotted when it is abnormal.
When the average value μ _i of the degree of divergence d _i,e is equal to or greater than the first threshold (−2.0 in FIG. 10), the image is classified as easy to learn. When the variation σ _i of the degree of divergence d _i,e is equal to or greater than the second threshold (3 in FIG. 10), the image is classified as one that improves generalization performance. When the average value μ _i of the degree of divergence d _i,e is less than the first threshold, the image is classified as difficult to learn.
In this manner, the images, which are the training data 41, can be classified based on the average μ _i of the degree of divergence d _i,e and the variation σ _i of the degree of divergence d _i,e . Then, for example, images classified as difficult to learn may be deleted from the training data 41 . Further, it is conceivable to add similar images to the training data 41 for images classified as those that improve the generalization performance. Images classified as easy-to-learn may be thinned out to reduce the number of images.

非特許文献１に記載されたＤＣとの比較から、実施の形態２に係るデータ選別支援装置１０の効果を説明する。
非特許文献１では、数１７に示すように、複数の訓練データ４１それぞれについて、各エポックにおける確信度の平均値μ＾_ｉが計算される。また、非特許文献１では、数１８に示すように、複数の訓練データ４１それぞれについて、確信度のばらつきσ＾_ｉが計算される。

そして、非特許文献１では、確信度の平均値μ＾_ｉを縦軸とし、確信度のばらつきσ＾_ｉを横軸として、複数の訓練データ４１をプロットした特徴マップが示されている。非特許文献１では、確信度の平均値μ＾_ｉが高い画像が学習容易なものに分類される。確信度のばらつきσ＾_ｉが高い画像が汎用化性能を向上させるものに分類される。確信度の平均値μ＾_ｉが低い画像が学習困難なものに分類される。 The effect of the data selection support device 10 according to the second embodiment will be described by comparison with the DC described in Non-Patent Document 1. FIG.
In Non-Patent Document 1, as shown in Equation 17, for each of a plurality of training data 41, the average value _μ̂i of confidence factors in each epoch is calculated. Further, in Non-Patent Document 1, as shown in Equation 18, the confidence factor variation σ^ _i is calculated for each of the plurality of training data 41 .

Non-Patent Document 1 shows a feature map in which a plurality of training data 41 are plotted with the average value μ^ _i of the confidence factor on the vertical axis and the variation of the confidence factor σ^ _i on the horizontal axis. In Non-Patent Document 1, an image with a high average confidence factor μ^ _i is classified as an easy-to-learn image. Images with high confidence variance σ^ _i are classified as those with improved generalization performance. An image with a low average confidence factor μ^ _i is classified as difficult to learn.

ディープラーニングでは、確信度が高くなる傾向がある。そのため、訓練データが学習容易なものに分類され易い傾向がある（非特許文献２参照）。また、モデルの学習が十分に行われた場合には、データ数が少ないラベルの画像は、確信度が高くなることがある。このとき、データ数が少ないラベルの画像は、ばらつきが大きくなることもある。そのため、学習困難なものに分類されるべきデータ数が少ないラベルの画像が、学習容易なもの又は汎用化性能を向上させるものに分類されてしまう可能性がある。そのため、非特許文献１に記載されたＤＣでは、モデルの精度向上を妨げる画像を適切に抽出することが困難である。 In deep learning, confidence tends to be high. Therefore, training data tends to be classified as easy to learn (see Non-Patent Document 2). In addition, when the model is sufficiently trained, images labeled with a small amount of data may have a high degree of certainty. At this time, the image of the label with a small number of data may have large variations. Therefore, there is a possibility that an image with a label with a small amount of data that should be classified as difficult-to-learn may be classified as easy-to-learn or to improve generalization performance. Therefore, with the DC described in Non-Patent Document 1, it is difficult to appropriately extract an image that hinders improvement in model accuracy.

図１１において、点線は、データ数が多いラベルの画像のエポックごとの確信度の平均の推移を示している。実線は、データ数が少ないラベルの画像のエポックごとの確信度の平均の推移を示している。図１１に示すように、データ数が多いラベルの画像とデータ数が少ないラベルの画像とは、エポックに対する確信度の変化が異なる。データ数が多いラベルの画像は、訓練データが多く、モデルが大量に特徴を学習できるため、初期のエポックで確信度が高い状態に達する。一方、データ数が少ないラベルの画像は、図１１に示すようにある程度のエポックになってから確信度が上がっていくか、もしくはずっと確信度が上がらないか、またはその間となる。しかし、データ数が少ないラベルの画像の確信度が上がっていく際は、ラベル以外の特徴を学習している場合がある。ラベル以外の特徴を学習している場合とは、例えば、画像に戦車があるか否かを学習しているはずが、木があるか否かを学習しているといった場合である。異常を検知するモデルでは、異常個所を学習しているはずが、異常個所以外を学習してしまうことがよくある。
これは、モデルにとって、ラベルよりもラベル以外の特徴が強調されて見えてしまったことが原因である。ラベル以外を学習してしまっている画像は訓練データとして有用ではない。そのため、ラベル以外を学習してしまっている可能性がある、データ数が少ないラベルの画像は、学習困難なものに分類される必要がある。
しかし、図１１において、エポック数Ｅが５０であるとすると、非特許文献１に記載されたＤＣでは、確信度の平均値μ＾_ｉは０．５ほどになる。また、確信度として高い値から低い値まで存在するため、確信度のばらつきσ＾_ｉは、高い値になる。そのため、データ数が少ないラベルの画像は、確信度が低いわけではないので、学習困難なものには分類されない。また、確信度が低い状態から高い状態に遷移しているため、汎用化性能を向上させるものに分類される。また、図１１において、エポック数Ｅが１００であるとすると、非特許文献１に記載されたＤＣでは、確信度の平均値μ＾_ｉは０．８ほどになる。また、確信度として高い値が多くなるため、確信度のばらつきσ＾_ｉは低い値になる。そのため、学習容易なものに分類されてしまう。
図１１のように、データ数が少ないラベルの画像の確信度が、最終的に１．０付近まで上がらなくとも、例えば０．５程度でも、エポック数Ｅが大きくモデルの学習が十分に行われた場合には、平均値μ＾_ｉは上昇するため学習困難なものには分類されない。したがって、ＤＣでは、誤った判断を導いてしまう。 In FIG. 11, the dotted line indicates the transition of the average confidence factor for each epoch of images labeled with a large amount of data. The solid line indicates the transition of the average confidence factor for each epoch of images labeled with a small amount of data. As shown in FIG. 11, the change in the degree of certainty with respect to the epoch is different for images labeled with a large number of data and images labeled with a small number of data. Images with more data labels reach a higher confidence state in the early epochs because the training data is more and the model can learn a lot of features. On the other hand, as shown in FIG. 11, for images with labels with a small number of data, the confidence increases after a certain amount of epoch, or the confidence never increases, or is in between. However, when the degree of certainty of an image labeled with a small amount of data increases, features other than the label may be learned. A case where a feature other than a label is learned is, for example, a case where whether or not an image has a tank is learned, but whether or not there is a tree is learned. Models that detect anomalies are supposed to have learned the location of anomalies, but they often learn other than the location of the anomalies.
This is because the model sees the features other than the label as being emphasized rather than the label. Images that have learned anything other than labels are not useful as training data. Therefore, it is necessary to classify images with labels with a small amount of data, which may have been learned other than labels, as difficult-to-learn images.
However, in FIG. 11, if the number of epochs E is 50, the average value μ^ _i of the confidence in the DC described in Non-Patent Document 1 is about 0.5. In addition, since certainty factors range from high values to low values, the variation σ^ _i of certainty factors becomes a high value. Therefore, an image with a label with a small amount of data does not have a low degree of certainty, so it is not classified as difficult to learn. In addition, since the state of confidence is transitioned from a state of low to a state of high, it is classified as one that improves generalization performance. Also, in FIG. 11, if the number of epochs E is 100, then the average value μ^ _i of the confidence in the DC described in Non-Patent Document 1 is about 0.8. Moreover, since there are many high values as the confidence, the variation σ^ _i of the confidence becomes a low value. Therefore, it is classified as easy to learn.
As shown in FIG. 11, even if the certainty of the image of the label with a small amount of data does not eventually rise to near 1.0, for example, even if it is about 0.5, the number of epochs E is large and the model is sufficiently learned. , the average value μ^ _i increases, so it is not classified as difficult to learn. Therefore, DC leads to erroneous decisions.

図１２では、データ数が少ないラベルの画像の確信度の推移が図１１とは別の形式で表されている。データ数が少ないラベルの画像の確信度は、ある程度のエポックになってから上がっている。そして、エポック数が５０になると、確信度は１．０に近い値になっている。そのため、上述した通り、データ数が少ないラベルの画像が学習困難なものには分類されない。
図１３では、図１２に対して、ある訓練データ４１の確信度と乖離度ｄ_ｉ，ｅとが追記されている。ここでのある訓練データ４１は、データ数が少ないラベルの画像のうち、精度向上の妨げとなる画像の訓練データ４１である。図１３に示すように、精度向上の妨げとなる訓練データ４１についても、エポックが進むに連れ、確信度は高くなる。そのため、ＤＣでは、データ数が少ないラベルの他の画像と同様に、ある訓練データ４１も、学習困難なものには分類されない。
しかし、他の画像についての確信度の平均値μ_ｅに比べ、精度向上の妨げとなる訓練データ４１の確信度は低くなる。そのため、乖離度ｄ_ｉ，ｅが低い値（負の大きな値）になり、乖離度ｄ_ｉ，ｅの平均値μ_ｉが低い値になる。したがって、実施の形態２に係るデータ選別支援装置１０では、精度向上の妨げとなる訓練データ４１を学習困難なものに分類することが可能である。
なお、エポックが進むにつれ確信度が変化したとしても、乖離度ｄ_ｉ，ｅが変化しなければ、乖離度ｄ_ｉ，ｅのばらつきσ_ｉは大きくならない。そのため、汎用化性能を向上させるものに分類されることもない。 In FIG. 12, transition of the certainty factor of an image of a label with a small number of data is represented in a format different from that in FIG. The confidence level of images with labels with a small amount of data has increased after a certain epoch. Then, when the number of epochs reaches 50, the certainty becomes a value close to 1.0. Therefore, as described above, an image with a label having a small number of data is not classified as difficult to learn.
In FIG. 13, the certainty factor and the divergence factor d _i,e of certain training data 41 are added to FIG. 12 . Some training data 41 here is training data 41 of an image that hinders accuracy improvement, among images with labels having a small number of data. As shown in FIG. 13, the confidence level increases as the epoch progresses, even for the training data 41 that hinders accuracy improvement. Therefore, in DC, some training data 41 is not classified as difficult to learn, like other images with labels having a small number of data.
However, the confidence of the training data 41, which hinders improvement in accuracy, is lower than the average value _μe of the confidence of other images. Therefore, the degree of divergence d _i,e becomes a low value (large negative value), and the average value μ _i of the degree of divergence d _i,e becomes a low value. Therefore, in the data selection support device 10 according to the second embodiment, it is possible to classify the training data 41, which hinders accuracy improvement, as difficult-to-learn data.
Note that even if the degree of certainty changes as the epoch progresses, if the degree of deviation d _i,e does not change, the variation σ _i of the degree of deviation d _i,e will not increase. Therefore, it is not categorized as improving general-purpose performance.

図１４では、データ数が多いラベルの画像の確信度の推移が図１１とは別の形式で表されている。データ数が多いラベルの画像の確信度は、初期段階のエポックから高くなっている。そのため、上述した通り、データ数が多いラベルの画像が学習困難なものには分類されず、学習が容易なものに分類される。
図１５では、図１４に対して、ある訓練データ４１の確信度と乖離度ｄ_ｉ，ｅとが追記されている。ここでのある訓練データ４１は、データ数が多いラベルの画像のうち、精度向上の妨げとなる画像の訓練データ４１である。データ数が多いラベルの画像にも、精度向上の妨げとなる画像の訓練データ４１が含まれる可能性がある。図１５に示すように、精度向上の妨げとなる訓練データ４１についても、エポックが進むに連れ、確信度は高くなることがある。これは、ディープラーニングでは、確信度が高くなる傾向があるためである。そのため、ＤＣでは、データ数が多いラベルの他の画像と同様に、ある訓練データ４１も、学習困難なものには分類されず、学習容易なものに分類される。
しかし、他の画像についての確信度の平均値μ_ｅに比べ、精度向上の妨げとなる訓練データ４１の確信度は低くなる。そのため、乖離度ｄ_ｉ，ｅが低い値になり、乖離度ｄ_ｉ，ｅの平均値μ_ｉが低い値になる。したがって、実施の形態２に係るデータ選別支援装置１０では、精度向上の妨げとなる訓練データ４１を学習困難なものに分類することが可能である。 In FIG. 14, transition of the certainty factor of the image of the label with a large amount of data is represented in a format different from that in FIG. Confidence for images with labels with a large number of data has been high since early epochs. Therefore, as described above, images with labels having a large amount of data are not classified as difficult-to-learn images, but are classified as easy-to-learn images.
In FIG. 15, the certainty factor and the divergence factor d _i,e of certain training data 41 are added to FIG. 14 . Some training data 41 here is training data 41 of an image that hinders improvement in accuracy among images labeled with a large number of data. There is a possibility that image training data 41 of images that hinder improvement in accuracy may be included in images labeled with a large amount of data. As shown in FIG. 15, even for training data 41 that hinders improvement in accuracy, the degree of certainty may increase as the epoch progresses. This is because deep learning tends to have a high degree of certainty. Therefore, in DC, some training data 41 is not classified as difficult to learn, but is classified as easy to learn, like other images with labels having a large number of data.
However, the confidence of the training data 41, which hinders improvement in accuracy, is lower than the average value _μe of the confidence of other images. Therefore, the degree of deviation d _i,e becomes a low value, and the average value μ _i of the degree of deviation d _i,e becomes a low value. Therefore, in the data selection support device 10 according to the second embodiment, it is possible to classify the training data 41, which hinders accuracy improvement, as difficult-to-learn data.

乖離度ｄ_ｉ，ｅの平均値μ_ｉは、学習の難易度を示している。また、乖離度ｄ_ｉ，ｅのばらつきσ_ｉは、学習の難易度の変化を示している。実施の形態２に係るデータ選別支援装置１０では、乖離度ｄ_ｉ，ｅの平均値μ_ｉと乖離度ｄ_ｉ，ｅのばらつきσ_ｉとを用いることにより、適切に学習容易であるか、学習困難であるかを判別可能である。 The average μ _i of the degree of divergence d _i,e indicates the difficulty of learning. Also, the variation σ _i of the degree of divergence d _i,e indicates the change in the difficulty of learning. In the data selection support device 10 according to Embodiment 2, by using the average value μ _i of the degree of deviation d _i,e and the variation σ _i of the degree of deviation d _{i, e} , it is possible to appropriately determine whether learning is easy or not. It is possible to determine whether it is difficult.

実施の形態３．
実施の形態３は、精度計算部２２によって計算された結果に基づき、訓練データ４１を選別する点が実施の形態１，２と異なる。実施の形態３では、この異なる点を説明し、同一の点については説明を省略する。
実施の形態３では、実施の形態２に機能を加えた場合について説明する。しかし、実施の形態１に機能を加えることも可能である。 Embodiment 3.
Embodiment 3 differs from Embodiments 1 and 2 in that the training data 41 is selected based on the results calculated by the accuracy calculator 22 . In the third embodiment, this different point will be explained, and the explanation of the same point will be omitted.
Embodiment 3 describes a case where a function is added to Embodiment 2. FIG. However, it is also possible to add functions to the first embodiment.

＊＊＊構成の説明＊＊＊
図１６を参照して、実施の形態３に係るデータ選別支援装置１０の構成を説明する。
データ選別支援装置１０は、機能構成要素として、データ選別部２３を備える点が図８に示すデータ選別支援装置１０と異なる。データ選別部２３の機能は、他の機能構成要素と同様に、ソフトウェア又はハードウェアによって実現される。 *** Configuration description ***
The configuration of the data selection support device 10 according to the third embodiment will be described with reference to FIG.
The data selection support device 10 differs from the data selection support device 10 shown in FIG. 8 in that it includes a data selection unit 23 as a functional component. The function of the data selector 23 is realized by software or hardware, like other functional components.

＊＊＊動作の説明＊＊＊
図１７を参照して、実施の形態３に係るデータ選別支援装置１０の動作を説明する。
ステップＳ３１からステップＳ３５の処理は、図９のステップＳ２１からステップＳ２５の処理と同じである。 ***Description of operation***
The operation of the data selection support device 10 according to the third embodiment will be described with reference to FIG.
The processing from step S31 to step S35 is the same as the processing from step S21 to step S25 in FIG.

（ステップＳ３６：データ選別処理）
データ選別部２３は、ステップＳ３４で計算された乖離度ｄ_ｉ，ｅの平均値μ_ｉと、ステップＳ３５で計算された乖離度ｄ_ｉ，ｅのばらつきσ_ｉとに基づき、訓練データ４１を選別する。
具体例としては、データ選別部２３は、図１０に基づき説明したように画像を分類する。つまり、データ選別部２３は、画像を、学習容易なものと、汎用化性能を向上させるものと、学習困難なものとに分類する。そして、データ選別部２３は、学習困難なものに分類された画像については訓練データ４１から削除してもよい。また、データ選別部２３は、学習容易なものに分類された画像については、数を減らすために間引いてもよい。データ選別部２３は、学習容易なものに分類された画像については、乖離度ｄ_ｉ，ｅの平均値μ_ｉが高いほど、多くの画像を間引くようにしてもよい。
なお、汎用化性能を向上させるものに分類された画像については、類似する画像を訓練データ４１に追加するように通知してもよい。 (Step S36: data sorting process)
The data selection unit 23 selects the training data 41 based on the average μ _i of the degrees of divergence d _i,e calculated in step S34 and the variation σ _i of the degrees of divergence d _i,e calculated in step S35. do.
As a specific example, the data selection unit 23 classifies images as described with reference to FIG. That is, the data selection unit 23 classifies the images into those that are easy to learn, those that improve generalization performance, and those that are difficult to learn. Then, the data selection unit 23 may delete images classified as difficult to learn from the training data 41 . In addition, the data selection unit 23 may thin out images classified as easy-to-learn images in order to reduce the number of images. The data selection unit 23 may thin out more images classified as easy-to-learn images as the average value μ _i of the degree of divergence d _i,e is higher.
Note that for images classified as those that improve the generalization performance, notification may be made to add similar images to the training data 41 .

＊＊＊実施の形態３の効果＊＊＊
以上のように、実施の形態３に係るデータ選別支援装置１０は、乖離度ｄ_ｉ，ｅの平均値μ_ｉと乖離度ｄ_ｉ，ｅのばらつきσ_ｉとに基づき、訓練データ４１を選別する。これにより、人手によらず、自動的に不要な訓練データ４１を削除することが可能になる。
不要な訓練データ４１とは、モデル３２の精度向上を妨げる可能性が高い訓練データ４１と、モデル３２の精度向上に不要な訓練データ４１とである。 *** Effect of Embodiment 3 ***
As described above, the data selection support device 10 according to Embodiment 3 selects the training data 41 based on the average value μ _i of the degrees of divergence d _i,e and the variation σ _i of the degrees of divergence d _i,e. . This makes it possible to automatically delete the unnecessary training data 41 without manual intervention.
The unnecessary training data 41 are the training data 41 that is likely to hinder the accuracy improvement of the model 32 and the training data 41 unnecessary for the accuracy improvement of the model 32 .

＊＊＊他の構成＊＊＊
＜変形例４＞
ラベル付けが間違っている訓練データ４１は、乖離度ｄ_ｉ，ｅの平均値μ_ｉが低くなる。しかし、ラベル付けが正しくても、乖離度ｄ_ｉ，ｅの平均値μ_ｉが低くなる場合がある。例えば、異なるラベル間で特徴が類似している場合には、乖離度ｄ_ｉ，ｅの平均値μ_ｉが低くなる可能性がある。また、特徴がユニークであり、数が少ないタイプの画像は、乖離度ｄ_ｉ，ｅの平均値μ_ｉが低くなる可能性がある。
図１８の（Ａ）に示す訓練データ４１は、犬の画像であり、ラベルが犬と付されており、適切な訓練データ４１である。しかし、一部の犬Ｘの画像は、（Ｂ）に示すモップの画像との区別がつき難い。そのため、訓練データ４１に、（Ａ）の犬Ｘの画像が少なく、（Ｂ）のようなモップの画像が多い場合には、犬Ｘの画像はラベルがモップの訓練データ４１の影響を受け、確信度が高くなりにくい。
犬Ｘのような画像については、訓練データ４１から削除してしまうのではなく、同じ見た目の犬の画像を訓練データ４１に追加することが望ましい。つまり、犬Ｘは、他の犬と比較して特徴がユニークである。そこで、同様の特徴を持つ犬の画像を訓練データ４１に追加して学習することが望ましい。しかし、実施の形態３で説明したように、単純に乖離度ｄ_ｉ，ｅの平均値μ_ｉの閾値で削除するか否かを決定してしまうと、犬Ｘのような画像は削除されてしまう可能性がある。 ***Other Configurations***
<Modification 4>
The incorrectly labeled training data 41 has a low average μ _i of the degree of divergence d _i,e . However, even if the labeling is correct, the average μ _i of the degree of divergence d _i,e may be low. For example, if the features are similar between different labels, the average μ _i of the degree of divergence d _i,e may be low. In addition, an image with unique features and a small number of types may have a low average μ _i of the degrees of divergence d _i,e .
The training data 41 shown in FIG. 18A is an image of a dog, labeled as dog, and is suitable training data 41 . However, some images of the dog X are difficult to distinguish from the image of the mop shown in (B). Therefore, if the training data 41 contains few images of the dog X in (A) and many images of the mop as in (B), the image of the dog X is affected by the training data 41 labeled with the mop. Confidence is hard to come by.
For images such as dog X, it is desirable to add an image of the same looking dog to the training data 41 instead of deleting it from the training data 41 . That is, dog X is unique in characteristics compared to other dogs. Therefore, it is desirable to add images of dogs having similar characteristics to the training data 41 for learning. However, as described in the third embodiment, if the threshold value of the average value μ _i of the degree of divergence d _i,e is simply used to determine whether or not to delete the image, the image like the dog X will be deleted. It may get lost.

ラベル付けが間違っている画像だけを削除することが望ましい。一例として以下の（１）から（３）の条件を全て満たせば、ラベル付けが間違っている画像と考えられる。（１）学習を行っても確信度が上がらない。（２）異なるラベルの画像と特徴が類似している。（３）同一のラベルの画像と特徴が類似していない。
（１）については、エポックが進んだ場合における乖離度ｄ_ｉ，ｅの推移から判断できる。（２）（３）は特徴の間の距離により判断できる。距離は、具体例としてはｃｏｓ距離である。特徴が類似していれば距離が近い。特徴が類似していなければ距離が遠い。 It is desirable to remove only images that are mislabeled. As an example, if all of the following conditions (1) to (3) are satisfied, the image is considered to be labeled incorrectly. (1) Confidence does not increase even after learning. (2) images with different labels are similar in features; (3) images with the same label are not similar in features;
Regarding (1), it can be judged from the transition of the degree of divergence d _i,e when the epoch progresses. (2) and (3) can be judged by the distance between the features. The distance is a cos distance as a specific example. If the features are similar, the distance is short. If the features are not similar, the distance is long.

そこで、データ選別部２３は、以下の処理を行い、ラベル付けが間違っている画像であるか否かを判定してもよい。
手順１．データ選別部２３は、学習済のモデル３２を用意する。
手順２．データ選別部２３は、学習済のモデル３２を用いて、画像の５１２次元程度の特徴を抽出する。
手順３．データ選別部２３は、各画像間の距離（ｃｏｓ距離等）を計算する。
手順４．データ選別部２３は、各画像について、その画像とは異なるラベルの画像を対象として、対象のラベルが付された画像との間の最小距離を特定する。
手順５．データ選別部２３は、各画像について、その画像と同じラベルの画像を対象として、対象のラベルが付された画像との間の最小距離を特定する。
手順６．データ選別部２３は、以下の条件Ｃ１からＣ３を全て満たす画像を、学習が困難な画像（ラベル付けに誤りがある画像）として抽出する。（Ｃ１）乖離度ｄ_ｉ，ｅの平均値μ_ｉが基準乖離度（例えば－２）以下である。（Ｃ２）手順４の最小距離が第１距離（例えば０．１）以下である。（Ｃ３）手順５の最小距離が第２距離（例えば０．３）以上である。
手順７．データ選別部２３は、抽出された画像を削除すると判定する。
なお、ここでは、対象のラベルが付された画像との間の距離の最小距離が用いられた。しかし、最小距離ではなく、平均距離が用いられてもよい。 Therefore, the data selection unit 23 may perform the following processing to determine whether or not the image is incorrectly labeled.
Procedure 1. The data selection unit 23 prepares a learned model 32 .
Procedure 2. The data selection unit 23 uses the trained model 32 to extract about 512-dimensional features of the image.
Step 3. The data selection unit 23 calculates the distance (cos distance, etc.) between each image.
Step 4. For each image, the data selection unit 23 identifies the minimum distance between an image with a label different from that of the image and the image with the target label.
Step 5. For each image, the data selection unit 23 identifies the minimum distance between an image with the same label as the image and the image with the target label.
Step 6. The data selection unit 23 extracts images that satisfy all of the following conditions C1 to C3 as images that are difficult to learn (images with incorrect labeling). (C1) The average μ _i of the degrees of deviation d _i,e is less than or equal to the reference degree of deviation (eg, −2). (C2) The minimum distance in Procedure 4 is less than or equal to the first distance (eg, 0.1). (C3) The minimum distance in procedure 5 is greater than or equal to the second distance (eg, 0.3).
Step 7. The data selection unit 23 determines to delete the extracted image.
Note that here the minimum distance between the labeled images of interest was used. However, an average distance may be used instead of the minimum distance.

実施の形態４．
実施の形態４は、特徴マップ等を表示して、訓練データ４１を選別させる点が実施の形態１，２と異なる。実施の形態４では、この異なる点を説明し、同一の点については説明を省略する。
実施の形態４では、実施の形態２に機能を加えた場合について説明する。しかし、実施の形態１に機能を加えることも可能である。 Embodiment 4.
Embodiment 4 differs from Embodiments 1 and 2 in that a feature map or the like is displayed to select training data 41 . In the fourth embodiment, this different point will be explained, and the explanation of the same point will be omitted.
In a fourth embodiment, a case in which functions are added to the second embodiment will be described. However, it is also possible to add functions to the first embodiment.

＊＊＊構成の説明＊＊＊
図１９を参照して、実施の形態４に係るデータ選別支援装置１０の構成を説明する。
データ選別支援装置１０は、機能構成要素として、特徴表示部２４を備える点が図８に示すデータ選別支援装置１０と異なる。特徴表示部２４の機能は、他の機能構成要素と同様に、ソフトウェア又はハードウェアによって実現される。 *** Configuration description ***
The configuration of the data selection support device 10 according to the fourth embodiment will be described with reference to FIG.
The data selection support device 10 differs from the data selection support device 10 shown in FIG. 8 in that it includes a characteristic display section 24 as a functional component. The function of the characteristic display section 24 is realized by software or hardware, like other functional components.

＊＊＊動作の説明＊＊＊
図２０を参照して、実施の形態４に係るデータ選別支援装置１０の動作を説明する。
ステップＳ４１からステップＳ４５の処理は、図９のステップＳ２１からステップＳ２５の処理と同じである。 ***Description of operation***
The operation of the data selection support device 10 according to the fourth embodiment will be described with reference to FIG.
The processing from step S41 to step S45 is the same as the processing from step S21 to step S25 in FIG.

（ステップＳ４６：特徴表示処理）
特徴表示部２４は、図１０に示す特徴マップを表示して、ユーザに訓練データ４１を選別させる。なお、特徴表示部２４は、図１０に示す特徴マップの形態に限らず、各訓練データ４１について、乖離度ｄ_ｉ，ｅの平均値μ_ｉと乖離度ｄ_ｉ，ｅのばらつきσ_ｉとを表示すればよい。
選別は、分類された画像をユーザが確認して行われる。つまり、閾値により一律に処理するのではなく、画像を見て判断される。 (Step S46: Feature display processing)
The feature display unit 24 displays the feature map shown in FIG. 10 and allows the user to select the training data 41 . Note that the _{characteristic} display unit ₂₄ is not limited to the form of the characteristic map _shown in FIG _. should be displayed.
The sorting is performed by the user confirming the classified images. In other words, the judgment is made by looking at the image instead of uniformly processing with a threshold value.

＊＊＊実施の形態４の効果＊＊＊
以上のように、実施の形態４では、特徴マップ等を表示して、ユーザに訓練データ４１を選別させる。これにより、本来削除すべきでない訓練データ４１を削除してしまうといったことを防止することが可能である。 *** Effect of Embodiment 4 ***
As described above, in the fourth embodiment, a feature map or the like is displayed to allow the user to select training data 41 . This makes it possible to prevent the training data 41 that should not be deleted from being deleted.

図１８の（Ａ）における犬Ｘのような画像については、訓練データ４１から削除してしまうのではなく、同じ見た目の犬の画像を訓練データ４１に追加することが望ましい。しかし、実施の形態３で説明したように、単純に乖離度ｄ_ｉ，ｅの平均値μ_ｉの閾値で削除するか否かを決定してしまうと、犬Ｘのような画像は削除されてしまう可能性がある。これに対して、実施の形態４のように、特徴マップ等を表示して、ユーザに訓練データ４１を選別させることで、犬Ｘのような画像が削除されることを防止できる。また、犬Ｘのような画像を追加するという判断を促すこともできる。 It is desirable to add an image of a dog with the same appearance to the training data 41 instead of deleting the image like the dog X in FIG. 18A from the training data 41 . However, as described in the third embodiment, if the threshold value of the average value μ _i of the degree of divergence d _i,e is simply used to determine whether or not to delete the image, the image like the dog X will be deleted. It may get lost. On the other hand, by displaying a feature map or the like and allowing the user to select the training data 41 as in the fourth embodiment, it is possible to prevent images such as the dog X from being deleted. Also, a decision to add an image like dog X can be prompted.

＊＊＊他の構成＊＊＊
＜変形例５＞
単純に乖離度ｄ_ｉ，ｅの平均値μ_ｉだけでは、削除するべきか否かを決定することが難しい。そこで、特徴表示部２４は、乖離度ｄ_ｉ，ｅの平均値μ_ｉとともに、変形例４で説明した距離を表示してもよい。つまり、特徴表示部２４は、乖離度ｄ_ｉ，ｅの平均値μ_ｉとともに、手順４の最小距離と手順５の最小距離との少なくともいずれかを表示してもよい。
具体例としては、特徴表示部２４は、乖離度ｄ_ｉ，ｅの平均値μ_ｉを縦軸とし、手順４の最小距離を横軸とした２次元空間に各画像の情報をプロットして表示する。また、特徴表示部２４は、乖離度ｄ_ｉ，ｅの平均値μ_ｉを縦軸とし、手順５の最小距離を横軸として２次元空間に各画像の情報をプロットして表示してもよい。また、特徴表示部２４は、手順４の最小距離をＸ軸とし、手順５の最小距離をＹ軸とし、乖離度ｄ_ｉ，ｅの平均値μ_ｉをＺ軸とした３次元空間に各画像の情報をプロットして表示してもよい。
これにより、乖離度ｄ_ｉ，ｅの平均値μ_ｉとともに、変形例４で説明した距離を考慮して、削除すべきか否かを判定することができる。 ***Other Configurations***
<Modification 5>
It is difficult to determine whether or not to delete simply based on the average value μ _i of the degree of divergence d _i,e . Therefore, the feature display unit 24 may display the distance described in the fourth modification together with the average value μ _i of the degrees of divergence di _,e . That is, the feature display unit 24 may display at least one of the minimum distance in procedure 4 and the minimum distance in procedure 5 together with the average value μ _i of the degree of divergence di _,e .
As a specific example, the feature display unit 24 plots and displays the information of each image in a two-dimensional space in which the vertical axis is the average value μ _i of the degrees of divergence d _{i and e} and the horizontal axis is the minimum distance in step 4. do. Further, the feature display unit 24 may plot and display the information of each image in a two-dimensional space with the average value μ _i of the degrees of divergence d _{i and e} as the vertical axis and the minimum distance in step 5 as the horizontal axis. . In addition, the feature display unit 24 displays each image in a three-dimensional space with the minimum distance in procedure 4 as the X axis, the minimum distance in procedure 5 as the Y axis, and the average value μ _i of the degrees of divergence d _{i and e} as the Z axis. information may be plotted and displayed.
As a result, it is possible to determine whether or not to delete in consideration of the distance described in Modification 4 together with the average value μ _i of the degree of divergence d _i,e .

＜変形例６＞
以上の説明では、訓練データ４１を選別することを説明した。しかし、同じ手法により、訓練データ４１ではなく、検証データ４２を選別することも可能である。 <Modification 6>
In the above description, selection of the training data 41 has been described. However, it is also possible to screen validation data 42 instead of training data 41 by the same technique.

なお、以上の説明における「部」を、「回路」、「工程」、「手順」、「処理」又は「処理回路」に読み替えてもよい。 Note that "unit" in the above description may be read as "circuit", "process", "procedure", "process", or "processing circuit".

以上、本開示の実施の形態及び変形例について説明した。これらの実施の形態及び変形例のうち、いくつかを組み合わせて実施してもよい。また、いずれか１つ又はいくつかを部分的に実施してもよい。なお、本開示は、以上の実施の形態及び変形例に限定されるものではなく、必要に応じて種々の変更が可能である。 The embodiments and modifications of the present disclosure have been described above. Some of these embodiments and modifications may be combined and implemented. Also, any one or some may be partially implemented. It should be noted that the present disclosure is not limited to the above embodiments and modifications, and various modifications are possible as necessary.

１０データ選別支援装置、１１プロセッサ、１２メモリ、１３ストレージ、１４通信インタフェース、１５電子回路、２１入力受付部、２２精度計算部、２２１確信度平均計算部、２２２確信度ばらつき計算部、２２３乖離度計算部、２２４乖離度ばらつき計算部、２３データ選別部、２４特徴表示部、３１学習装置、３２モデル、４１訓練データ、４２検証データ。 10 data selection support device, 11 processor, 12 memory, 13 storage, 14 communication interface, 15 electronic circuit, 21 input reception unit, 22 accuracy calculation unit, 221 confidence average calculation unit, 222 confidence variation calculation unit, 223 divergence Calculation unit 224 deviation degree variation calculation unit 23 data selection unit 24 feature display unit 31 learning device 32 model 41 training data 42 verification data.

Claims

a certainty average calculation unit that calculates the average value μ _e of the model's certainty for each of the plurality of training data for each epoch e equal to or less than the epoch number E;
a divergence calculation unit for calculating, for each of the plurality of training data, the degree of divergence between the confidence of the model at each epoch e and the average value μ _e calculated by the average confidence calculation unit; A data selection support device provided.

2. The data selection support device according to claim 1 , wherein the divergence calculation unit calculates an average value _μi of the divergence in each epoch e.

The data selection support device further
a confidence variation calculation unit that calculates a confidence variation of the model for each of the plurality of training data for each epoch e;
3. The data sorting support device according to claim 2 , wherein the divergence calculating unit calculates the average value _μi of the divergence using the variation in the certainty calculated by the certainty variation calculating unit.

The certainty average calculator calculates i=1, . . . _, N with respect to the correct label y _i ^* of _the training data image xi for each integer i;
The confidence variation calculation unit calculates the confidence variation σ _e as shown in Equation 2 for each epoch e,
The divergence calculator calculates i=1, . . . 4. The data selection support device according to claim 3 , wherein the average value μ _i of the degree of deviation is calculated as shown in Equation 3 for the training data for each integer i of N, .

The data selection support device further
5. The data selection support device according to any one of claims 1 to 4 , further comprising a divergence degree variation calculation unit that calculates variations in the divergence degree calculated by the divergence degree calculation unit.

The data selection support device further
5. The data selection support device according to claim 4, further comprising a divergence degree variation calculation unit that calculates the deviation degree variation _σi calculated by the divergence degree calculation unit as shown in Equation 4 .

The data selection support device further
7. The data selection support device according to claim 5 , further comprising a data selection unit that selects training data to be deleted from the plurality of training data based on at least one of the degree of deviation and variation in the degree of deviation.

8. The data selection support device according to claim 7 , wherein the data selection unit selects training data to be deleted based on a feature distance between each training data and other training data.

The data selection support device further
9. Any one of claims 5 to 8 , further comprising a feature display unit for displaying a feature map obtained by plotting the plurality of training data, with one of the degree of deviation and the variation of the degree of deviation set as a vertical axis and the other set as a horizontal axis. 1. The data selection support device according to 1.

10. The data selection support device according to claim 9 , wherein the feature display unit displays feature distances between each piece of training data and other training data.

a computer, for each epoch e equal to or less than the number of epochs E, calculates the average value μ _e of the model confidence for each of the plurality of training data;
A data sorting support method in which a computer calculates, for each of the plurality of training data, the degree of divergence between the degree of confidence of the model in each of the epochs e and the average value μ _e .

Confidence average calculation processing for calculating the average value μ _e of the model confidence for each of a plurality of training data for each epoch e equal to or less than the epoch number E;
a deviation calculation process for calculating, for each of the plurality of training data, the deviation between the confidence of the model at each epoch e and the average value μ _e calculated by the confidence average calculation process; A data selection support program that causes a computer to function as a data selection support device.