JP7395396B2

JP7395396B2 - Information processing device, information processing method and program

Info

Publication number: JP7395396B2
Application number: JP2020051326A
Authority: JP
Inventors: 鳴鏑蘇; 遼平田中; 信太郎高橋
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2023-12-11
Anticipated expiration: 2040-03-23
Also published as: JP2021149818A

Description

本発明の実施形態は、情報処理装置、情報処理方法およびプログラムに関する。 Embodiments of the present invention relate to an information processing device, an information processing method, and a program.

不確実性（uncertainty）は、統計学、経済学、および、自然科学等の分野で生まれてきた概念である。不確実性を表現する方法（不確実度、不確実性尺度）として様々な方法が提案されている。機械学習分野でも、不確実度は、能動学習のための訓練データの選別、および、高精度なモデルを学習するための訓練データの選別などの様々な活用法がある。 Uncertainty is a concept that originated in fields such as statistics, economics, and natural science. Various methods have been proposed to express uncertainty (uncertainty degree, uncertainty scale). In the field of machine learning, uncertainty can be used in various ways, such as selecting training data for active learning and selecting training data for learning highly accurate models.

特開２０１９－０６１６４２号公報JP2019-061642A

しかしながら、従来技術では、不確実度を高精度に算出できない場合があった。 However, with the conventional technology, there are cases where the degree of uncertainty cannot be calculated with high accuracy.

実施形態の情報処理装置は、分割部と、算出部と、統合部と、を備える。分割部は、データを入力して処理結果を出力する処理の対象とする入力データを、複数の部分データに分割する。算出部は、複数の部分データごとに処理を実行し、処理の不確実性を示す複数の不確実度を算出する。統合部は、複数の不確実度を統合し、入力データの不確実度として出力する。 The information processing device according to the embodiment includes a dividing section, a calculating section, and an integrating section. The dividing unit divides input data to be subjected to a process of inputting data and outputting a processing result into a plurality of partial data. The calculation unit executes processing for each of the plurality of partial data and calculates a plurality of degrees of uncertainty indicating the uncertainty of the processing. The integrating unit integrates a plurality of uncertainties and outputs the integrated uncertainty as input data uncertainty.

図１は、２クラス分類問題のデータ分布の例を示す図である。FIG. 1 is a diagram showing an example of data distribution for a two-class classification problem. 図２は、第１の実施形態にかかる情報処理装置の構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the configuration of the information processing device according to the first embodiment. 図３は、パーツの分割方法の一例を示す図である。FIG. 3 is a diagram illustrating an example of a method for dividing parts. 図４は、第１の実施形態における学習処理の一例を示すフローチャートである。FIG. 4 is a flowchart illustrating an example of learning processing in the first embodiment. 図５は、第１の実施形態におけるデータ選別処理の一例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of data sorting processing in the first embodiment. 図６は、画像データおよび当該画像データに対する認識結果の例を示す図である。FIG. 6 is a diagram showing an example of image data and recognition results for the image data. 図７は、画像データおよび当該画像データに対する認識結果の例を示す図である。FIG. 7 is a diagram showing an example of image data and recognition results for the image data. 図８は、第２の実施形態にかかる情報処理装置の構成の一例を示すブロック図である。FIG. 8 is a block diagram showing an example of the configuration of an information processing device according to the second embodiment. 図９は、第２の実施形態における学習処理の一例を示すフローチャートである。FIG. 9 is a flowchart illustrating an example of learning processing in the second embodiment. 図１０は、訓練データと選別対象データとの関係を説明するための図である。FIG. 10 is a diagram for explaining the relationship between training data and selection target data. 図１１は、第１または第２の実施形態にかかる情報処理装置のハードウェア構成例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of the hardware configuration of the information processing apparatus according to the first or second embodiment.

以下に添付図面を参照して、この発明にかかる情報処理装置の好適な実施形態を詳細に説明する。以下では、不確実度を算出し、算出した不確実度に基づいて訓練データを選別し、選別した訓練データで機械学習モデル（以下、モデルという）を学習する情報処理装置（学習装置）を例に説明する。モデルは、例えばニューラルネットワークモデル、ランダムフォレスト、サポートベクターマシン（SVM）、条件付き確率場（Conditional Random Fields：CRF）、および、ロジスティック回帰などである。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of an information processing apparatus according to the present invention will be described in detail below with reference to the accompanying drawings. The following is an example of an information processing device (learning device) that calculates the degree of uncertainty, selects training data based on the calculated degree of uncertainty, and learns a machine learning model (hereinafter referred to as model) using the selected training data. Explain. Models include, for example, neural network models, random forests, support vector machines (SVM), conditional random fields (CRF), and logistic regression.

不確実度を用いる処理は学習処理に限られるものではなく、どのような処理であってもよい。例えば、データに対する処理の不確実度を算出して出力（表示など）する装置に適用してもよい。 Processing using uncertainty is not limited to learning processing, and may be any type of processing. For example, the present invention may be applied to a device that calculates and outputs (displays, etc.) the degree of uncertainty in processing data.

能動学習は、専門家に質問しながら分類器（モデルの一例）を学習させる機械学習のフレームワークの１つである。一般的な教師あり学習では、数千以上の大量のラベル付きデータ（訓練データ）によって学習を行う必要がある。しかし、多くの教師あり学習タスクでは、ラベル付きデータの入手は非常に困難であったり、時間的コストおよび費用的コストが必要であったりする場合が多い。能動学習は、できるだけ学習に効果的な訓練データを優先的に選別して教示するユーザに提示することで、教示コストを削減するための学習方法である。 Active learning is one of the machine learning frameworks that trains a classifier (an example of a model) while asking questions to experts. In general supervised learning, it is necessary to perform learning using a large amount of labeled data (training data) of several thousand or more. However, for many supervised learning tasks, obtaining labeled data is often very difficult or requires time and financial costs. Active learning is a learning method for reducing teaching costs by preferentially selecting training data that is as effective as possible for learning and presenting it to the user to be taught.

理論的には、不確実性とは、モデルによるデータに対する処理結果（予測結果）の曖昧さを表す指標である。以下では不確実性を示す度合いを不確実度という。不確実性の高い（予測結果が最も曖昧である）データを訓練データとして学習するほうが、モデルの予測精度を向上させること、および、所望の予測性能を達成するまでの学習期間を短縮することができる。 Theoretically, uncertainty is an index representing the ambiguity of the processing results (predicted results) of data by a model. In the following, the degree of uncertainty will be referred to as the degree of uncertainty. Learning data with high uncertainty (the prediction result is the most ambiguous) as training data can improve the prediction accuracy of the model and shorten the learning period to achieve the desired prediction performance. can.

図１は、２クラス分類問題のデータ分布の例を示す図である。白い三角および黒い三角は、それぞれ２クラスのうちいずれかのクラスに分類されるデータを表す。実線は、データに対するモデルによる分類の境界線を表す。境界線に近いデータほど、不確実性が高いデータであると考えられる。 FIG. 1 is a diagram showing an example of data distribution for a two-class classification problem. White triangles and black triangles each represent data classified into one of the two classes. The solid line represents the boundary line of the model's classification of the data. The closer the data is to the boundary line, the more uncertain the data is considered to be.

不確実性の高いデータを訓練データとすることで予測精度が向上する原因として以下が挙げられる。
（Ｆ１）各クラスのデータ分布の中心部１１、１２のデータの学習頻度を減らし、中心部１１、１２のデータへの過学習を防ぐ。
（Ｆ２）境界線から遠く離れて、ラベルが間違っているデータ２１、２２、２３を排除できる。
（Ｆ３）不確実性の高いデータは情報エントロピー（情報量）が大きく、境界線との距離が近い領域３１に含まれると考えられる。このようなデータを集中的に学習すれば、より高精度に境界線３２を構築できる。 The following are reasons why prediction accuracy improves by using highly uncertain data as training data.
(F1) Reduce the learning frequency of data in the central parts 11 and 12 of the data distribution of each class to prevent overfitting to the data in the central parts 11 and 12.
(F2) Data 21, 22, and 23 that are far from the boundary line and have incorrect labels can be excluded.
(F3) Data with high uncertainty has a large information entropy (amount of information) and is considered to be included in the region 31 that is close to the boundary line. By intensively learning such data, the boundary line 32 can be constructed with higher accuracy.

不確実性が高いデータを選別する方法としては、以下のような方法が存在する。なお、確率とは、処理の精度（予測精度など）を示す値である。例えば認識処理の場合、認識率が確率に相当する。
（Ｍ１）LeastConfidence：「確率最大のラベル」の確率が最小のデータを選ぶ。
（Ｍ２）MarginSampling：「１番目に確率の高いラベル」と「２番目に確率の高いラベル」との間の確率の差が最も小さいデータを選ぶ。
（Ｍ３）EntropyBased：予測分布のエントロピーが最大のデータを選ぶ。
（Ｍ４）Query-By-Committee：相互に異なる初期値から学習された複数の異なるモデル（committee）によりデータに対する予測処理を行い、複数のモデル間の予測結果のばらつき、または、誤差（Loss値）が最も大きいデータを選ぶ。 The following methods exist to select data with high uncertainty. Note that probability is a value indicating processing accuracy (prediction accuracy, etc.). For example, in the case of recognition processing, the recognition rate corresponds to probability.
(M1) LeastConfidence: Select data with the minimum probability of the "label with maximum probability".
(M2) MarginSampling: Select data with the smallest probability difference between the "first highest probability label" and the "second highest probability label".
(M3) EntropyBased: Select data with the maximum entropy of the predicted distribution.
(M4) Query-By-Committee: Prediction processing is performed on data using multiple different models (committee) learned from mutually different initial values, and the variation in prediction results between multiple models or error (Loss value) Select the data with the largest value.

上記の各方法では、それぞれ以下の指標が、不確実度に相当する。なお指標に応じて、値が大きいほど不確実性（処理結果の曖昧さ）が高くなる場合と、値が小さいほど不確実性が高くなる場合がある。
（Ｉ１）「確率最大のラベル」の確率の低さ
（Ｉ２）「１番目に確率の高いラベル」と「２番目に確率の高いラベル」確率の差
（Ｉ３）予測分布のエントロピーの大きさ
（Ｉ４）複数のモデル間の予測のばらつき、または、誤差の大きさ In each of the above methods, the following indicators correspond to the degree of uncertainty. Depending on the index, there are cases where the larger the value, the higher the uncertainty (ambiguity of the processing result), and the smaller the value, the higher the uncertainty.
(I1) Low probability of "the label with the highest probability" (I2) Difference between the probabilities of "the label with the highest probability" and "the label with the second highest probability" (I3) The magnitude of the entropy of the prediction distribution ( I4) Dispersion of predictions between multiple models or size of error

データを予め定められたルールに従って複数の部分データ（以下、パーツという場合がある）に分割し、予測処理等を行う方法も存在する。例えば、コネクションニスト時系列分類法（connectionist temporal classification）では、系列データを畳み込んで、特徴マップを１画素ずつのパーツに分割し、各パーツの予測情報を統合し、最終の予測を出力する。しかし、このような方法では、局所の不確実性に左右され、全般的な不確実性の高いデータを選別することができない。 There is also a method of dividing data into a plurality of partial data (hereinafter sometimes referred to as parts) according to predetermined rules and performing predictive processing. For example, in connectionist temporal classification, sequential data is convolved, a feature map is divided into parts of one pixel each, prediction information of each part is integrated, and a final prediction is output. However, such methods are affected by local uncertainties and cannot select data with high overall uncertainty.

（第１の実施形態）
そこで、第１の実施形態にかかる情報処理装置は、局所の不確実性に左右されることなく、全般的な不確実性がより考慮されるように、データの不確実度を算出する。なお、不確実度を算出する対象となる処理はどのような処理であってもよいが、例えば、予測、分類、または、物体認識などである。 (First embodiment)
Therefore, the information processing apparatus according to the first embodiment calculates the degree of uncertainty of data so that the overall uncertainty is taken into consideration without being influenced by local uncertainties. Note that the process for which the degree of uncertainty is calculated may be any type of process, such as prediction, classification, or object recognition.

図２は、第１の実施形態にかかる情報処理装置１００の構成の一例を示すブロック図である。図２に示すように、情報処理装置１００は、前処理部１０１と、学習部１０２と、出力制御部１０３と、選別部１１０と、記憶部１２１と、を備えている。 FIG. 2 is a block diagram showing an example of the configuration of the information processing device 100 according to the first embodiment. As shown in FIG. 2, the information processing device 100 includes a preprocessing section 101, a learning section 102, an output control section 103, a sorting section 110, and a storage section 121.

前処理部１０１は、データ選別処理および学習処理の前処理を実行する。訓練済みのモデルが既に存在する場合、前処理は、目的に応じた訓練済みのモデルをベースモデルとして準備する処理である。前処理部１０１は、訓練済みのモデルをそのままベースモデルとして準備してもよいし、プルーニングなどの学習を必要としない前処理を訓練済みのモデルに対して実行してベースモデルを生成してもよい。 The preprocessing unit 101 executes preprocessing for data selection processing and learning processing. If a trained model already exists, preprocessing is a process of preparing a trained model according to the purpose as a base model. The preprocessing unit 101 may prepare the trained model as it is as a base model, or may perform preprocessing that does not require learning, such as pruning, on the trained model to generate the base model. good.

訓練済みモデルが存在しない場合、前処理は、入力されたデータから予め定められたルールに従って訓練データと選別対象データとに分ける処理、および、訓練データを用いてモデルを学習してベースモデルを生成する処理を含む。訓練データは、事前にモデルを学習するために用いられるデータである。選別対象データは、選別部１１０によるデータ選別処理の対象となるデータである。ルールの例を以下に記載する。
（Ｒ１）データをランダムに分ける。
（Ｒ２）データに付与されたラベル（教示情報）を参照し、同じラベルが付与されたデータが偏らないように分ける。
（Ｒ３）データを複数のクラスタに分類した分類情報（クラスタリング結果情報）を参照し、同じクラスタに分類されたデータが偏らないように分ける。 If a trained model does not exist, preprocessing involves dividing the input data into training data and selection target data according to predetermined rules, and learning the model using the training data to generate a base model. Includes processing to do. Training data is data used to learn a model in advance. The data to be sorted is data to be subjected to data sorting processing by the sorting unit 110. Examples of rules are listed below.
(R1) Randomly divide the data.
(R2) The labels (teaching information) given to the data are referred to and the data given the same label is divided so as not to be biased.
(R3) Refer to the classification information (clustering result information) in which the data is classified into a plurality of clusters, and divide the data so that the data classified into the same cluster are not biased.

前処理部１０１は、訓練データを用いてモデルを学習し、ベースモデルを生成する。なお、訓練済みのモデルが予め準備され、そのモデルをそのまま利用できる場合は、情報処理装置１００は、前処理部１０１を備えなくてもよい。 The preprocessing unit 101 learns a model using training data and generates a base model. Note that if a trained model is prepared in advance and can be used as is, the information processing device 100 does not need to include the preprocessing unit 101.

学習部１０２は、データを入力して処理結果を出力する処理を実行するモデルを、訓練データを用いて学習する。例えば学習部１０２は、前処理により訓練データとして分けられたデータと、選別対象データのうち選別部１１０により選別されたデータと、を訓練データとして用いてモデルを学習する。選別部１１０によるデータ選別処理は後述する。 The learning unit 102 uses training data to learn a model that executes a process of inputting data and outputting a processing result. For example, the learning unit 102 learns a model using, as training data, data separated as training data by preprocessing and data selected by the selection unit 110 from among the data to be selected. Data selection processing by the selection unit 110 will be described later.

出力制御部１０３は、情報処理装置１００による各種情報の出力を制御する。例えば出力制御部１０３は、学習部１０２により学習されたモデルの情報を、モデルを用いる外部の装置等に出力する。出力制御部１０３は、算出された不確実度をディスプレイなどの表示装置に表示してもよい。 The output control unit 103 controls the output of various information by the information processing device 100. For example, the output control unit 103 outputs information about the model learned by the learning unit 102 to an external device using the model. The output control unit 103 may display the calculated degree of uncertainty on a display device such as a display.

選別部１１０は、選別対象データに対するデータ選別処理を実行する。選別部１１０は、分割部１１１と、算出部１１２と、統合部１１３と、を備えている。 The sorting unit 110 executes data sorting processing on data to be sorted. The sorting section 110 includes a dividing section 111, a calculating section 112, and an integrating section 113.

分割部１１１は、選別対象データ（入力データ）を、複数の部分データ（パーツ）に分割する。分割部１１１は、入力された選別対象データをそのまま分割してもよいし、選別対象データに対して処理を加えたデータを分割してもよい。選別対象データに加える処理は、例えば選別対象データの特徴量を求める処理である。 The dividing unit 111 divides the data to be sorted (input data) into a plurality of partial data (parts). The dividing unit 111 may divide the input data to be sorted as is, or may divide data obtained by adding processing to the data to be sorted. The process to be added to the data to be sorted is, for example, a process to obtain the feature amount of the data to be sorted.

選別対象データに処理を加えるか否かは、例えば、学習するモデルに対する入力データの形式に応じて決定すればよい。例えば画像データを入力するモデルを用いる場合は、分割部１１１は、画像データである選別対象データをそのまま分割してもよい。例えば画像データの特徴量を入力して認識処理を行うモデルを用いる場合は、分割部１１１は、画像データの特徴量を求める処理を実行し、得られた特徴量を分割してもよい。 Whether or not to apply processing to the data to be sorted may be determined, for example, depending on the format of the input data for the model to be learned. For example, when using a model that inputs image data, the dividing unit 111 may divide the sorting target data, which is the image data, as it is. For example, when using a model that performs recognition processing by inputting the feature amount of image data, the dividing unit 111 may execute processing to obtain the feature amount of the image data, and divide the obtained feature amount.

画像データは例えば２次元のデータである。画像データの特徴量は、例えば２次元の画像データを畳み込むことで算出される特徴マップである。特徴マップは、例えば画像データを入力して特徴マップを出力するニューラルネットワークにより算出することができる。特徴マップは、例えば、２次元の画素位置それぞれに特徴量を示す画素値が設定されたデータである。画像データまたは特徴マップに対しては、分割部１１１は、例えば、上下または左右の一定数の画素ごとにデータを分割して複数のパーツを生成する。 The image data is, for example, two-dimensional data. The feature amount of image data is, for example, a feature map calculated by convolving two-dimensional image data. The feature map can be calculated, for example, by a neural network that inputs image data and outputs a feature map. The feature map is, for example, data in which pixel values indicating feature amounts are set at each two-dimensional pixel position. For image data or a feature map, the dividing unit 111 generates a plurality of parts by dividing the data into a certain number of pixels in the upper and lower or left and right directions, for example.

例えば文字列データおよび音声データなどの１次元の系列データを入力するモデルを用いる場合、分割部１１１は、系列データを一定の間隔、または、不特定の間隔で分割して複数のパーツを生成する。不特定の間隔とは、例えばランダムに決定される間隔である。 For example, when using a model that inputs one-dimensional series data such as character string data and audio data, the dividing unit 111 divides the series data at fixed intervals or unspecified intervals to generate multiple parts. . The unspecified interval is, for example, a randomly determined interval.

図３は、パーツの分割方法の一例を示す図である。画像３０１、３０２は、分割前のデータの例である。画像３０１、３０２は、それぞれ特徴マップ３１１に変換される。分割部１１１は、特徴マップ３１１を、左右方向、または、上下方向に分割してパーツを生成する。 FIG. 3 is a diagram illustrating an example of a method for dividing parts. Images 301 and 302 are examples of data before division. Images 301 and 302 are each converted into feature maps 311. The dividing unit 111 divides the feature map 311 horizontally or vertically to generate parts.

算出部１１２は、複数のパーツごとの不確実度を算出する。例えば算出部１１２は、複数のパーツをそれぞれモデルに入力し、処理結果から、上記の（Ｉ１）～（Ｉ４）などに示す不確実度を算出する。 The calculation unit 112 calculates the degree of uncertainty for each of a plurality of parts. For example, the calculation unit 112 inputs each of a plurality of parts into a model, and calculates the degrees of uncertainty shown in (I1) to (I4) above from the processing results.

統合部１１３は、複数のパーツそれぞれに対して算出された複数の不確実度を統合し、分割前の選別対象データに対する不確実度を求めて出力する。まず統合部１１３は、検証用データ（バリデーションデータ）に対するモデルの状態を示す状態情報に基づいて、複数の統合方法のうちいずれかの統合方法を決定する。 The integrating unit 113 integrates a plurality of degrees of uncertainty calculated for each of a plurality of parts, calculates and outputs a degree of uncertainty for the data to be sorted before division. First, the integration unit 113 determines one of a plurality of integration methods based on state information indicating the state of the model with respect to verification data (validation data).

複数の統合方法は、例えば以下の方法を含む。
（ＭＭ１）複数の不確実度すべての平均値または中央値を算出する。
（ＭＭ２）複数の不確実度のうち一部の不確実度の平均値または中央値を算出する。
（ＭＭ３）複数の不確実度の重み付け加算値を算出する。 The multiple integration methods include, for example, the following methods.
(MM1) Calculate the average value or median value of all multiple uncertainties.
(MM2) Calculate the average value or median value of some uncertainties among the plurality of uncertainties.
(MM3) Calculate a weighted addition value of multiple uncertainties.

（ＭＭ２）は、例えば複数の不確実度のうち値が予め定められた範囲に含まれる不確実度の平均値または中央値を算出する方法である。予め定められた範囲は、例えば、値が乖離しているような不確実度を除外するような範囲である。 (MM2) is a method of calculating, for example, an average value or a median value of uncertainties whose values are included in a predetermined range among a plurality of uncertainties. The predetermined range is, for example, a range that excludes uncertainties such as deviations in values.

検証用データとは、例えば、学習対象とするモデルを用いて事前に処理を行ったときのデータ、および、このモデルに対して事前に学習を行ったときのデータなどである。状態情報は、例えば検証用データに対するモデルの予測精度、および、検証用データに対してモデルを学習したときの誤差である。 The verification data includes, for example, data obtained when processing is performed in advance using a model to be learned, data obtained when learning is performed in advance on this model, and the like. The state information is, for example, the prediction accuracy of the model with respect to the verification data, and the error when the model is trained with respect to the verification data.

例えば統合部１１３は、検証用データに対するモデルの各カテゴリの認識率（予測精度）が大きく異なっている場合は、上記（ＭＭ３）を統合方法として決定する。そして統合部１１３は、例えば、認識率が高いパーツほど小さい重みをかけ、認識率が低いパーツほど大きな重みをかけて、不確実度を統合する。 For example, if the recognition rate (prediction accuracy) of each category of the model with respect to the verification data is significantly different, the integration unit 113 determines the above (MM3) as the integration method. Then, the integrating unit 113 integrates the uncertainties by applying a smaller weight to parts with a higher recognition rate and a larger weight to parts with a lower recognition rate, for example.

また統合部１１３は、例えばモデルの学習時に検証用データに対する認識率または誤差が大きく変動している場合は、（ＭＭ２）を統合方法として決定する。そして統合部１１３は、予め定められた範囲外である不確実度を除外することにより、不確実度が乖離しているパーツを除去してから、各パーツの中央値または平均値を算出して、不確実度を統合する。 Furthermore, for example, if the recognition rate or error with respect to the verification data fluctuates greatly during model learning, the integration unit 113 determines (MM2) as the integration method. Then, the integrating unit 113 removes parts with different degrees of uncertainty by excluding uncertainties that are outside a predetermined range, and then calculates the median or average value of each part. , to integrate uncertainties.

また統合部１１３は、モデルの状態情報が取得できない場合は、（ＭＭ１）を統合方法として決定する。 Furthermore, if the model state information cannot be acquired, the integrating unit 113 determines (MM1) as the integrating method.

上記各部（前処理部１０１、学習部１０２、出力制御部１０３、および、選別部１１０）は、例えば、１または複数のプロセッサにより実現される。例えば上記各部は、ＣＰＵ（Central Processing Unit）などのプロセッサにプログラムを実行させること、すなわちソフトウェアにより実現してもよい。上記各部は、専用のＩＣ（Integrated Circuit）などのプロセッサ、すなわちハードウェアにより実現してもよい。上記各部は、ソフトウェアおよびハードウェアを併用して実現してもよい。複数のプロセッサを用いる場合、各プロセッサは、各部のうち１つを実現してもよいし、各部のうち２以上を実現してもよい。 Each of the above units (preprocessing unit 101, learning unit 102, output control unit 103, and selection unit 110) is realized by, for example, one or more processors. For example, each of the above units may be realized by causing a processor such as a CPU (Central Processing Unit) to execute a program, that is, by software. Each of the above units may be realized by a processor such as a dedicated IC (Integrated Circuit), that is, by hardware. Each of the above units may be realized using a combination of software and hardware. When using a plurality of processors, each processor may implement one of each unit, or may implement two or more of each unit.

記憶部１２１は、情報処理装置１００による各種処理で用いられる各種情報を記憶する。例えば記憶部１２１は、訓練データ、選別対象データ、および、モデルを示すデータなどを記憶する。記憶部１２１は、フラッシュメモリ、メモリカード、ＲＡＭ（Random Access Memory）、ＨＤＤ（Hard Disk Drive）、および、光ディスクなどの一般的に利用されているあらゆる記憶媒体により構成することができる。 The storage unit 121 stores various information used in various processes by the information processing apparatus 100. For example, the storage unit 121 stores training data, selection target data, data indicating a model, and the like. The storage unit 121 can be configured from any commonly used storage medium such as a flash memory, a memory card, a RAM (Random Access Memory), an HDD (Hard Disk Drive), and an optical disk.

次に、このように構成された第１の実施形態にかかる情報処理装置１００による学習処理について説明する。図４は、第１の実施形態における学習処理の一例を示すフローチャートである。 Next, a learning process performed by the information processing apparatus 100 according to the first embodiment configured as described above will be described. FIG. 4 is a flowchart illustrating an example of learning processing in the first embodiment.

前処理部１０１は、入力されたデータを訓練データと選別対象データとに分ける（ステップＳ１０１）。前処理部１０１は、訓練データを用いてモデルを学習しベースモデルを生成する（ステップＳ１０２）。なお、訓練済みのモデルが予め準備されている場合は、ステップＳ１０１、Ｓ１０２は省略することができる。 The preprocessing unit 101 divides the input data into training data and selection target data (step S101). The preprocessing unit 101 learns a model using training data and generates a base model (step S102). Note that if a trained model is prepared in advance, steps S101 and S102 can be omitted.

選別部１１０は、選別対象データを対象としてデータ選別処理を実行する（ステップＳ１０３）。データ選別処理の詳細は後述する。 The sorting unit 110 executes data sorting processing on the sorting target data (step S103). Details of the data selection process will be described later.

学習部１０２は、データ選別処理により、不確実性が高いデータが得られたか否かを判定する（ステップＳ１０４）。不確実性が高いデータが得られた場合（ステップＳ１０４：Ｙｅｓ）、学習部１０２は、選別されたデータを訓練データに追加してモデルを学習する（ステップＳ１０５）。 The learning unit 102 determines whether data with high uncertainty is obtained through the data selection process (step S104). If data with high uncertainty is obtained (step S104: Yes), the learning unit 102 adds the selected data to training data to learn the model (step S105).

学習部１０２による学習方法はどのような方法であってもよい。例えば学習部１０２は、訓練データが教師なしデータの場合、能動学習によりモデルを学習する。学習部１０２は、訓練データが教師ありデータの場合、教師あり学習によりモデルを学習する。 The learning method by the learning unit 102 may be any method. For example, if the training data is unsupervised data, the learning unit 102 learns the model by active learning. If the training data is supervised data, the learning unit 102 learns the model by supervised learning.

モデルを学習した後、ステップＳ１０３に戻り処理が繰り返される。不確実性が高いデータが得られなかった場合（ステップＳ１０４：Ｎｏ）、学習部１０２は、それまでに学習された学習済みのモデルを出力し（ステップＳ１０６）、学習処理を終了する。 After learning the model, the process returns to step S103 and is repeated. If data with high uncertainty is not obtained (step S104: No), the learning unit 102 outputs the learned model that has been learned so far (step S106), and ends the learning process.

次に、ステップＳ１０３のデータ選別処理の詳細について説明する。図５は、第１の実施形態におけるデータ選別処理の一例を示すフローチャートである。 Next, details of the data sorting process in step S103 will be explained. FIG. 5 is a flowchart illustrating an example of data sorting processing in the first embodiment.

分割部１１１は、入力された選別対象データそれぞれを複数のパーツに分割する（ステップＳ２０１）。算出部１１２は、分割された複数のパーツごとに、不確実度を算出する（ステップＳ２０２）。統合部１１３は、モデルの状態情報を参照して不確実度の統合方法を決定する（ステップＳ２０３）。統合部１１３は、決定した統合方法により、複数の不確実度を統合し、分割前の選別対象データに対する不確実度を算出する（ステップＳ２０４）。統合部１１３は、算出した不確実度が示す不確実性が高い（例えば不確実性が閾値以上である）選別対象データを出力する（ステップＳ２０５）。 The dividing unit 111 divides each of the input sorting target data into a plurality of parts (step S201). The calculation unit 112 calculates the degree of uncertainty for each of the plurality of divided parts (step S202). The integrating unit 113 refers to the state information of the model and determines the method of integrating the uncertainties (step S203). The integrating unit 113 integrates a plurality of degrees of uncertainty using the determined integration method, and calculates the degree of uncertainty for the data to be sorted before division (step S204). The integrating unit 113 outputs data to be sorted that has a high degree of uncertainty indicated by the calculated degree of uncertainty (for example, the degree of uncertainty is greater than or equal to a threshold) (step S205).

例えば上記（Ｉ２）（「１番目に確率の高いラベル」と「２番目に確率の高いラベル」確率の差）のように、値が小さいほど不確実性が高い不確実度を用いる場合、統合部１１３は、不確実度が閾値以下である選別対象データを出力する。上記（Ｉ３）（予測分布のエントロピーの大きさ）のように、値が大きいほど不確実性が高い不確実度を用いる場合、統合部１１３は、不確実度が閾値以上である選別対象データを出力する。 For example, when using an uncertainty degree where the smaller the value, the higher the uncertainty, as in (I2) above (difference between the probabilities of "the first most probable label" and "the second most probable label"), the integration The unit 113 outputs data to be sorted whose degree of uncertainty is less than or equal to a threshold value. When using the degree of uncertainty in which the larger the value, the higher the uncertainty, as in (I3) (magnitude of entropy of the predicted distribution) above, the integration unit 113 selects data to be sorted whose degree of uncertainty is equal to or higher than the threshold. Output.

閾値および終了条件は、予め定められた値または条件を用いてもよいし、ユーザにより設定可能としてもよい。また、閾値および終了条件は、処理中に状況に応じて動的に変更されてもよい。例えば、ある閾値を用いて選別対象データが得られなかった場合は、統合部１１３は、選別対象データが得られやすいように閾値を変更してもよい。 The threshold value and termination condition may be a predetermined value or condition, or may be settable by the user. Further, the threshold value and termination condition may be dynamically changed during processing depending on the situation. For example, if the data to be sorted cannot be obtained using a certain threshold value, the integrating unit 113 may change the threshold value so that the data to be sorted can be easily obtained.

以下に、本実施形態と比較例との相違について説明する。説明を簡単にするため、「ａ」および「ｂ」の２種類のカテゴリのみである文字列を含む画像データをモデルにより認識し、認識したカテゴリを出力する例とする。パーツの分割方法は、画像データをニューラルネットに入力して得らえる特徴マップを分割する方法とする。本実施形態で不確実性が高いデータを選別する方法は、（Ｍ２）（MarginSampling）とする。また、一般に不確実度はノイズおよび文字の綺麗さなどと関連するが、ここではノイズのみで不確実度を表す。 Differences between this embodiment and a comparative example will be explained below. To simplify the explanation, an example will be described in which image data including character strings with only two categories, "a" and "b", is recognized by a model, and the recognized categories are output. The method for dividing parts is to divide a feature map obtained by inputting image data into a neural network. In this embodiment, the method for selecting data with high uncertainty is (M2) (MarginSampling). Further, although the degree of uncertainty is generally related to noise and the beauty of characters, here, the degree of uncertainty is expressed only by noise.

図６および図７は、画像データおよび当該画像データに対する認識結果の例を示す図である。図６および図７内の楔形状は、画像上にノイズが含まれることを意味する。図６および図７の例では、「ａａｂ」の文字列を含む画像データ６０２および７０２が、それぞれ３つのパーツＰ１、Ｐ２、Ｐ３に分割される。画像データ６０２は、パーツＰ３のみにノイズが含まれている。すなわち、画像データ６０２は、局所的に不確実性が高いデータとなる。画像データ７０２は、すべてのパーツにノイズが含まれている。すなわち、画像データ７０２は、全般的な不確実性の高いデータとなる。 6 and 7 are diagrams showing examples of image data and recognition results for the image data. The wedge shape in FIGS. 6 and 7 means that noise is included on the image. In the examples of FIGS. 6 and 7, image data 602 and 702 including the character string "aab" are divided into three parts P1, P2, and P3, respectively. Image data 602 includes noise only in part P3. That is, the image data 602 becomes data with high local uncertainty. Image data 702 includes noise in all parts. In other words, the image data 702 is data with high overall uncertainty.

画像データ６０２および７０２の上には、それぞれ対応する認識結果６０１および７０１の例が示されている。認識結果６０１および７０１内の数値は、各パーツに対応する文字列がカテゴリ「ａ」または「ｂ」に属する確率を示している。 Examples of corresponding recognition results 601 and 701 are shown above image data 602 and 702, respectively. The numerical values in the recognition results 601 and 701 indicate the probability that the character string corresponding to each part belongs to the category "a" or "b".

画像データ６０２は、パーツＰ１、Ｐ２に相当する「ａａ」に対してはノイズがなく高い確率で認識されるが、パーツＰ３に相当する「ｂ」に対しては大きなノイズのため認識が困難となるようなデータである。画像データ７０２は、すべてのパーツＰ１、Ｐ２、Ｐ３に相当する「ａａｂ」に小さなノイズがあり平均的に認識が困難となるようなデータである。 In the image data 602, "aa" corresponding to parts P1 and P2 has no noise and is recognized with a high probability, but "b" corresponding to part P3 is difficult to recognize due to large noise. This is the data. Image data 702 has small noise in "aab" corresponding to all parts P1, P2, and P3, and is data that is difficult to recognize on average.

比較例では、各パーツのカテゴリの確率の積で、各ラベル列の確率を算出し、各ラベル列の確率の差を不確実度とする。ラベル列とは、各パーツの認識結果となりうる文字（ラベル）を並べた情報である。図６および図７の例では、「ａａａ」、「ａａｂ」、「ａｂａ」、「ａｂｂ」、「ｂａａ」、「ｂａｂ」、「ｂｂａ」、「ｂｂｂ」の８種類のラベル列が得られる。 In the comparative example, the probability of each label string is calculated by the product of the probabilities of the categories of each part, and the difference between the probabilities of each label string is defined as the degree of uncertainty. The label string is information in which characters (labels) that can be the recognition result of each part are arranged. In the examples of FIGS. 6 and 7, eight types of label strings are obtained: "aaa", "aab", "aba", "abb", "baa", "bab", "bba", and "bbb".

また比較例では、「１番目に確率の高いラベル列」と「２番目に確率の高いラベル列」との間の確率の差（＝不確実度）が最も小さい画像データが選別されるものとする。比較例では、画像データ６０２に対する不確実度は例えば以下のように算出される。
ラベル列「ａａｂ」：１×１×０．５＝０．５
ラベル列「ａａａ」：１×１×０．５＝０．５
不確実度＝０．５－０．５＝０ In addition, in the comparative example, image data with the smallest difference in probability (=uncertainty) between the "first highest probability label sequence" and the "second highest probability label sequence" is selected. do. In the comparative example, the degree of uncertainty for the image data 602 is calculated as follows, for example.
Label string "aab": 1 x 1 x 0.5 = 0.5
Label string "aaa": 1 x 1 x 0.5 = 0.5
Uncertainty = 0.5-0.5 = 0

また、比較例では、画像データ７０２に対する不確実度は例えば以下のように算出される。
ラベル列「ａａｂ」：０．７５×０．７５×０．７５≒０．４２
ラベル列「ａａａ」：０．７５×０．７５×０．２５≒０．１４
不確実度＝０．４２－０．１４＝０．２８ Furthermore, in the comparative example, the degree of uncertainty for the image data 702 is calculated as follows, for example.
Label string “aab”: 0.75×0.75×0.75≒0.42
Label string “aaa”: 0.75×0.75×0.25≒0.14
Uncertainty = 0.42-0.14 = 0.28

従って、比較例では、不確実性が高い（不確実度の値が小さい）データとして、全般的な不確実性の高い画像データ７０２（不確実度＝０．２８）ではなく、局所的に不確実性が高い画像データ６０２（不確実度＝０）が選別される。 Therefore, in the comparative example, the image data 702 (uncertainty level = 0.28), which has high overall uncertainty, is used as data with high uncertainty (low uncertainty value), but with local uncertainty. Image data 602 (uncertainty level=0) with high certainty is selected.

これに対して本実施形態では、画像データ６０２に対する不確実度は例えば以下のように算出される。
パーツＰ１の不確実度：１－０＝１
パーツＰ２の不確実度：１－０＝１
パーツＰ３の不確実度：０．５－０．５＝０
画像データ６０２の不確実度：（１＋１＋０）／３≒０．６７ In contrast, in this embodiment, the degree of uncertainty for the image data 602 is calculated as follows, for example.
Uncertainty of part P1: 1-0=1
Uncertainty of part P2: 1-0=1
Uncertainty of part P3: 0.5-0.5=0
Uncertainty of image data 602: (1+1+0)/3≒0.67

また、本実施形態では、画像データ７０２に対する不確実度は例えば以下のように算出される。
パーツＰ１の不確実度：０．７５－０．２５＝０．５
パーツＰ２の不確実度：０．７５－０．２５＝０．５
パーツＰ３の不確実度：０．７５－０．２５＝０．５
画像データ７０２の不確実度：（０．５＋０．５＋０．５）／３＝０．５ Furthermore, in this embodiment, the degree of uncertainty for the image data 702 is calculated as follows, for example.
Uncertainty of part P1: 0.75-0.25=0.5
Uncertainty of part P2: 0.75-0.25=0.5
Uncertainty of part P3: 0.75-0.25=0.5
Uncertainty of image data 702: (0.5+0.5+0.5)/3=0.5

従って、本実施形態では、不確実性が高い（不確実度の値が小さい）データとして、全般的な不確実性の高い画像データ７０２が選別される。 Therefore, in the present embodiment, image data 702 with high overall uncertainty is selected as data with high uncertainty (low uncertainty value).

以上のように、比較例では、複数のパーツのうち１つでもパーツごとの１位候補の確率と２位候補の確率の差が小さい場合、ラベル列間の確率の差分（Margin）も必然的に小さくなる。すなわち、局所的な不確実度の影響が大きくなる。 As described above, in the comparative example, if the difference between the probability of the first candidate and the probability of the second candidate for each part is small for even one of the multiple parts, the difference in probability (Margin) between the label strings is also inevitable. becomes smaller. In other words, the influence of local uncertainty increases.

一方、本実施形態では、パーツごとに不確実度を算出し、各パーツの不確実度を統合する（ここでは平均値をとる）ことで、データの不確実度を算出する。このため、全般的な不確実度を考慮したより高精度な不確実度を算出することができる。この結果、全般的に不確実性が高いデータを選別することが可能となる。 On the other hand, in this embodiment, the degree of uncertainty of data is calculated by calculating the degree of uncertainty for each part and integrating the degrees of uncertainty of each part (here, the average value is taken). Therefore, it is possible to calculate a more accurate degree of uncertainty in consideration of the overall degree of uncertainty. As a result, it becomes possible to select data with high overall uncertainty.

また本実施形態では、以下のように計算量を削減することも可能となる。比較例では、データの不確実度を算出するときに、各パーツのカテゴリの積で各ラベル列（カテゴリの全ての組み合わせ）の確率を算出する。このため、比較例の計算量のオーダーは、Ο（（カテゴリのクラス数）^（系列長））となる。すなわち、パーツの数が増えると、計算量が指数関数的に増加する。 Furthermore, in this embodiment, it is also possible to reduce the amount of calculation as described below. In the comparative example, when calculating the degree of uncertainty of data, the probability of each label string (all combinations of categories) is calculated by the product of the categories of each part. Therefore, the order of the amount of calculation in the comparative example is Ο((number of categories classes)^(sequence length)). That is, as the number of parts increases, the amount of calculation increases exponentially.

一方、本実施形態では、パーツごとにクラス間の比較を行えばよい。このため、計算量のオーダーは、Ο（（カテゴリのクラス数）×（系列長））となる。本実施形態では、パーツの数が増えると、計算量が線形関数的に増加する。従って、本実施形態では、比較例より少ない計算量でデータを選別することができる。 On the other hand, in this embodiment, comparison between classes may be performed for each part. Therefore, the order of the amount of calculation is Ο((number of category classes)×(sequence length)). In this embodiment, as the number of parts increases, the amount of calculation increases linearly. Therefore, in this embodiment, data can be selected with a smaller amount of calculation than in the comparative example.

例えば図３の場合、比較例では不確実度を計算するために「ａａａ」、「ａａｂ」、「ａｂａ」、「ａｂｂ」、「ｂａａ」、「ｂａｂ」、「ｂｂａ」、「ｂｂｂ」の８種類の文字列の確率を計算し、比較する必要があるが、本実施形態では３つのパーツに対してそれぞれ「ａ」と「ｂ」の確率を計３回比較するだけでよい。 For example, in the case of FIG. 3, in order to calculate the degree of uncertainty in the comparative example, 8 Although it is necessary to calculate and compare the probabilities of each type of character string, in this embodiment, it is only necessary to compare the probabilities of "a" and "b" for each of the three parts three times in total.

以上のように、第１の実施形態にかかる情報処理装置では、不確実度をより高精度に算出することができる。 As described above, the information processing apparatus according to the first embodiment can calculate the degree of uncertainty with higher accuracy.

（第２の実施形態）
第２の実施形態の情報処理装置は、モデルの学習をより効率的に実行する機能をさらに備える。 (Second embodiment)
The information processing device according to the second embodiment further includes a function of performing model learning more efficiently.

図８は、第２の実施形態にかかる情報処理装置１００－２の構成の一例を示すブロック図である。図８に示すように、情報処理装置１００－２は、前処理部１０１－２と、学習部１０２－２と、出力制御部１０３と、選別部１１０と、記憶部１２１と、を備えている。 FIG. 8 is a block diagram showing an example of the configuration of an information processing device 100-2 according to the second embodiment. As shown in FIG. 8, the information processing device 100-2 includes a preprocessing section 101-2, a learning section 102-2, an output control section 103, a sorting section 110, and a storage section 121. .

第２の実施形態では、前処理部１０１－２および学習部１０２－２の機能が第１の実施形態と異なっている。その他の構成および機能は、第１の実施形態にかかる情報処理装置１００のブロック図である図１と同様であるので、同一符号を付し、ここでの説明は省略する。 In the second embodiment, the functions of the preprocessing section 101-2 and the learning section 102-2 are different from those in the first embodiment. The other configurations and functions are the same as those in FIG. 1, which is a block diagram of the information processing apparatus 100 according to the first embodiment, so the same reference numerals are given and the description thereof will be omitted.

前処理部１０１－２は、認識するカテゴリを設定する機能をさらに備える点が、第１の実施形態の前処理部１０１と異なっている。例えば文字認識の場合、前処理部１０１－２は、カテゴリを文字または部首に設定する。例えば音声認識の場合、前処理部１０１－２は、カテゴリを基本音素に設定する。例えば物体認識の場合、前処理部１０１－２は、カテゴリを認識する各物体に設定する。前処理部１０１－２は、例えばユーザから指定されたカテゴリを、認識するカテゴリとして設定する。 The preprocessing unit 101-2 differs from the preprocessing unit 101 of the first embodiment in that it further includes a function of setting categories to be recognized. For example, in the case of character recognition, the preprocessing unit 101-2 sets the category to characters or radicals. For example, in the case of speech recognition, the preprocessing unit 101-2 sets the category to basic phoneme. For example, in the case of object recognition, the preprocessing unit 101-2 sets a category for each object to be recognized. The preprocessing unit 101-2 sets, for example, a category specified by the user as a category to be recognized.

学習部１０２－２は、不確実性が低いデータを訓練データから除外するための機能（忘却機能）、および、学習の収束を判定する機能をさらに備える点が、第１の実施形態の学習部１０２と異なっている。 The learning unit 102-2 is different from the learning unit of the first embodiment in that it further includes a function for excluding data with low uncertainty from the training data (forgetting function) and a function for determining convergence of learning. It is different from 102.

次に、このように構成された第２の実施形態にかかる情報処理装置１００－２による学習処理について図９を用いて説明する。図９は、第２の実施形態における学習処理の一例を示すフローチャートである。 Next, a learning process performed by the information processing apparatus 100-2 according to the second embodiment configured as described above will be described using FIG. 9. FIG. 9 is a flowchart illustrating an example of learning processing in the second embodiment.

前処理部１０１は、認識処理で用いられるカテゴリを設定する（ステップＳ３０１）。 The preprocessing unit 101 sets categories used in recognition processing (step S301).

ステップＳ３０２、Ｓ３０３は、第１の実施形態にかかる情報処理装置１００におけるステップＳ１０１、Ｓ１０２と同様の処理なので、その説明を省略する。 Steps S302 and S303 are the same processes as steps S101 and S102 in the information processing apparatus 100 according to the first embodiment, so the description thereof will be omitted.

本実施形態では、学習部１０２－２が、訓練データを用いた学習が終了（収束）したか否かを判定する（ステップＳ３０４）。例えば学習部１０２－２は、学習時に算出される誤差およびエラー率などを用いて学習が収束したか判定する。学習部１０２－２は、例えば誤差またはエラー率が閾値以下となった場合に学習が収束したと判定する。 In this embodiment, the learning unit 102-2 determines whether learning using training data has been completed (converged) (step S304). For example, the learning unit 102-2 uses the error and error rate calculated during learning to determine whether learning has converged. The learning unit 102-2 determines that learning has converged, for example, when the error or error rate becomes less than or equal to a threshold value.

学習が終了していない場合（ステップＳ３０４：Ｎｏ）、学習部１０２－２は、ステップＳ３０３に戻り処理を繰り返す。学習が終了した場合（ステップＳ３０４：Ｙｅｓ）、学習部１０２－２は、訓練データの一部を選別対象データに移動する（ステップＳ３０５）。例えば学習部１０２－２は、訓練データからランダムに一部のデータを選択し、選択したデータを選別対象データに移動する。 If the learning is not completed (step S304: No), the learning unit 102-2 returns to step S303 and repeats the process. When the learning is completed (step S304: Yes), the learning unit 102-2 moves part of the training data to the selection target data (step S305). For example, the learning unit 102-2 randomly selects some data from the training data and moves the selected data to the selection target data.

図１０は、訓練データと選別対象データとの関係を説明するための図である。データ１００１は、ステップＳ３０２の処理の対象となるデータ（入力されたデータ）に相当する。ステップＳ３０２で分けられた訓練データを図１０ではベースモデル訓練データ１００２と記載している。またステップＳ３０２で分けられた選別対象データが図１０の選別対象データ１００３に相当する。 FIG. 10 is a diagram for explaining the relationship between training data and selection target data. Data 1001 corresponds to the data (input data) to be processed in step S302. The training data divided in step S302 is indicated as base model training data 1002 in FIG. Furthermore, the data to be sorted separated in step S302 corresponds to the data to be sorted 1003 in FIG.

選別対象データ１００３のうち選別処理により選別されたデータ（選別：Ｙｅｓ）は、訓練データ１００４に追加される。訓練データ１００４は、ベースモデル訓練データ１００２とともにステップＳ３０８のモデルの学習で用いられる。選別対象データ１００３のうち選別処理により選別されなかったデータ（選別：Ｎｏ）は、選別対象データ１００３に残される。 Data selected by the screening process (selection: Yes) out of the screening target data 1003 is added to the training data 1004. The training data 1004 is used together with the base model training data 1002 in model learning in step S308. Of the data to be sorted 1003, data that has not been sorted by the sorting process (sorting: No) is left in the data to be sorted 1003.

本実施形態では、ステップＳ３０８のモデルの学習を繰り返すごとに、ステップＳ３０５で、ベースモデル訓練データ１００２の一部を選別対象データに移動する。仮に不確実性が低いデータがベースモデル訓練データ１００２に含まれていた場合は、ステップＳ３０５の処理でこのデータが選別対象データ１００３に移動されれば、選別処理により選別されず、訓練データ１００４から除外することが可能となる。従って、より効率的にモデルを学習することができる。 In this embodiment, each time the model learning in step S308 is repeated, a part of the base model training data 1002 is moved to the selection target data in step S305. If data with low uncertainty is included in the base model training data 1002, if this data is moved to the selection target data 1003 in the process of step S305, it will not be selected by the selection process and will be removed from the training data 1004. It becomes possible to exclude. Therefore, the model can be learned more efficiently.

図９の説明に戻る。ステップＳ３０６～Ｓ３０９は、第１の実施形態にかかる情報処理装置１００におけるステップＳ１０３～Ｓ１０６と同様の処理なので、その説明を省略する。 Returning to the explanation of FIG. 9. Steps S306 to S309 are the same processes as steps S103 to S106 in the information processing apparatus 100 according to the first embodiment, so the description thereof will be omitted.

ステップＳ３０８の後、学習部１０２－２は、選別したデータを用いた学習が終了（収束）したか否かを判定する（ステップＳ３１０）。この処理は、ステップＳ３０４と同様に、学習時に算出される誤差およびエラー率などを用いた判定処理である。 After step S308, the learning unit 102-2 determines whether learning using the selected data has been completed (converged) (step S310). Similar to step S304, this process is a determination process using the error, error rate, etc. calculated during learning.

学習が終了していない場合（ステップＳ３１０：Ｎｏ）、学習部１０２－２は、ステップＳ３０８に戻り処理を繰り返す。学習が終了した場合（ステップＳ３１０：Ｙｅｓ）、学習部１０２－２は、ステップＳ３０５に戻り処理を繰り返す。 If the learning is not completed (step S310: No), the learning unit 102-2 returns to step S308 and repeats the process. If the learning is completed (step S310: Yes), the learning unit 102-2 returns to step S305 and repeats the process.

このように、第２の実施形態にかかる情報処理装置では、不確実性が低いデータを訓練データから除外する機能、および、学習の収束を判定する機能をさらに備えるため、モデルの学習をより効率的に実行することができる。 In this way, the information processing device according to the second embodiment further includes a function to exclude data with low uncertainty from training data and a function to determine convergence of learning, so that model learning can be made more efficient. can be carried out in a specific manner.

以上説明したとおり、第１から第２の実施形態によれば、不確実度を高精度に算出することができる。 As explained above, according to the first to second embodiments, the degree of uncertainty can be calculated with high accuracy.

次に、第１または第２の実施形態にかかる情報処理装置のハードウェア構成について図１１を用いて説明する。図１１は、第１または第２の実施形態にかかる情報処理装置のハードウェア構成例を示す説明図である。 Next, the hardware configuration of the information processing apparatus according to the first or second embodiment will be described using FIG. 11. FIG. 11 is an explanatory diagram showing an example of the hardware configuration of the information processing apparatus according to the first or second embodiment.

第１または第２の実施形態にかかる情報処理装置は、ＣＰＵ（Central Processing Unit）５１などの制御装置と、ＲＯＭ（Read Only Memory）５２やＲＡＭ（Random Access Memory）５３などの記憶装置と、ネットワークに接続して通信を行う通信Ｉ／Ｆ５４と、各部を接続するバス６１を備えている。 The information processing device according to the first or second embodiment includes a control device such as a CPU (Central Processing Unit) 51, a storage device such as a ROM (Read Only Memory) 52 or a RAM (Random Access Memory) 53, and a network. It is provided with a communication I/F 54 that connects to perform communication, and a bus 61 that connects each part.

第１または第２の実施形態にかかる情報処理装置で実行されるプログラムは、ＲＯＭ５２等に予め組み込まれて提供される。 The program executed by the information processing device according to the first or second embodiment is provided by being pre-installed in the ROM 52 or the like.

第１または第２の実施形態にかかる情報処理装置で実行されるプログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ－ＲＯＭ（Compact Disk Read Only Memory）、フレキシブルディスク（ＦＤ）、ＣＤ－Ｒ（Compact Disk Recordable）、ＤＶＤ（Digital Versatile Disk）等のコンピュータで読み取り可能な記録媒体に記録してコンピュータプログラムプロダクトとして提供されるように構成してもよい。 The program executed by the information processing device according to the first or second embodiment is a file in an installable format or an executable format and is stored on a CD-ROM (Compact Disk Read Only Memory), a flexible disk (FD), or a CD. It may also be configured to be recorded on a computer-readable recording medium such as -R (Compact Disk Recordable) or DVD (Digital Versatile Disk) and provided as a computer program product.

さらに、第１または第２の実施形態にかかる情報処理装置で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、第１または第２の実施形態にかかる情報処理装置で実行されるプログラムをインターネット等のネットワーク経由で提供または配布するように構成してもよい。 Furthermore, the program executed by the information processing apparatus according to the first or second embodiment is configured to be stored on a computer connected to a network such as the Internet, and provided by being downloaded via the network. Good too. Furthermore, the program executed by the information processing apparatus according to the first or second embodiment may be provided or distributed via a network such as the Internet.

第１または第２の実施形態にかかる情報処理装置で実行されるプログラムは、コンピュータを上述した情報処理装置の各部として機能させうる。このコンピュータは、ＣＰＵ５１がコンピュータ読取可能な記憶媒体からプログラムを主記憶装置上に読み出して実行することができる。 The program executed by the information processing device according to the first or second embodiment can cause the computer to function as each part of the information processing device described above. In this computer, the CPU 51 can read a program from a computer-readable storage medium onto the main storage device and execute it.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, substitutions, and changes can be made without departing from the gist of the invention. These embodiments and their modifications are included within the scope and gist of the invention, as well as within the scope of the invention described in the claims and its equivalents.

１００、１００－２情報処理装置
１０１、１０１－２前処理部
１０２、１０２－２学習部
１０３出力制御部
１１０選別部
１１１分割部
１１２算出部
１１３統合部
１２１記憶部 100, 100-2 Information processing device 101, 101-2 Preprocessing unit 102, 102-2 Learning unit 103 Output control unit 110 Sorting unit 111 Division unit 112 Calculation unit 113 Integration unit 121 Storage unit

Claims

a dividing unit that divides input data to be subjected to a process of inputting data and outputting a processing result into a plurality of partial data;
a calculation unit that executes the processing for each of the plurality of partial data and calculates a plurality of degrees of uncertainty indicating the uncertainty of the processing;
an integrating unit that integrates the plurality of uncertainties and outputs them as the uncertainties of the input data;
Equipped with
The integration unit determines one of a plurality of integration methods based on status information indicating a state of the processing on the verification data, and integrates the plurality of uncertainties using the determined integration method. death,
The plurality of integration methods include a method of calculating an average value or a median value of a plurality of the degrees of uncertainty, a method of calculating an average value or a median value of some of the degrees of uncertainty among the plurality of degrees of uncertainty, and a method of calculating a weighted sum of the plurality of uncertainties,
Information processing device.

Some of the plurality of uncertainties are uncertainties whose values are included in a predetermined range among the plurality of uncertainties,
The information processing device according to claim 1 .

further comprising a learning unit that learns a machine learning model for executing the process using a plurality of input data whose uncertainty indicated by the degree of uncertainty output by the integrating unit is equal to or greater than a threshold value as training data;
The information processing device according to claim 1.

An information processing method executed by an information processing device, the method comprising:
a dividing step of dividing input data to be subjected to a process of inputting data and outputting a processing result into a plurality of partial data;
a calculation step of executing the process for each of the plurality of partial data and calculating a plurality of degrees of uncertainty indicating the uncertainty of the process;
an integrating step of integrating a plurality of the uncertainties and outputting them as the uncertainties of the input data;
including;
In the integration step, one of a plurality of integration methods is determined based on status information indicating the status of the processing for the verification data, and the plurality of uncertainties are integrated using the determined integration method. death,
The plurality of integration methods include a method of calculating an average value or a median value of a plurality of the degrees of uncertainty, a method of calculating an average value or a median value of some of the degrees of uncertainty among the plurality of degrees of uncertainty, and a method of calculating a weighted sum of the plurality of uncertainties,
Information processing method.

to the computer,
a dividing step of dividing input data to be subjected to a process of inputting data and outputting a processing result into a plurality of partial data;
a calculation step of executing the process for each of the plurality of partial data and calculating a plurality of degrees of uncertainty indicating the uncertainty of the process;
an integrating step of integrating a plurality of the uncertainties and outputting them as the uncertainties of the input data;
run the
In the integration step, one of a plurality of integration methods is determined based on status information indicating the status of the processing for the verification data, and the plurality of uncertainties are integrated using the determined integration method. death,
The plurality of integration methods include a method of calculating an average value or a median value of a plurality of the degrees of uncertainty, a method of calculating an average value or a median value of some of the degrees of uncertainty among the plurality of degrees of uncertainty, and a method of calculating a weighted sum of the plurality of uncertainties,
program .