JP2023086053A

JP2023086053A - Learning data management method, learning data generation apparatus, and machine learning method

Info

Publication number: JP2023086053A
Application number: JP2021200450A
Authority: JP
Inventors: 智也太田; Tomoya Ota
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2023-06-21

Abstract

To provide a learning data management method which reflects data useful for learning of a model of a past generation in re-learning, a learning data generation apparatus, and a machine learning method.SOLUTION: In a learning data management method for recording, as inference result management information, information as to whether an inference result has been corrected or not, the inference result obtained by a first machine learning model trained with first learning data, an inference result management unit of an inference system serving as a learning data generation apparatus records the inference result management information. A dataset management unit extracts, as first correction data list, data for which inference results obtained by the first machine learning model have been corrected, based on the inference result management information. A learning data output unit outputs, as learning data, data including at least one piece of data included in the first correction data list. A machine learning method trains a machine learning model using the learning data output from the learning data output unit of the learning data generation apparatus.SELECTED DRAWING: Figure 1

Description

本発明は、機械学習に用いる学習データに関する。 The present invention relates to learning data used for machine learning.

近年、機械学習モデルを用いた推論が実用化されている。一例として、工場などにおける製品の目視検査において、人手不足解消などの観点から機械学習モデルを用いた検査システムが提案されている。例えば、画像データを用いた検査システムでは、画像データに映っている被写体に対する傷、欠陥、ゆがみなどをラベルとしてして推論し、それらに対して確信度を0～1の値で求め、閾値以上の最大値を持つラベルを結果として出力する。 In recent years, inference using machine learning models has been put to practical use. As an example, an inspection system using a machine learning model has been proposed from the viewpoint of resolving labor shortages in the visual inspection of products in factories and the like. For example, in an inspection system that uses image data, it makes inferences by using labels such as scratches, defects, and distortions on the subject shown in the image data, and determines the degree of certainty for each of these as a value between 0 and 1. output the label with the maximum value of .

このような機械学習を活用する検査システムでは、モデルによる推論精度の維持、向上のために継続的にモデルを更新(再学習)する必要がある。再学習では、例えば、あるモデルを元として、新たな学習データを与えることで新たなモデルを生成する。新たな学習データとしては、そのモデルで推論の確信度が低かったデータや、誤ったラベルに高確信度を与えたデータが、そのモデルの精度の向上に有用なデータとして用いられる。 In inspection systems that utilize such machine learning, it is necessary to continuously update (re-learn) the model in order to maintain and improve the inference accuracy of the model. In re-learning, for example, a new model is generated by giving new learning data based on a certain model. As new training data, data with low confidence in inference in the model and data with high confidence given to erroneous labels are used as useful data for improving the accuracy of the model.

つまり、再学習を、時間的、データ量的に効率的に行うには、再学習に有用と考えられるデータを効率よく抽出し、管理する必要がある。 In other words, in order to perform re-learning efficiently in terms of time and data amount, it is necessary to efficiently extract and manage data considered useful for re-learning.

例えば、特許文献１では、確度が低いデータを、モデルが不得意のため、モデルの精度向上に有用なデータとみなし、検査システムによる検査結果を蓄積し、蓄積されたデータのうち、確度が低いデータをサンプリングにより抽出し、抽出されたデータに対して正解のラベルを付与することで新たな学習データとして抽出する方法が開示されている。 For example, in Patent Document 1, data with low accuracy is regarded as useful data for improving the accuracy of the model because the model is not good at it, and the inspection results of the inspection system are accumulated. A method of extracting data by sampling and assigning a correct label to the extracted data to extract new learning data is disclosed.

US2020/0151613 A1US2020/0151613 A1

特許文献1では、過去の世代で学習に用いたデータを考慮していなかった。特許文献1では、検査システムでの推論結果から、確度が低いデータを集めるため、現在の検査システムで利用されているモデルに依存したデータのみが抽出される。 Patent Document 1 does not consider data used for learning in past generations. In Patent Document 1, in order to collect data with low accuracy from the inference results of the inspection system, only data dependent on the model used in the current inspection system is extracted.

しかしながら、モデルは再学習を繰り返すため、過去の世代のモデルの学習で有用だったデータが、再学習に反映されない可能性があった。 However, since the model is repeatedly relearned, there is a possibility that the data that was useful in the training of past generation models may not be reflected in the relearning.

本発明の課題は、過去の世代のモデルの学習で有用だったデータを、再学習に反映することにある。 An object of the present invention is to reflect data that was useful in learning models of past generations in re-learning.

本発明の好ましい一側面は、第１の学習データで学習した第１の機械学習モデルによる推論結果を修正したかどうかを、推論結果管理情報として記録する第１のステップ、を実行する、学習データ管理方法である。 A preferred aspect of the present invention is a first step of recording as inference result management information whether or not the inference result by the first machine learning model learned with the first learning data is corrected. management method.

本発明の好ましい他の一側面は、推論結果管理部と、データセット管理部と、学習データ出力部を備え、前記推論結果管理部は、第１の学習データで学習した第１の機械学習モデルによる推論結果を修正したかどうかを、推論結果管理情報として記録し、前記データセット管理部は、前記推論結果管理情報に基づいて、前記第１の機械学習モデルによる推論結果が修正されたデータを第１の修正データリストとして抽出し、前記学習データ出力部は、前記第１の修正データリストに含まれるデータの少なくとも一つを含むデータを、学習データとして出力する、学習データ生成装置である。 Another preferred aspect of the present invention includes an inference result management unit, a data set management unit, and a learning data output unit, and the inference result management unit is configured to generate a first machine learning model trained with first learning data. records as inference result management information whether or not the inference result is corrected by The learning data generating device extracts data as a first correction data list, and the learning data output unit outputs data including at least one of the data included in the first correction data list as learning data.

本発明の好ましい他の一側面は、前記学習データ生成装置の前記学習データ出力部から出力された学習データを用いて、機械学習モデルを学習する、機械学習方法である。 Another preferred aspect of the present invention is a machine learning method of learning a machine learning model using learning data output from the learning data output unit of the learning data generation device.

過去の世代のモデルの学習で有用だったデータを、再学習に反映することができる。 Data that was useful in training previous generation models can be reflected in retraining.

システム構成の一例を示すブロック図。FIG. 2 is a block diagram showing an example of system configuration; データセット管理テーブルの一例を示す表図。FIG. 4 is a table diagram showing an example of a data set management table; モデル管理テーブルの一例を示す表図。FIG. 4 is a diagram showing an example of a model management table; FIG. 推論結果管理テーブルの一例を示す表図。FIG. 4 is a table diagram showing an example of an inference result management table; データセット管理部の画面の一例を示すイメージ図。FIG. 4 is an image diagram showing an example of a screen of a data set management unit; データセット管理部の処理のフローチャートの一例。An example of a flowchart of processing of a data set management unit. データ抽出処理のフローチャートの一例。An example of a flowchart of data extraction processing. 学習データ出力部の処理のフローチャートの一例。An example of the flowchart of the process of a learning data output part. 実施例の動作を示す実行例を示す表図。FIG. 4 is a table showing an execution example showing the operation of the embodiment; 実施例の動作を示す実行例を示す表図。FIG. 4 is a table showing an execution example showing the operation of the embodiment; 実施例の動作を示す実行例を示す表図。FIG. 4 is a table showing an execution example showing the operation of the embodiment; 実行例におけるデータセット管理部が抽出するデータの例を示す表図。FIG. 4 is a table showing an example of data extracted by a data set management unit in an execution example; 実行例におけるデータセット管理部が抽出するデータの例を示す表図。FIG. 4 is a table showing an example of data extracted by a data set management unit in an execution example; 実行例におけるデータセット管理部が抽出するデータの例を示す表図。FIG. 4 is a table showing an example of data extracted by a data set management unit in an execution example; 実行例におけるデータセット管理部の画面の例を示すイメージ図。FIG. 11 is an image diagram showing an example of a screen of a data set management unit in an execution example; 計算機の構成の一例を示すブロック図。FIG. 2 is a block diagram showing an example of the configuration of a computer;

以下に図面を参照しながら本発明の実施形態を説明する。なお、以下の説明により本発明が限定されるものではない。 Embodiments of the present invention will be described below with reference to the drawings. In addition, the present invention is not limited by the following description.

以下に説明する実施例の構成において、同一部分又は同様な機能を有する部分には同一の符号を異なる図面間で共通して用い、重複する説明は省略することがある。 In the configurations of the embodiments described below, the same reference numerals may be used in common for the same parts or parts having similar functions in different drawings, and redundant description may be omitted.

同一あるいは同様な機能を有する要素が複数ある場合には、同一の符号に異なる添字を付して説明する場合がある。ただし、複数の要素を区別する必要がない場合には、添字を省略して説明する場合がある。 When there are a plurality of elements having the same or similar functions, they may be described with the same reference numerals and different suffixes. However, if there is no need to distinguish between multiple elements, the subscripts may be omitted.

本明細書等における「第１」、「第２」、「第３」などの表記は、構成要素を識別するために付するものであり、必ずしも、数、順序、もしくはその内容を限定するものではない。また、構成要素の識別のための番号は文脈毎に用いられ、一つの文脈で用いた番号が、他の文脈で必ずしも同一の構成を示すとは限らない。また、ある番号で識別された構成要素が、他の番号で識別された構成要素の機能を兼ねることを妨げるものではない。 Notations such as “first”, “second”, “third” in this specification etc. are attached to identify the constituent elements, and do not necessarily limit the number, order, or content thereof isn't it. Also, numbers for identifying components are used for each context, and numbers used in one context do not necessarily indicate the same configuration in other contexts. Also, it does not preclude a component identified by a certain number from having the function of a component identified by another number.

図面等において示す各構成の位置、大きさ、形状、範囲などは、発明の理解を容易にするため、実際の位置、大きさ、形状、範囲などを表していない場合がある。このため、本発明は、必ずしも、図面等に開示された位置、大きさ、形状、範囲などに限定されない。 The position, size, shape, range, etc. of each component shown in the drawings, etc. may not represent the actual position, size, shape, range, etc., in order to facilitate understanding of the invention. Therefore, the present invention is not necessarily limited to the positions, sizes, shapes, ranges, etc. disclosed in the drawings and the like.

本明細書で引用した刊行物、特許および特許出願は、そのまま本明細書の説明の一部を構成する。 All publications, patents and patent applications cited herein are hereby incorporated by reference into this description.

本明細書において単数形で表される構成要素は、特段文脈で明らかに示されない限り、複数形を含むものとする。 Elements presented herein in the singular shall include the plural unless the context clearly dictates otherwise.

以下の実施例で詳細に説明するが、発明者らの検討によると、特許文献1の手法には２つの課題がある。１つは、過去の世代で学習に用いたデータを考慮していない点である。検査システムでの推論結果から、確度が低いデータを集めるため、現在の検査システムで利用されているモデルに依存したデータのみが抽出される。しかしながら、モデルは再学習を繰り返すため、過去の世代の学習で有用だったデータが漏れてしまう可能性がある。 Although detailed description will be given in the following examples, according to the study of the inventors, the technique of Patent Document 1 has two problems. One is that the data used for learning in past generations is not considered. In order to collect data with low accuracy from the inference results of the inspection system, only data dependent on the model used in the current inspection system is extracted. However, because the model is retrained repeatedly, data that was useful in training previous generations may be leaked.

２つ目の課題は、ラベル付けの結果がデータセットの抽出に反映できていないことである。推論結果には、確度は高くても誤ったラベルの付いたデータが存在する。これらは人手によるラベル付け作業により判明するが、特許文献1の手法では、これらのデータを学習データとして抽出できていない。 The second problem is that the results of labeling cannot be reflected in the extraction of datasets. The inference results contain mislabeled data even if the accuracy is high. These can be identified by manual labeling work, but the method of Patent Document 1 cannot extract these data as learning data.

実施例の一つでは、モデルの学習履歴と、推論結果のデータに対するラベルの修正履歴を管理する。これらを用いて、再学習の元となるモデルや、そのモデルの元のになった世代のモデルを用いて行われた推論結果に対してラベルが修正されたデータと、元となったモデルの学習データに含まれるラベルが修正されたデータを抽出する。 In one embodiment, the learning history of the model and the correction history of the label for the inference result data are managed. Using these, the model that is the source of re-learning, the data whose labels are corrected for the inference results performed using the model of the generation that is the source of that model, and the data of the source model Extract the corrected label data included in the training data.

このような構成で、ラベルが修正されたデータを、過去の世代のモデルも含めて効率的に抽出できるようになる。これにより、再学習における学習データの抽出が効率化できる。また、有用なデータのみを抽出することで学習に必要となるデータ量を少なくすることが可能となり、学習時間の短縮などが見込める。 Such a configuration enables efficient extraction of label-corrected data, including past generation models. As a result, extraction of learning data in re-learning can be made more efficient. In addition, by extracting only useful data, it is possible to reduce the amount of data required for learning, which is expected to shorten the learning time.

本実施例は、入力されたデータがあるクラスに属するかを推論する推論処理において、再学習に必要となる学習データを管理する方法に関する。一例として画像データによる不良検査（ラベリング）を対象に説明する。本実施例は一例であり、セグメンテーションや分類などの他の画像解析や、文字認識や音声認識に適用可能である。また、実施例では分類問題を扱う機械学習モデルの例を説明するが、回帰問題を扱う機械学習モデルでも同様に適用することができる。 This embodiment relates to a method of managing learning data necessary for re-learning in inference processing for inferring whether input data belongs to a certain class. As an example, defect inspection (labeling) using image data will be described. This embodiment is an example, and can be applied to other image analysis such as segmentation and classification, character recognition, and voice recognition. In addition, although an example of a machine learning model that handles a classification problem will be described in the embodiments, it can also be applied to a machine learning model that handles a regression problem in the same way.

図１は本実施例を実現するシステム構成の一例である。本システムは、推論の実行、学習データの管理、再学習を行う推論システム101、推論システム101の利用者が用いるシステム利用者端末102から構成され、それらがネットワーク103を介して接続される。 FIG. 1 shows an example of a system configuration for realizing this embodiment. This system consists of an inference system 101 that executes inference, manages learning data, and performs re-learning, and a system user terminal 102 that is used by a user of the inference system 101. These are connected via a network 103. FIG.

当該システムは、基本的にサーバーのような情報処理装置で構成することができ、一般的なコンピュータの構成として、入力装置、出力装置、記憶装置、および処理装置（CPU : Central Processing Unit）を備える。以下の説明では情報処理装置が当然備える構成の説明は省略し、本実施例の機能を中心に説明する。 The system can basically be configured with an information processing device such as a server, and has an input device, an output device, a storage device, and a processing device (CPU: Central Processing Unit) as a general computer configuration. . In the following description, the description of the configuration of the information processing apparatus will be omitted, and the description will focus on the functions of the present embodiment.

推論システム101は、推論部110、データ入力部111、推論結果管理部112、学習データ管理部113、再学習部114、入出力部115から構成される。本実施例では、これらの構成を情報処理装置の記憶装置が記憶するソフトウェアを、処理装置が実行することにより実現することにした。ただし、ソフトウェアで構成した機能と同等の機能は、FPGA（Field Programmable Gate Array）、ASIC（Application Specific Integrated Circuit）などのハードウエアでも実現できる。そのような態様も実施例の範囲に含まれる。 Inference system 101 includes inference unit 110 , data input unit 111 , inference result management unit 112 , learning data management unit 113 , relearning unit 114 , and input/output unit 115 . In this embodiment, these configurations are realized by the processing device executing software stored in the storage device of the information processing device. However, functions equivalent to those configured with software can also be realized with hardware such as FPGAs (Field Programmable Gate Arrays) and ASICs (Application Specific Integrated Circuits). Such aspects are also included in the scope of the examples.

また、本システムは、入力データ群131、データセット管理テーブル132、推論結果管理テーブル133、モデル群135、モデル管理テーブル134をデータベースとして利用可能である。データベースは例えば、磁気ディスク装置のような記憶装置に格納することができる。 In addition, this system can use an input data group 131, a data set management table 132, an inference result management table 133, a model group 135, and a model management table 134 as a database. The database can be stored, for example, in a storage device such as a magnetic disk device.

以上のシステム構成は、単体のコンピュータで構成してもよいし、あるいは、任意の部分が、ネットワークで接続された他のコンピュータで構成されてもよい。 The system configuration described above may be composed of a single computer, or any part thereof may be composed of other computers connected via a network.

利用者は、まず、推論システム101に対して最初に推論部110で使用する学習済みモデルと、そのモデルの学習に使用した学習データを、入出力部115を介して入力する。モデルは一般的にDNN(Deep Neural Network)などのネットワーク構成が知られている。１つの学習データは、一般に説明変数（問題）と目的変数（正解）の組で構成される。また、このようなネットワークを、学習データを用いた機械学習で学習することも知られている。本実施例では、公知の部分については説明を省略する。 First, the user inputs to the inference system 101 through the input/output unit 115 a trained model to be used in the inference unit 110 and learning data used in learning the model. Network configurations such as DNN (Deep Neural Network) are generally known as models. One piece of learning data generally consists of a set of an explanatory variable (problem) and an objective variable (correct answer). It is also known to learn such a network by machine learning using learning data. In the present embodiment, description of known parts is omitted.

入力された学習データは、入力データ群131に記録され、それらの入力データのグループをデータセットとしてデータセット管理テーブル132にて管理する。また、モデルは、モデル群135に記録され、モデル管理テーブル134にて管理される。 The input learning data is recorded in the input data group 131, and the data set management table 132 manages the group of the input data as a data set. Also, the models are recorded in the model group 135 and managed in the model management table 134 .

図２はデータセット管理テーブル132の一例である。データセットID201はデータセットを識別するためのIDである。データリスト202はデータセットに含まれる入力データのIDをリストとして持つ。例えば、行204は、IDが1のデータセットは、入力データ群131のIDが1から50のデータで構成されることを示す。なお、以降では、IDが1のデータセットをデータセット1、IDが1のデータをデータ1と表現する。 FIG. 2 is an example of the data set management table 132. As shown in FIG. A dataset ID 201 is an ID for identifying a dataset. A data list 202 has a list of IDs of input data included in the data set. For example, row 204 indicates that a data set with an ID of 1 consists of data with an ID of 1 to 50 in the input data group 131 . Hereinafter, a data set with an ID of 1 will be referred to as a data set 1, and data with an ID of 1 will be referred to as a data 1.

データセット管理テーブル132では、学習データや後述する修正データのデータセットを管理することができる。また、モデルのテスト用データや、実運用時の推論対象となるデータを管理してもよい。また、これらのデータ種別を識別するフラグを付加してもよい。 The dataset management table 132 can manage datasets of learning data and corrected data, which will be described later. Also, model test data and data to be inferred during actual operation may be managed. Also, a flag for identifying these data types may be added.

図３は、モデル管理テーブル134の一例である。モデルID301は一例としてはモデルの世代を識別するためのIDである。モデル管理テーブル134は、あるモデルに対して、その学習に使用したデータセットである学習データセットID302、そのモデルを用いた推論で正しく判定できなかったデータのグループをあらわす修正データセットID303、モデルの再学習による派生の関係を示す次モデルID304、および、そのモデルの作成時間305より構成される。例えば、行306では、モデルIDが1のモデルは、学習にデータセットIDが1のデータセットが使用されたことを示している。 FIG. 3 is an example of the model management table 134. As shown in FIG. The model ID 301 is, for example, an ID for identifying the generation of the model. The model management table 134 includes, for a certain model, a learning data set ID 302 that is a data set used for learning, a correction data set ID 303 that represents a group of data that could not be correctly determined by inference using that model, and a model It is composed of a next model ID 304 indicating the relation of derivation by re-learning and a creation time 305 of the model. For example, in row 306, the model with model ID 1 indicates that the dataset with dataset ID 1 was used for training.

推論部110では、入出力部115にて、ネットワーク103を介して、利用者から推論対象のデータあるいはテスト用のデータを受け取り、モデル管理テーブル134に登録されている学習済みのモデルを用いて推論を実行する。推論対象のデータとは、一般に説明変数（問題）であり、推論部110は推定した目的変数を出力する。本実施例では、作成時間が最新のモデルを１つ用いて推論されるものとする。 In the inference unit 110, the input/output unit 115 receives data to be inferred or test data from the user via the network 103, and performs inference using the trained model registered in the model management table 134. to run. Data to be inferred are generally explanatory variables (problems), and the inference unit 110 outputs the estimated objective variable. In this embodiment, inference is made using one model whose creation time is the latest.

推論処理とは、例えば、入力された画像データに対して、あらかじめ定められた画像の特徴に対する類似度を0～1の値で出力する。製品の不良検査の場合、「傷」、「欠損」、「ゆがみ」などの特徴を表すラベルとして定め、入力された画像データに対して、それぞれのラベルの類似値を出力する。そして、ラベルに対する値が一定数(例えば、0.8等)を超えた一番大きな値を持つラベルをその画像データに対するラベルと決定する。推論部110では、入力された画像データは新たにIDを採番し、入力データ群131に記録するとともに、推論結果を、推論結果管理テーブル133に記録する。 The inference processing is, for example, outputting a value of 0 to 1 as a degree of similarity with respect to predetermined image features with respect to the input image data. In the case of defect inspection of products, labels representing features such as "flaw", "defect", and "distortion" are determined, and the similarity value of each label is output for the input image data. Then, the label having the largest value exceeding a certain number (for example, 0.8) for the label is determined as the label for the image data. The inference unit 110 newly assigns an ID to the input image data, records it in the input data group 131 , and records the inference result in the inference result management table 133 .

上記のラベリング（分類）の他、物体検出では、画像中の対象物にラベル付けするとともに位置（XY座標）を検出する。骨格検出では、画像中の対象物の特定の箇所にラベル付けするとともに位置を検出する。セグメンテーションでは、画像のピクセルごとにラベル情報を付して領域を色分けする。また音声認識では音声データに対して文字を対応させる。学習データセットの形式は基本的に問題と解答の組である。 In addition to the labeling (classification) described above, object detection involves labeling an object in an image and detecting its position (XY coordinates). Skeleton detection involves labeling and locating specific locations of objects in an image. In segmentation, each pixel in the image is labeled with information to color-code the region. In speech recognition, characters are associated with speech data. The format of the training data set is basically a question-answer pair.

図４は、推論結果管理テーブル133の一例である。データID401は、入力データ群131に登録されているデータのIDを示している。モデルID402はデータID401のデータに対して、推論を行ったモデルのIDを、ラベル403は、その推論の結果として出力されたラベルを記録している。例えば、行406は、データIDが51のデータに対しては、モデル１にて推論を実施し、「正常」と判定されたことを示している。行407では、データIDが52のデータに対しては、モデル１にて推論が実施され、「欠損」と判定されたことを示している。 FIG. 4 is an example of the inference result management table 133. As shown in FIG. A data ID 401 indicates the ID of data registered in the input data group 131 . The model ID 402 records the ID of the model that performed the inference with respect to the data of the data ID 401, and the label 403 records the label output as a result of the inference. For example, line 406 indicates that the data with data ID 51 was inferred by model 1 and determined to be "normal". Line 407 indicates that data with a data ID of 52 was inferred in model 1 and determined to be "missing".

推論部110により入力データに対して順次推論が実施されるが、推論結果がすべて正しいとは限らない。そのため、推論処理が実行された後に、推論結果の誤りが判明することがある。例えば、工場のラインを流れる製品の不良検査を行っている場合、推論システムによる自動診断と同時に人手あるいは他の推論システムによるダブルチェックを行っていたり、あるいは、製品出荷などの後の工程で不良が判明したりする場合などがある。このような場合、推論結果管理部112では、新たに判明した正しいラベルを推論結果管理テーブル133に記録する。 The inference unit 110 performs inference on the input data sequentially, but not all the inference results are correct. Therefore, an error in the inference result may be found after the inference process is executed. For example, when performing defect inspections on products flowing through a factory line, automatic diagnosis by an inference system and double checks are performed manually or by other inference systems. There are cases where it becomes clear. In such a case, the inference result management unit 112 records the newly found correct label in the inference result management table 133 .

例えば、図４の行408で示されるデータ53では、推論部110では、モデル１を用いて「傷」と判定されていたが、のちに、「正常」と判明し、修正されたことを示している。行409で示されるデータ54では、「正常」の判定が、のちに「傷」に修正されている。 For example, data 53 shown in row 408 of FIG. ing. In the data 54 indicated by line 409, the judgment of "normal" is later corrected to "wound".

推論結果管理テーブル133に蓄積されている推論結果が少ない場合は、データ入力部111を用いて、入力データを追加することができる。データ入力部111では、入出力部115にて、ネットワーク103を介して、利用者から入力データを受け取り入力データ群131に登録する。また、登録されたデータに対して、推論結果管理部112を介して修正結果404を登録する。また、データ入力部111は、モデル管理テーブル134に登録されているモデルを用いて仮の推論結果をラベル403欄に記録してもよい。この場合、利用者はデータ入力部111が作成したラベル403を確認し、誤っている場合は、修正結果404を書き込む。 If the number of inference results accumulated in the inference result management table 133 is small, the data input unit 111 can be used to add input data. In the data input unit 111 , the input/output unit 115 receives input data from the user via the network 103 and registers it in the input data group 131 . Also, the correction result 404 is registered via the inference result management unit 112 for the registered data. Further, the data input unit 111 may use the model registered in the model management table 134 to record the provisional inference result in the label 403 column. In this case, the user checks the label 403 created by the data input unit 111 and writes the correction result 404 if it is incorrect.

学習データ管理部113では、推論部110により推論結果管理テーブル133に蓄積されたデータを用いて、再学習に用いるデータの候補となるデータセットを抽出する。学習データ管理部113は、推論部110による推論結果が十分に蓄積された場合や、推論部110での推論結果に対するラベルの修正の割合が一定以上になった場合、あるいは、定期的に、推論システム101の利用者により実行される。 The learning data management unit 113 uses the data accumulated in the inference result management table 133 by the inference unit 110 to extract data sets that are candidates for data used for re-learning. The learning data management unit 113 performs inference when a sufficient amount of inference results by the inference unit 110 is accumulated, when the rate of label correction for the inference result in the inference unit 110 exceeds a certain level, or when the inference unit 113 periodically Executed by the user of the system 101.

学習データ管理部113では、まず、データセット管理部121により、ユーザインタフェースにより指定されたモデルに対する修正データを抽出し、その結果を表示する。利用者は、表示されたデータセットに対して、出力データセットに含めるデータの数、割合などを指定し、学習データ出力部122が指定に従いデータを抽出し、新たなデータセットを作成する。 In the learning data management unit 113, first, the data set management unit 121 extracts correction data for the model specified by the user interface, and displays the result. The user specifies the number and ratio of data to be included in the output data set for the displayed data set, and the learning data output unit 122 extracts data according to the specification and creates a new data set.

図５にユーザインタフェース501の一例を示す。このようなGUI(Graphical User Interface）は例えば画像出力装置に表示可能である。プルダウンメニュー502では、モデル管理テーブル134に登録されているモデルがリストとして表示され、利用者が一つのモデルを選択する。 An example of the user interface 501 is shown in FIG. Such a GUI (Graphical User Interface) can be displayed on an image output device, for example. A pull-down menu 502 displays a list of models registered in the model management table 134, and the user selects one model.

データセット管理部121は、選択されたモデルに応じて、学習データ情報表示部503、修正データ情報表示部504、その他の入力データ情報エリア505にデータを表示する。 Data set management unit 121 displays data in learning data information display unit 503, correction data information display unit 504, and other input data information area 505 according to the selected model.

学習データ情報表示部503は、プルダウンメニュー502で選択されたモデルの学習データに関する情報を表示する。修正学習データ情報506は学習データのうち、ラベルに修正が発生しているデータ、その他の学習データ情報507は学習データのうち、ラベルに修正が発生していないデータを表す。 A learning data information display section 503 displays information about the learning data of the model selected in the pull-down menu 502 . Corrected learning data information 506 represents learning data whose labels have been revised, and other learning data information 507 represents learning data whose labels have not been revised.

修正データ情報表示部504は、プルダウンメニュー502にて選択されたモデル、および、そのモデルの学習の元になったモデルや、そのモデルを元にして作られたモデル群に対して、そのモデルが推論したデータのうち、新たにラベルが修正されたデータを表示する。修正データ情報508は、学習の元になったモデル毎にデータを抽出する。そのため、修正データ情報508が表示するモデルの数は、モデルの学習履歴に応じて増える。 The corrected data information display section 504 displays information about the model selected in the pull-down menu 502, the model from which the model was learned, and the model group created based on the model. Among the inferred data, the newly labeled data is displayed. The corrected data information 508 extracts data for each model that is the basis of learning. Therefore, the number of models displayed by the corrected data information 508 increases according to the learning history of the models.

その他の入力データ情報エリア505では、プルダウンメニュー502にて選択されたモデルにて推論されたデータのうち、修正データ情報508に含まれていないデータの情報を新規データ情報509として表示する。 In the other input data information area 505 , information on data not included in the corrected data information 508 among the data inferred by the model selected in the pull-down menu 502 is displayed as new data information 509 .

図５の例では、モデル１を学習した学習データ50個についてはモデル１の推定結果は修正されておらず、新規データのモデル１による推定結果のうち２個が修正され、他の300個は修正されなかったことを示している。 In the example of FIG. 5, the estimation results of model 1 are not corrected for 50 learning data for which model 1 has been trained, two of the estimation results by model 1 of new data are corrected, and the other 300 are It shows that it has not been fixed.

図６は、データを抽出し、図５への表示情報を生成するデータセット管理部121のフローチャートの一例である。開始(S601)は任意のタイミングで可能である。まず、プルダウンメニュー502によりモデルが選択される(S602)。 FIG. 6 is an example of a flow chart of the dataset manager 121 for extracting data and generating display information for FIG. Start (S601) can be done at any timing. First, a model is selected from the pull-down menu 502 (S602).

次に、学習データ情報表示部503に表示するデータを抽出して表示する(S603)。この処理では、データセット管理部121は、選択されたモデルの学習データセットからラベルの修正があるデータを抽出し記録する。また、ラベル修正のないデータを抽出し記録する。それぞれの数を修正学習データ情報506として表示し、その他の学習データ情報507として表示する。 Next, the data to be displayed in the learning data information display section 503 is extracted and displayed (S603). In this process, the data set management unit 121 extracts and records data with corrected labels from the learning data set of the selected model. In addition, data without label correction is extracted and recorded. Each number is displayed as corrected learning data information 506 and displayed as other learning data information 507 .

具体的には、データセット管理部121は、選択されたモデルのモデルID301を用いてモデル管理テーブル134を検索し、選択されたモデルの学習に用いられた学習データセットID302を抽出する。そして、学習データセットID302を用いて、データセット管理テーブル132を検索し、選択されたモデルに対するデータセットを抽出する。 Specifically, data set management unit 121 searches model management table 134 using model ID 301 of the selected model, and extracts learning data set ID 302 used for learning of the selected model. Then, using the learning data set ID 302, the data set management table 132 is searched to extract the data set for the selected model.

次に、データセットに含まれるデータに対して、推論結果管理テーブル133の修正結果404欄が登録されているデータを修正学習データとして抽出し、その数を修正学習データ情報506に表示する。また、データセットに含まれるその他のデータを抽出し、その他の学習データ情報507にその数を表示する。 Next, the data registered in the correction result 404 column of the inference result management table 133 are extracted as corrected learning data from the data included in the data set, and the number thereof is displayed in the corrected learning data information 506 . In addition, other data included in the data set are extracted, and the number thereof is displayed in the other learning data information 507.

修正学習データ情報506は、過去１度でも修正があったデータはもれなく抽出される。1つのデータが過去複数回修正されている場合は、まとめて1カウントとする。 The corrected learning data information 506 extracts all data that has been corrected even once in the past. If one data has been revised multiple times in the past, it will be counted as one.

再学習では、新たなデータセットで学習して、実際に評価を行い、評価が悪ければ別のデータセットで再度学習を行うサイクルを繰り返すことになる。その場合に、新規データに対応することが重要であるが、過去の学習を忘れないという点も重要である。修正学習データ情報506は、評価の結果、過去の学習を忘れているようなら、この部分のデータを増やして新たな学習データセットを作成するためのものになる。これにより、より効率的に、過去の学習で取り込んだデータを、学習データに含めて、あらたな学習データセットを作ることが出来るようになる。 Re-learning repeats the cycle of learning with a new data set, actually performing evaluation, and learning again with another data set if the evaluation is poor. In that case, it is important to deal with new data, but it is also important not to forget past learning. The corrected learning data information 506 is for creating a new learning data set by increasing the data of this portion if past learning is forgotten as a result of the evaluation. As a result, it becomes possible to more efficiently create a new learning data set by including the data acquired in the past learning in the learning data.

次に、修正データ情報表示部504に表示する情報を生成する(S604,S605,S606)。S602にて選択されたモデルと、モデル管理テーブル134の次モデルID304欄を元に、S602で選択されたモデルの元になったモデル、および、そのモデルを元にしたモデルを逐次抽出し、新しいものから順に並べたモデルの依存関係リストを作成する。また、選択されたモデルの作成時間305を基準時間に設定する(S604）。 Next, information to be displayed in the corrected data information display section 504 is generated (S604, S605, S606). Based on the model selected in S602 and the next model ID 304 column of the model management table 134, the model that was the basis of the model selected in S602 and the model based on that model are sequentially extracted, and new Creates a list of model dependencies, ordered first. Also, the creation time 305 of the selected model is set as the reference time (S604).

依存関係リストのそれぞれのモデルに対して、そのモデルで推論が実施され、基準時間以降にラベルが修正されたデータを抽出し、その数を修正データ情報508に表示する(S605-S606）。 For each model in the dependency list, inference is performed in that model, data whose label has been modified after the reference time is extracted, and the number thereof is displayed in the modified data information 508 (S605-S606).

修正データ情報508には、S602にて選択されたモデルの作成時（基準時間）以降に、関連する前の世代あるいは後の世代を含め、ラベルが修正されたデータを抽出する。これらのデータは、モデル作成後の状況変化でモデルに適合しなくなった新しい入力データを含む場合がある。修正されたデータがそのモデルを学習した学習データである場合、それらのデータも含む。ただし、修正学習データ情報506と重複する場合は除外してもよい。 The corrected data information 508 extracts data whose labels have been corrected, including related previous generations or subsequent generations, after the creation of the model selected in S602 (reference time). These data may include new input data that no longer fits the model due to changes in circumstances since the model was created. If the modified data is the training data with which the model was trained, then those data are also included. However, if it overlaps with the corrected learning data information 506, it may be excluded.

最後に、その他の入力データ情報エリア505に表示する情報を抽出する。基準時間を元に、推論結果管理テーブル133から基準時間以降で、修正結果404が登録されていないデータを抽出して、その数を新規データ情報509として登録する(S607）。 Finally, the information to be displayed in the other input data information area 505 is extracted. Based on the reference time, the data for which the correction result 404 is not registered after the reference time is extracted from the inference result management table 133, and the number thereof is registered as the new data information 509 (S607).

図７は、S606の詳細を示すフローチャートの一例である。S606の処理は、依存関係リストに含まれる各世代のモデルについて実行される。まず、処理対象のモデル（例えば最新のモデル）について、モデル管理テーブル134の修正データセットID303欄をクリアする(S702)。 FIG. 7 is an example of a flowchart showing details of S606. The processing of S606 is executed for each generation model included in the dependency list. First, for the model to be processed (for example, the latest model), the corrected data set ID 303 column of the model management table 134 is cleared (S702).

クリアした修正データセットID303欄に、新たなIDを割り当てて新規にデータセットを作成し、そのデータセットのIDをモデル管理テーブル134に登録する。クリアされたデータセットIDを持つデータセットは、クリアしてもそのまま残してもよい。実際に適用するときは、過去の記録を保持する目的で、データセットは一定期間消さない運用とするのが望ましい。 A new ID is assigned to the cleared correction data set ID 303 column to create a new data set, and the ID of the data set is registered in the model management table 134 . A dataset with a cleared dataset ID may be cleared or left alone. When actually applied, it is desirable to keep the data set for a certain period of time for the purpose of preserving past records.

次に、すべての推論結果管理テーブル133のデータに対して(S703)、修正結果404欄を調べ、ラベルの修正が行われ(S704)、かつ、その修正時間が基準時刻より後であるかを調べる(S705)。もし、条件を満たす場合、モデル管理テーブル134の修正データセットID303欄で示されるデータセットのデータリスト202に、現在のデータのIDを追加する。 Next, for all the data in the inference result management table 133 (S703), the correction result 404 column is examined to check whether the label has been corrected (S704) and whether the correction time is later than the reference time. Examine (S705). If the condition is satisfied, the ID of the current data is added to the data list 202 of the data set indicated by the correction data set ID 303 column of the model management table 134. FIG.

例えば、S602で図３の行306に示すモデル１が選択された場合、学習データ情報表示部503では、修正学習データ情報506は0、その他のデータ情報507は、最初に登録した学習データ(1～50)が抽出される（図５参照）。 For example, when model 1 shown in row 306 of FIG. 3 is selected in S602, in the learning data information display unit 503, the corrected learning data information 506 is 0, and the other data information 507 is the first registered learning data (1 50) are extracted (see Fig. 5).

また、修正データ情報508は、図２の行205より、53, 54の2つのデータが抽出される。そのため、図２の行205に示すデータセットが作成され、そのIDである2が、図３の修正データセットID欄303に記録される。 As for the correction data information 508, two data 53 and 54 are extracted from the row 205 in FIG. Therefore, the data set shown in row 205 in FIG. 2 is created, and its ID 2 is recorded in the modified data set ID column 303 in FIG.

また、モデル１を元にするモデル、あるいは、モデル１を元としたモデルがないので、モデル１に対する情報のみが表示される。最後に、その他の入力データ情報605では、新規データ情報509として、図４の行406、行407等で示されるデータが抽出される（図５参照）。 Also, since there is no model based on model 1 or a model based on model 1, only information about model 1 is displayed. Finally, in the other input data information 605, data shown in rows 406, 407, etc. in FIG. 4 are extracted as new data information 509 (see FIG. 5).

なお、上記では、推論結果管理テーブル133に登録されているすべてのデータを抽出の対象としているが、推論システム101の適用環境によっては、すべてのデータを利用者がチェックすることは効率的に不可能な場合も考えられる。その場合、推論結果管理テーブル133に利用者による確認済みを示すフラグを用意することで対応可能である。利用者がダブルチェックなどを行ってラベルを確認した場合、確認済みフラグを設定する。データの抽出処理(図７)では、確認済みフラグが設定されているデータのみを抽出の対象とする。 In the above description, all data registered in the inference result management table 133 are to be extracted. It may be possible. In that case, the inference result management table 133 can be dealt with by preparing a flag indicating confirmation by the user. If the user confirms the label by double-checking, etc., the confirmed flag is set. In the data extraction process (FIG. 7), only data for which the confirmed flag is set is extracted.

また、図６に示すフローチャートでは、S604にて作成される依存関係をリストとして表現しているが、あるモデルから複数のモデルが作成されることもある。このような場合は、モデル管理テーブル134の次モデルIDに複数のモデルのIDを記録し、依存するすべてのモデルをたどることにより対応可能である。 Also, in the flowchart shown in FIG. 6, the dependencies created in S604 are represented as a list, but a plurality of models may be created from a certain model. Such a case can be dealt with by recording IDs of a plurality of models in the next model ID of the model management table 134 and tracing all dependent models.

次に利用者は、データセット管理部121により抽出されたデータを用いて、次の学習データに用いる新たなデータセットを作成する。図５の画面511がデータセット出力画面の例である。利用者は、抽出された学習データ情報表示部503、修正データ情報表示部504を元に、それぞれのデータ群のうち、新たなデータセットに含めるデータの数、および、最終的に作成したデータの個数を指定する(512)。新規作成ボタン513が押されると、指定されたデータと不足データを新規データ509で補ったデータセットを新たに作成し、データセット管理テーブル201に登録する。 Next, the user uses the data extracted by the data set management unit 121 to create a new data set to be used for the next learning data. Screen 511 in FIG. 5 is an example of the data set output screen. Based on the extracted learning data information display section 503 and corrected data information display section 504, the user can select the number of data to be included in the new data set and the number of data finally created from each data group. Specify the number (512). When a new creation button 513 is pressed, a new data set is created by supplementing specified data and missing data with new data 509 and registered in the data set management table 201 .

図８は学習データ出力部122の処理のフローチャートの一例である。まず、S603で抽出したデータ（推論結果に修正があった学習データ）から指定された数のデータを出力データセットに追加する(S802)。この時、データ数に対して指定された数が少ない場合は、データの中からランダムに選んだものを選択する。これらのデータは、学習データとして使用して、デグレードに効果が期待できる。 FIG. 8 is an example of a flow chart of processing of the learning data output unit 122 . First, a specified number of data is added to the output data set from the data extracted in S603 (learning data with modified inference results) (S802). At this time, if the number specified for the number of data is small, a random selection is made from the data. These data are used as learning data, and are expected to be effective in reducing degradation.

次に、S604で作成した依存関係リストのモデルに対して、S606にて抽出、記録したデータ（基準時以降に修正があったデータ）に対して、指定された個数のデータを出力データセットに追加する(S805)。個数の指定はモデル毎に行ってもよい。これらのデータは、新しい入力データに対応させる学習に役立つことが期待できる。 Next, for the dependency list model created in S604, for the data extracted and recorded in S606 (data modified after the reference time), the specified number of data is output as an output data set. Add (S805). The number may be specified for each model. These data can be expected to be useful for learning corresponding to new input data.

最後に、指定された合計数を、出力データセットのデータ数が満たさない場合は、不足分をS607で記録されたデータから抽出する(S807)。 Finally, if the number of data in the output data set does not satisfy the specified total number, the shortage is extracted from the data recorded in S607 (S807).

以上のように抽出して出力したデータを、次の再学習に用いる学習データとすることができる。本実施例では、デグレードに対処できるとともに、新しい入力データに対応する学習が可能な学習データを生成することができる。また、データ数の指定を調整することで、学習データの特性も調整が可能になっている。 The data extracted and output as described above can be used as learning data to be used for the next re-learning. In the present embodiment, it is possible to generate learning data capable of coping with degradation and learning corresponding to new input data. Also, by adjusting the number of data points, it is possible to adjust the characteristics of the learning data.

なお、S802、S805、S807において、ランダムでデータを選択する方法に変えて、新しいデータを選択したり、優先度の高いものを抽出したりしてもよい。優先度が高いものとは、例えば、ラベルの誤りが判明しづらかったものであり、例えば、より後工程で誤りが判明したものや、熟練者でしか誤りが見つけられなかったものである。これらは、例えば、推論結果管理テーブル133に、工程やラベルの修正者に関する情報を追加し、それらに優先度付けを行うことにより実現できる。 In S802, S805, and S807, the method of selecting data at random may be changed to select new data or extract data with high priority. Higher priority means, for example, labels whose label errors were difficult to find, for example, labels whose errors were found later in the process or whose errors could only be found by skilled workers. These can be realized, for example, by adding information about process and label correctors to the inference result management table 133 and prioritizing them.

次に、利用者は、学習データ管理部113で作成したデータセットと、モデル管理テーブル134にて管理されるモデルを用いて、再学習部114にてモデルの再学習を実施する。再学習にてモデルの認識精度など評価基準を満たすモデルができた場合、新たに作成されたモデルとデータセットを、モデル管理テーブル134に登録する。また、学習元モデルの次モデルID304欄に、新たに登録したモデルのIDを記録する。これにより、モデルの学習履歴を管理するとともに、モデルと学習データの対応付けが可能となる。 Next, the user re-learns the model in re-learning section 114 using the data set created in learning data management section 113 and the model managed in model management table 134 . When a model that satisfies evaluation criteria such as model recognition accuracy is created by re-learning, the newly created model and data set are registered in the model management table 134 . Also, the ID of the newly registered model is recorded in the next model ID 304 field of the learning source model. This makes it possible to manage the learning history of the model and associate the model with learning data.

なお、再学習で評価基準を満たすモデルができない場合がある。その場合、新たなデータの組み合わせから新たなデータセットを作成し再学習を繰り返す。もしも、新たに作成されたデータセットにて評価基準が大幅に下がった場合、新たに採用したデータが悪影響を及ぼしていることが考えられる。そのため、以降の学習データ管理部113の処理では、これらのデータを除外する必要がある。これらは、推論結果管理テーブル133に新たなにデータ抽出時に除外するためのフラグを設けることで対応する。 In some cases, re-learning may not produce a model that satisfies the evaluation criteria. In that case, a new data set is created from a new combination of data and re-learning is repeated. If the newly created dataset drops the evaluation criteria significantly, it is likely that the newly adopted data is having a negative impact. Therefore, it is necessary to exclude these data in subsequent processing of the learning data management unit 113 . These are dealt with by providing the inference result management table 133 with a new flag for exclusion at the time of data extraction.

学習データ管理部113が表示する画面に対して、除外情報を設定するボタンを用意する。ボタンが押された場合、選択されているモデルの学習データに対して、作成されたデータセットに新たに採用されているデータに対して除外フラグを設定する。データセット管理部121では、データ抽出時に、除外フラグが設定されいるデータは抽出対象から除外する。 A button for setting exclusion information is prepared for the screen displayed by learning data management unit 113 . When the button is pressed, exclusion flags are set for data newly adopted in the created data set for the learning data of the selected model. Data set management unit 121 excludes data for which an exclusion flag is set from the extraction target at the time of data extraction.

図９Ａ～Ｃ、図１０Ａ～Ｃを用いて具体例を説明する。上記処理により推論、ラベル修正、データセット抽出、再学習のサイクルが繰り返された場合の動作例を、データセット管理テーブル132(図９Ａ)、モデル管理テーブル134(図９Ｂ)、推論結果管理テーブル133(図９Ｃ)、並びに、図５の画面に表示する情報を示す図１０Ａ～Ｃを用いて説明する。 A specific example will be described with reference to FIGS. 9A to 9C and FIGS. 10A to 10C. An operation example when the cycle of inference, label correction, data set extraction, and re-learning is repeated by the above processing is shown in the data set management table 132 (FIG. 9A), the model management table 134 (FIG. 9B), and the inference result management table 133. (FIG. 9C) and FIGS. 10A to 10C showing information displayed on the screen in FIG.

本例では、モデル１を元に、再学習によりモデル2が生成され、さらにモデル2からモデル3が作られる場合を示している(図９Ｂ)。なお、図９Ｂ、図９Ｃにおける作成時間欄の値は、数値が小さい順に時間が経過しているものとする。 In this example, model 2 is generated by re-learning based on model 1, and model 3 is generated from model 2 (FIG. 9B). It should be noted that the values in the creation time columns in FIGS. 9B and 9C assume that time has elapsed in ascending order of numerical value.

まず、最初に、図９Ｂのモデル管理テーブル134に初期モデルとしてモデル１と、その学習データであるデータセット1が登録される(911)。図９Ａのデータセット管理テーブル132において、データセット1はデータ1から100で構成されている(901)。 First, a model 1 as an initial model and a data set 1 as its learning data are registered in the model management table 134 of FIG. 9B (911). In the data set management table 132 of FIG. 9A, data set 1 consists of data 1 to 100 (901).

次に、推論部110では、モデル１を用いた推論が行われ、データ101から200が入力され、その後、推論結果管理部112により、データ102のラベルが修正されたとする（図９Ｃの921）。 Next, assume that the inference unit 110 makes an inference using the model 1, inputs the data 101 to 200, and then corrects the label of the data 102 by the inference result management unit 112 (921 in FIG. 9C). .

ここで、新たなモデル2の作成のために、学習データ管理部113の処理を実行すると、データセット管理部121により図１０Ａに示すデータが抽出される。また、本例では、学習データ出力部122によるデータセットの生成では、生成データ数を100とし、修正データが必ず用いられ、また、新規データが常に10個含まれるように作成されるものとする。これにより、修正データ102を含む10個の新規データ101～111を含む図９Ａの行902に示すデータが抽出される。このデータセットで学習したモデルがモデル2として採用されると、図９Ｂのモデル管理テーブル134の行912が登録される。 Here, when the processing of the learning data management unit 113 is executed to create a new model 2, the data set management unit 121 extracts the data shown in FIG. 10A. Also, in this example, in the generation of the data set by the learning data output unit 122, the number of generated data is set to 100, corrected data is always used, and 10 new data are always included. . This extracts the data shown in row 902 of FIG. When the model learned with this data set is adopted as model 2, row 912 of model management table 134 in FIG. 9B is registered.

次に、モデル１の場合と同様に、推論部110ではモデル2が用いられ、データ201から300が入力され、その後、推論結果管理部112よりデータ103、203のラベルが修正されたものとする（図９Ｃの922,923）。この場合、データセット管理部121では、図１０Ｂに示すデータが抽出される。次に、学習データ出力部122で、図９Ａの行903で示すデータセット3が作成され、そのデータセットで学習したモデル3が登録される。 Next, as in the case of model 1, model 2 is used in inference unit 110, data 201 to 300 are input, and then labels of data 103 and 203 are corrected by inference result management unit 112. (922, 923 in FIG. 9C). In this case, the data set management unit 121 extracts the data shown in FIG. 10B. Next, the learning data output unit 122 creates the data set 3 indicated by row 903 in FIG. 9A, and registers the model 3 learned with that data set.

なお、データ103の修正は、過去のモデルを使用していた時の結果の誤りを、後で発見して修正した場合を想定している。図９Ｃでは、前の世代であるモデルID1のモデルの推論結果を、モデルID2のモデルの稼働後の時刻である時刻103の時点で修正している。具体例としては、工場の検査などで、製造現場では誤りを発見できず出荷してしまい、その後、あらたなロットではモデルを更新して検査していたが、後になってから、出荷先から以前のロットでの不良品の連絡があった場合等である。 The correction of the data 103 assumes that an error in the result when using the past model is discovered later and corrected. In FIG. 9C, the inference result of the model with model ID1, which is the previous generation, is corrected at time 103, which is the time after the model with model ID2 is activated. As a specific example, in a factory inspection, the error was not found at the manufacturing site and shipped, and then a new lot was updated and inspected. This is the case, for example, when there is a report of a defective product in one lot.

次に、モデル１、モデル2の場合と同様に、推論部110でモデル3が用いられ、データ301から400が入力され、その後、推論結果管理部112により、データ104、202、203、302、303が修正されたものとする。この場合、データセット管理部121では、図１０Ｃに示すデータが抽出される。 Next, in the same way as the models 1 and 2, the inference unit 110 uses the model 3, inputs the data 301 to 400, and then the inference result management unit 112 processes the data 104, 202, 203, 302, 303 shall be modified. In this case, the data set management unit 121 extracts the data shown in FIG. 10C.

図１１にこの時点の学習データ管理部113が表示する画面の例を示す。ここでは、モデル3は、学習データのうち3個が修正され、この3個は次の学習データに採用される。また、学習データのうち97個は修正されなかったが、このうち83個が次の学習データに採用される。また、基準時以降、モデル3、モデル2、モデル１の各世代のモデルは、それぞれ2個,1個,1個のデータが修正され、これらは次の学習データに採用される。本実施例では、表示部1108,1109,1110に表示される修正されたデータの数は、学習データ、テストデータ、および運用データを区別していないが、分けて表示することもできる。 FIG. 11 shows an example of a screen displayed by the learning data management unit 113 at this time. Here, in model 3, 3 pieces of learning data are corrected, and these 3 pieces are adopted as the next learning data. Also, 97 of the training data were not corrected, but 83 of them are adopted as the next training data. After the reference time, each generation model of model 3, model 2, and model 1 has 2, 1, and 1 data corrected, respectively, and these are adopted as the next learning data. In this embodiment, the numbers of corrected data displayed on display units 1108, 1109, and 1110 do not distinguish between learning data, test data, and operational data, but they can be displayed separately.

学習データ管理部113が表示する画面に表示される各データの個数のうち、それぞれいくつを次の学習データに採用するかは、ユーザが枠511内の数値を調整することで任意に設定ができる。 The user can arbitrarily set how many of each data displayed on the screen displayed by the learning data management unit 113 are adopted as the next learning data by adjusting the numerical values in the frame 511. .

その他の学習データ507には、過去の世代で学習して、そのモデルでは正しく認識できているデータが含まれている。例えば、次の学習データをその他の学習データ507のデータを減らして作成し、もし、それにより学習したモデルがデグレードしているようなら、減らしたデータが原因でデグレードしていることが推定される。そのため、その他の学習データ507を用いればデグレードへの対応が可能となる。このようにして、過去の世代のモデルの学習で有用だったデータを、再学習に反映することができる。 Other learning data 507 includes data that has been learned in past generations and correctly recognized by the model. For example, if the following learning data is created by reducing the data of the other learning data 507, and if the learned model is degraded, it is presumed that the reduced data is the cause of the degradation. . Therefore, by using the other learning data 507, it is possible to cope with the deterioration. In this way, data that was useful in training models of past generations can be reflected in retraining.

以上のように、本実施例では、推論で使用しているモデルに対して、その学習データに含まれているラベルが修正されているデータ、そのモデルに関連するモデルで推論された、ラベルが修正されているデータの一覧を抽出することができるようになり、また、それらのデータを組み合わせたデータセットを容易に作成することで、利用者によるデータ管理の効率を向上させることができるようになる。 As described above, in this embodiment, for the model used in inference, the data in which the label contained in the learning data is corrected, the label inferred by the model related to the model It is now possible to extract a list of modified data, and by easily creating a dataset that combines these data, the efficiency of data management by the user can be improved. Become.

図１２は本実施例にて説明した推論システム101が実行される計算機システムの一例である。計算機システムは、演算装置1201、記憶装置1202、メモリ装置1203、通信装置1204、入出力装置1205により構成され、それらがバス1206を介して接続されている。推論システム101を構成するプログラムは、記憶装置1202に保持され、演算装置1201により実行される。 FIG. 12 shows an example of a computer system in which the inference system 101 described in this embodiment is executed. The computer system comprises an arithmetic device 1201 , a storage device 1202 , a memory device 1203 , a communication device 1204 and an input/output device 1205 , which are connected via a bus 1206 . A program that configures the inference system 101 is held in the storage device 1202 and executed by the arithmetic device 1201 .

また、本実施例では、画像データに対するラベルによる分類を例に説明しているが、画像データに限定されるものではなく、音声データやテキストデータ、数値データなどに対してクラスわけを行う推論処理に対しても適用できる。 In addition, in the present embodiment, classification by labels for image data is explained as an example, but the inference processing is not limited to image data, and classifies voice data, text data, numerical data, etc. can also be applied to

以上の実施例では、モデルの学習履歴と、推論結果のデータに対するラベルの修正履歴を管理する。これらを用いて、再学習の元となるモデルや、そのモデルの元になった世代のモデルを用いて行われた推論結果に対して、ラベルが修正されたデータと、学習データに含まれるラベルが修正されたデータを抽出する。 In the above embodiments, the learning history of the model and the revision history of the label for the inference result data are managed. Using these, the data with corrected labels and the labels included in the training data for the inference results performed using the original model for re-learning and the generation model that was the basis of that model. extract the corrected data.

以上により、モデルの精度向上に重要となるラベルが修正されたデータを、モデルの履歴に基づいて抽出することが可能となり、学習データの管理効率を向上することが可能となる。 As described above, it is possible to extract data whose label is corrected, which is important for improving the accuracy of the model, based on the history of the model, and it is possible to improve the management efficiency of the learning data.

上記実施例によれば、効率の良い機械学習が実現可能となるため、消費エネルギーが少なく、炭素排出量を減らし、地球温暖化を防止、持続可能な社会の実現に寄与することができる。 According to the above embodiment, efficient machine learning can be realized, so that energy consumption can be reduced, carbon emissions can be reduced, global warming can be prevented, and contribution can be made to the realization of a sustainable society.

推論システム101、システム利用者端末102、ネットワーク103、推論部110、データ入力部111、推論結果管理部112、学習データ管理部113、再学習部114、入出力部115、入力データ群131、データセット管理テーブル132、推論結果管理テーブル133、モデル群135、モデル管理テーブル134 Inference system 101, system user terminal 102, network 103, inference unit 110, data input unit 111, inference result management unit 112, learning data management unit 113, relearning unit 114, input/output unit 115, input data group 131, data Set management table 132, inference result management table 133, model group 135, model management table 134

Claims

A first step of recording as inference result management information whether or not the inference result by the first machine learning model learned with the first learning data has been corrected;
A learning data management method that performs

a second step of obtaining, as a first corrected data list, data obtained by correcting the inference result of the first machine learning model based on the inference result management information;
a third step of causing the second learning data to include at least a portion included in the first modified data list when learning the first machine learning model with the second learning data;
The learning data management method according to claim 1, wherein

In the third step,
At least part of the first correction data list to be included in the second learning data is at least part of the first learning data,
The learning data management method according to claim 2.

When re-learning the previous generation machine learning model to obtain the first machine learning model,
a fourth step of recording as the inference result management information whether or not the inference result by the previous generation machine learning model has been corrected;
a fifth step of obtaining, as a second corrected data list, data obtained by correcting the inference result of the previous generation machine learning model based on the inference result management information;
a sixth step of causing the second learning data to include at least a portion included in the second modified data list when learning the first machine learning model with the second learning data;
3. The learning data management method according to claim 2, wherein

In the case of obtaining the previous generation machine learning model by relearning the previous generation machine learning model,
a seventh step of recording, as the inference result management information, whether or not the inference result by the second generation machine learning model has been corrected;
an eighth step of obtaining, as a third corrected data list, data obtained by correcting the inference result of the second-previous generation machine learning model based on the inference result management information;
a ninth step of causing the second learning data to include at least a portion included in the third modified data list when learning the first machine learning model with the second learning data;
5. The learning data management method according to claim 4, wherein

Creating model management information that records the relationship between generations of the second generation machine learning model, the previous generation machine learning model, and the first machine learning model;
The learning data management method according to claim 5.

The model management information includes information specifying learning data used for learning each machine learning model and information about the creation time of each machine learning model.
The learning data management method according to claim 6.

When causing the second learning data to include at least a portion included in the first correction data list and at least a portion included in the second correction data list,
The number of data to be selected from the first correction data list and the number of data to be selected from the second correction data list can be arbitrarily specified,
The learning data management method according to claim 4.

The inference result management information includes information specifying input data to be inferred, information specifying a machine learning model used for inference of the input data, an inference result by the machine learning model, and corrections to the inference result. including results and correction time information,
The learning data management method according to claim 1.

Equipped with an inference result management unit, a data set management unit, and a learning data output unit,
The inference result management unit
recording as inference result management information whether or not the inference result by the first machine learning model learned with the first learning data has been corrected;
The data set management unit
based on the inference result management information, extracting data obtained by correcting the inference result of the first machine learning model as a first corrected data list;
The learning data output unit
outputting data including at least one of the data included in the first correction data list as learning data;
Learning data generator.

The inference result management unit
recording as inference result management information whether or not the inference result by the machine learning model of the generation prior to the first machine learning model has been corrected;
The data set management unit
based on the inference result management information, extract data in which the inference result of the previous generation machine learning model has been corrected as a second corrected data list;
The learning data output unit
outputting data including at least one of the data included in the second correction data list as learning data;
11. The learning data generation device according to claim 10.

The learning data output unit
at least one of the first learning data included in the first modified data list and the second learning data obtained by learning the previous generation machine learning model included in the second modified data list; , output as the learning data,
12. The learning data generation device according to claim 11.

Equipped with a data set management section,
The data set management unit
displaying the number of data included in the first correction data list and the number of data included in the second correction data list;
a GUI that enables designation of the number of data included in the first correction data list and the number of data included in the second correction data list to be included in the learning data output by the learning data output unit. indicate,
13. The learning data generation device according to claim 12.

The data set management unit
Further displaying the number of the first learning data, and displaying a GUI that enables specification of the number of the first learning data to be included in the learning data output by the learning data output unit.
14. The learning data generation device according to claim 13.

Learning a machine learning model using the learning data output from the learning data output unit of the learning data generation device according to claim 10,
machine learning method.