JP7037180B2

JP7037180B2 - Learning data discriminator and learning data discriminator

Info

Publication number: JP7037180B2
Application number: JP2018100274A
Authority: JP
Inventors: 亮清水
Original assignee: ギリア株式会社
Priority date: 2018-05-25
Filing date: 2018-05-25
Publication date: 2022-03-16
Anticipated expiration: 2038-05-25
Also published as: JP2019204392A

Description

本発明は、学習用データ判別装置および学習用データ判別プログラムに関し、特に、学習用として用意したデータセットがどの学習アルゴリズムに適しているかを判別する技術に関するものである。 The present invention relates to a learning data discriminating device and a learning data discriminating program, and more particularly to a technique for discriminating which learning algorithm is suitable for a data set prepared for learning.

近年、人工知能（ＡＩ）の一分野である機械学習への期待が高まっている。機械学習は、多数のデータからモデルを作成する処理のことをいい、学習に使用するデータの性質によって教師あり学習と教師なし学習とに大別される。教師あり学習と教師なし学習との中間的な学習法として、半教師あり学習や強化学習なども存在する。 In recent years, expectations for machine learning, which is a field of artificial intelligence (AI), are increasing. Machine learning refers to the process of creating a model from a large number of data, and is roughly divided into supervised learning and unsupervised learning depending on the nature of the data used for learning. Semi-supervised learning and reinforcement learning are also available as intermediate learning methods between supervised learning and unsupervised learning.

機械学習は、様々な製品やサービスへの応用が急速に進められており、用途に応じて適切なアルゴリズムが適用されている。例えば、特許文献１には、学習アルゴリズムを複数種類用意した学習アルゴリズム群を設け、実際の学習に用いるアルゴリズムを、解決する問題に合わせて選択することが開示されている。 Machine learning is rapidly being applied to various products and services, and appropriate algorithms are applied according to the application. For example, Patent Document 1 discloses that a learning algorithm group in which a plurality of types of learning algorithms are prepared is provided, and an algorithm used for actual learning is selected according to a problem to be solved.

いずれの学習アルゴリズムを用いるにしても、モデルを構築するための学習に使用するデータの収集が必要である。そして、用途に応じて適切な学習アルゴリズムがあるのと同様に、学習アルゴリズムに応じてそのモデルを構築するのに適切なデータがある。従来は、収集したデータセットをどのアルゴリズムの学習に使用することができるのかを開発者が手動で判別していた。このため、各種アルゴリズムの機械学習に習熟した開発者でなければ、データセットを学習に適切に使うことができないという問題があった。 Whichever training algorithm is used, it is necessary to collect the data used for training to build the model. And just as there is an appropriate learning algorithm depending on the application, there is appropriate data to build the model according to the learning algorithm. In the past, developers had to manually determine which algorithm the collected dataset could be used to train. Therefore, there is a problem that the data set cannot be used properly for learning unless the developer is familiar with machine learning of various algorithms.

特開平５－２９８２７７号公報Japanese Unexamined Patent Publication No. 5-298277

本発明は、このような問題を解決するために成されたものであり、各種アルゴリズムの機械学習に習熟した開発者でなくても、データセットがどの学習アルゴリズムに適しているのかの判別を適切に行うことができるようにすることを目的とする。 The present invention has been made to solve such a problem, and it is appropriate to determine which learning algorithm the data set is suitable for, even if the developer is not proficient in machine learning of various algorithms. The purpose is to be able to do it.

上記した課題を解決するために、本発明では、学習用に提供されたデータセットの保存形式としてフォルダの有無を判定し、その判定の結果に応じて、データセットが適している学習アルゴリズムを提示するようにしている。 In order to solve the above-mentioned problems, in the present invention, the existence or nonexistence of a folder is determined as a storage format of the data set provided for learning, and a learning algorithm suitable for the data set is presented according to the result of the determination. I try to do it.

上記のように構成した本発明によれば、学習用に提供されたデータセットがフォルダに保存されているか否かに応じて、そのデータセットが適している学習アルゴリズムが提示されるので、収集したデータセットをどのアルゴリズムの学習に使用することができるのかを開発者が手動で判別する必要がなくなる。これにより、各種アルゴリズムの機械学習に習熟した開発者でなくても、データセットがどの学習アルゴリズムに適しているのかの判別を適切に行うことができるようになる。 According to the present invention configured as described above, a learning algorithm suitable for the data set is presented depending on whether or not the data set provided for training is stored in the folder. It eliminates the need for developers to manually determine which algorithms the dataset can be used to train. As a result, even a developer who is not proficient in machine learning of various algorithms can appropriately determine which learning algorithm the data set is suitable for.

本実施形態による学習用データ判別装置を適用した通信システムの構成例を示す図である。It is a figure which shows the configuration example of the communication system which applied the learning data discrimination apparatus by this embodiment. 本実施形態による学習用データ判別装置の機能構成例を示すブロック図である。It is a block diagram which shows the functional structure example of the learning data discrimination apparatus by this embodiment.

以下、本発明の一実施形態を図面に基づいて説明する。図１は、本実施形態による学習用データ判別装置を適用した通信システムの構成例を示す図である。図１に示すように、本実施形態の通信システムは、ユーザ端末１０およびサーバ装置２０を備え、両者がインターネットや携帯電話網などの通信ネットワーク３０を介して接続可能に構成されている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a diagram showing a configuration example of a communication system to which the learning data discrimination device according to the present embodiment is applied. As shown in FIG. 1, the communication system of the present embodiment includes a user terminal 10 and a server device 20, and both are configured to be connectable via a communication network 30 such as the Internet or a mobile phone network.

サーバ装置２０は、本実施形態の学習用データ判別装置に相当するものであり、ユーザ端末１０から送信されるデータセットがどの学習アルゴリズムに適しているのかを判別する。例えば、ユーザがユーザ端末１０を操作して、複数のデータから成るデータセットをまとめて圧縮ファイルに格納し、当該圧縮ファイルをサーバ装置２０にアップロードする。サーバ装置２０は、ユーザ端末１０から送信された圧縮ファイルを解凍し、提供されたデータセットがどの学習アルゴリズムに適しているのかを判別する。 The server device 20 corresponds to the learning data discriminating device of the present embodiment, and discriminates which learning algorithm the data set transmitted from the user terminal 10 is suitable for. For example, the user operates the user terminal 10 to collectively store a data set composed of a plurality of data in a compressed file, and upload the compressed file to the server device 20. The server device 20 decompresses the compressed file transmitted from the user terminal 10 and determines which learning algorithm the provided data set is suitable for.

図２は、サーバ装置２０に実装された本実施形態による学習用データ判別装置の機能構成例を示すブロック図である。図２に示すように、本実施形態の学習用データ判別装置は、その機能構成として、データセット取得部２１、保存形式判定部２２、データサイズ判定部２３およびアルゴリズム提示部２４を備えている。また、本実施形態の学習用データ判別装置は、記憶媒体として、対応テーブル記憶部２５を備えている。 FIG. 2 is a block diagram showing a functional configuration example of the learning data discrimination device according to the present embodiment mounted on the server device 20. As shown in FIG. 2, the learning data discriminating device of the present embodiment includes a data set acquisition unit 21, a storage format determination unit 22, a data size determination unit 23, and an algorithm presentation unit 24 as its functional configuration. Further, the learning data discriminating device of the present embodiment includes a corresponding table storage unit 25 as a storage medium.

上記各機能ブロック２１～２４は、ハードウェア、ＤＳＰ（Digital Signal Processor）、ソフトウェアの何れによっても構成することが可能である。例えばソフトウェアによって構成する場合、上記各機能ブロック２１～２４は、実際にはコンピュータのＣＰＵ、ＲＡＭ、ＲＯＭなどを備えて構成され、ＲＡＭやＲＯＭ、ハードディスクまたは半導体メモリ等の記録媒体に記憶されたプログラムが動作することによって実現される。 Each of the above functional blocks 21 to 24 can be configured by any of hardware, DSP (Digital Signal Processor), and software. For example, when configured by software, each of the above functional blocks 21 to 24 is actually configured to include a computer CPU, RAM, ROM, etc., and is a program stored in a recording medium such as RAM, ROM, hard disk, or semiconductor memory. Is realized by the operation of.

データセット取得部２１は、ユーザ端末１０から学習用に提供されたデータセットを取得する。上述のように、データセットが圧縮ファイルとして送られてくる場合、データセット取得部２１は、その圧縮ファイルを解凍してデータセットを取り出す。なお、本実施形態では、データセット取得部２１が取得するデータセットは、複数の画像データの集合であるものとする。 The data set acquisition unit 21 acquires the data set provided for learning from the user terminal 10. As described above, when the data set is sent as a compressed file, the data set acquisition unit 21 decompresses the compressed file and takes out the data set. In this embodiment, the data set acquired by the data set acquisition unit 21 is assumed to be a set of a plurality of image data.

保存形式判定部２２は、データセット取得部２１により取得されたデータセットの保存形式として、フォルダの有無を判定する。すなわち、ユーザがデータセットをユーザ端末１０からサーバ装置２０にアップロードする際に、複数のデータをフォルダに入れずに送信した場合、保存形式判定部２２は、データセット取得部２１により取得されたデータセットの保存形式を「フォルダなし」と判定する。一方、ユーザがデータセットをユーザ端末１０からサーバ装置２０にアップロードする際に、複数のデータを１つまたは複数のフォルダに入れて送信した場合、保存形式判定部２２は、データセット取得部２１により取得されたデータセットの保存形式を「フォルダあり」と判定する。 The storage format determination unit 22 determines the presence or absence of a folder as the storage format of the data set acquired by the data set acquisition unit 21. That is, when the user uploads the data set from the user terminal 10 to the server device 20, when a plurality of data are transmitted without being put in the folder, the storage format determination unit 22 is the data acquired by the data set acquisition unit 21. Judge that the save format of the set is "no folder". On the other hand, when the user uploads the data set from the user terminal 10 to the server device 20, when a plurality of data are put in one or a plurality of folders and transmitted, the storage format determination unit 22 is performed by the data set acquisition unit 21. It is determined that the storage format of the acquired data set is "with folder".

データサイズ判定部２３は、保存形式判定部２２によりデータセットの保存形式が「フォルダあり」と判定された場合に、データセットを構成している複数の画像データの縦横比が２：１または１：２で統一されているか否かを判定する。ここでいう画像データの縦横比とは、矩形形状をした画像の縦方向のサイズと横方向のサイズとの比率をいう。 When the storage format determination unit 22 determines that the storage format of the data set is "there is a folder", the data size determination unit 23 has an aspect ratio of a plurality of image data constituting the data set of 2: 1 or 1. : Determine whether or not it is unified with 2. The aspect ratio of the image data here means the ratio between the vertical size and the horizontal size of the rectangular image.

アルゴリズム提示部２４は、保存形式判定部２２およびデータサイズ判定部２３による判定の結果に応じて、データセットが適している学習アルゴリズムを判別して提示する。この判別の際に、アルゴリズム提示部２４は、対応テーブル記憶部２５を参照する。対応テーブル記憶部２５は、フォルダの有無および画像データの縦横比から成る各判定要素と、当該各判定要素の内容に応じて適している学習アルゴリズムとの対応関係を示したテーブル情報を記憶している。 The algorithm presentation unit 24 determines and presents a learning algorithm suitable for the data set according to the result of the determination by the storage format determination unit 22 and the data size determination unit 23. At the time of this determination, the algorithm presentation unit 24 refers to the corresponding table storage unit 25. The correspondence table storage unit 25 stores table information showing the correspondence relationship between each determination element consisting of the presence / absence of a folder and the aspect ratio of image data and a learning algorithm suitable for the content of each determination element. There is.

アルゴリズム提示部２４は、このテーブル情報を用いて判別した好適な学習アルゴリズムを、例えばユーザ端末１０に送信してユーザに提示する。その提示方法としては、例えば、学習アルゴリズムの名称をテキスト情報としてユーザ端末１０の画面に表示するといった方法が考えられる。提示方法の別の例として、次のようにしてもよい。すなわち、アルゴリズム提示部２４は、判別した好適な学習アルゴリズムを示す情報をサーバ装置２０に保存し、ユーザ端末１０からサーバ装置２０に対するアクセスを通じて情報の提供要求が行われたときに、所定の提示画面を通じて学習アルゴリズムの情報をユーザ端末１０に提供する。 The algorithm presentation unit 24 transmits, for example, a suitable learning algorithm determined by using the table information to the user terminal 10 and presents it to the user. As the presentation method, for example, a method of displaying the name of the learning algorithm as text information on the screen of the user terminal 10 can be considered. As another example of the presentation method, the following may be used. That is, the algorithm presentation unit 24 stores information indicating the determined suitable learning algorithm in the server device 20, and when a request for providing information is made through access to the server device 20 from the user terminal 10, a predetermined presentation screen is displayed. Information on the learning algorithm is provided to the user terminal 10 through.

以下に、アルゴリズム提示部２４による具体的な処理内容を説明する。アルゴリズム提示部２４は、保存形式判定部２２によりデータセットの保存形式が「フォルダなし」と判定された場合、当該データセットが適している学習アルゴリズムとして、少なくとも教師なし学習を提示する。 The specific processing contents by the algorithm presentation unit 24 will be described below. When the storage format determination unit 22 determines that the storage format of the data set is "no folder", the algorithm presentation unit 24 presents at least unsupervised learning as a learning algorithm suitable for the data set.

一方、アルゴリズム提示部２４は、保存形式判定部２２によりデータセットの保存形式が「フォルダあり」と判定された場合、当該データセットが適している学習アルゴリズムとして、少なくとも教師あり学習を提示する。アルゴリズム提示部２４は、教師あり学習の具体例として、画像分類、ＧＡＮの少なくとも１つを提示するようにしてもよい。 On the other hand, when the storage format determination unit 22 determines that the storage format of the data set is "with a folder", the algorithm presentation unit 24 presents at least supervised learning as a learning algorithm suitable for the data set. The algorithm presentation unit 24 may present at least one of image classification and GAN as a specific example of supervised learning.

ユーザがデータセットをフォルダに入れる場合、そのフォルダには必ずフォルダ名が付けられる。通常、フォルダ名は、その中に入れられるデータセットに共通する概念を表す名称とされる。この場合、フォルダ名を画像分類の教師データとして用いることができる可能性がある。よって、アルゴリズム提示部２４は、保存形式判定部２２によりデータセットの保存形式が「フォルダあり」と判定された場合、フォルダありのデータセットが適している学習アルゴリズムの１つとして、画像分類を提示する。 When a user puts a dataset in a folder, the folder is always given a folder name. Usually, the folder name is a name that represents a concept common to the data sets contained therein. In this case, the folder name may be used as teacher data for image classification. Therefore, when the storage format determination unit 22 determines that the storage format of the data set is "with a folder", the algorithm presentation unit 24 presents image classification as one of the learning algorithms suitable for the data set with a folder. do.

また、ユーザがデータセットをフォルダに入れる場合、同じフォルダに入れる画像データは何れも同じ概念の画像を示したものとする可能性がある。この場合、同じフォルダに入っている画像データをＧＡＮの正解データとして用いることができる可能性がある。例えば、データセットが複数のフォルダに分けられている場合は、その中の何れかのフォルダを特定してＧＡＮの学習を行うことが可能である。よって、アルゴリズム提示部２４は、保存形式判定部２２によりデータセットの保存形式が「フォルダあり」と判定された場合、フォルダありのデータセットが適している学習アルゴリズムの１つとして、ＧＡＮを提示する。 Also, when a user puts a dataset in a folder, the image data put in the same folder may all indicate images of the same concept. In this case, there is a possibility that the image data in the same folder can be used as the correct answer data of GAN. For example, when the data set is divided into a plurality of folders, it is possible to specify one of the folders and perform GAN learning. Therefore, when the storage format determination unit 22 determines that the storage format of the data set is "with a folder", the algorithm presentation unit 24 presents GAN as one of the learning algorithms suitable for the data set with a folder. ..

なお、ここでは、フォルダありのデータセットが適している教師あり学習の具体例として、画像分類、ＧＡＮの少なくとも１つを提示する例について説明したが、これ以外の教師あり学習のアルゴリズムを提示するようにしてもよい。例えば、データセット取得部２１により取得されたデータセットが複数のフォルダに分けて格納されていた場合、ＧＡＮのバリエーションであるＣｙｃｌｅＧＡＮやＤｉｓｃｏＧＡＮを提示するようにしてもよい。 Here, as a specific example of supervised learning in which a data set with folders is suitable, an example of presenting at least one of image classification and GAN has been described, but other algorithms for supervised learning are presented. You may do so. For example, when the data set acquired by the data set acquisition unit 21 is stored in a plurality of folders, CycleGAN or DiscoGAN, which are variations of GAN, may be presented.

この場合は、複数のフォルダのうち何れか２つのフォルダを選択して学習を行うことが可能である。すなわち、複数のフォルダのうち何れか２つのフォルダを選択し、一方のフォルダに格納された画像データと他方のフォルダに格納された画像データとの間でスタイル変換を行うといったＣｙｃｌｅＧＡＮの学習を行うことができる可能性がある。また、複数のフォルダのうち何れか２つのフォルダを選択し、一方のフォルダに格納された画像データと他方のフォルダに格納された画像データとの間の関係（属性）を把握するといったＤｉｓｃｏＧＡＮの学習を行うことができる可能性がある。 In this case, it is possible to select any two folders from the plurality of folders for learning. That is, learning CycleGAN such as selecting any two folders from a plurality of folders and performing style conversion between the image data stored in one folder and the image data stored in the other folder. May be possible. In addition, DiscoGAN learning such as selecting any two folders from a plurality of folders and grasping the relationship (attribute) between the image data stored in one folder and the image data stored in the other folder. May be able to do.

また、アルゴリズム提示部２４は、保存形式判定部２２によりデータセットの保存形式が「フォルダあり」と判定され、かつ、データサイズ判定部２３により複数の画像データの縦横比が２：１または１：２で統一されていると判定された場合、当該データセットが適している学習のアルゴリズムとして、ｐｉｘ２ｐｉｘを更に提示するようにしてもよい。 Further, in the algorithm presentation unit 24, the storage format determination unit 22 determines that the data set storage format is “with a folder”, and the data size determination unit 23 determines that the aspect ratio of the plurality of image data is 2: 1 or 1: 1. If it is determined that the data set is unified in 2, pix2pix may be further presented as a learning algorithm suitable for the data set.

ｐｉｘ２ｐｉｘの学習アルゴリズムは、２つの画像間に潜む画像変換をＤＮＮ（ディープニューラルネットワーク）で表現しようとするものである。フォルダに格納されている複数の画像データの縦横比が２：１または１：２で統一されている場合は、１つの画像データの中に、縦横比が１：１である２つの画像が上下または左右に並べて記録されている可能性があると推定することができる。よって、この場合は、並べて記録されている２つの画像を用いてｐｉｘ２ｐｉｘの学習を行うことができる可能性があり、アルゴリズム提示部２４は、データセットが適している学習のアルゴリズムの１つとしてｐｉｘ２ｐｉｘを提示する。 The learning algorithm of pix2pix tries to express the image transformation hidden between two images by DNN (deep neural network). When the aspect ratio of a plurality of image data stored in a folder is unified to 2: 1 or 1: 2, two images having an aspect ratio of 1: 1 are placed one above the other in one image data. Alternatively, it can be estimated that they may have been recorded side by side. Therefore, in this case, there is a possibility that the learning of the pix2pix can be performed using the two images recorded side by side, and the algorithm presenting unit 24 uses the pix2pix as one of the learning algorithms suitable for the data set. To present.

以上のようなフォルダありのデータセットに対し、フォルダなしのデータセットの場合は、それらが共通の概念を持った画像データであるとは限らない。教師データとすべきフォルダ名が存在するわけでもない。よって、アルゴリズム提示部２４は、保存形式判定部２２によりデータセットの保存形式が「フォルダなし」と判定された場合、当該データセットが適している学習のアルゴリズムとして、基本的には教師なし学習を提示する。 In contrast to the above datasets with folders, in the case of datasets without folders, they are not necessarily image data with a common concept. Nor does the folder name that should be the teacher data exist. Therefore, when the storage format determination unit 22 determines that the storage format of the data set is "no folder", the algorithm presentation unit 24 basically performs unsupervised learning as a learning algorithm suitable for the data set. Present.

ただし、アルゴリズム提示部２４は、フォルダなしのデータセットが適している学習アルゴリズムとして、ｐｉｘ２ｐｉｘを更に提示するようにしてもよい。１つの画像データの中に２つの画像が並べて記録されている可能性もあるからである。なお、フォルダなしのデータセットについても、それらの画像データの縦横比が２：１または１：２で統一されているか否かをデータサイズ判定部２３により判定し、統一されていると判定された場合に、当該データセットが適している学習アルゴリズムとしてｐｉｘ２ｐｉｘを提示するようにしてもよい。 However, the algorithm presentation unit 24 may further present pix2pix as a learning algorithm suitable for a data set without folders. This is because there is a possibility that two images are recorded side by side in one image data. Even for the data set without folders, the data size determination unit 23 determines whether or not the aspect ratio of the image data is unified at 2: 1 or 1: 2, and it is determined that they are unified. In some cases, pix2pix may be presented as a learning algorithm suitable for the dataset.

以上詳しく説明したように、本実施形態では、学習用に提供されたデータセットの保存形式としてフォルダの有無を判定し、その判定の結果に応じて、データセットが適している学習アルゴリズムを提示するようにしている。また、本実施形態では、データセットの保存形式に加え、複数の画像データのデータサイズも判定し、縦横比が２：１または１：２で統一されているか否かに応じて、データセットが適している学習アルゴリズムを提示するようにしている。 As described in detail above, in the present embodiment, the presence / absence of a folder is determined as the storage format of the data set provided for learning, and a learning algorithm suitable for the data set is presented according to the result of the determination. I am doing it. Further, in the present embodiment, in addition to the data set storage format, the data size of a plurality of image data is also determined, and the data set is set according to whether or not the aspect ratio is unified at 2: 1 or 1: 2. I try to present a suitable learning algorithm.

このように構成した本実施形態によれば、学習用に提供されたデータセットがフォルダに保存されているか否かとか、複数の画像データの縦横比が２：１または１：２で統一されているか否かなどに応じて、そのデータセットが適している学習アルゴリズムが提示されるので、ユーザが収集したデータセットをどのアルゴリズムの学習に使用することができるのかをユーザが手動で判別する必要がなくなる。これにより、各種アルゴリズムの機械学習に習熟した開発者でなくても、データセットがどの学習アルゴリズムに適しているのかの判別を適切に行うことができるようになる。 According to the present embodiment configured in this way, whether or not the data set provided for learning is stored in the folder, and the aspect ratio of the plurality of image data are unified to 2: 1 or 1: 2. Since the learning algorithm suitable for the data set is presented depending on whether or not the data set is present, it is necessary for the user to manually determine which algorithm the data set collected by the user can be used for learning. It disappears. As a result, even a developer who is not proficient in machine learning of various algorithms can appropriately determine which learning algorithm the data set is suitable for.

なお、上記実施形態では、ユーザ端末１０とサーバ装置２０とを通信ネットワーク３０で接続したシステム構成において、サーバ装置２０に本実施形態の学習用データ判別装置を実装し、サーバ装置２０がユーザ端末１０からデータセットを取得する例について説明したが、本発明はこれに限定されない。例えば、ユーザ端末１０に本実施形態の学習用データ判別装置を実装し、ユーザ端末１０が外部のサーバ装置やリムーバル記憶媒体などからデータセットを取得するようにしてもよい。 In the above embodiment, in the system configuration in which the user terminal 10 and the server device 20 are connected by the communication network 30, the learning data discrimination device of the present embodiment is mounted on the server device 20, and the server device 20 is the user terminal 10. Although an example of acquiring a data set from is described, the present invention is not limited thereto. For example, the learning data discrimination device of the present embodiment may be mounted on the user terminal 10 so that the user terminal 10 acquires a data set from an external server device, a removable storage medium, or the like.

また、上記実施形態において示したデータサイズ判定部２３は、本発明において必須の構成ではなく、これを省略してもよい。 Further, the data size determination unit 23 shown in the above embodiment is not an essential configuration in the present invention, and may be omitted.

また、上記実施形態では、データセットの一例として画像データを用いる例について説明したが、本発明はこれに限定されない。例えば、音声データやテキストデータ、あるいはその他の形式のデータを用いる場合にも本実施形態を適用することが可能である。 Further, in the above embodiment, an example in which image data is used as an example of a data set has been described, but the present invention is not limited thereto. For example, the present embodiment can be applied even when voice data, text data, or data in other formats are used.

その他、上記実施形態は、何れも本発明を実施するにあたっての具体化の一例を示したものに過ぎず、これによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその要旨、またはその主要な特徴から逸脱することなく、様々な形で実施することができる。 In addition, the above embodiments are merely examples of the embodiment of the present invention, and the technical scope of the present invention should not be construed in a limited manner. That is, the present invention can be implemented in various forms without departing from its gist or its main features.

１０ユーザ端末
２０サーバ装置（学習用データ判別装置）
２１データセット取得部
２２保存形式判定部
２３データサイズ判定部
２４アルゴリズム提示部
２５対応テーブル記憶部 10 User terminal 20 Server device (learning data discrimination device)
21 Data set acquisition unit 22 Storage format determination unit 23 Data size determination unit 24 Algorithm presentation unit 25 Corresponding table storage unit

Claims

A dataset acquisition unit that acquires the dataset provided for training,
As the storage format of the data set acquired by the above data set acquisition unit, a storage format determination unit that determines the existence of a folder and a storage format determination unit.
A learning data discrimination device including an algorithm presentation unit that presents a learning algorithm suitable for the data set according to the result of determination by the storage format determination unit.

The above data set is a set of multiple image data.
When the storage format determination unit determines that the storage format of the data set is no folder, the algorithm presentation unit presents at least unsupervised learning as a learning algorithm suitable for the data set, and the storage format determination unit. The data discriminating device for learning according to claim 1, wherein when it is determined that the storage format of the data set has a folder, at least supervised learning is presented as a learning algorithm suitable for the data set.

2. The algorithm presenting unit further presents pix2pix as a learning algorithm suitable for the data set when the storage format determination unit determines that the storage format of the data set is no folder. The learning data discriminating device described in 1.

When the storage format determination unit determines that the storage format of the data set has a folder, the algorithm presentation unit presents at least one of image classification and GAN as a learning algorithm suitable for the data set. The learning data discriminating device according to claim 2 or 3.

When the storage format determination unit determines that the data set has a folder and the data set is divided into a plurality of folders, the algorithm presentation unit determines that the data set has a folder. The learning data discriminating device according to any one of claims 2 to 4, wherein at least one of CycleGAN and DiscoGAN is presented as a suitable learning algorithm.

Further, a data size determination unit for determining whether or not the aspect ratio of a plurality of image data constituting the above data set is unified at 2: 1 or 1: 2 is provided.
When the data size determination unit determines that the aspect ratio of the plurality of image data is unified at 2: 1 or 1: 2, the algorithm presentation unit can be used as a learning algorithm suitable for the data set. The data discriminating device for learning according to any one of claims 2 to 5, further comprising presenting pix2pix.

A dataset acquisition method, which acquires the dataset provided for training.
As the storage format of the data set acquired by the data set acquisition means, the storage format determination means for determining the existence of a folder and the learning algorithm for which the data set is suitable according to the result of the determination by the storage format determination means. A data discriminating program for learning to make a computer function as a means of presenting an algorithm.