JP2020144553A

JP2020144553A - Storage device and data processing method in the same

Info

Publication number: JP2020144553A
Application number: JP2019039970A
Authority: JP
Inventors: 征之兒玉; Masayuki Kodama; 雄樹近藤; Takeki Kondo
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2019-03-05
Filing date: 2019-03-05
Publication date: 2020-09-10

Abstract

To reduce consumption of storage capacity of a storage device, reduce the storage capacity, and improve capacity efficiency.SOLUTION: A storage device 104 connected to a server via a network includes: a storage unit for storing original data acquired from an outside; a generation unit for generating processing data by processing original data selected from the original data stored in the storage unit in response to a processing data request from the server; and an evaluation unit for evaluating processing performance when the generation unit generates the processing data prior to the generation of processing data by the generation unit. The generation unit generates processing data based on the evaluated processing performance and sends the generated processing data to the server.SELECTED DRAWING: Figure 7

Description

本発明は、ストレージ装置及びストレージ装置におけるデータ処理方法に関し、例えば監視カメラなどＩｏＴ（Internet of Things）センサデバイスにより取得されストレージ装置に蓄積された画像データを機械学習に用いる画像系ＡＩ（Artificial Intelligence）の情報処理システムに適用して好適なるものである。 The present invention relates to a storage device and a data processing method in the storage device, for example, an image system AI (Artificial Intelligence) that uses image data acquired by an IoT (Internet of Things) sensor device such as a surveillance camera and stored in the storage device for machine learning. It is suitable for application to the information processing system of.

特許文献１には、データをひとつの拠点のファイルサーバ又は他の拠点のファイルサーバの何れかで管理し、データが何れのファイルサーバで管理されているかに基づいて、適切にデータ参照を行う技術が開示されている。詳細には、ファイルサーバが、ユーザ端末から受信したファイルのデータを記憶デバイスに格納すると共にリモートのファイルサーバにレプリケーションし、記憶デバイスに格納したファイルのデータをスタブ化する。ファイルサーバは、ユーザ端末から受信したアクセス要求に係るファイルがスタブ化されていない場合には、記憶デバイスから該当のファイルのデータを読み出してユーザ端末に送信する。一方、ファイルサーバは、ユーザ端末から受信したアクセス要求に係るファイルがスタブ化されていない場合には、記憶デバイスから該当のファイルのデータを読み出してユーザ端末に送信する。 Patent Document 1 describes a technique in which data is managed by either a file server at one base or a file server at another base, and data is appropriately referred to based on which file server the data is managed by. Is disclosed. Specifically, the file server stores the data of the file received from the user terminal in the storage device and replicates it to the remote file server, and stubs the data of the file stored in the storage device. When the file related to the access request received from the user terminal is not stubbed, the file server reads the data of the file from the storage device and sends it to the user terminal. On the other hand, when the file related to the access request received from the user terminal is not stubbed, the file server reads the data of the file from the storage device and sends it to the user terminal.

特許６２３１６２３号公報Japanese Patent No. 6231623

近年のＩｏＴ及びＡＩの発展に伴い、ＩｏＴデバイスであるカメラなどで取得されストレージ装置に蓄積された画像データをサーバで機械学習する情報処理システムがある。かかる情報処理システムにおいて、機械学習の効率化及び精度向上のために、画像データをもとにした加工画像（元画像に対して位置シフトや回転などの加工を施した画像）を追加するデータオーギュメンテーションが行われている。この情報処理システムにおいては、データオーギュメンテーションの際、ストレージ装置から全ての加工対象の画像データがサーバに読み出され、サーバでデータオーギュメンテーションが行われ全ての加工画像のデータがストレージ装置に書き出される。 With the development of IoT and AI in recent years, there is an information processing system in which a server machine-learns image data acquired by an IoT device such as a camera and stored in a storage device. In such an information processing system, in order to improve the efficiency and accuracy of machine learning, a data augment that adds a processed image (an image that has been processed such as position shift or rotation to the original image) based on the image data. Mentioning is done. In this information processing system, at the time of data augmentation, all the image data to be processed is read from the storage device to the server, the data augmentation is performed by the server, and the data of all the processed images is stored in the storage device. Will be written out.

よって、上述の従来技術では、ＩｏＴデバイスで取得されストレージ装置に蓄積された画像データをサーバで機械学習する情報処理システムにおいて、全ての加工画像データを一旦ストレージ装置に書き出すことによりストレージ装置の記憶容量が圧迫されるという問題がある。 Therefore, in the above-mentioned conventional technique, in an information processing system in which image data acquired by an IoT device and stored in a storage device is machine-learned by a server, all processed image data is once written to the storage device to store the storage capacity of the storage device. There is a problem that is squeezed.

さらに、機械学習を行う情報処理システムには、当該システムが搭載する主記憶容量以上の教師データを取り扱う際、ＨＤＤ（Hard Disk Drive）などからなる二次記憶領域に機械学習用データを格納し、主記憶へ部分的に読み出すことで機械学習を行うシステムが利用されている。このようなシステムでは、複数の学習用データを束ねてサイズの大きな中間データとすることで、ＨＤＤなどへのアクセスを効率化することが知られている。しかし、これら中間データについてもいったんストレージ装置に書き出されるため、前述のデータオーギュメンテーションと合わせて、学習用データの規模が大きくなればなるほど、ストレージ装置の記憶容量が圧迫される問題はより大きくなる。 Furthermore, in an information processing system that performs machine learning, when handling teacher data that exceeds the main storage capacity installed in the system, machine learning data is stored in a secondary storage area consisting of an HDD (Hard Disk Drive) or the like. A system that performs machine learning by partially reading into main memory is used. In such a system, it is known that access to an HDD or the like is made more efficient by bundling a plurality of learning data into large-sized intermediate data. However, since these intermediate data are also once written to the storage device, the problem that the storage capacity of the storage device is compressed becomes greater as the scale of the training data increases in combination with the above-mentioned data augmentation. ..

本発明は以上の点を考慮してなされたもので、ＩｏＴデバイスで取得されストレージ装置に蓄積されたデータをサーバで機械学習する情報処理システムにおいて、ストレージ装置の記憶容量の消費を削減し、記憶容量の小容量化及び容量効率の向上を図ることができるストレージ装置及びストレージ装置におけるデータ処理方法を提案しようとするものである。 The present invention has been made in consideration of the above points, and in an information processing system in which data acquired by an IoT device and stored in a storage device is machine-learned by a server, consumption of the storage capacity of the storage device is reduced and storage is performed. We are trying to propose a storage device and a data processing method in the storage device that can reduce the capacity and improve the capacity efficiency.

かかる課題を解決するため本発明においては、ストレージ装置は、サーバとネットワークを介して接続され、外部から取得された元データを記憶する記憶部と、前記サーバからの加工データの要求に応じて、前記記憶部に記憶されている元データのうちから選択した元データに対して加工を行って加工データを生成する生成部と、前記生成部による加工データの生成に先立って、前記生成部が該加工データを生成する際の処理性能を評価する評価部とを有し、前記評価された処理性能に基づいて前記生成部により前記加工データを生成し、生成した加工データを前記サーバに送信することを特徴とする。 In order to solve such a problem, in the present invention, the storage device is connected to the server via a network, has a storage unit that stores the original data acquired from the outside, and responds to a request for processed data from the server. A generation unit that processes the original data selected from the original data stored in the storage unit to generate processing data, and the generation unit prior to the generation of the processing data by the generation unit. It has an evaluation unit that evaluates the processing performance when generating processing data, generates the processing data by the generation unit based on the evaluated processing performance, and transmits the generated processing data to the server. It is characterized by.

本発明によれば、ストレージ装置の記憶容量の小容量化及び記憶効率の向上により、機械学習に用いるＡＩの情報処理システムにおける設備コスト及び運用コストを低減できる。 According to the present invention, it is possible to reduce the equipment cost and the operation cost in the AI information processing system used for machine learning by reducing the storage capacity of the storage device and improving the storage efficiency.

実施例１のストレージ装置を含む学習システムのハードウェアの論理的構成の一例を示す図である。It is a figure which shows an example of the logical configuration of the hardware of the learning system including the storage device of Example 1. FIG. 実施例１のストレージ装置のメモリに格納されるプログラムとデータの一例を示す図である。It is a figure which shows an example of the program and data stored in the memory of the storage apparatus of Example 1. FIG. 実施例１のストレージ装置のストレージデータ領域に格納されるデータの一例を示す図である。It is a figure which shows an example of the data stored in the storage data area of the storage apparatus of Example 1. FIG. 実施例１のデータ生成リストセットに含まれるデータ生成リストの一例を示す図である。It is a figure which shows an example of the data generation list included in the data generation list set of Example 1. FIG. 実施例１の事前性能評価結果を記録するテーブルの一例を示す図である。It is a figure which shows an example of the table which records the preliminary performance evaluation result of Example 1. FIG. 実施例１の学習サーバ装置のメモリに格納されるプログラムの一例を示す図である。It is a figure which shows an example of the program stored in the memory of the learning server apparatus of Example 1. FIG. 実施例１のストレージ装置のデータ登録フェーズにおける論理的構成の一例を示す図である。It is a figure which shows an example of the logical configuration in the data registration phase of the storage apparatus of Example 1. FIG. 実施例１のストレージ装置におけるデータ登録フェーズの処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process of the data registration phase in the storage apparatus of Example 1. FIG. 実施例１のストレージ装置の学習フェーズにおける論理的構成の一例を示す図である。It is a figure which shows an example of the logical configuration in the learning phase of the storage apparatus of Example 1. FIG. 実施例１のストレージ装置における学習フェーズの処理の一例を示すフローチャートである。It is a flowchart which shows an example of the processing of the learning phase in the storage apparatus of Example 1. FIG. 実施例１のストレージ装置におけるバッファ領域解放処理の一例を示すフローチャートである。It is a flowchart which shows an example of the buffer area release processing in the storage apparatus of Example 1. FIG. 実施例２のストレージ装置を含む学習システムのハードウェアの論理的構成の一例を示す図である。It is a figure which shows an example of the logical configuration of the hardware of the learning system including the storage device of Example 2. FIG. 実施例２のストレージ装置のメモリに格納されるプログラムとデータの一例を示す図である。It is a figure which shows an example of the program and data stored in the memory of the storage apparatus of Example 2. FIG. 実施例２のストレージ装置のストレージデータ領域に格納されるデータの一例を示す図である。It is a figure which shows an example of the data stored in the storage data area of the storage apparatus of Example 2. FIG. 実施例２のデータ生成リストセットに含まれる拡張データ生成リストの一例を示す図である。It is a figure which shows an example of the extended data generation list included in the data generation list set of Example 2. FIG. 実施例２の事前性能評価結果のテーブルの一例を示す図である。It is a figure which shows an example of the table of the preliminary performance evaluation result of Example 2. 実施例２のストレージ装置のデータ登録フェーズにおける論理的構成の一例を示す図である。It is a figure which shows an example of the logical configuration in the data registration phase of the storage apparatus of Example 2. FIG. 実施例２のストレージ装置の学習フェーズにおける論理的構成の一例を示す図である。It is a figure which shows an example of the logical configuration in the learning phase of the storage apparatus of Example 2.

以下図面に基づき、本発明の実施例を詳述する。以下の実施例を説明するための各図面において、同一の参照番号で同一あるいは類似の機能を備えた構成要件を示し、後出の説明を省略する。また実施例及び変形例は、本発明の技術思想の範囲内及び整合する範囲内でその一部又は全部を組合せることができる。 Examples of the present invention will be described in detail below with reference to the drawings. In each of the drawings for explaining the following examples, the configuration requirements having the same or similar functions with the same reference number will be shown, and the description below will be omitted. Further, the examples and modifications can be combined in part or in whole within the scope of the technical idea of the present invention and within the range consistent with them.

以下の説明では、「ａａａリスト」又は「ａａａテーブル」の表現にて各種情報を説明することがあるが、各種情報は、リスト及びテーブル以外のデータ構造で表現されていてもよい。データ構造に依存しないことを示すために「ａａａリスト」及び「ａａａテーブル」を「ａａａ情報」と呼ぶこともできる。「ａａａリスト」、「ａａａテーブル」、又は「ａａａ情報」は、記憶資源（例えばメモリ）に確保された記憶領域に格納される。 In the following description, various information may be described by the expression of "aaa list" or "aaa table", but various information may be expressed by a data structure other than the list and the table. The "aaa list" and the "aaa table" can also be referred to as "aaa information" to show that they do not depend on the data structure. The "aaa list", "aaa table", or "aaa information" is stored in a storage area reserved for a storage resource (for example, memory).

また、以下の説明では、「プログラム」を主語として処理フローを説明する場合がある。プログラムは、プロセッサ（例えばＣＰＵ（Central Processing Unit））によって、記憶資源（例えばメモリ）にロードされ実行されることで、記憶資源及び通信インタフェースのうちの少なくとも１つを適宜に使用しながら予め定められた処理を実行する。例えば「ｘｘｘプログラム」のように説明又は図示される要素は、プロセッサが、記憶資源又は通信インタフェースを使用し、プログラムソースを解析実行することで、予め定められた処理を実行する処理部である。このため、プログラムによる処理の主語が、プロセッサ、もしくはそのプロセッサを有する装置とされてもよい。また、「ｘｘｘプログラム」は、「ｘｘｘ部」とも言い換えることができる。 Further, in the following description, the processing flow may be described with "program" as the subject. The program is loaded into a storage resource (for example, memory) by a processor (for example, a CPU (Central Processing Unit)) and executed, so that the program is predetermined while appropriately using at least one of the storage resource and the communication interface. Execute the processing. For example, an element described or illustrated as "xxx program" is a processing unit in which a processor executes a predetermined process by analyzing and executing a program source using a storage resource or a communication interface. Therefore, the subject of processing by the program may be a processor or a device having the processor. Further, the "xxx program" can be paraphrased as a "xxx part".

また、プロセッサが実行する処理の一部又は全部は、ハードウェア回路により処理されてもよい。プロセッサが実行する処理を規定するプログラムは、例えば外部装置からネットワークを介して取得して、あるいは記憶メディアを介して取得して、プロセッサにより実行されるものとしてもよい。 In addition, a part or all of the processing executed by the processor may be processed by the hardware circuit. The program that defines the processing to be executed by the processor may be, for example, acquired from an external device via a network or acquired via a storage medium and executed by the processor.

本発明の実施例１について、図１〜図１０を用いて説明する。 Example 1 of the present invention will be described with reference to FIGS. 1 to 10.

＜実施例１のストレージ装置を含む学習システムの構成＞
図１を用いて、本発明の前提となるシステムのハードウェアの論理的構造を説明する。図１は、実施例１のストレージ装置を含む学習システムのハードウェアの論理的構成の一例を示す図である。実施例１のストレージ装置１０４を含む学習システム１Ｓは、複数の学習サーバ１０１、１０２とストレージ装置１０４とが、ネットワーク１０３で接続されたシステムである。 <Configuration of learning system including storage device of Example 1>
The logical structure of the hardware of the system which is the premise of the present invention will be described with reference to FIG. FIG. 1 is a diagram showing an example of a logical configuration of hardware of a learning system including the storage device of the first embodiment. The learning system 1S including the storage device 104 of the first embodiment is a system in which a plurality of learning servers 101 and 102 and the storage device 104 are connected by a network 103.

学習システム１Ｓの複数の学習サーバ１０１、１０２は、教師データの機械学習を行うサーバであり、積和演算や誤差逆伝搬処理などの学習処理を分担して行う学習サーバ（従）１０２と、学習処理に加えて初期化や学習結果の集約なども行う学習サーバ（主）１０１とによって構成される。これら学習サーバ１０１、１０２のそれぞれは、ＣＰＵ１１１と、メモリ１１２と、ＩＦ１１３とから構成される。なお、学習サーバ１０１、１０２は、ＨＤＤ（Hard Disk Drive）などの記憶装置を搭載することもある。また、学習サーバ１０１、１０２は、学習処理を高速化するためのアクセラレータを搭載していてもよい。学習サーバ（従）１０２を用いずに、学習サーバ（主）１０１のみで学習処理を行うことも可能である。 The plurality of learning servers 101 and 102 of the learning system 1S are servers that perform machine learning of teacher data, and are a learning server (sub) 102 that shares learning processing such as product-sum calculation and error backpropagation processing, and learning. It is composed of a learning server (main) 101 that performs initialization and aggregation of learning results in addition to processing. Each of the learning servers 101 and 102 is composed of a CPU 111, a memory 112, and an IF 113. The learning servers 101 and 102 may be equipped with a storage device such as an HDD (Hard Disk Drive). Further, the learning servers 101 and 102 may be equipped with an accelerator for speeding up the learning process. It is also possible to perform the learning process only on the learning server (main) 101 without using the learning server (sub) 102.

ＣＰＵ１１１は、メモリ１１２に格納された各プログラム、管理情報などを使用して、学習処理を含む各種処理を実行する。 The CPU 111 executes various processes including a learning process by using each program, management information, and the like stored in the memory 112.

メモリ１１２は、ＣＰＵ１１１が実行するプログラムやプログラムが使用する管理情報などを格納する。このメモリ１１２には、これら以外の情報、例えばＯＳ（Operating System）の管理情報を格納するなどの用途に使用してもよい。また、メモリ１１２は、ＳＤＲＡＭ（Synchronous Dynamic Random Access Memory）で構成されることが一般的であるが、これ以外の記憶素子で構成されてもよい。 The memory 112 stores a program executed by the CPU 111, management information used by the program, and the like. The memory 112 may be used for purposes such as storing information other than these, for example, management information of the OS (Operating System). Further, the memory 112 is generally composed of an SDRAM (Synchronous Dynamic Random Access Memory), but may be composed of a storage element other than this.

ＩＦ１１３は、ネットワーク１０３を介して、学習サーバ１０１、１０２が相互に情報をやり取りしたり、ストレージ装置１０４から学習データを読み出したりするためのインタフェースである。このＩＦ１１３は、ネットワーク１０３の構成に合わせて選択することが可能であり、また用途ごとに分けることも可能である。例えば学習サーバ１０１、１０２が相互にやり取りする場合はＥｔｈｅｒｎｅｔ（登録商標、以下同様）を使用する、ストレージ装置１０４から学習データを読み出す場合はＦｉｂｒｅＣｈａｎｎｅｌを利用する、といった具合である。また、複数のＩＦ１１３を連携させることで、冗長性を得るなども可能である。 The IF 113 is an interface for the learning servers 101 and 102 to exchange information with each other and read learning data from the storage device 104 via the network 103. The IF 113 can be selected according to the configuration of the network 103, and can also be divided according to the application. For example, when the learning servers 101 and 102 communicate with each other, Ethernet (registered trademark, the same applies hereinafter) is used, and when the learning data is read from the storage device 104, Fiber Channel is used. It is also possible to obtain redundancy by linking a plurality of IF 113s.

ネットワーク１０３は、集線装置とケーブルからなり、学習サーバ１０１、１０２同士、もしくはストレージ装置１０４との間の通信を取り持つ。本実施例では全てＥｔｈｅｒｎｅｔによるネットワーク１０３を例としているが、全てもしくは部分的に異なる通信規格、例えばＩｎｆｉｎｉｂａｎｄ（登録商標）やＦｉｂｒｅＣｈａｎｎｅｌなどを用いることができる。異なる通信規格を混在させる場合は、通信規格に合わせた集線装置とケーブルを併用することができる。 The network 103 includes a concentrator and a cable, and handles communication between the learning servers 101 and 102 or the storage device 104. In this embodiment, the network 103 by Ethernet is taken as an example, but communication standards that are completely or partially different, such as Infiniband (registered trademark) and Fiber Channel, can be used. When different communication standards are mixed, a line concentrator and a cable that match the communication standard can be used together.

ストレージ装置１０４は、ＦＥ（フロントエンド）１２１と、ＣＰＵ１２２と、ＢＥ（バックエンド）１２３と、メモリ１２４と、１つ以上のドライブ１２５とから構成される。 The storage device 104 includes an FE (front end) 121, a CPU 122, a BE (back end) 123, a memory 124, and one or more drives 125.

ＦＥ１２１は、ネットワーク１０３を介して、学習サーバ１０１、１０２からストレージ装置１０４への通信を受け付けるたり、応答したりする。通信規格としてはＥｔｈｅｒｎｅｔやＦｉｂｒｅＣｈａｎｎｅｌなどを使用することができる。 The FE 121 accepts and responds to the communication from the learning servers 101 and 102 to the storage device 104 via the network 103. As a communication standard, Ethernet, Fiber Channel, or the like can be used.

ＣＰＵ１２２は、メモリ１２４に格納された各プログラム、管理情報などを使用して、学習サーバ１０１、１０２やその他サーバからのＦＥ１２１を介してのデータアクセスに対し応答する。 The CPU 122 responds to data access from the learning servers 101, 102 and other servers via the FE 121 by using each program, management information, and the like stored in the memory 124.

ＢＥ１２３は、ＣＰＵ１２２が学習サーバ１０１、１０２やその他サーバからのアクセスの際に、実際のデータが格納されているドライブ１２５へのアクセスを仲介する。通信規格としてはＳＡＳ（Serial Attached SCSI）やＮＶＭｅ（Non-Volatile Memory Express）などを使用できる。 The BE 123 mediates access to the drive 125 in which the actual data is stored when the CPU 122 accesses from the learning servers 101, 102 and other servers. As a communication standard, SAS (Serial Attached SCSI), NVMe (Non-Volatile Memory Express), etc. can be used.

メモリ１２４は、ＣＰＵ１２２が実行するプログラムやプログラムが使用する管理情報などを格納する。このメモリ１２４には、これら以外の情報、例えばストレージファームウェアの管理情報を格納するなどの用途に使用してもよい。また、メモリ１２４は、ＳＤＲＡＭ（Synchronous Dynamic Random Access Memory）で構成されることが一般的であるが、これ以外の記憶素子で構成されてもよい。 The memory 124 stores a program executed by the CPU 122, management information used by the program, and the like. Information other than these, for example, storage firmware management information may be stored in the memory 124. Further, the memory 124 is generally composed of an SDRAM (Synchronous Dynamic Random Access Memory), but may be composed of other storage elements.

１又は複数のドライブ１２５は、ＨＤＤやＳＳＤ（Solid State Drive）で構成される記憶装置である。また、複数のドライブ１２５をパリティグループと呼ばれる単位にまとめ、ＲＡＩＤ（Redundant Arrays of Independent Disks）のような高信頼化技術を使用して、論理的な記憶領域として使用してもよい。なお、ＳＳＤはＮＡＮＤＦｌａｓｈＭｅｍｏｒｙを記憶領域として使用したものに限らず、ＰｈａｓｅＣｈａｎｇｅＭｅｍｏｒｙなどの記憶素子を用いたもので構成してもよい。 One or more drives 125 are storage devices composed of HDDs and SSDs (Solid State Drives). Further, a plurality of drives 125 may be grouped into a unit called a parity group and used as a logical storage area by using a highly reliable technology such as RAID (Redundant Arrays of Independent Disks). The SSD is not limited to the one using the NAND Flash Memory as the storage area, and may be configured by using a storage element such as the Phase Change Memory.

＜実施例１のストレージ装置のメモリに格納されるプログラムとデータ＞
図２を用いて、ストレージ装置１０４のメモリ１２４に格納されるプログラム及び情報を示す。図２は、実施例１のストレージ装置のメモリに格納されるプログラムとデータの一例を示す図である。 <Programs and data stored in the memory of the storage device of Example 1>
FIG. 2 shows a program and information stored in the memory 124 of the storage device 104. FIG. 2 is a diagram showing an example of a program and data stored in the memory of the storage device of the first embodiment.

メモリ１２４には、データ生成管理プログラム２０１と、バッファサイズ抑制プログラム２０２と、事前性能評価プログラム２０３と、データ生成処理プログラム２０４と、パラメータ受信プログラム２０５と、ＩＯ観測プログラム２０６と、ファイルエミュレーションプログラム２０７とが格納される。また、メモリ１２４は、データを一時格納する一時領域として生成済みデータバッファ２１１を有する。詳細は後述するが、後述のストレージデータ領域３０１の生成済みデータ退避領域３０３（図３参照）も、生成済みデータバッファ２１１と共に、データを一時格納する一時領域となる。なお、メモリ１２４は、これ以外の制御プログラム、制御データなども保持してよい。 The memory 124 includes a data generation management program 201, a buffer size suppression program 202, a pre-performance evaluation program 203, a data generation processing program 204, a parameter reception program 205, an IO observation program 206, and a file emulation program 207. Is stored. Further, the memory 124 has a generated data buffer 211 as a temporary area for temporarily storing data. Although details will be described later, the generated data save area 303 (see FIG. 3) of the storage data area 301 described later also serves as a temporary area for temporarily storing data together with the generated data buffer 211. The memory 124 may also hold other control programs, control data, and the like.

データ生成管理プログラム２０１は、データ生成処理プログラム２０４を起動させ、学習サーバ１０１、１０２が要求する学習データサイズや、学習サーバ１０１、１０２のデータ読み出しスループットに応じたデータ生成スループットで、オーギュメンテーションデータを動的に生成し、一時領域に格納する制御を行う。生成されたオーギュメンテーションデータは、学習データとなる。 The data generation management program 201 activates the data generation processing program 204, and the augmentation data is obtained with the data generation throughput corresponding to the learning data size required by the learning servers 101 and 102 and the data read throughput of the learning servers 101 and 102. Is dynamically generated and controlled to be stored in the temporary area. The generated augmentation data becomes training data.

バッファサイズ抑制プログラム２０２は、学習サーバ１０１、１０２からの読み出しが終わった、すなわちクローズされた生成済みのオーギュメンテーションデータが使用していた一時領域を解放し、一時領域の肥大化を抑える制御を行う。 The buffer size suppression program 202 releases the temporary area used by the generated augmentation data that has been read from the learning servers 101 and 102, that is, the closed generated augmentation data, and controls the suppression of the expansion of the temporary area. Do.

事前性能評価プログラム２０３は、データオーギュメンテーションの単位処理性能（１処理あたりの処理性能）、すなわちデータ生成スループットを実際の学習処理前に事前評価する。 The pre-performance evaluation program 203 pre-evaluates the unit processing performance (processing performance per process) of data augmentation, that is, the data generation throughput before the actual learning process.

データ生成処理プログラム２０４は、データ生成管理プログラム２０１から起動され、実際のデータオーギュメンテーションを行う。 The data generation processing program 204 is started from the data generation management program 201 to perform actual data augmentation.

パラメータ受信プログラム２０５は、ユーザが学習サーバ１０１、１０２を使用する際に動的に決定する動的パラメータ、例えば学習サーバ（従）１０２の台数や、学習サーバ１０１、１０２における処理のスレッド数、メモリ１１２に一度に読み出す教師データ数、エポック数などを、学習サーバ（主）１０１から受信する。また、動的パラメータをストレージ装置１０４が受け取った後、ストレージ装置１０４内部でオーギュメンテーションデータの応答準備が完了すると、学習サーバ（主）１０１にオーギュメンテーションデータの読み出しが可能な状態になったことを通知する。 The parameter receiving program 205 has dynamic parameters that are dynamically determined when the user uses the learning servers 101 and 102, for example, the number of learning servers (subordinate) 102, the number of processing threads in the learning servers 101 and 102, and the memory. The number of teacher data, the number of epochs, etc. read at one time in 112 are received from the learning server (main) 101. Further, after the storage device 104 receives the dynamic parameters, when the response preparation of the augmentation data is completed inside the storage device 104, the learning server (main) 101 can read the augmentation data. Notify that.

なお、教師データとは、学習対象になる画像などの学習データと学習データに対する正解情報を組み合わせたもののことを言う。また、エポック数とは、ある教師データ群を何回繰り返し使用して学習するかを示す数値である。 The teacher data is a combination of learning data such as an image to be learned and correct answer information for the learning data. The epoch number is a numerical value indicating how many times a certain teacher data group is repeatedly used for learning.

ＩＯ観測プログラム２０６は、学習サーバ１０１、１０２による学習データのファイルのオープン、リード、クローズといったイベントや、学習サーバ１０１、１０２への学習データのデータ転送速度を観測し、データ生成管理プログラム２０１とバッファサイズ抑制プログラム２０２へ通知する。 The IO observation program 206 observes events such as opening, reading, and closing of the learning data file by the learning servers 101 and 102, and the data transfer speed of the learning data to the learning servers 101 and 102, and observes the data transfer speed of the learning data to the learning servers 101 and 102, and the data generation management program 201 and the buffer Notify the size suppression program 202.

ファイルエミュレーションプログラム２０７は、学習サーバ１０１、１０２に対して、一時領域に格納している作成済みのオーギュメンテーションデータを通常のファイルとして公開したり、生成前のオーギュメンテーションデータがあたかも存在するかのように見せ、学習サーバ１０１、１０２に動的なデータオーギュメンテーションがされていることを意識させないようにする仮想化を行ったりする。 The file emulation program 207 publishes the created augmentation data stored in the temporary area to the learning servers 101 and 102 as a normal file, or does the augmentation data before generation exist as if it were present? It is virtualized so that the learning servers 101 and 102 are not aware that dynamic data augmentation is performed.

生成済みデータバッファ２１１は、データ生成処理プログラム２０４が生成したオーギュメンテーションデータが格納される一時領域である。生成済みデータバッファ２１１に格納されるオーギュメンテーションデータは、ファイルエミュレーションプログラム２０７が学習サーバ１０１、１０２に対して公開するオーギュメンテーションデータの実体となる。 The generated data buffer 211 is a temporary area in which the augmentation data generated by the data generation processing program 204 is stored. The augmentation data stored in the generated data buffer 211 is an entity of the augmentation data that the file emulation program 207 publishes to the learning servers 101 and 102.

＜実施例１のストレージ装置のストレージデータ領域に格納されるデータ＞
図３を用いて、ストレージ装置１０４の記憶領域に格納される情報を示す。図３は、実施例１のストレージ装置のストレージデータ領域に格納されるデータの一例を示す図である。 <Data stored in the storage data area of the storage device of the first embodiment>
FIG. 3 shows information stored in the storage area of the storage device 104. FIG. 3 is a diagram showing an example of data stored in the storage data area of the storage device of the first embodiment.

ストレージデータ領域３０１は、ストレージ装置１０４が内蔵するドライブ１２５上に論理的に構成された記憶領域である。このストレージデータ領域３０１には、事前性能評価結果３０２と、生成済みデータ退避領域３０３と、データ生成リストセット３１１と、データオーギュメンテーションプログラム３１２と、元データ３１３とが格納される。 The storage data area 301 is a storage area logically configured on the drive 125 included in the storage device 104. The storage data area 301 stores the pre-performance evaluation result 302, the generated data save area 303, the data generation list set 311 and the data augmentation program 312, and the original data 313.

このうち、事前性能評価結果３０２と生成済みデータ退避領域３０３は、ストレージデータ領域３０１に格納されるもののユーザには公開されないデータである。一方、データ生成リストセット３１１と、データオーギュメンテーションプログラム３１２と、元データ３１３は、ユーザによってストレージデータ領域３０１に格納される画像データなどのデータである。このストレージデータ領域３０１には、上記以外のデータも格納されてもよい。 Of these, the pre-performance evaluation result 302 and the generated data save area 303 are data that are stored in the storage data area 301 but are not disclosed to the user. On the other hand, the data generation list set 311 and the data augmentation program 312 and the original data 313 are data such as image data stored in the storage data area 301 by the user. Data other than the above may be stored in the storage data area 301.

事前性能評価結果３０２は、事前性能評価プログラム２０３が行う事前評価の結果を記録したテーブルを格納する領域である。事前性能評価結果３０２の詳細は、後述する。 The pre-performance evaluation result 302 is an area for storing a table in which the result of the pre-evaluation performed by the pre-performance evaluation program 203 is recorded. Details of the preliminary performance evaluation result 302 will be described later.

生成済みデータ退避領域３０３は、データ生成処理プログラム２０４により生成され生成済みデータバッファ２１１に格納されているオーギュメンテーションデータのうち、容量制限のために生成済みデータバッファ２１１に格納しきれないデータを一時的に退避させて格納する領域である。 The generated data save area 303 contains, among the augmentation data generated by the data generation processing program 204 and stored in the generated data buffer 211, the data that cannot be stored in the generated data buffer 211 due to the capacity limitation. This is an area that is temporarily saved and stored.

データ生成リストセット３１１は、ユーザが元データ３１３に対して行いたいデータオーギュメンテーションと、データオーギュメンテーションの結果として学習サーバ１０１、１０２が学習に使用することとなるデータの関係づけを記載するデータ生成リスト４０１（図４参照）の集合である。データ生成リスト４０１の詳細は、後述する。 The data generation list set 311 describes the relationship between the data augmentation that the user wants to perform on the original data 313 and the data that the learning servers 101 and 102 will use for learning as a result of the data augmentation. It is a set of data generation list 401 (see FIG. 4). The details of the data generation list 401 will be described later.

データオーギュメンテーションプログラム３１２は、データオーギュメンテーションに用いられるプログラムもしくはスクリプトであり、ストレージ装置１０４のＣＰＵ１２２により実行される。 The data augmentation program 312 is a program or script used for data augmentation and is executed by the CPU 122 of the storage device 104.

元データ３１３は、ユーザが加工（データオーギュメンテーション）の対象とするデータである。複数のファイル、リストなどから構成され、例えば学習用の画像データや正解データなどである。 The original data 313 is data to be processed (data augmentation) by the user. It is composed of a plurality of files, lists, etc., such as image data for learning and correct answer data.

＜実施例１のデータ生成リスト＞
図４を用いて、データ生成リストセット３１１に含まれるデータ生成リスト４０１を説明する。図４は、実施例１のデータ生成リストセットに含まれるデータ生成リストの一例を示す図である。なお、データ生成リストセット３１１に含まれる複数のデータ生成リスト４０１のそれぞれは、データ生成リストセット３１１内でユニークなＩＤが付与されて識別される。 <Data generation list of Example 1>
The data generation list 401 included in the data generation list set 311 will be described with reference to FIG. FIG. 4 is a diagram showing an example of a data generation list included in the data generation list set of the first embodiment. Each of the plurality of data generation lists 401 included in the data generation list set 311 is identified by being given a unique ID in the data generation list set 311.

データ生成リスト４０１は、入力データ４１１と、データオーギュメンテーションプログラム４１２と、オプション４１３と、出力データ４１４との各列を含む。データ生成リスト４０１は、複数行からなり、１組の入力データ４１１とデータオーギュメンテーションプログラム４１２とオプション４１３と出力データ４１４との対応が１行に記載される。１行が１つの出力ファイルに対応する。 The data generation list 401 includes columns for input data 411, data augmentation program 412, options 413, and output data 414. The data generation list 401 is composed of a plurality of lines, and the correspondence between one set of input data 411, the data augmentation program 412, the option 413, and the output data 414 is described in one line. One line corresponds to one output file.

入力データ４１１には、データオーギュメンテーションを行う対象の元データ３１３のファイル名が記載される。入力データ４１１に記載される加工対象の元データ３１３のファイル名には、該当の元データ３１３のパス情報も含まれる。 In the input data 411, the file name of the original data 313 to be subjected to data augmentation is described. The file name of the original data 313 to be processed described in the input data 411 also includes the path information of the corresponding original data 313.

データオーギュメンテーションプログラム４１２には、入力データ４１１に記載されたファイル名の元データ３１３に対して実行するデータオーギュメンテーションのプログラム名が記載される。オプション４１３には、データオーギュメンテーションプログラム４１２に記載されたデータオーギュメンテーションプログラムを、入力データ４１１に記載されたファイル名の元データ３１３に対して実行する際に、与える引数などのオプションが記載される。 In the data augmentation program 412, the program name of the data augmentation to be executed for the original data 313 of the file name described in the input data 411 is described. Option 413 describes options such as arguments to be given when the data augmentation program described in the data augmentation program 412 is executed on the original data 313 with the file name described in the input data 411. Will be done.

出力データ４１４は、入力データ４１１に記載されたファイル名の元データ３１３に対して、データオーギュメンテーションプログラム４１２に記載されたデータオーギュメンテーションプログラムを、オプション４１３に記載されたオプションを与えて実行した処理結果であるオーギュメンテーションデータの出力ファイル名が記載される。出力データ４１４に記載される出力ファイル名には、該当の出力ファイルのパス情報も含まれる。出力データ４１４に記載の出力ファイル名は、データオーギュメンテーションの処理結果を学習サーバ１０１、１０２に公開する際のファイル名である。 The output data 414 executes the data augmentation program described in the data augmentation program 412 with the option described in option 413 given to the original data 313 with the file name described in the input data 411. The output file name of the augmentation data that is the processing result is described. The output file name described in the output data 414 also includes the path information of the corresponding output file. The output file name described in the output data 414 is a file name when the processing result of the data augmentation is published to the learning servers 101 and 102.

なお、データオーギュメンテーションプログラム４１２に、データオーギュメンテーションを行わないことを示すキーワードとして、ＮＯＮＥと記載することもできる。 In the data augmentation program 412, NONE may be described as a keyword indicating that data augmentation is not performed.

例えば、図４における１行目は、入力データ４１１が“imgs01.jpg”である入力データに対して、データオーギュメンテーションプログラム４１２が“img_augmentation”というデータオーギュメンテーションプログラムを、“-flip”のオプションを与えて実行することを示す。その出力データ４１４は、“imgs01-f.jpg”である。 For example, in the first line in FIG. 4, for the input data in which the input data 411 is “imgs01.jpg”, the data augmentation program 412 sets the data augmentation program “img_augmentation” to “-flip”. Indicates to execute with options. The output data 414 is "imgs01-f.jpg".

また例えば、図４における２行目は、入力データ４１１が“imgs02.lst”であるリストファイル内で列挙されている複数の入力データに対して、データオーギュメンテーションプログラム４１２が“imgs_aug”というデータオーギュメンテーションプログラムを、“-flip”及び“-pack bin”のオプションを与えて実行することを示す。その出力データ４１４は、“imgs02-f.bin”である。“imgs02-f.bin”は、“-pack bin”のオプションにより、複数の入力データがそれぞれデータオーギュメンテーションされた複数の出力データがパッキングされた中間ファイルである。 Further, for example, in the second line in FIG. 4, the data augmentation program 412 is "imgs_aug" for a plurality of input data listed in the list file in which the input data 411 is "imgs02.lst". Indicates that the augmentation program should be executed with the "-flip" and "-pack bin" options. The output data 414 is "imgs02-f.bin". “Imgs02-f.bin” is an intermediate file packed with multiple output data in which multiple input data are data-augmented by the option of “-pack bin”.

＜実施例１の事前性能評価結果を記録するテーブル＞
図５を参照して、事前性能評価結果３０２を記録するテーブルについて説明する。図５は、実施例１の事前性能評価結果を記録するテーブルの一例を示す図である。 <Table for recording the preliminary performance evaluation results of Example 1>
A table for recording the pre-performance evaluation result 302 will be described with reference to FIG. FIG. 5 is a diagram showing an example of a table for recording the preliminary performance evaluation results of Example 1.

事前性能評価３０２のテーブルは、データオーギュメンテーションプログラム３２１と、オプション３２２と、単位処理性能３２３との各列を含む。事前性能評価結果３０２のテーブルは、複数行からなり、１組のデータオーギュメンテーションプログラム３２１とオプション３２２と単位処理性能３２３との対応が１行に記載される。 The table of pre-performance evaluation 302 includes columns for the data augmentation program 321 and options 322 and unit processing performance 323. The table of the pre-performance evaluation result 302 is composed of a plurality of rows, and the correspondence between one set of data augmentation program 321 and option 322 and the unit processing performance 323 is described in one row.

データオーギュメンテーションプログラム３２１とオプション３２２との組合せは、データ生成リスト４０１に記載されているデータオーギュメンテーションプログラム４１２とオプション４１３との組合せに対応する。すなわち、事前性能評価プログラム２０３は、事前性能評価結果３０２のテーブルの各行に記載のデータオーギュメンテーションプログラム４１２とオプション４１３との全ての組合せから重複排除した組合せを、該当する加工対象の入力データ４１１のうちの１つに対して実行し、単位スレッドあたりの性能を測定する。そして、事前性能評価プログラム２０３は、データオーギュメンテーションプログラム３２１とオプション３２２との組合せを実行して測定した単位スレッドあたりの性能を、単位処理性能３２３に記録する。単位処理性能３２３は、例えば単位スレッドでの単位時間あたりの処理データ量（スループット）である。 The combination of the data augmentation program 321 and the option 322 corresponds to the combination of the data augmentation program 412 and the option 413 described in the data generation list 401. That is, the pre-performance evaluation program 203 deduplications from all the combinations of the data augmentation program 412 and the option 413 described in each row of the table of the pre-performance evaluation result 302 are the input data 411 of the corresponding processing target. Execute for one of them and measure the performance per unit thread. Then, the pre-performance evaluation program 203 records the performance per unit thread measured by executing the combination of the data augmentation program 321 and the option 322 in the unit processing performance 323. The unit processing performance 323 is, for example, the amount of processing data (throughput) per unit time in a unit thread.

＜学習サーバ装置のメモリに格納されるプログラム＞
図６を用いて、学習サーバ１０１、１０２のメモリ１１２に格納されるプログラムを説明する。図６は、実施例１の学習サーバ装置のメモリに格納されるプログラムの一例を示す図である。メモリ１１２には、学習プログラム５０１とパラメータ送信ライブラリ５０２とが格納される。 <Program stored in the memory of the learning server device>
A program stored in the memory 112 of the learning servers 101 and 102 will be described with reference to FIG. FIG. 6 is a diagram showing an example of a program stored in the memory of the learning server device of the first embodiment. The learning program 501 and the parameter transmission library 502 are stored in the memory 112.

学習プログラム５０１は、ユーザが実際に行いたい学習処理の内容を記述したプログラムである。学習プログラム５０１による処理は、一般的に初期化フェーズと学習フェーズとに分かれている。 The learning program 501 is a program that describes the content of the learning process that the user actually wants to perform. The processing by the learning program 501 is generally divided into an initialization phase and a learning phase.

学習プログラム５０１は、初期化フェーズにおいて学習処理時に使用するパラメータを決定する。パラメータの具体例としては、学習プログラム５０１が使用する学習データに対応するデータ生成リスト４０１のファイル名、学習サーバ１０１、１０２の数、スレッド数、メモリ１１２に一度に読み出す教師データ数、エポック数などである。 The learning program 501 determines the parameters used during the learning process in the initialization phase. Specific examples of the parameters include the file name of the data generation list 401 corresponding to the learning data used by the learning program 501, the number of learning servers 101 and 102, the number of threads, the number of teacher data read into the memory 112 at one time, the number of epochs, and the like. Is.

これらのパラメータをパラメータ受信プログラム２０５に送信するため、学習プログラム５０１の初期化フェーズ部分にはパラメータ送信ＡＰＩ（Application Programing Interface）が記述される。 In order to transmit these parameters to the parameter receiving program 205, a parameter transmitting API (Application Programming Interface) is described in the initialization phase portion of the learning program 501.

なお、一般的には、学習サーバ（主）１０１と学習サーバ（従）１０２とは、学習サーバ１０１、１０２が持つ識別情報をもとに、学習サーバ（主）１０１か学習サーバ（従）１０２かが判別される。学習サーバ（主）１０１と学習サーバ（従）１０２との間で、学習プログラム５０１自体は共有されるが、学習サーバ１０１、１０２全体で１回だけ実施すればよい処理などは、学習サーバ（主）１０１のみに実施させるなどの制御を行うことができる。 In general, the learning server (main) 101 and the learning server (slave) 102 are either the learning server (main) 101 or the learning server (slave) 102 based on the identification information of the learning servers 101 and 102. Is determined. The learning program 501 itself is shared between the learning server (main) 101 and the learning server (slave) 102, but the learning server (main) performs processing that needs to be performed only once for the learning servers 101 and 102 as a whole. ) It is possible to perform control such as having only 101 carry out.

パラメータ送信ライブラリ５０２は、学習プログラム５０１に記述されるパラメータ送信ＡＰＩから呼び出されるプログラムの実体である。パラメータ送信ライブラリ５０２は、パラメータ受信プログラム２０５を経由してストレージ装置１０４に学習プログラム５０１の動的パラメータを通知する機能と、動的パラメータ通知後からストレージ装置１０４へのデータアクセス準備完了まで学習プログラム５０１の実行を待機させる機能を有する。 The parameter transmission library 502 is an entity of a program called from the parameter transmission API described in the learning program 501. The parameter transmission library 502 has a function of notifying the storage device 104 of the dynamic parameters of the learning program 501 via the parameter receiving program 205, and a learning program 501 from after the dynamic parameter notification until the data access preparation to the storage device 104 is completed. Has a function to wait for the execution of.

＜実施例１の学習システムにおける学習処理＞
以上を有する実施例１の学習システム１Ｓにおける学習処理を行う手順を以降に示す。学習システム１Ｓにおける学習処理には、大きく２つのフェーズがある。ユーザのデータ生成リスト４０１、データオーギュメンテーションプログラム３１２及び元データ３１３をストレージ装置１０４に登録するデータ登録フェーズ、及び、教師データの機械学習を行う学習フェーズである。 <Learning process in the learning system of Example 1>
The procedure for performing the learning process in the learning system 1S of the first embodiment having the above is shown below. The learning process in the learning system 1S has two main phases. It is a data registration phase in which the user's data generation list 401, the data augmentation program 312 and the original data 313 are registered in the storage device 104, and a learning phase in which the teacher data is machine-learned.

＜実施例１のデータ登録フェーズ＞
図７及び図８を用いて、実施例１のデータ登録フェーズの動作を説明する。図７は、実施例１のストレージ装置のデータ登録フェーズにおける論理的構成の一例を示す図であり、データ登録フェーズにおける学習サーバ（主）１０１とストレージ装置１０４の各プログラム及びデータの関係を示す。図８は、実施例１のストレージ装置におけるデータ登録フェーズの処理の一例を示すフローチャートであり、データ登録フェーズにおけるストレージ装置１０４の動作フローを示す。 <Data registration phase of Example 1>
The operation of the data registration phase of the first embodiment will be described with reference to FIGS. 7 and 8. FIG. 7 is a diagram showing an example of a logical configuration in the data registration phase of the storage device of the first embodiment, and shows the relationship between each program and data of the learning server (main) 101 and the storage device 104 in the data registration phase. FIG. 8 is a flowchart showing an example of processing of the data registration phase in the storage device of the first embodiment, and shows an operation flow of the storage device 104 in the data registration phase.

実施例１のデータ登録フェーズの処理において、先ず、学習サーバ（主）１０１は、図７のステップＳ７００で示すように、ストレージ装置１０４の事前に決められたデータ領域（フォルダなど）に、データ生成リストセット３１１に含まれるあるデータ生成リスト４０１と、データオーギュメンテーションプログラム３１２と、元データ３１３とを書き込む。ストレージ装置１０４のデータ領域に書き込まれるデータ生成リスト４０１に記載されている入力データ４１１及びデータオーギュメンテーションプログラム４１２のそれぞれが指し示すデータ及びプログラムが、データ生成リスト４０１とほぼ同時に書き込まれる元データ３１３及びデータオーギュメンテーションプログラム３１２である。 In the process of the data registration phase of the first embodiment, first, as shown in step S700 of FIG. 7, the learning server (main) 101 generates data in a predetermined data area (folder or the like) of the storage device 104. A certain data generation list 401 included in the list set 311, a data augmentation program 312, and original data 313 are written. The data and programs pointed to by the input data 411 and the data augmentation program 412 described in the data generation list 401 written in the data area of the storage device 104 are the original data 313 and the original data 313 written almost simultaneously with the data generation list 401. Data augmentation program 312.

図８に示すように、ステップＳ７０１では、ストレージ装置１０４の事前性能評価プログラム２０３は、データ生成リストセット３１１が更新されるのを監視する。事前性能評価プログラム２０３は、データ生成リストセット３１１に含まれるデータ生成リスト４０１が新規登録もしくは更新されている場合（ステップＳ７０１：ＹＥＳ）、ステップＳ７０２に処理を移し、新規登録も更新もされていない場合（ステップＳ７０１：ＮＯ）、ステップＳ７０１を繰り返す。 As shown in FIG. 8, in step S701, the pre-performance evaluation program 203 of the storage device 104 monitors that the data generation list set 311 is updated. When the data generation list 401 included in the data generation list set 311 is newly registered or updated (step S701: YES), the pre-performance evaluation program 203 shifts the process to step S702, and neither new registration nor update is performed. If (step S701: NO), step S701 is repeated.

続いて、ステップＳ７０２では、事前性能評価プログラム２０３は、新規登録もしくは更新されたデータ生成リスト４０１に対応するデータオーギュメンテーションプログラム３１２及び元データ３１３が格納済みかを確認する。事前性能評価プログラム２０３は、データ生成リスト４０１に対応するデータオーギュメンテーションプログラム３１２及び元データ３１３が格納済みの場合（ステップＳ７０２：ＹＥＳ）、ステップＳ７０３に処理を移し、格納済みでない場合（ステップＳ７０２：ＮＯ）、ステップＳ７０２を繰り返し、格納の完了を待つ。 Subsequently, in step S702, the pre-performance evaluation program 203 confirms whether the data augmentation program 312 and the original data 313 corresponding to the newly registered or updated data generation list 401 have been stored. When the data augmentation program 312 and the original data 313 corresponding to the data generation list 401 are stored in the pre-performance evaluation program 203 (step S702: YES), the process shifts to step S703 and the process is not stored (step S702). : NO), step S702 is repeated, and the completion of storage is waited for.

続いて、ステップＳ７０３では、事前性能評価プログラム２０３は、データ生成処理プログラム２０４に、新規追加もしくは更新が確認されたデータ生成リスト４０１に記載されているデータオーギュメンテーション（データオーギュメンテーションプログラム及びオプションの組合せ）の１つを、例えば、ＣＰＵ１２２の１スレッドで１つの入力データに対して実行させる事前性能評価を行う。 Subsequently, in step S703, the pre-performance evaluation program 203 includes the data augmentation (data augmentation program and options) described in the data generation list 401 whose new addition or update has been confirmed in the data generation processing program 204. For example, a pre-performance evaluation is performed in which one of the above combinations) is executed for one input data by one thread of the CPU 122.

データ生成処理プログラム２０４は、事前性能評価の際、オーギュメンテーションデータを作成するとともに、単位スレッドあたりの性能（スループットなど）を測定する。続いて、ステップＳ７０４では、事前性能評価プログラム２０３は、データ生成処理プログラム２０４による処理が完了した後、性能測定結果をストレージデータ領域にある事前性能評価結果３０２のテーブルに記録する（図５参照）。 The data generation processing program 204 creates augmentation data and measures the performance (throughput, etc.) per unit thread at the time of pre-performance evaluation. Subsequently, in step S704, the pre-performance evaluation program 203 records the performance measurement result in the table of the pre-performance evaluation result 302 in the storage data area after the processing by the data generation processing program 204 is completed (see FIG. 5). ..

続いて、ステップＳ７０５では、事前性能評価プログラム２０３は、データ生成処理プログラム２０４に、生成済みデータバッファ２１１に空き領域があるかを確認させる。事前性能評価プログラム２０３は、生成済みデータバッファ２１１に空き領域がある場合（ステップＳ７０５：ＹＥＳ）、ステップＳ７１２に処理を移し、空き領域がない場合（ステップＳ７０５：ＮＯ）、ステップＳ７０６に処理を移す。 Subsequently, in step S705, the pre-performance evaluation program 203 causes the data generation processing program 204 to confirm whether or not there is a free area in the generated data buffer 211. When the generated data buffer 211 has a free area (step S705: YES), the pre-performance evaluation program 203 moves the process to step S712, and when there is no free area (step S705: NO), moves the process to step S706. ..

ステップＳ７０６では、事前性能評価プログラム２０３は、データ生成処理プログラム２０４に、生成済みデータ退避領域３０３に空き領域があるかを確認させる。事前性能評価プログラム２０３は、生成済みデータ退避領域３０３に空き領域がある場合（ステップＳ７０６：ＹＥＳ）、ステップＳ７１１に処理を移し、空き領域がない場合（ステップＳ７０６：ＮＯ）、ステップＳ７０７に処理を移す。 In step S706, the pre-performance evaluation program 203 causes the data generation processing program 204 to confirm whether or not there is a free area in the generated data save area 303. The pre-performance evaluation program 203 shifts the process to step S711 when there is a free area in the generated data save area 303 (step S706: YES), and performs the process in step S707 when there is no free area (step S706: NO). Move.

ステップＳ７０７では、事前性能評価プログラム２０３は、データ生成処理プログラム２０４に、生成済みデータ退避領域３０３の一部が解放可能かを確認させる。事前性能評価プログラム２０３は、生成済みデータ退避領域３０３の一部が解放可能である場合（ステップＳ７０７：ＹＥＳ）、ステップＳ７１０に処理を移し、解放可能でない場合（ステップＳ７０７：ＮＯ）、ステップＳ７０８に処理を移す。 In step S707, the pre-performance evaluation program 203 causes the data generation processing program 204 to confirm whether a part of the generated data save area 303 can be released. When a part of the generated data save area 303 can be released (step S707: YES), the pre-performance evaluation program 203 shifts the process to step S710, and when it cannot be released (step S707: NO), the pre-performance evaluation program 203 proceeds to step S708. Move the process.

ステップＳ７０８では、事前性能評価プログラム２０３は、データ生成処理プログラム２０４に、事前性能評価の際に作成したオーギュメンテーションデータを破棄させる。 In step S708, the pre-performance evaluation program 203 causes the data generation processing program 204 to discard the augmentation data created during the pre-performance evaluation.

続いて、ステップＳ７０９では、事前性能評価プログラム２０３は、ステップＳ７０１で新規追加もしくは更新が確認されたデータ生成リスト４０１内に、ステップＳ７０３の性能測定を行っていないデータオーギュメンテーションが有るかを判定する。事前性能評価プログラム２０３は、新規追加もしくは更新が確認されたデータ生成リスト４０１内に性能測定を行っていないデータオーギュメンテーションが有る場合（ステップＳ７０９：ＹＥＳ）、ステップＳ７０３に処理を移す。一方、事前性能評価プログラム２０３は、新規追加もしくは更新が確認されたデータ生成リスト４０１内に、性能測定を行っていないデータオーギュメンテーションが無い場合（ステップＳ７０９：ＮＯ）、ステップＳ７０１に処理を移す。 Subsequently, in step S709, the pre-performance evaluation program 203 determines whether or not there is data augmentation in the data generation list 401 for which new addition or update has been confirmed in step S701 for which the performance of step S703 has not been measured. To do. The pre-performance evaluation program 203 shifts the process to step S703 when there is a data augmentation for which performance measurement has not been performed in the data generation list 401 for which new addition or update has been confirmed (step S709: YES). On the other hand, the pre-performance evaluation program 203 shifts the process to step S701 when there is no data augmentation for which performance measurement has not been performed in the data generation list 401 for which new addition or update has been confirmed (step S709: NO). ..

他方、ステップＳ７１０では、事前性能評価プログラム２０３は、生成済みデータ退避領域３０３に空き領域がないことから、データ生成処理プログラム２０４に、生成済みデータ退避領域３０３に格納されたデータのうち保持優先度の低いものを破棄させることで空き領域を確保する。ステップＳ７１０が終了すると、事前性能評価プログラム２０３は、ステップＳ７１１に処理を移す。 On the other hand, in step S710, since the pre-performance evaluation program 203 has no free area in the generated data save area 303, the data generation processing program 204 has a retention priority among the data stored in the generated data save area 303. Free space is secured by discarding the ones with low. When step S710 is completed, the pre-performance evaluation program 203 shifts the process to step S711.

また、ステップＳ７１１では、事前性能評価プログラム２０３は、データ生成処理プログラム２０４に、生成済みデータバッファ２１１に格納されたオーギュメンテーションデータの一部を生成済みデータ退避領域３０３に退避させる。ステップＳ７１１が終了すると、事前性能評価プログラム２０３は、ステップＳ７１２に処理を移す。 Further, in step S711, the pre-performance evaluation program 203 saves a part of the augmentation data stored in the generated data buffer 211 to the generated data saving area 303 in the data generation processing program 204. When step S711 is completed, the pre-performance evaluation program 203 shifts the process to step S712.

また、ステップＳ７１２では、事前性能評価プログラム２０３は、データ生成処理プログラム２０４に、ステップＳ７０３の事前性能評価時に作成したオーギュメンテーションデータを生成済みデータバッファ２１１に格納させる。ステップＳ７１２が終了すると、事前性能評価プログラム２０３は、ステップＳ７０９に処理を移す。 Further, in step S712, the pre-performance evaluation program 203 causes the data generation processing program 204 to store the augmentation data created at the time of the pre-performance evaluation in step S703 in the generated data buffer 211. When step S712 is completed, the pre-performance evaluation program 203 shifts the process to step S709.

ステップＳ７０９から処理を移されたステップＳ７０１では、事前性能評価プログラム２０３は、データ生成リストセット３１１を再確認し、事前性能評価が未実行のデータオーギュメンテーションプログラム３１２が、存在する場合は再度ステップＳ７０３の事前性能評価を行い、存在しない場合はデータ生成リストセット３１１が更新されるのを待つ。以上がデータ登録フェーズの動作である。 In step S701, the process is transferred from step S709, the pre-performance evaluation program 203 reconfirms the data generation list set 311 and steps again if there is a data augmentation program 312 for which pre-performance evaluation has not been executed. The pre-performance evaluation of S703 is performed, and if it does not exist, the data generation list set 311 is waited to be updated. The above is the operation of the data registration phase.

なお、ストレージ装置１０４の事前に決められたデータ領域（フォルダなど）にデータ生成リストセット３１１、データオーギュメンテーションプログラム３１２、及び元データ３１３を書き込むのは、必ずしも学習サーバ（主）１０１である必要はなく、ストレージ装置１０４にアクセスできる機器であれば、何れでもよい。 It is not always the learning server (main) 101 that writes the data generation list set 311 and the data augmentation program 312 and the original data 313 to the predetermined data area (folder or the like) of the storage device 104. Any device can be used as long as it can access the storage device 104.

＜実施例１の学習フェーズ＞
図９、図１０、及び図１１を用いて、実施例１の学習フェーズの動作を示す。学習フェーズは、さらに、データ生成動作フロー及び生成済みデータ領域の解放動作フローを含む。図９は、実施例１のストレージ装置の学習フェーズにおける論理的構成の一例を示す図であり、学習フェーズにおける学習サーバ１０１、１０２とストレージ装置１０４の各プログラム及びデータの関係を示す。図１０は、実施例１のストレージ装置における学習フェーズの処理の一例を示すフローチャートであり、学習フェーズにおけるストレージ装置１０４のデータ生成動作フローを示す。図１１は、実施例１のストレージ装置におけるバッファ領域解放処理の一例を示すフローチャートであり、不要になった生成済みデータ領域の解放動作フローを示す。 <Learning phase of Example 1>
The operation of the learning phase of the first embodiment is shown with reference to FIGS. 9, 10, and 11. The learning phase further includes a data generation operation flow and a generation data area release operation flow. FIG. 9 is a diagram showing an example of a logical configuration in the learning phase of the storage device of the first embodiment, and shows the relationship between each program and data of the learning servers 101 and 102 and the storage device 104 in the learning phase. FIG. 10 is a flowchart showing an example of processing in the learning phase in the storage device of the first embodiment, and shows a data generation operation flow of the storage device 104 in the learning phase. FIG. 11 is a flowchart showing an example of the buffer area release process in the storage device of the first embodiment, and shows the release operation flow of the generated data area that is no longer needed.

学習フェーズは、図９に示すように、学習サーバ（主）１０１が学習プログラム５０１を起動するところから始まる。学習サーバ（主）１０１は、学習プログラム５０１の初期化フェーズを実行し、その中でパラメータ送信ＡＰＩを呼び出す。そして、学習サーバ（主）１０１において、パラメータ送信ＡＰＩにより起動されたパラメータ送信ライブラリ５０２は、学習プログラム５０１の動的パラメータをストレージ装置１０４のパラメータ受信プログラム２０５に送信する。動的パラメータは、既述の通り、例えば学習サーバ（従）１０２の台数や、学習サーバ１０１、１０２における処理のスレッド数、メモリ１１２に一度に読み出す教師データ数、エポック数などである。その後、パラメータ送信ライブラリ５０２は、パラメータ受信プログラム２０５からの応答を待ち、学習プログラム５０１を待機させる。 As shown in FIG. 9, the learning phase starts from the point where the learning server (main) 101 starts the learning program 501. The learning server (main) 101 executes the initialization phase of the learning program 501, and calls the parameter transmission API in the initialization phase. Then, in the learning server (main) 101, the parameter transmission library 502 activated by the parameter transmission API transmits the dynamic parameters of the learning program 501 to the parameter reception program 205 of the storage device 104. As described above, the dynamic parameters include, for example, the number of learning servers (subordinate) 102, the number of processing threads in the learning servers 101 and 102, the number of teacher data read into the memory 112 at one time, the number of epochs, and the like. After that, the parameter transmission library 502 waits for a response from the parameter reception program 205 and causes the learning program 501 to wait.

＜実施例１のデータ生成動作フロー＞
図１０に示すように、先ず、ステップＳ９０１では、データ生成管理プログラム２０１は、パラメータ受信プログラム２０５がパラメータ送信ライブラリ５０２から動的パラメータを受信したかを判定する。データ生成管理プログラム２０１は、パラメータ受信プログラム２０５がパラメータ送信ライブラリ５０２から動的パラメータを受信した場合（ステップＳ９０１：ＹＥＳ）、ステップＳ９０２に処理を移す。一方、データ生成管理プログラム２０１は、パラメータ受信プログラム２０５がパラメータ送信ライブラリ５０２から動的パラメータを受信していない場合（ステップＳ９０１：ＮＯ）、ステップＳ９０１を繰り返す。 <Data generation operation flow of Example 1>
As shown in FIG. 10, first, in step S901, the data generation management program 201 determines whether the parameter receiving program 205 has received the dynamic parameters from the parameter transmitting library 502. When the parameter receiving program 205 receives the dynamic parameter from the parameter transmitting library 502 (step S901: YES), the data generation management program 201 shifts the process to step S902. On the other hand, the data generation management program 201 repeats step S901 when the parameter receiving program 205 has not received the dynamic parameters from the parameter transmitting library 502 (step S901: NO).

ステップＳ９０２では、データ生成管理プログラム２０１は、パラメータ受信プログラム２０５に、パラメータ送信ライブラリ５０２から受信した動的パラメータを、各プログラムから参照可能に保持させる。パラメータ受信プログラム２０５は、動的パラメータを各プログラムが利用可能な状態に保持するとともに、データ生成管理プログラム２０１とバッファサイズ抑制プログラム２０２に学習フェーズ開始を通知する。 In step S902, the data generation management program 201 causes the parameter receiving program 205 to hold the dynamic parameters received from the parameter transmission library 502 so that they can be referred to by each program. The parameter receiving program 205 keeps the dynamic parameters available to each program, and notifies the data generation management program 201 and the buffer size suppression program 202 of the start of the learning phase.

続いて、ステップＳ９０３では、データ生成管理プログラム２０１は、パラメータ受信プログラム２０５から学習フェーズ開始の通知を受け、学習フェーズで読み出されるオーギュメンテーションデータに対応したデータ生成リスト４０１を読み出し、事前性能評価時に生成したオーギュメンテーションデータが生成済みデータ退避領域３０３に保持されているかを確認する。データ生成管理プログラム２０１は、事前性能評価時に生成したオーギュメンテーションデータが生成済みデータ退避領域３０３に保持されている場合（ステップＳ９０３：ＹＥＳ）、ステップＳ９０４に処理を移し、保持されていない場合（ステップＳ９０３：ＮＯ）、ステップＳ９０５に処理を移す。 Subsequently, in step S903, the data generation management program 201 receives the notification of the start of the learning phase from the parameter receiving program 205, reads the data generation list 401 corresponding to the augmentation data read in the learning phase, and at the time of pre-performance evaluation. It is confirmed whether the generated augmentation data is held in the generated data save area 303. The data generation management program 201 shifts the process to step S904 when the augmentation data generated at the time of pre-performance evaluation is held in the generated data save area 303 (step S903: YES) and is not held (step S903: YES). Step S903: NO), the process is transferred to step S905.

ステップＳ９０４では、データ生成管理プログラム２０１は、ステップＳ９０３で生成済みデータ退避領域３０３に保持されているとされたオーギュメンテーションデータを、生成済みデータ退避領域３０３から生成済みデータバッファ２１１に読み出す。 In step S904, the data generation management program 201 reads the augmentation data held in the generated data save area 303 in step S903 from the generated data save area 303 into the generated data buffer 211.

続いて、ステップＳ９０５では、データ生成管理プログラム２０１は、データ生成処理プログラム２０４に、１回目の学習時に使用する初回データを生成させ、生成した初回データをもとにファイルエミュレーションプログラム２０７に応答準備をさせる。なお、この初回データは、学習サーバ１０１、１０２の台数、スレッド数、メモリ１１２に一度に読み出す教師データ数をもとに、そのサイズとファイル数が決定され、決定されたサイズで必要データ数分だけ生成される。また、この初回データに該当するデータのうち、ステップＳ９０４で生成済みデータバッファ２１１に読み出されたデータや、事前性能評価時に生成され生成済みデータバッファ２１１に残存しているデータは、再利用される。 Subsequently, in step S905, the data generation management program 201 causes the data generation processing program 204 to generate the initial data to be used in the first learning, and the file emulation program 207 prepares for a response based on the generated initial data. Let me. The size and number of files of this initial data are determined based on the number of learning servers 101 and 102, the number of threads, and the number of teacher data read into the memory 112 at one time, and the determined size is the required number of data. Is only generated. Further, among the data corresponding to the initial data, the data read into the generated data buffer 211 in step S904 and the data generated at the time of the preliminary performance evaluation and remaining in the generated data buffer 211 are reused. To.

続いて、ステップＳ９０６では、データ生成管理プログラム２０１は、データ生成処理プログラム２０４の応答準備が完了すると、パラメータ受信プログラム２０５に応答準備完了を通知し、応答準備完了をパラメータ送信ライブラリ５０２へ応答させる。その後、パラメータ送信ライブラリ５０２は、パラメータ受信プログラム２０５から応答準備完了を受信すると、待機させていた学習プログラム５０１を再開させ、学習サーバ１０１、１０２からストレージ装置１０４へのデータアクセスを開始させる。 Subsequently, in step S906, when the response preparation of the data generation processing program 204 is completed, the data generation management program 201 notifies the parameter receiving program 205 of the response preparation completion, and causes the parameter transmission library 502 to respond to the response preparation completion. After that, when the parameter transmission library 502 receives the response preparation completion from the parameter reception program 205, the learning program 501 that has been kept on standby is restarted, and data access from the learning servers 101 and 102 to the storage device 104 is started.

続いて、ステップＳ９０７では、データ生成管理プログラム２０１は、学習サーバ１０１、１０２からストレージ装置１０４へのデータアクセス（ファイルオープン）が発生したかを判定する。ここで、ＩＯ観測プログラム２０６は、学習サーバ１０１、１０２からストレージ装置１０４へのデータアクセスと、ファイルエミュレーションプログラム２０７による生成済みデータバッファ２１１に格納されているオーギュメンテーションデータを使用した学習サーバ１０１、１０２へのアクセス対象のデータ返却とを監視している。ＩＯ観測プログラム２０６は、このデータアクセス及びデータ返却をデータ生成管理プログラム２０１に通知する。また、同時に、ＩＯ観測プログラム２０６は、学習サーバ１０１、１０２がオーギュメンテーションデータを読み出す速度と、次データへのアクセス間隔についても監視する。 Subsequently, in step S907, the data generation management program 201 determines whether data access (file open) from the learning servers 101 and 102 to the storage device 104 has occurred. Here, the IO observation program 206 uses the learning server 101, which uses the data access from the learning servers 101 and 102 to the storage device 104 and the augmentation data stored in the generated data buffer 211 by the file emulation program 207. It monitors the return of data to be accessed to 102. The IO observation program 206 notifies the data generation management program 201 of this data access and data return. At the same time, the IO observation program 206 also monitors the speed at which the learning servers 101 and 102 read the augmentation data and the access interval to the next data.

データ生成管理プログラム２０１は、学習サーバ１０１、１０２からストレージ装置１０４へのデータアクセス（ファイルオープン）が発生した場合（ステップＳ９０７：ＹＥＳ）、ステップＳ９０８に処理を移す。一方、データ生成管理プログラム２０１は、学習サーバ１０１、１０２からストレージ装置１０４へのデータアクセス（ファイルオープン）が発生していない場合（ステップＳ９０７：ＮＯ）、ステップＳ９０７を繰り返す。 When the data access (file open) from the learning servers 101 and 102 to the storage device 104 occurs (step S907: YES), the data generation management program 201 shifts the process to step S908. On the other hand, the data generation management program 201 repeats step S907 when data access (file open) from the learning servers 101 and 102 to the storage device 104 has not occurred (step S907: NO).

ステップＳ９０８では、データ生成管理プログラム２０１は、ステップＳ９０７でのデータアクセスに続く後続データアクセスが発生するかを判定する。ここで、データ生成管理プログラム２０１は、ステップＳ９０７でＩＯ観測プログラム２０６から通知されたデータアクセスの後続データが存在する場合、もしくは、後続データは存在しないがエポック回数分のデータ読み出しが繰り返されていない場合、後続データアクセスが発生すると判定する。後続データとは、データ生成リスト４０１において現在読み出されているオーギュメンテーションデータに該当する行の次の行に該当するデータであり、後続する加工データの要求が見込まれと判断できるデータである。 In step S908, the data generation management program 201 determines whether subsequent data access following the data access in step S907 occurs. Here, the data generation management program 201 does not repeat the data reading for the number of epochs when the succeeding data of the data access notified from the IO observation program 206 in step S907 exists, or there is no succeeding data. If so, it is determined that subsequent data access will occur. The succeeding data is data corresponding to the line next to the line corresponding to the augmentation data currently read in the data generation list 401, and is data that can be determined to be expected to request the subsequent machining data. ..

データ生成管理プログラム２０１は、後続データアクセスが発生する場合（ステップＳ９０８：ＹＥＳ）、ステップＳ９０９に処理を移し、後続データアクセスが発生しない場合（ステップＳ９０８：ＮＯ）、ステップＳ９０１に処理を戻す。データ生成管理プログラム２０１は、後続データアクセスが発生しないと判定した場合、アクセス中の学習データを使用した学習が終了したと判定し、次のパラメータ送信ＡＰＩの起動を待つ。 When the subsequent data access occurs (step S908: YES), the data generation management program 201 shifts the process to step S909, and when the subsequent data access does not occur (step S908: NO), returns the process to step S901. When the data generation management program 201 determines that subsequent data access does not occur, it determines that learning using the learning data being accessed has been completed, and waits for the activation of the next parameter transmission API.

ステップＳ９０９では、データ生成管理プログラム２０１は、生成済みデータバッファ２１１及び生成済みデータ退避領域３０３の何れかに後続データアクセスに該当する必要データがあるかに応じて後続データの生成の要否を判定する。データ生成管理プログラム２０１は、生成済みデータバッファ２１１及び生成済みデータ退避領域３０３の何れにも後続データアクセスに該当する必要データが無い場合に後続データの生成が必要と判定し（ステップＳ９０９：ＹＥＳ）、ステップＳ９１０に処理を移す。一方、データ生成管理プログラム２０１は、生成済みデータバッファ２１１及び生成済みデータ退避領域３０３の何れかに後続データアクセスに該当する必要データが有る場合に後続データの生成が不要と判定し（ステップＳ９０９：ＮＯ）、ステップＳ９０７に処理を移し、データアクセスの発生を待つ。 In step S909, the data generation management program 201 determines whether or not to generate the subsequent data depending on whether the generated data buffer 211 or the generated data save area 303 has necessary data corresponding to the subsequent data access. To do. The data generation management program 201 determines that the subsequent data needs to be generated when neither the generated data buffer 211 nor the generated data save area 303 has the necessary data corresponding to the subsequent data access (step S909: YES). , The process is transferred to step S910. On the other hand, the data generation management program 201 determines that the generation of the subsequent data is unnecessary when there is necessary data corresponding to the subsequent data access in any of the generated data buffer 211 and the generated data save area 303 (step S909: NO), the process is transferred to step S907, and the occurrence of data access is awaited.

ステップＳ９１０では、データ生成管理プログラム２０１は、後続データの生成が必要であるので、パラメータ受信プログラム２０５が保持する動的パラメータと、ＩＯ観測プログラム２０６が観測しているデータアクセス状況とをもとに、学習サーバ１０１、１０２が学習データを読み出すスループットを計算する。この計算結果と、事前性能評価結果３０２に格納された、後続データに対するデータオーギュメンテーションプログラム３１２の単位処理性能（単位スレッド性能）をもとに、データ生成スループットが学習データの読み出しスループットを上回るようにデータ生成用リソース（例えばＣＰＵ１２２のスレッド数）を調整し決定する。 In step S910, since the data generation management program 201 needs to generate subsequent data, it is based on the dynamic parameters held by the parameter receiving program 205 and the data access status observed by the IO observation program 206. , The learning servers 101 and 102 calculate the throughput for reading the learning data. Based on this calculation result and the unit processing performance (unit thread performance) of the data augmentation program 312 for the succeeding data stored in the pre-performance evaluation result 302, the data generation throughput exceeds the read throughput of the training data. The data generation resource (for example, the number of threads of the CPU 122) is adjusted and determined.

続いて、ステップＳ９１１では、データ生成管理プログラム２０１は、データ生成処理プログラム２０４に、ステップＳ９０９で生成が必要と判定された後続データアクセスのデータ生成処理を、ステップＳ９１０で調整したデータ生成用リソースで実行させる。データ生成処理プログラム２０４は、後続データアクセスで使用されるオーギュメンテーションデータを生成し、生成済みデータバッファ２１１に格納する。ステップＳ９１１が終了すると、データ生成管理プログラム２０１は、ステップＳ９０７に処理を移し、学習サーバ１０１、１０２からのデータアクセスを待つ。 Subsequently, in step S911, the data generation management program 201 tells the data generation processing program 204 that the data generation process of the subsequent data access determined to be required to be generated in step S909 is adjusted by the data generation resource adjusted in step S910. Let it run. The data generation processing program 204 generates augmentation data to be used in subsequent data access and stores it in the generated data buffer 211. When step S911 is completed, the data generation management program 201 shifts the process to step S907 and waits for data access from the learning servers 101 and 102.

＜実施例１の生成済みデータ領域の解放動作フロー＞
図１１に示すステップＳ１００１及びステップＳ１００２では、バッファサイズ抑制プログラム２０２は、図１０に示すデータ生成管理プログラム２０１が行うステップＳ９０１及びステップＳ９０２と同様の処理を実行する。 <Release operation flow of the generated data area of Example 1>
In steps S1001 and S1002 shown in FIG. 11, the buffer size suppression program 202 executes the same processing as steps S901 and S902 performed by the data generation management program 201 shown in FIG.

続いて、ステップＳ１００３では、バッファサイズ抑制プログラム２０２は、パラメータ受信プログラム２０５から、パラメータ送信ＡＰＩを受信及び保持の通知を受け、ＩＯ観測プログラム２０６による学習サーバ１０１、１０２からの生成済みデータバッファ２１１の生成済みデータに対するファイルクローズ（アクセス終了）の検出通知を待ち、アクセス終了が発生したかを判定する。バッファサイズ抑制プログラム２０２は、ファイルクローズ発生の場合（ステップＳ１００３：ＹＥＳ）、ステップＳ１００４に処理を移し、ファイルクローズ発生でない場合（ステップＳ１００３：ＮＯ）、ステップＳ１００３を繰り返す。 Subsequently, in step S1003, the buffer size suppression program 202 receives notification from the parameter receiving program 205 that it receives and holds the parameter transmission API, and the generated data buffer 211 from the learning servers 101 and 102 by the IO observation program 206. It waits for the detection notification of file close (access end) for the generated data, and determines whether the access end has occurred. The buffer size suppression program 202 shifts the process to step S1004 when the file close occurs (step S1003: YES), and repeats step S1003 when the file close does not occur (step S1003: NO).

ステップＳ１００４では、バッファサイズ抑制プログラム２０２は、ＩＯ観測プログラム２０６からのファイルクローズの検出通知を、生成済みのオーギュメンテーションデータ毎にカウントし、ファイルクローズ回数が同時アクセス数、すなわち学習サーバ１０１、１０２の数と等しくなったかを判定する。バッファサイズ抑制プログラム２０２は、ファイルクローズ回数が同時アクセスと等しくなった場合（ステップＳ１００４：ＹＥＳ）、ステップＳ１００５に処理を移し、ファイルクローズ回数が同時アクセス数未満の場合（ステップＳ１００４：ＮＯ）、ステップＳ１００３に処理を戻す。 In step S1004, the buffer size suppression program 202 counts the file close detection notification from the IO observation program 206 for each generated augmentation data, and the number of file closes is the number of simultaneous accesses, that is, the learning servers 101 and 102. Determine if it is equal to the number of. When the number of file closes becomes equal to the number of simultaneous accesses (step S1004: YES), the buffer size suppression program 202 shifts the process to step S1005, and when the number of file closes is less than the number of simultaneous accesses (step S1004: NO), the step The process is returned to S1003.

ステップＳ１００５では、バッファサイズ抑制プログラム２０２は、データ生成処理プログラム２０４に、ステップＳ１００４でファイルクローズ回数が同時アクセス数と等しくなったという解放条件を満たした生成済みデータが格納される生成済みデータバッファ２１１及び生成済みデータ退避領域３０３のバッファ領域を解放させる。ファイルクローズ回数が、学習サーバ１０１、１０２の数と等しくなった生成済みデータは、学習サーバ１０１、１０２からのアクセスが終了したとみなすことができる。 In step S1005, the buffer size suppression program 202 stores the generated data buffer 211 in which the data generation processing program 204 stores the generated data satisfying the release condition that the number of file closes becomes equal to the number of simultaneous accesses in step S1004. And the buffer area of the generated data save area 303 is released. The generated data in which the number of file closes is equal to the number of the learning servers 101 and 102 can be regarded as the end of access from the learning servers 101 and 102.

続いて、ステップＳ１００６では、バッファサイズ抑制プログラム２０２は、データ生成リストセット３１１で使用している全てのバッファ領域が解放されたかを判定する。すなわち、バッファサイズ抑制プログラム２０２は、データ生成処理プログラム２０４を介して、生成済みデータが格納される生成済みデータバッファ２１１及び生成済みデータ退避領域３０３のバッファ領域に、データ生成リストセット３１１に含まれる現在学習対象のオーギュメンテーションデータが記載されているデータ生成リスト４０１の使用中領域が残っているかを判定する。 Subsequently, in step S1006, the buffer size suppression program 202 determines whether all the buffer areas used in the data generation list set 311 have been released. That is, the buffer size suppression program 202 is included in the data generation list set 311 in the buffer area of the generated data buffer 211 and the generated data save area 303 in which the generated data is stored via the data generation processing program 204. It is determined whether or not the used area of the data generation list 401 in which the augmentation data to be learned is currently described remains.

バッファサイズ抑制プログラム２０２は、データ生成リストセット３１１で使用している全てのバッファ領域が解放された場合（ステップＳ１００６：ＹＥＳ）、ステップＳ１００１に処理を移し、全てのバッファ領域が解放されていない場合（ステップＳ１００６：ＮＯ）、ステップＳ１００３に処理を移す。バッファサイズ抑制プログラム２０２は、ステップＳ１００３〜ステップＳ１００６の処理を、データ生成リストセット３１１に含まれるデータ生成リスト４０１で使用している全ての生成済みデータバッファ２１１及び生成済みデータ退避領域３０３が解放されるまで繰り返し実行する。 When all the buffer areas used in the data generation list set 311 are released (step S1006: YES), the buffer size suppression program 202 shifts the process to step S1001 and when all the buffer areas are not released. (Step S1006: NO), the process is transferred to step S1003. The buffer size suppression program 202 releases all the generated data buffers 211 and the generated data save area 303 used in the data generation list 401 included in the data generation list set 311 for the processes of steps S1003 to S1006. Repeat until.

以上の実施例１では、ストレージ装置１０４は、学習システム１Ｓで用いられる学習データのデータ準備処理（画像データの加工による教師データの水増し）をストレージ装置１０４側で行う。つまり、ストレージ装置１０４は、学習サーバ１０１、１０２からデータ要求を受け付けてからデータ準備処理を行う。 In the above-described first embodiment, the storage device 104 performs data preparation processing (inflating of teacher data by processing image data) of learning data used in the learning system 1S on the storage device 104 side. That is, the storage device 104 performs the data preparation process after receiving the data request from the learning servers 101 and 102.

ここで、データ準備処理で生成する教師データのファイルサイズが大容量となる場合、例えば学習サーバ１０１、１０２のメモリ上に展開し切れない大容量の教師データをパッキングして中間データとして取り扱う場合に、ストレージ装置１０４がデータ要求を受け付けてからファイル生成して送信開始するまでの時間（レイテンシ）が長くなり、学習サーバ側がタイムアウトするなどの問題が発生しうる。 Here, when the file size of the teacher data generated in the data preparation process becomes large, for example, when packing a large amount of teacher data that cannot be expanded on the memories of the learning servers 101 and 102 and handling it as intermediate data. , The time (latency) from when the storage device 104 receives the data request to when the file is generated and the transmission is started becomes long, which may cause a problem that the learning server side times out.

これに対し、実施例１では、ストレージ装置１０４は、データ要求を受け付ける前に、事前に送信用データを用意し、学習サーバの要求に即したバッファサイズ及びバッファ領域の解放可否を判別し、バッファサイズが必要最低限になるよう最適管理する。 On the other hand, in the first embodiment, the storage device 104 prepares transmission data in advance before accepting the data request, determines the buffer size and whether or not the buffer area can be released according to the request of the learning server, and buffers the data. Optimal management to minimize the size.

すなわち、実施例１では、ストレージ装置１０４は、学習サーバ１０１、１０２へのデータ供給の速度に合わせてデータ生成を制御し、また、複数の学習サーバ１０１、１０２からのデータアクセスを監視して不要となったバッファ領域を解放することで、ストレージ装置によるデータ供給とバッファ破棄を最適に制御する。 That is, in the first embodiment, the storage device 104 controls data generation according to the speed of data supply to the learning servers 101 and 102, and monitors data access from the plurality of learning servers 101 and 102, which is unnecessary. By releasing the buffer area that has become, the data supply and buffer discard by the storage device are optimally controlled.

このように、ストレージ装置１０４は、データ生成動作フロー及び生成済みデータ領域の解放動作フローの２つの動作フローで、学習時のアクセス開始前における動的なオーギュメンテーションデータ生成と、アクセス終了後におけるオーギュメンテーションデータの格納バッファ領域の解放とを並行して行う。これにより、必要最小限のストレージリソースで学習データを準備し、ファイルオープン時には遅延を起こすことなくオーギュメンテーションデータを学習処理に使用することが可能となる。 In this way, the storage device 104 has two operation flows, a data generation operation flow and a generated data area release operation flow, in which dynamic augmentation data generation before the start of access during learning and after the end of access are performed. Release the storage buffer area for augmentation data in parallel. This makes it possible to prepare training data with the minimum required storage resources and use the augmentation data for training processing without causing a delay when opening a file.

本発明の実施例２について、図１２〜図１０を用いて説明する。実施例２では、実施例１と比較して、ストレージ装置におけるオーギュメンテーションデータの生成の際に、ＣＰＵに加えてアクセラレータを用いる点が異なる。 Example 2 of the present invention will be described with reference to FIGS. 12 to 10. The second embodiment is different from the first embodiment in that an accelerator is used in addition to the CPU when generating augmentation data in the storage device.

図１２を用いて、本発明の前提となるシステムのハードウェアの論理的構造を説明する。図１２は、実施例２のストレージ装置を含む学習システムのハードウェアの論理的構成の一例を示す図である。実施例２のストレージ装置１１０４を含む学習システム２Ｓは、実施例１のストレージ装置１０４を含む学習システム１Ｓと比較して、ストレージ装置１１０４がアクセラレータ１１２６をさらに有する点が異なる。その他については、実施例２の学習システム２Ｓは、実施例１の学習システム１Ｓと同様であるので、説明を省略する。 The logical structure of the hardware of the system which is the premise of the present invention will be described with reference to FIG. FIG. 12 is a diagram showing an example of the logical configuration of the hardware of the learning system including the storage device of the second embodiment. The learning system 2S including the storage device 1104 of the second embodiment is different from the learning system 1S including the storage device 104 of the first embodiment in that the storage device 1104 further has an accelerator 1126. Other than that, the learning system 2S of the second embodiment is the same as the learning system 1S of the first embodiment, and thus the description thereof will be omitted.

アクセラレータ１１２６は、データオーギュメンテーションに関わる処理をＣＰＵ１２２の代わりに、もしくは連携して高速で実行するものである。アクセラレータ１１２６は、ＡＳＩＣ（Application Specific Integrated Circuit）やＧＰＵ（Graphics Processing Unit）、ＦＰＧＡ（Field Programmable Gate Array）などを用いて、構成することができる。 The accelerator 1126 executes processing related to data augmentation at high speed instead of or in cooperation with the CPU 122. The accelerator 1126 can be configured by using an ASIC (Application Specific Integrated Circuit), a GPU (Graphics Processing Unit), an FPGA (Field Programmable Gate Array), or the like.

＜実施例２のストレージ装置のメモリに格納されるプログラムとデータ＞
図１３を用いて、ストレージ装置１１０４のメモリ１２４に格納されるプログラム及び情報を示す。図１３は、実施例２のストレージ装置のメモリに格納されるプログラムとデータの一例を示す図である。 <Program and data stored in the memory of the storage device of Example 2>
FIG. 13 shows programs and information stored in the memory 124 of the storage device 1104. FIG. 13 is a diagram showing an example of a program and data stored in the memory of the storage device of the second embodiment.

実施例２のストレージ装置１１０４は、実施例１のストレージ装置１０４と比較して、メモリ１２４に、データ生成処理プログラム２０４に代えてアクセラレータ対応データ生成処理プログラム１２０４が格納される点が異なる。実施例２のストレージ装置１１０４のメモリ１２４に格納されるその他のプログラムとデータは、実施例１のストレージ装置１０４のメモリ１２４に格納されるプログラムとデータと同様であるので、説明を省略する。 The storage device 1104 of the second embodiment is different from the storage device 104 of the first embodiment in that the accelerator-compatible data generation processing program 1204 is stored in the memory 124 instead of the data generation processing program 204. The other programs and data stored in the memory 124 of the storage device 1104 of the second embodiment are the same as the programs and data stored in the memory 124 of the storage device 104 of the first embodiment, and thus the description thereof will be omitted.

アクセラレータ対応データ生成処理プログラム１２０４は、データ生成管理プログラム２０１から起動され、実際のデータオーギュメンテーションを行う。アクセラレータ対応データ生成処理プログラム１２０４は、ＣＰＵ１２２に加えアクセラレータ１１２６も使用して処理を行う。 The accelerator-compatible data generation processing program 1204 is started from the data generation management program 201 and performs actual data augmentation. The accelerator-corresponding data generation processing program 1204 uses the accelerator 1126 in addition to the CPU 122 to perform processing.

＜実施例２のストレージ装置のストレージデータ領域に格納されるデータ＞
図１４を用いて、ストレージ装置１１０４の記憶領域に格納される情報を示す。図１４は、実施例２のストレージ装置のストレージデータ領域に格納されるデータの一例を示す図である。 <Data stored in the storage data area of the storage device of the second embodiment>
FIG. 14 shows information stored in the storage area of the storage device 1104. FIG. 14 is a diagram showing an example of data stored in the storage data area of the storage device of the second embodiment.

実施例２のストレージデータ領域１３０１は、ストレージ装置１０４が内蔵するドライブ１２５上に論理的に構成された記憶領域である。実施例２のストレージデータ領域１３０１は、実施例１のストレージデータ領域３０１と比較して、事前性能評価結果３０２に代えて事前性能評価結果１３０２格納され、アクセラレータ用プログラム１３１４がさらに格納される点が異なる。実施例２のストレージデータ領域１３０１に格納されるその他のデータは、実施例１のストレージデータ領域に３０１に格納されるデータと同様であるので、説明を省略する。 The storage data area 1301 of the second embodiment is a storage area logically configured on the drive 125 included in the storage device 104. Compared with the storage data area 301 of the first embodiment, the storage data area 1301 of the second embodiment stores the pre-performance evaluation result 1302 instead of the pre-performance evaluation result 302, and further stores the accelerator program 1314. different. The other data stored in the storage data area 1301 of the second embodiment is the same as the data stored in the 301 in the storage data area of the first embodiment, and thus the description thereof will be omitted.

事前性能評価結果１３０２と生成済みデータ退避領域３０３は、ストレージデータ領域１３０１に格納されるもののユーザには公開されないデータである。一方、データ生成リストセット３１１、データオーギュメンテーションプログラム３１２、アクセラレータ用プログラム１３１４、元データ３１３については、ユーザによってストレージデータ領域１３０１に格納されるデータである。このストレージデータ領域１３０１には、上記データ以外のデータも格納されてもよい。 The pre-performance evaluation result 1302 and the generated data save area 303 are data stored in the storage data area 1301 but not disclosed to the user. On the other hand, the data generation list set 311, the data augmentation program 312, the accelerator program 1314, and the original data 313 are data stored in the storage data area 1301 by the user. Data other than the above data may be stored in the storage data area 1301.

データ生成リストセット３１１は、ユーザが元データ３１３に対して行いたいデータオーギュメンテーションと、利用するアクセラレータ１１２６、データオーギュメンテーションの結果として学習サーバ１０１、１０２が学習に使用することとなるデータの関係づけを記載するリストの集合である。実施例１では、データ生成リストセット３１１には、データ生成リスト４０１が含まれるとしたが、実施例２では、データ生成リスト４０１に代えて拡張データ生成リスト１４０１が含まれる。 The data generation list set 311 includes data augmentation that the user wants to perform on the original data 313, accelerators 1126 to be used, and data that the learning servers 101 and 102 will use for learning as a result of the data augmentation. A set of lists that describe the relationships. In the first embodiment, the data generation list set 311 includes the data generation list 401, but in the second embodiment, the extended data generation list 1401 is included instead of the data generation list 401.

＜実施例１の拡張データ生成リスト＞
図１５は、実施例２のデータ生成リストセットに含まれるデータ生成リストの一例を示す図である。なお、データ生成リストセット３１１に含まれる複数の拡張データ生成リスト１４０１のそれぞれには、データ生成リストセット３１１内でユニークなＩＤが付与されて識別される。 <Extended data generation list of Example 1>
FIG. 15 is a diagram showing an example of a data generation list included in the data generation list set of the second embodiment. Each of the plurality of extended data generation lists 1401 included in the data generation list set 311 is assigned a unique ID in the data generation list set 311 and identified.

拡張データ生成リスト１４０１は、実施例１のデータ生成リスト４０１と比較して、アクセラレータ１４１１と、アクセラレータプログラム１４１２との２つの列をさらに含む点が異なる。拡張データ生成リスト１４０１は、その他の点では、実施例１のデータ生成リスト４０１と同様であるので、説明を省略する。 The extended data generation list 1401 is different from the data generation list 401 of the first embodiment in that it further includes two columns, an accelerator 1411 and an accelerator program 1412. Since the extended data generation list 1401 is the same as the data generation list 401 of the first embodiment in other respects, the description thereof will be omitted.

拡張データ生成リスト１４０１は、複数行からなり、１組の入力データ４１１とデータオーギュメンテーションプログラム４１２とオプション４１３とアクセラレータ１４１１とアクセラレータプログラム１４１２と出力データ４１４との対応が１行に記載される。１行が１つの出力ファイルに対応する。 The extended data generation list 1401 is composed of a plurality of lines, and the correspondence between a set of input data 411, a data augmentation program 412, an option 413, an accelerator 1411, an accelerator program 1412, and an output data 414 is described in one line. One line corresponds to one output file.

拡張データ生成リスト１４０１では、入力データ４１１に対してデータオーギュメンテーションで使用するアクセラレータ１１２６の種類がアクセラレータ１４１１に、アクセラレータ１１２６が実行するプログラム名がアクセラレータプログラム１４１２に記載される。アクセラレータプログラムには、オプションが付与されていてもよい。 In the extended data generation list 1401, the type of the accelerator 1126 used in the data augmentation for the input data 411 is described in the accelerator 1411, and the program name executed by the accelerator 1126 is described in the accelerator program 1412. Options may be added to the accelerator program.

なお、アクセラレータ１４１１に、アクセラレータ１１２６を使用しないことを示すキーワードとして、ＮＯＮＥと記載することもできる。また、ストレージ装置１１０４に複数のアクセラレータ１１２６が搭載されている場合には、アクセラレータ１１２６毎にアクセラレータ１４１１及びアクセラレータプログラム１４１２の各列を設け、アクセラレータ１１２６毎に、アクセラレータ１１２６の使用又は不使用と、アクセラレータ１１２６が実行するプログラム名とが記載されてもよい。 In addition, NONE may be described as a keyword indicating that the accelerator 1126 is not used in the accelerator 1411. Further, when a plurality of accelerators 1126 are mounted on the storage device 1104, each row of the accelerator 1411 and the accelerator program 1412 is provided for each accelerator 1126, and the accelerator 1126 is used or not used for each accelerator 1126. The name of the program executed by 1126 may be described.

＜実施例２の事前性能評価結果を記録するテーブル＞
図１６を参照して、事前性能評価結果１３０２を記録するテーブルについて説明する。図１６は、実施例２の事前性能評価結果を記録するテーブルの一例を示す図である。 <Table for recording the preliminary performance evaluation results of Example 2>
A table for recording the pre-performance evaluation result 1302 will be described with reference to FIG. FIG. 16 is a diagram showing an example of a table for recording the preliminary performance evaluation results of Example 2.

事前性能評価結果１３０２のテーブルは、実施例１の事前性能評価結果３０２のテーブルと比較して、アクセラレータ１３２１と、アクセラレータプログラム１３２２との２つの列をさらに含む点が異なる。事前性能評価結果１３０２のテーブルは、その他の点では、実施例１の事前性能評価結果３０２のテーブルと同様であるので、説明を省略する。 The table of the pre-performance evaluation result 1302 is different from the table of the pre-performance evaluation result 302 of the first embodiment in that it further includes two columns of the accelerator 1321 and the accelerator program 1322. Since the table of the pre-performance evaluation result 1302 is the same as the table of the pre-performance evaluation result 302 of the first embodiment in other respects, the description thereof will be omitted.

事前性能評価結果１３０２のテーブルは、複数行からなり、１組のデータオーギュメンテーションプログラム３２１とオプション３２２とアクセラレータ１３２１とアクセラレータプログラム１３２２と単位処理性能３２３との対応が１行に記載される。 The table of the pre-performance evaluation result 1302 is composed of a plurality of rows, and the correspondence between one set of data augmentation program 321 and option 322, the accelerator 1321, the accelerator program 1322, and the unit processing performance 323 is described in one row.

データオーギュメンテーションプログラム３２１とオプション３２２とアクセラレータ１３２１とアクセラレータプログラム１３２２との組合せは、拡張データ生成リスト１４０１に記載されているデータオーギュメンテーションプログラム４１２とオプション４１３とアクセラレータ１４１１とアクセラレータプログラム１４１２との組合せに対応する。すなわち、事前性能評価プログラム２０３は、事前性能評価結果１３０２のテーブルの各行に記載のデータオーギュメンテーションプログラム４１２とオプション４１３とアクセラレータ１４１１とアクセラレータプログラム１４１２との全ての組合せから重複排除した組合せを、該当する加工対象の入力データ４１１のうちの１つに対して実行し、単位スレッドあたりの性能を測定する。そして、事前性能評価プログラム２０３は、データオーギュメンテーションプログラム３２１とオプション３２２とアクセラレータ１３２１とアクセラレータプログラム１３２２との組合せを実行して測定した単位スレッドあたりの性能を、単位処理性能３２３に記録する。 The combination of the data augmentation program 321 and the option 322, the accelerator 1321 and the accelerator program 1322 is a combination of the data augmentation program 412, the option 413, the accelerator 1411 and the accelerator program 1412 described in the extended data generation list 1401. Corresponds to. That is, the pre-performance evaluation program 203 corresponds to a combination that is deduplicated from all combinations of the data augmentation program 412, the option 413, the accelerator 1411, and the accelerator program 1412 described in each row of the table of the pre-performance evaluation result 1302. It is executed for one of the input data 411 to be processed and the performance per unit thread is measured. Then, the pre-performance evaluation program 203 records the performance per unit thread measured by executing the combination of the data augmentation program 321 and the option 322, the accelerator 1321 and the accelerator program 1322 in the unit processing performance 323.

＜実施例２の学習システムにおける学習処理＞
以上を有する実施例２の学習システム２Ｓにおける学習処理を行う手順を以降に示す。学習システム２Ｓにおける学習処理は、実施例１の学習システム１Ｓにおける学習処理と同様に、データ登録フェーズ及び学習フェーズがある。以下、実施例１との差分についてのみ説明する。 <Learning process in the learning system of Example 2>
The procedure for performing the learning process in the learning system 2S of the second embodiment having the above is shown below. The learning process in the learning system 2S has a data registration phase and a learning phase, similarly to the learning process in the learning system 1S of the first embodiment. Hereinafter, only the difference from the first embodiment will be described.

＜実施例２のデータ登録フェーズ＞
図１７を用いて、実施例２のデータ登録フェーズの動作を説明する。図１７は、実施例２のストレージ装置のデータ登録フェーズにおける論理的構成の一例を示す図であり、データ登録フェーズにおける学習サーバ（主）１０１とストレージ装置１１０４の各プログラム及びデータの関係を示す。なお、実施例２のストレージ装置におけるデータ登録フェーズの処理フローの説明は、図８を援用し、実施例１のストレージ装置１０４、データ生成処理プログラム２０４、及びデータ生成リスト４０１を、実施例２のストレージ装置１１０４、アクセラレータ対応データ生成処理プログラム１２０４、及び拡張データ生成リスト１４０１に読み替えて説明する。 <Data registration phase of Example 2>
The operation of the data registration phase of the second embodiment will be described with reference to FIG. FIG. 17 is a diagram showing an example of a logical configuration in the data registration phase of the storage device of the second embodiment, and shows the relationship between each program and data of the learning server (main) 101 and the storage device 1104 in the data registration phase. For the explanation of the processing flow of the data registration phase in the storage device of the second embodiment, FIG. 8 is referred to, and the storage device 104, the data generation processing program 204, and the data generation list 401 of the first embodiment are described in the second embodiment. The description will be replaced with the storage device 1104, the accelerator-compatible data generation processing program 1204, and the extended data generation list 1401.

実施例２のデータ登録フェーズの処理において、先ず、学習サーバ（主）１０１は、図１７のステップＳ１７００で示すように、ストレージ装置１１０４の事前に決められたデータ領域（フォルダなど）に、データ生成リストセット３１１に含まれるある拡張データ生成リスト１４０１と、データオーギュメンテーションプログラム３１２と、アクセラレータ用プログラム１２１４と、元データ３１３とを書き込む。ストレージ装置１１０４のデータ領域に書き込まれる拡張データ生成リスト１４０１に記載されている入力データ４１１、データオーギュメンテーションプログラム４１２、及びアクセラレータプログラム１４１２のそれぞれが指し示すデータとプログラムとが、データ生成リスト４０１とほぼ同時に書き込まれる元データ３１３、データオーギュメンテーションプログラム３１２、及びアクセラレータ用プログラム１２１４とである。 In the process of the data registration phase of the second embodiment, first, as shown in step S1700 of FIG. 17, the learning server (main) 101 generates data in a predetermined data area (folder or the like) of the storage device 1104. The extended data generation list 1401 included in the list set 311, the data augmentation program 312, the accelerator program 1214, and the original data 313 are written. The data and programs pointed to by the input data 411, the data augmentation program 412, and the accelerator program 1412 described in the extended data generation list 1401 written in the data area of the storage device 1104 are substantially the same as the data generation list 401. The original data 313, the data augmentation program 312, and the accelerator program 1214 are written at the same time.

また、実施例２では、図８のステップＳ７０１において、ストレージ装置１０４の事前性能評価プログラム２０３は、データ生成リストセット３１１が更新されるのを監視する。事前性能評価プログラム２０３は、データ生成リストセット３１１に含まれる拡張データ生成リスト１４０１が新規登録もしくは更新されている場合（ステップＳ７０１：ＹＥＳ）、ステップＳ７０２に処理を移し、新規登録も更新もされていない場合（ステップＳ７０１：ＮＯ）、ステップＳ７０１を繰り返す。 Further, in the second embodiment, in step S701 of FIG. 8, the pre-performance evaluation program 203 of the storage device 104 monitors that the data generation list set 311 is updated. When the extended data generation list 1401 included in the data generation list set 311 is newly registered or updated (step S701: YES), the pre-performance evaluation program 203 shifts the process to step S702, and the new registration and the update are also performed. If not (step S701: NO), step S701 is repeated.

また、実施例２では、図８のステップＳ７０２において、事前性能評価プログラム２０３は、新規登録もしくは更新された拡張データ生成リスト１４０１に対応するデータオーギュメンテーションプログラム３１２、アクセラレータ用プログラム１２１４、及び元データ３１３が格納済みかを確認する。事前性能評価プログラム２０３は、拡張データ生成リスト１４０１に対応するデータオーギュメンテーションプログラム３１２、アクセラレータ用プログラム１２１４、及び元データ３１３が格納済みの場合（ステップＳ７０２：ＹＥＳ）、ステップＳ７０３に処理を移し、格納済みでない場合（ステップＳ７０２：ＮＯ）、ステップＳ７０２を繰り返し、格納の完了を待つ。 Further, in the second embodiment, in step S702 of FIG. 8, the pre-performance evaluation program 203 is the data augmentation program 312, the accelerator program 1214, and the original data corresponding to the newly registered or updated extended data generation list 1401. Check if 313 is stored. When the data augmentation program 312 corresponding to the extended data generation list 1401, the accelerator program 1214, and the original data 313 have already been stored (step S702: YES), the pre-performance evaluation program 203 shifts the process to step S703. If it has not been stored (step S702: NO), step S702 is repeated and the storage is completed.

また、実施例２では、図８のステップＳ７０３において、事前性能評価プログラム２０３は、アクセラレータ対応データ生成処理プログラム１２０４に、新規追加もしくは更新が確認された拡張データ生成リスト１４０１に記載されているデータオーギュメンテーション（データオーギュメンテーションプログラム、オプション、アクセラレータ、及びアクセラレータプログラムの組合せ）の１つを、例えばＣＰＵ１２２の１スレッドで１つの入力データに対して実行させる事前性能評価を行う。 Further, in the second embodiment, in step S703 of FIG. 8, the pre-performance evaluation program 203 is described in the extended data generation list 1401 in which new addition or update is confirmed in the accelerator-compatible data generation processing program 1204. A pre-performance evaluation is performed in which one of the mentions (a combination of a data augmentation program, an option, an accelerator, and an accelerator program) is executed for one input data by, for example, one thread of the CPU 122.

アクセラレータ対応データ生成処理プログラム１２０４は、拡張データ生成リスト１４０１にアクセラレータ１１２６を使用する指定がある場合は、アクセラレータ１１２６も使用してオーギュメンテーションデータを作成する。アクセラレータ対応データ生成処理プログラム１２０４は、事前性能評価の際、オーギュメンテーションデータを作成するとともに、単位スレッドあたりの性能（スループットなど）を測定する。 When the extended data generation list 1401 specifies that the accelerator 1126 is used, the accelerator-compatible data generation processing program 1204 also uses the accelerator 1126 to create augmentation data. The accelerator-compatible data generation processing program 1204 creates augmentation data and measures the performance (throughput, etc.) per unit thread at the time of pre-performance evaluation.

また、実施例２では、図８のステップＳ７０４において、事前性能評価プログラム２０３は、アクセラレータ対応データ生成処理プログラム１２０４による処理が完了した後、性能測定結果をストレージデータ領域にある事前性能評価結果１３０２のテーブルに記録する（図１６参照）。 Further, in the second embodiment, in step S704 of FIG. 8, the pre-performance evaluation program 203 sets the performance measurement result of the pre-performance evaluation result 1302 in the storage data area after the processing by the accelerator-compatible data generation processing program 1204 is completed. Record on the table (see Figure 16).

また、実施例２では、図８のステップＳ７０５〜ステップＳ７０８、ステップＳ７１０〜ステップＳ７１２においては、実施例１と同様の処理を行う。 Further, in the second embodiment, the same processing as in the first embodiment is performed in steps S705 to S708 and steps S710 to S712 of FIG.

また、実施例２では、図８のステップＳ７０９において、事前性能評価プログラム２０３は、テップＳ７０１で新規追加もしくは更新が確認された拡張データ生成リスト１４０１内に、ステップＳ７０３の性能測定を行っていないデータオーギュメンテーションが有るかを判定する。事前性能評価プログラム２０３は、新規追加もしくは更新が確認された拡張データ生成リスト１４０１内に性能測定を行っていないデータオーギュメンテーションが有る場合（ステップＳ７０９：ＹＥＳ）、ステップＳ７０３に処理を移す。一方、事前性能評価プログラム２０３は、新規追加もしくは更新が確認されたデータ生成リスト４０１内に、性能測定を行っていないデータオーギュメンテーションが無い場合（ステップＳ７０９：ＮＯ）、ステップＳ７０１に処理を移す。以上が実施例２のデータ登録フェーズの動作である。 Further, in the second embodiment, in step S709 of FIG. 8, the pre-performance evaluation program 203 includes data in the extended data generation list 1401 for which new addition or update is confirmed in step S701, in which the performance of step S703 is not measured. Determine if there is augmentation. The pre-performance evaluation program 203 shifts the process to step S703 when there is data augmentation for which performance measurement has not been performed in the extended data generation list 1401 for which new addition or update has been confirmed (step S709: YES). On the other hand, the pre-performance evaluation program 203 shifts the process to step S701 when there is no data augmentation for which performance measurement has not been performed in the data generation list 401 for which new addition or update has been confirmed (step S709: NO). .. The above is the operation of the data registration phase of the second embodiment.

なお、ストレージ装置１１０４の事前に決められたデータ領域（フォルダなど）にデータ生成リストセット３１１、データオーギュメンテーションプログラム３１２、アクセラレータ用プログラム１２１４、及び元データ３１３を書き込むのは、必ずしも学習サーバ（主）１０１である必要はなく、ストレージ装置１１０４にアクセスできる機器であれば、何れでもよい。 It is not always the learning server (mainly) that writes the data generation list set 311, the data augmentation program 312, the accelerator program 1214, and the original data 313 to the predetermined data area (folder, etc.) of the storage device 1104. ) 101, and any device that can access the storage device 1104 may be used.

＜実施例２の学習フェーズ＞
図１８を用いて、実施例２の学習フェーズの動作を示す。実施例２においても、実施例１と同様に、学習フェーズは、さらに、データ生成動作フロー及び生成済みデータ領域の解放動作フローを含む。図１８は、実施例２のストレージ装置の学習フェーズにおける論理的構成の一例を示す図であり、学習フェーズにおける学習サーバ１０１、１０２とストレージ装置１１０４の各プログラム及びデータの関係を示す。 <Learning phase of Example 2>
FIG. 18 shows the operation of the learning phase of the second embodiment. In the second embodiment as well, the learning phase further includes the data generation operation flow and the release operation flow of the generated data area, as in the first embodiment. FIG. 18 is a diagram showing an example of a logical configuration in the learning phase of the storage device of the second embodiment, and shows the relationship between each program and data of the learning servers 101 and 102 and the storage device 1104 in the learning phase.

なお、実施例２のストレージ装置における学習フェーズのデータ生成動作フローの説明は、図１０を援用し、実施例１のストレージ装置１０４、データ生成処理プログラム２０４、及びデータ生成リスト４０１を、実施例２のストレージ装置１１０４、アクセラレータ対応データ生成処理プログラム１２０４、及び拡張データ生成リスト１４０１に読み替えて説明する。以下では、実施例１との差分があるステップのみ説明する。 For the explanation of the data generation operation flow of the learning phase in the storage device of the second embodiment, FIG. 10 is referred to, and the storage device 104, the data generation processing program 204, and the data generation list 401 of the first embodiment are referred to in the second embodiment. The storage device 1104, the accelerator-compatible data generation processing program 1204, and the extended data generation list 1401 will be described. In the following, only the steps having a difference from the first embodiment will be described.

実施例２の学習フェーズは、図１８に示すように、学習サーバ（主）１０１が学習プログラム５０１を起動するところから始まる。学習サーバ（主）１０１は、学習プログラム５０１の初期化フェーズを実行し、その中でパラメータ送信ＡＰＩを呼び出す。そして、学習サーバ（主）１０１において、パラメータ送信ＡＰＩにより起動されたパラメータ送信ライブラリ５０２は、学習プログラム５０１の動的パラメータをストレージ装置１１０４のパラメータ受信プログラム２０５に送信する。 As shown in FIG. 18, the learning phase of the second embodiment starts from the point where the learning server (main) 101 starts the learning program 501. The learning server (main) 101 executes the initialization phase of the learning program 501, and calls the parameter transmission API in the initialization phase. Then, in the learning server (main) 101, the parameter transmission library 502 activated by the parameter transmission API transmits the dynamic parameters of the learning program 501 to the parameter reception program 205 of the storage device 1104.

＜実施例２のデータ生成動作フロー＞
実施例２では、図１０のステップＳ９０３において、データ生成管理プログラム２０１は、パラメータ受信プログラム２０５から学習フェーズ開始の通知を受け、学習フェーズで読み出されるオーギュメンテーションデータに対応した拡張データ生成リスト１４０１を読み出し、事前性能評価時に生成したオーギュメンテーションデータが生成済みデータ退避領域３０３に保持されているかを確認する。データ生成管理プログラム２０１は、事前性能評価時に生成したオーギュメンテーションデータが生成済みデータ退避領域３０３に保持されている場合（ステップＳ９０３：ＹＥＳ）、ステップＳ９０４に処理を移し、保持されていない場合（ステップＳ９０３：ＮＯ）、ステップＳ９０５に処理を移す。 <Data generation operation flow of Example 2>
In the second embodiment, in step S903 of FIG. 10, the data generation management program 201 receives the notification of the start of the learning phase from the parameter receiving program 205, and displays the extended data generation list 1401 corresponding to the augmentation data read in the learning phase. It is confirmed whether the augmentation data generated at the time of reading and pre-performance evaluation is held in the generated data save area 303. The data generation management program 201 shifts the process to step S904 when the augmentation data generated at the time of pre-performance evaluation is held in the generated data save area 303 (step S903: YES) and is not held (step S903: YES). Step S903: NO), the process is transferred to step S905.

また、実施例２では、図１０のステップＳ９０５において、データ生成管理プログラム２０１は、アクセラレータ対応データ生成処理プログラム１２０４に、１回目の学習時に使用する初回データを生成させ、生成した初回データをもとにファイルエミュレーションプログラム２０７に応答準備をさせる。この際、拡張データ生成リスト１４０１にアクセラレータ１１２６の使用が指定されている場合は、アクセラレータ１１２６とアクセラレータ用プログラム１２１４も使用する。なお、この初回データは、学習サーバ１０１、１０２の台数、スレッド数、メモリ１１２に一度に読み出す教師データ数をもとに、そのサイズとファイル数が決定され、決定されたサイズで必要データ数分だけ生成される。また、この初回データに該当するデータのうち、ステップＳ９０４で生成済みデータバッファ２１１に読み出されたデータや、事前性能評価時に生成され生成済みデータバッファ２１１に残存しているデータは、再利用される。 Further, in the second embodiment, in step S905 of FIG. 10, the data generation management program 201 causes the accelerator-compatible data generation processing program 1204 to generate the initial data to be used at the first learning, and based on the generated initial data. To prepare the file emulation program 207 for response. At this time, when the use of the accelerator 1126 is specified in the extended data generation list 1401, the accelerator 1126 and the accelerator program 1214 are also used. The size and number of files of this initial data are determined based on the number of learning servers 101 and 102, the number of threads, and the number of teacher data read into the memory 112 at one time, and the determined size is the required number of data. Is only generated. Further, among the data corresponding to the initial data, the data read into the generated data buffer 211 in step S904 and the data generated at the time of the preliminary performance evaluation and remaining in the generated data buffer 211 are reused. To.

また、実施例２では、図１０のステップＳ９０６において、データ生成管理プログラム２０１は、アクセラレータ対応データ生成処理プログラム１２０４の応答準備が完了すると、パラメータ受信プログラム２０５に応答準備完了を通知し、応答準備完了をパラメータ送信ライブラリ５０２へ応答させる。その後、パラメータ送信ライブラリ５０２は、パラメータ受信プログラム２０５から応答準備完了を受信すると、待機させていた学習プログラム５０１を再開させ、学習サーバ１０１、１０２からストレージ装置１１０４へのデータアクセスを開始させる。 Further, in the second embodiment, in step S906 of FIG. 10, when the response preparation of the accelerator-compatible data generation processing program 1204 is completed, the data generation management program 201 notifies the parameter receiving program 205 of the response preparation completion, and the response preparation is completed. Is made to respond to the parameter transmission library 502. After that, when the parameter transmission library 502 receives the response preparation completion from the parameter reception program 205, the learning program 501 that has been kept on standby is restarted, and data access from the learning servers 101 and 102 to the storage device 1104 is started.

また、実施例２では、図１０のステップＳ９０８において、データ生成管理プログラム２０１は、ステップＳ９０７でのデータアクセスに続く後続データアクセスが発生するかを判定する。ここで、データ生成管理プログラム２０１は、ステップＳ９０７でＩＯ観測プログラム２０６から通知されたデータアクセスの後続データが存在する場合、もしくは、後続データは存在しないがエポック回数分のデータ読み出しが繰り返されていない場合、後続データアクセスが発生すると判定する。後続データとは、拡張データ生成リスト１４０１において現在読み出されているオーギュメンテーションデータに該当する行の次の行に該当するデータである。 Further, in the second embodiment, in step S908 of FIG. 10, the data generation management program 201 determines whether a subsequent data access following the data access in step S907 occurs. Here, the data generation management program 201 does not repeat the data reading for the number of epochs when the succeeding data of the data access notified from the IO observation program 206 in step S907 exists, or there is no succeeding data. If so, it is determined that subsequent data access will occur. The subsequent data is the data corresponding to the line following the line corresponding to the augmentation data currently being read in the extended data generation list 1401.

また、実施例２では、図１０のステップＳ９１１において、データ生成管理プログラム２０１は、アクセラレータ対応データ生成処理プログラム１２０４に、ステップＳ９０９で生成が必要と判定された後続データのデータ生成処理を実行させる。ステップＳ９１１が終了すると、データ生成管理プログラム２０１は、ステップＳ９０７に処理を移し、生成済みデータバッファ２１１に次のアクセスで使用されるデータを準備させて、学習サーバ１０１、１０２からのデータアクセスを待つ。 Further, in the second embodiment, in step S911 of FIG. 10, the data generation management program 201 causes the accelerator-compatible data generation processing program 1204 to execute the data generation processing of the subsequent data determined to be necessary to be generated in step S909. When step S911 is completed, the data generation management program 201 shifts the process to step S907, prepares the generated data buffer 211 for the data to be used for the next access, and waits for the data access from the learning servers 101 and 102. ..

＜実施例１の生成済みデータ領域の解放動作フロー＞
実施例２のストレージ装置における生成済みデータ領域の解放動作フローの説明は、図１１を援用し、実施例１のストレージ装置１０４、データ生成処理プログラム２０４、及びデータ生成リスト４０１を、実施例２のストレージ装置１１０４、アクセラレータ対応データ生成処理プログラム１２０４、及び拡張データ生成リスト１４０１に読み替えて説明する。以下では、実施例１との差分があるステップのみ説明する。 <Release operation flow of the generated data area of Example 1>
For the explanation of the release operation flow of the generated data area in the storage device of the second embodiment, FIG. 11 is referred to, and the storage device 104, the data generation processing program 204, and the data generation list 401 of the first embodiment are described in the second embodiment. The description will be replaced with the storage device 1104, the accelerator-compatible data generation processing program 1204, and the extended data generation list 1401. In the following, only the steps having a difference from the first embodiment will be described.

実施例２では、図１１のステップＳ１００５において、バッファサイズ抑制プログラム２０２は、アクセラレータ対応データ生成処理プログラム１２０４に、ステップＳ１００４でファイルクローズ回数が同時アクセス数と等しくなった生成済みデータが格納される生成済みデータバッファ２１１及び生成済みデータ退避領域３０３のバッファ領域を解放させる。 In the second embodiment, in step S1005 of FIG. 11, the buffer size suppression program 202 stores the generated data in which the number of file closes equals the number of simultaneous accesses in step S1004 in the accelerator-compatible data generation processing program 1204. The buffer areas of the completed data buffer 211 and the generated data saving area 303 are released.

また、実施例２では、図１１のステップＳ１００６において、バッファサイズ抑制プログラム２０２は、データ生成リストセット３１１で使用している全てのバッファ領域が解放されたかを判定する。すなわち、バッファサイズ抑制プログラム２０２は、アクセラレータ対応データ生成処理プログラム１２０４を介して、生成済みデータが格納される生成済みデータバッファ２１１及び生成済みデータ退避領域３０３のバッファ領域にデータ生成リストセット３１１の使用中領域が残っているかを判定する。 Further, in the second embodiment, in step S1006 of FIG. 11, the buffer size suppression program 202 determines whether or not all the buffer areas used in the data generation list set 311 have been released. That is, the buffer size suppression program 202 uses the data generation list set 311 in the buffer area of the generated data buffer 211 in which the generated data is stored and the generated data save area 303 via the accelerator-compatible data generation processing program 1204. Determine if the middle area remains.

また、実施例２では、バッファサイズ抑制プログラム２０２は、データ生成リストセット３１１で使用している全てのバッファ領域が解放された場合（ステップＳ１００６：ＹＥＳ）、ステップＳ１００１に処理を移し、全てのバッファ領域が解放されていない場合（ステップＳ１００６：ＮＯ）、ステップＳ１００３に処理を移す。バッファサイズ抑制プログラム２０２は、ステップＳ１００３〜ステップＳ１００６の処理を、データ生成リストセット３１１に含まれる拡張データ生成リスト１４０１で使用している全ての生成済みデータバッファ２１１及び生成済みデータ退避領域３０３が解放されるまで繰り返し実行する。 Further, in the second embodiment, the buffer size suppression program 202 shifts the processing to step S1001 when all the buffer areas used in the data generation list set 311 are released (step S1006: YES), and all the buffers. If the area is not released (step S1006: NO), the process is transferred to step S1003. The buffer size suppression program 202 releases all the generated data buffers 211 and the generated data save area 303 used in the extended data generation list 1401 included in the data generation list set 311 for the processes of steps S1003 to S1006. Repeat until it is done.

以上のデータ生成動作フロー及び生成済みデータ領域の解放動作フローの２つの動作フローで、学習時のアクセス開始前におけるアクセラレータ１１２６を使用した動的なオーギュメンテーションデータ生成と、アクセス終了後におけるオーギュメンテーションデータの格納バッファ領域の解放とを並行して行うことができる。これにより、必要最小限のストレージリソースで、アクセラレータを使用して高速に学習データを準備し、ファイルオープン時には遅延を起こすことなくオーギュメンテーションデータを学習処理に使用することが可能となる。 In the above two operation flows, the data generation operation flow and the release operation flow of the generated data area, the dynamic augmentation data generation using the accelerator 1126 before the access start at the time of learning and the augmentation after the access end. It is possible to release the storage buffer area of the station data in parallel. This makes it possible to prepare training data at high speed using an accelerator with the minimum required storage resources, and to use the augmentation data for training processing without causing a delay when opening a file.

本発明は上記した実施例に限定されるものではなく、様々な変形例を含む。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した構成を備えるものに限定されない。また、矛盾しない限りにおいて、ある実施例の構成の一部を他の実施例の構成で置き換えたり、ある実施例の構成に他の実施例の構成を加えたりすることも可能である。また、各実施例の構成の一部について、追加、削除、置換、統合、及び分割をすることが可能である。 The present invention is not limited to the above-described examples, but includes various modifications. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to those having the described configuration. Further, as long as there is no contradiction, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, or to add the configuration of another embodiment to the configuration of one embodiment. In addition, it is possible to add, delete, replace, integrate, and divide a part of the configuration of each embodiment.

１Ｓ，２Ｓ：学習システム、１０１：学習サーバ（主）、１０２：学習サーバ（従）、１０３：ネットワーク、１０４：ストレージ装置、１１２：メモリ、１２２：ＣＰＵ、１２４：メモリ、１２５：ドライブ、２０１：データ生成管理プログラム、２０２：バッファサイズ抑制プログラム、２０３：事前性能評価プログラム、２０４：データ生成処理プログラム、２０５：パラメータ受信プログラム、２０６：ＩＯ観測プログラム、２０７：ファイルエミュレーションプログラム、２１１：生成済みデータバッファ、３０１：ストレージデータ領域、３０２：事前性能評価結果、３０３：生成済みデータ退避領域、３１１：データ生成リストセット、３１２：データオーギュメンテーションプログラム、３１３：元データ、３２１：データオーギュメンテーションプログラム、３２２：オプション、３２３：単位処理性能、４０１：データ生成リスト、４１１：入力データ、４１２：データオーギュメンテーションプログラム、４１３：オプション、４１４：出力データ、５０１：学習プログラム、５０２：パラメータ送信ライブラリ、１１０４：ストレージ装置、１１２６：アクセラレータ、１２０４：アクセラレータ対応データ生成処理プログラム、
１２１４：アクセラレータ用プログラム、１３０１：ストレージデータ領域、１３０２：事前性能評価結果、１３１４：アクセラレータ用プログラム、１３２１：アクセラレータ、１３２２：アクセラレータプログラム、１４０１：拡張データ生成リスト、１４１１：アクセラレータ、１４１２：アクセラレータプログラム 1S, 2S: Learning system, 101: Learning server (main), 102: Learning server (subordinate), 103: Network, 104: Storage device, 112: Memory, 122: CPU, 124: Memory, 125: Drive, 201: Data generation management program, 202: Buffer size suppression program, 203: Pre-performance evaluation program, 204: Data generation processing program, 205: Parameter reception program, 206: IO observation program, 207: File emulation program, 211: Generated data buffer , 301: Storage data area, 302: Preliminary performance evaluation result, 303: Generated data save area, 311: Data generation list set, 312: Data augmentation program, 313: Original data, 321: Data augmentation program, 322: Option, 323: Unit processing performance, 401: Data generation list, 411: Input data, 412: Data augmentation program, 413: Option, 414: Output data, 501: Learning program, 502: Parameter transmission library, 1104 : Storage device, 1126: Accelerator, 1204: Accelerator-compatible data generation processing program,
1214: Accelerator program, 1301: Storage data area, 1302: Pre-performance evaluation result, 1314: Accelerator program, 1321: Accelerator, 1322: Accelerator program, 1401: Extended data generation list, 1411: Accelerator, 1412: Accelerator program

Claims

In a storage device connected to a server via a network
A storage unit that stores the original data acquired from the outside,
A generation unit that generates processing data by processing the original data selected from the original data stored in the storage unit in response to a request for processing data from the server.
Prior to the generation of the processing data by the generation unit, the generation unit has an evaluation unit for evaluating the processing performance when the processing data is generated.
A storage device characterized in that the generation unit generates the processing data based on the evaluated processing performance and transmits the generated processing data to the server.

It also has a generation management unit that manages the generation of machining data by the generation unit.
The generation management unit
The data read speed from the storage device to the server is calculated from the parameters related to data read from the storage device to the server and the access status from the server to the storage device notified in advance, and evaluated by the evaluation unit. The generation speed at which the generation unit generates the processing data is adjusted from the generation speed per unit resource of the processed processing data and the calculated data reading speed.
The generator
The storage device according to claim 1, wherein processing data to be transmitted to the server is generated at a generation rate adjusted by the generation management unit.

The storage device according to claim 2, wherein the evaluation unit evaluates the generation speed of the processing data per unit resource for each program to perform the processing.

When the original data is registered in the storage unit, the evaluation unit causes the generation unit to perform the processing on one original data for each program to perform the processing, so that the unit of the processing data is The storage device according to claim 3, wherein the generation rate per resource is evaluated.

It has a storage area for storing the processing data generated by the generation unit, and has a storage area.
The storage device according to claim 1, wherein the processing data stored in the storage area is transmitted to the server in response to a request for processing data from the server.

The processing data generated by the generation unit at the time of evaluation of the processing performance by the evaluation unit is stored in the storage area.
When the processing data is requested from the server and the processing data generated by the generation unit at the time of the evaluation is stored in the storage area, the processing data is read out from the storage area. The storage device according to claim 5, wherein the data is transmitted to the server.

The generation management unit
After transmitting the machining data to the server in response to the request from the server, it is determined whether or not the request for machining data following the request for the machining data is expected from the server.
The generator
When it is determined by the generation management unit that a request for subsequent processing data is expected, the subsequent processing data to be transmitted to the server in response to the request from the server is generated in advance and stored in the storage area. The storage device according to claim 5, wherein the storage device is stored.

Storage area capacity control that releases the storage area in which the processing data is temporarily stored based on the number of servers that simultaneously access the storage device included in the parameters and the number of file closes for the processing data by the server. The storage device according to claim 5, further comprising a unit.

The storage device according to claim 1, wherein the original data is image data.

The server is a server that performs machine learning of teacher data.
The storage device according to claim 1, wherein the generation unit performs data augmentation on data selected from the original data to generate the processed data to be the teacher data.

Has more accelerators,
The storage device according to claim 1, wherein the generation unit uses the accelerator to generate the processing data.

The storage device according to claim 11, wherein the evaluation unit evaluates the processing performance for each of the accelerator that performs the processing and the accelerator program executed by the accelerator.

In the data processing method in the storage device connected to the server via the network
The storage device
The original data acquired from the outside is stored in the storage unit,
In response to a request for processing data from the server, processing is performed on the original data selected from the original data stored in the storage unit to generate processing data.
Prior to the generation of the processing data, the processing performance at the time of generating the processing data is evaluated.
The processing data is generated based on the evaluated processing performance, and the processing data is generated.
A data processing method in a storage device, which includes each process of transmitting the generated processed data to the server.