JP2012190206A

JP2012190206A - Data processing scheduling device, method and program

Info

Publication number: JP2012190206A
Application number: JP2011052360A
Authority: JP
Inventors: Junpei Kamimura; 純平上村; Takehiko Kashiwagi; 岳彦柏木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2011-03-10
Filing date: 2011-03-10
Publication date: 2012-10-04
Anticipated expiration: 2031-03-10
Also published as: JP5831679B2

Abstract

PROBLEM TO BE SOLVED: To improve the processing efficiency of a system in which data is transferred to and processed by a co-processor by reducing waiting time for data transfer and data processing.SOLUTION: A host computer operates to divide data to be processed into multiple chunks, obtain the lengths of time needed for transferring and processing each of the chunks, determine the order in which the chunks are processed by a co-processor based on the lengths of data transfer time and data processing time of each of the chunks, transfer the chunks to the co-processor according to the determined order and direct the co-processor to process them.

Description

本発明は、コ・プロセッサにデータを転送して処理させるシステムにおけるデータ処理スケジュールリング装置に関する。 The present invention relates to a data processing scheduling apparatus in a system for transferring data to a co-processor for processing.

近年の計算機において、ＧＰＵ（Graphics Processing Unit）はコモディティ化しており、広く利用可能となっている。ＧＰＵは、主にグラフィックス用の処理を行うものであったが、ＣＰＵ（Central Processing Unit）のコ・プロセッサとして一般的な計算用途に使うＧＰＧＰＵ（General Purpose computing on Graphics Processing Units）と呼ばれる利用形態が注目されている。また、同じくコ・プロセッサとして利用可能なＦＰＧＡ（Field Programmable Gate Array）などのプログラマブルなデバイスの低廉化も進んできている。これらのデバイスは、ホストの計算機とデータバスを介してデータをやり取りするが、その帯域は限られるため、データの転送時間が処理全体の時間に影響を及ぼす。 In recent computers, GPUs (Graphics Processing Units) have become commoditized and are widely available. The GPU is mainly used for graphics processing, but it is used as a general purpose computing on graphics processing units (GPGPU) for general computing purposes as a co-processor of the CPU (Central Processing Unit). Is attracting attention. In addition, the price of programmable devices such as field programmable gate arrays (FPGAs) that can also be used as co-processors has been reduced. These devices exchange data with a host computer via a data bus. However, since the bandwidth is limited, the data transfer time affects the overall processing time.

一般に、大量のデータをコ・プロセッサを用いて処理するシステムにおけるデータ処理の手順は、１）ホストコンピュータからコ・プロセッサへのデータの転送、２）コ・プロセッサでのデータ処理、３）コ・プロセッサからホストコンピュータへの結果データの転送、となる。一般にデータ転送時間は、データ量に対して単調増加である。ホスト―コ・プロセッサ間はPCIExpressX16などのバスで結ばれるため、データ転送時間がボトルネックになりやすい。 In general, a data processing procedure in a system that processes a large amount of data using a coprocessor is as follows: 1) transfer of data from the host computer to the coprocessor, 2) data processing in the coprocessor, 3) co The result data is transferred from the processor to the host computer. In general, the data transfer time monotonically increases with respect to the amount of data. Data transfer time tends to be a bottleneck because the host-co-processor is connected by a bus such as PCI ExpressX16.

データ転送時間を小さくし、全体の処理を高速化するためには、一般に次の二つの方法がある。一つ目は、データ列を複数に分割し、転送と処理を多重化（パイプライン処理）して、データ転送時間を隠ぺいする方法である。二つ目は、転送データを圧縮する方法である。 In order to reduce the data transfer time and increase the overall processing speed, there are generally the following two methods. The first is a method of concealing the data transfer time by dividing the data string into a plurality of pieces and multiplexing the transfer and processing (pipeline processing). The second is a method of compressing transfer data.

例えば、特許文献１には、転送される各圧縮データのデータサイズと送信順序を受信側Ｗの機器に送信し、受信側の機器では、これらの情報に基づいて、転送される各圧縮データの解凍処理のスケジューリングを行う技術が開示されている。 For example, in Patent Document 1, the data size and the transmission order of each compressed data to be transferred are transmitted to the device on the receiving side W, and the receiving side device transmits each compressed data to be transferred based on these pieces of information. A technique for scheduling a decompression process is disclosed.

特開２００６−３３８４０９号公報JP 2006-338409 A

コ・プロセッサを用いてデータを処理するシステムで上述の二つの方法を用いて、処理対象のデータを分割し、分割したデータ（チャンク）を圧縮してコ・プロセッサに転送して処理させる場合を考える。この場合、圧縮後のチャンクのサイズが異なることでチャンク毎にデータ転送時間とデータ処理時間が異なる状況が発生する場合がある。例えば、不均一なサイズでチャンクを作成した場合や、固定長サイズでチャンクを作成したがチャンク毎に圧縮率が異なる場合等である。このような状況で、ホストコンピュータとコ・プロセッサの間でパイプライン処理を行う場合、元のままのデータの順番（インオーダー）で各チャンクを転送・処理していくと、データ転送待ちやデータ処理待ちが発生し、処理効率が低下するという問題があった。 In a system that processes data using a co-processor, the above-mentioned two methods are used to divide the data to be processed, compress the divided data (chunk), transfer it to the co-processor, and process it. Think. In this case, there may be a situation in which the data transfer time and the data processing time are different for each chunk due to different chunk sizes after compression. For example, when chunks are created with non-uniform sizes, or when chunks are created with a fixed length size, but the compression rate differs for each chunk. In this situation, when pipeline processing is performed between the host computer and the co-processor, if each chunk is transferred and processed in the original data order (in-order), data transfer wait and data There is a problem that processing wait occurs and the processing efficiency decreases.

特許文献１の転送方式においても、データ列を元のままの順番で転送しているため、受信側でデータ転送待ちが発生し、効率が低下するという問題がある。 Even in the transfer method of Patent Document 1, since the data strings are transferred in the original order, there is a problem in that the reception side waits for data transfer and efficiency decreases.

本発明は、上記問題点に鑑みてなされたもので、コ・プロセッサにデータを転送して処理させるシステムにおいて、データ転送やデータ処理の待ち時間を短縮し、処理効率を向上させることができるデータ処理スケジューリング装置、方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and in a system that transfers data to a co-processor for processing, data that can reduce the waiting time for data transfer and data processing and improve processing efficiency. It is an object to provide a processing scheduling apparatus, method, and program.

本発明は、コ・プロセッサにデータを転送して処理させるデータ処理スケジュールリング装置であって、処理対象のデータを複数のチャンクに分割する分割手段と、各チャンクのデータ転送時間とデータ処理時間を取得する時間取得手段と、各チャンクのデータ転送時間とデータ処理時間の大小関係に基づいて、前記各チャンクを前記コ・プロセッサに処理させる順番を決定する順番決定手段と、を備えることを特徴とするデータ処理スケジュールリング装置である。 The present invention is a data processing scheduling apparatus for transferring data to a co-processor for processing, dividing means for dividing the data to be processed into a plurality of chunks, and the data transfer time and data processing time for each chunk. A time acquisition means for acquiring; and an order determination means for determining the order in which the co-processor processes the chunks based on the magnitude relationship between the data transfer time and the data processing time of each chunk. This is a data processing scheduling apparatus.

本発明は、ホストからコ・プロセッサにデータを転送して処理させるデータ処理スケジュールリング方法あって、処理対象のデータを複数のチャンクに分割し、各チャンクのデータ転送時間とデータ処理時間を取得し、各チャンクのデータ転送時間とデータ処理時間の大小関係に基づいて、前記各チャンクを前記コ・プロセッサに処理させる順番を決定することを特徴とするデータ処理スケジュールリング方法である。 The present invention provides a data processing scheduling method for transferring data from a host to a co-processor for processing, dividing the data to be processed into a plurality of chunks, and acquiring the data transfer time and data processing time of each chunk. The data processing scheduling method is characterized in that the order in which each of the chunks is processed by the co-processor is determined based on the magnitude relationship between the data transfer time and the data processing time of each chunk.

本発明は、コンピュータに、処理対象のデータを複数のチャンクに分割する分割処理、各チャンクのデータ転送時間とデータ処理時間を取得する時間取得処理、各チャンクのデータ転送時間とデータ処理時間の大小関係に基づいて、前記各チャンクを前記コ・プロセッサに処理させる順番を決定する順番決定処理、を実行させるためのプログラムである。 The present invention provides a computer with a division process for dividing processing target data into a plurality of chunks, a time acquisition process for acquiring the data transfer time and data processing time of each chunk, and a data transfer time and a data processing time of each chunk. A program for executing an order determination process for determining an order in which the co-processor processes each chunk based on a relationship.

本発明によれば、コ・プロセッサにデータを転送して処理させるシステムにおいて、データ転送やデータ処理の待ち時間を短縮し、処理効率を向上させることができる。 According to the present invention, in a system that transfers data to a co-processor for processing, the waiting time for data transfer and data processing can be shortened and the processing efficiency can be improved.

図１は本発明の実施形態に係るデータ処理スケジューリング装置を概略的に示すブロック図である。FIG. 1 is a block diagram schematically showing a data processing scheduling apparatus according to an embodiment of the present invention. 図２は本実施形態に係るデータ処理スケジュール装置の動作を説明するための図である。FIG. 2 is a diagram for explaining the operation of the data processing schedule device according to the present embodiment. 図３はデータの登録処理を説明するためのフローチャートである。FIG. 3 is a flowchart for explaining data registration processing. 図４は処理の順番の決定処理を説明するためのフローチャートである。FIG. 4 is a flowchart for explaining processing order determination processing. 図５は順番が決定された処理の実行を説明するためのフローチャートである。FIG. 5 is a flowchart for explaining the execution of the process whose order has been determined. 図６は各チャンクのデータ転送処理とデータ処理の実行時間の具体例を示す図である。FIG. 6 is a diagram showing a specific example of the data transfer processing and data processing execution time of each chunk. 図７はチャンク毎に処理結果をホスト側に渡す場合の処理を説明するためのフローチャートである。FIG. 7 is a flowchart for explaining the processing when the processing result is transferred to the host side for each chunk. 図８はデータ転送時間とデータ処理時間を計測して取得する場合のデータ登録処理を説明するためのフローチャートである。FIG. 8 is a flowchart for explaining the data registration process in the case where the data transfer time and the data processing time are measured and acquired.

以下、本発明の実施形態について図面を参照して説明する。本発明は、ホストコンピュータからコ・プロセッサにデータを転送して処理させるデータ処理スケジュール装置に関する。本発明の装置が処理対象とするデータ処理は、処理対象のデータ列が複数個に分割されたデータ列（以下、チャンク）について、その処理順番に影響を受けない処理を含む。例えば、数値配列の総和計算や平均値計算などの集約処理、数値配列から特定の数値だけビットの立ったフラグ配列を作り出すbitmapインデックスの作成処理、数百万から数億件の巨大なデータを扱うＤＷＨ（Data
Ware House）内部で行われる処理等を含む。 Embodiments of the present invention will be described below with reference to the drawings. The present invention relates to a data processing schedule device for transferring data from a host computer to a co-processor for processing. Data processing to be processed by the apparatus of the present invention includes processing that is not affected by the processing order of a data string (hereinafter referred to as a chunk) obtained by dividing the data string to be processed. For example, aggregation processing such as summation calculation and average value calculation of numeric arrays, creation of bitmap index that creates a flag array with bits of specific numeric values from numeric arrays, handling millions to hundreds of millions of huge data DWH (Data
Ware House) includes internal processing.

図１は、本発明の実施形態に係るデータ処理スケジューリング装置を概略的に示すブロック図である。このデータ処理スケジューリング装置は、ホストコンピュータ１０と、コ・プロセッサ２０と、ホストコンピュータ１０とコ・プロセッサ２０を接続するデータバス７を備える。 FIG. 1 is a block diagram schematically showing a data processing scheduling apparatus according to an embodiment of the present invention. The data processing scheduling apparatus includes a host computer 10, a co-processor 20, and a data bus 7 that connects the host computer 10 and the co-processor 20.

ホストコンピュータ１０は、データ登録部１とデータ蓄積部２と順番決定部３と処理指示部４を備える。 The host computer 10 includes a data registration unit 1, a data storage unit 2, an order determination unit 3, and a processing instruction unit 4.

データ登録部１は、入力された処理対象のデータに所定の処理を施してデータ蓄積部２に登録する。データ登録部１は、データ分割部１１とデータ圧縮部１２と時間取得部１３を備える。 The data registration unit 1 performs predetermined processing on the input data to be processed and registers it in the data storage unit 2. The data registration unit 1 includes a data division unit 11, a data compression unit 12, and a time acquisition unit 13.

データ分割部１１は、入力された処理対象データを複数のチャンクに分割する。 The data dividing unit 11 divides the input processing target data into a plurality of chunks.

データ圧縮部１２は、分割されたチャンクを、所定の圧縮アルゴリズムにより圧縮する。 The data compression unit 12 compresses the divided chunks with a predetermined compression algorithm.

時間取得部１３は、入力されたデータと入力された処理内容に基づいて、データ転送時間とデータ処理時間を取得する。データ転送時間とデータ処理時間の取得方法は任意であり、例えば、データサイズ等に基づく実測値のデータを予め記憶しておき、その実測値を取得してもよく、また、データサイズ等を引数とする予め設定された関数を用いて取得してもよい。 The time acquisition unit 13 acquires the data transfer time and the data processing time based on the input data and the input processing content. The method for acquiring the data transfer time and the data processing time is arbitrary. For example, the actual measurement data based on the data size or the like may be stored in advance, and the actual measurement value may be acquired. You may acquire using the preset function.

データ登録部１は、データ分割部１１とデータ圧縮部１２により分割・圧縮されたチャンクと、時間取得部１３により取得されたデータ転送時間とデータ処理時間と、入力された処理内容と、を対応付けてデータ蓄積部２に登録する。 The data registration unit 1 corresponds to the chunks divided and compressed by the data dividing unit 11 and the data compressing unit 12, the data transfer time and the data processing time acquired by the time acquiring unit 13, and the input processing content. At the same time, it is registered in the data storage unit 2.

データ蓄積部２は、圧縮されたチャンクと、チャンクに対する処理内容と、チャックについて取得されたデータ転送時間とデータ処理時間と、チャンクに対する処理結果と、を蓄積して記憶する。データ蓄積部２は、ＤＲＡＭ（Dynamic Random Access Memory）、ＳＳＤ（Solid
State Drive）等を含む。 The data accumulation unit 2 accumulates and stores the compressed chunk, the processing content for the chunk, the data transfer time and data processing time acquired for the chuck, and the processing result for the chunk. The data storage unit 2 includes DRAM (Dynamic Random Access Memory), SSD (Solid
State Drive) etc.

順番決定部３は、データ蓄積部２の蓄積された情報に基づいて、各チャンクを処理する順番を決定する。この処理の順番を決定する処理の詳細については後述する。 The order determination unit 3 determines the order in which each chunk is processed based on the information stored in the data storage unit 2. Details of the process for determining the order of the processes will be described later.

処理指示部４は、順番決定部３により決定された順番に従って、各チャンクのデータ転送とデータ処理の実行を指示する。 The processing instruction unit 4 instructs the data transfer of each chunk and the execution of data processing according to the order determined by the order determination unit 3.

コ・プロセッサ２０は、データ蓄積部５と演算実行部６を備える。コ・プロセッサ２０は、例えば、ＧＰＵ、ＦＰＧＡ等を含む。 The co-processor 20 includes a data storage unit 5 and an operation execution unit 6. The co-processor 20 includes, for example, a GPU, an FPGA, and the like.

データ蓄積部５は、ホスト側から転送されたチャンクと、チャンクに対する処理と、チャンクに対する処理結果を蓄積記憶する。データ蓄積部５は、ＤＲＡＭ、ＳＳＤ等を含む。 The data accumulation unit 5 accumulates and stores the chunk transferred from the host side, the processing for the chunk, and the processing result for the chunk. The data storage unit 5 includes a DRAM, an SSD, and the like.

演算実行部６は、転送が終わったチャンクについてデータ処理を行い、処理結果をデータ蓄積部５に格納する。 The arithmetic execution unit 6 performs data processing on the chunk that has been transferred, and stores the processing result in the data storage unit 5.

データバス７はホストコンピュータ１０とコ・プロセッサ２０を接続し、データ蓄積部２からデータ蓄積部５へデータを転送する。データバス７は、例えば、PCI Expressやイーサネット（登録商標）を含む。 The data bus 7 connects the host computer 10 and the co-processor 20 and transfers data from the data storage unit 2 to the data storage unit 5. The data bus 7 includes, for example, PCI Express and Ethernet (registered trademark).

次に本実施形態に係るデータ処理スケジューリング装置の動作について説明する。本実施形態に係るデータ処理スケジュール装置は、図２に示すように、各種データの登録（ステップＡ２１）、処理の順番の決定（ステップＡ２２）、処理の実行（ステップＡ２３）という処理の流れに従って動作する。以下、これらの各ステップについて説明する。 Next, the operation of the data processing scheduling apparatus according to this embodiment will be described. As shown in FIG. 2, the data processing schedule device according to the present embodiment operates in accordance with the processing flow of registration of various data (step A21), determination of processing order (step A22), and execution of processing (step A23). To do. Hereinafter, each of these steps will be described.

まず、ステップＡ２１のデータの登録処理について図３を参照して説明する。データ登録部１は、処理対象データ、処理内容、転送時間を求めるための関数、処理時間を求めるための関数の入力を受ける（ステップＡ３１）。データ分割部１１は、入力された処理対象データを複数のチャンクに分割する（ステップＡ３２）。データ圧縮部１２は、各チャンクを圧縮する（ステップＡ３３）。データ登録部１は、圧縮された各チャンクと、入力された処理内容をデータ蓄積部２に記憶する（ステップＡ３４）。 First, the data registration process in step A21 will be described with reference to FIG. The data registration unit 1 receives input of processing target data, processing contents, a function for obtaining a transfer time, and a function for obtaining a processing time (step A31). The data dividing unit 11 divides the input processing target data into a plurality of chunks (step A32). The data compression unit 12 compresses each chunk (step A33). The data registration unit 1 stores each compressed chunk and the input processing content in the data storage unit 2 (step A34).

次に、ステップＡ２２の処理の順番の決定処理について図４を参照して説明する。順番決定部２は、データ蓄積部２に保存された全チャックについて以下の処理を行う（ステップＡ４１）。順番決定部２は、データ蓄積部２に保存されている一つのチャンクについて、転送時間と処理時間を取得する（ステップＡ４２）。順番決定部２は、転送時間と処理時間を比較し、処理時間が転送時間以上かどうか、それらの大小関係を判定する。転送時間が処理時間よりも大きい場合（ステップＡ４３：ＮＯ）、そのチャンクをリストＡに追加する（ステップＡ４４）。また、処理時間が転送時間以上の場合（ステップＡ４３：ＹＥＳ）、そのチャンクをリストＢに追加する（ステップＡ４５）。 Next, the processing order determination processing in step A22 will be described with reference to FIG. The order determination unit 2 performs the following processing for all chucks stored in the data storage unit 2 (step A41). The order determination unit 2 acquires the transfer time and the processing time for one chunk stored in the data storage unit 2 (step A42). The order determination unit 2 compares the transfer time and the processing time, and determines whether the processing time is equal to or greater than the transfer time and the magnitude relationship between them. If the transfer time is longer than the processing time (step A43: NO), the chunk is added to the list A (step A44). If the processing time is equal to or longer than the transfer time (step A43: YES), the chunk is added to the list B (step A45).

全チャンクについて、ステップＡ４２〜Ａ４５の処理が終了すると（ステップＡ４１：ＹＥＳ）、順番決定部２は、リストＡに登録したチャンクを転送時間の短い順にソートし（ステップＡ４６）、リストＢに登録したチャンクを処理時間の長い順にソートする（ステップＡ４７）。順番決定部２は、リストＡの後ろにリストＢを連結して、全体の実行スケジュールとして、データ蓄積部２に保存する（ステップＡ４８）。 When the processing of steps A42 to A45 is completed for all the chunks (step A41: YES), the order determining unit 2 sorts the chunks registered in the list A in the order of the shortest transfer time (step A46) and registers them in the list B. Chunks are sorted in order of processing time (step A47). The order determination unit 2 concatenates the list B after the list A, and stores it in the data storage unit 2 as the entire execution schedule (step A48).

これにより、各チャンクの処理の順番が決定される。先に実行される、転送時間が処理時間よりも小さいチャンクのグループ（リストＡ）について転送時間の短い順にソートしたのは、初めの処理開始待ち時間を短縮するためである。また、続いて実行される、処理時間が転送時間以上のチャンクのグループ（リストＢ）について処理時間の長い順にソートしたのは、終わりの処理時間を短縮するためである。 Thereby, the processing order of each chunk is determined. The reason why the group of chunks (list A) that is executed first and whose transfer time is shorter than the processing time is sorted in the order of short transfer time is to reduce the initial processing start waiting time. The reason why the group of chunks (list B), which are executed subsequently and whose processing time is equal to or longer than the transfer time, is sorted in descending order of processing time is to shorten the end processing time.

次に、ステップＡ２３の処理の実行について図５を参照して説明する。 Next, execution of the process of step A23 will be described with reference to FIG.

処理指示部４は、まず、チャンクの転送をコ・プロセッサ２０に指示し、順番決定部３により決定された順番に従って、チャンクをデータ蓄積部２から読み出し、データバス７を介してコ・プロセッサ２０に転送する（ステップＡ５１）。コ・プロセッサ２０は、転送されたチャンクをデータ蓄積部５に記憶する。 The processing instruction unit 4 first instructs the co-processor 20 to transfer the chunk, reads the chunk from the data storage unit 2 according to the order determined by the order determination unit 3, and transmits the chunk via the data bus 7. (Step A51). The co-processor 20 stores the transferred chunk in the data storage unit 5.

処理指示部４は、データ蓄積部２に保存された処理内容を読み出し、コ・プロセッサ２０にチャンクの処理を指示する（ステップＡ５２）。コ・プロセッサ２０の演算実行部６は、転送が終わったチャンクについて、ホストから指示されたデータ処理を行う。 The processing instruction unit 4 reads the processing content stored in the data storage unit 2 and instructs the co-processor 20 to process the chunk (step A52). The arithmetic execution unit 6 of the co-processor 20 performs data processing instructed by the host for the chunk that has been transferred.

処理指示部４は、全チャンクについて処理の指示が終わったかを判定し（ステップＡ５３）、終わっていない場合には（ステップＡ５３：ＮＯ）、ステップＡ５１に戻って次のチャンクについての処理を行う。また、全チャンクについて処理の指示が終わった場合には（ステップＡ５３：ＹＥＳ）、コ・プロセッサ２０演算実行部６による最後のチャンクの処理の終了を待つ（ステップＡ５４）。 The process instruction unit 4 determines whether or not the process instruction has been completed for all chunks (step A53). If it has not been completed (step A53: NO), the process instruction unit 4 returns to step A51 to perform the process for the next chunk. If the processing instructions for all the chunks have been completed (step A53: YES), the process of the last chunk by the co-processor 20 calculation execution unit 6 is awaited (step A54).

処理指示部４は、コ・プロセッサ２０において最後のチャンクの処理が終了すると、処理結果の転送をコ・プロセッサ２０に指示する（ステップＡ５５）。コ・プロセッサ２０は、データ蓄積部５に保存された処理結果をホストコンピュータ１０に転送する。 When the processing of the last chunk is completed in the co-processor 20, the processing instruction unit 4 instructs the co-processor 20 to transfer the processing result (step A55). The co-processor 20 transfers the processing result stored in the data storage unit 5 to the host computer 10.

次に本データ処理スケジューリング装置の動作を具体的に説明する。例えば、ＤＷＨ等の分野では、データの傾向や関連性を分析するために、数百万〜数億件の大量のデータに対する処理を何度も行うことがある。ここでは、１億件の要素を持つ数値データ配列の総和計算を行う場合を例に説明する。 Next, the operation of the data processing scheduling apparatus will be specifically described. For example, in a field such as DWH, processing of a large amount of data of millions to hundreds of millions of times may be performed many times in order to analyze data trends and relationships. Here, a case where the sum calculation of a numerical data array having 100 million elements is performed will be described as an example.

まず、データ登録処理が実行される（ステップＡ２１）。具体的には、データ登録部１に、１億件の数値データ配列と、処理内容（圧縮された数値列の総和を計算する）と、変数ｙを引数とし、データ転送時間を返す関数Ｔ（ｙ）と、変数ｙを引数とし、データ処理時間を返す関数Ｘ（ｙ）と、が入力される（ステップＡ３１）。本実施形態では、関数Ｔと関数Ｘにおける引数は、圧縮後のチャンクのデータサイズとし、例えば、Ｔ（ｓｉｚｅ）＝２α＊ｓｉｚｅ、Ｘ（ｓｉｚｅ）＝α＊ｓｉｚｅとする。 First, a data registration process is executed (step A21). Specifically, in the data registration unit 1, a function T that returns a data transfer time with 100 million numerical data arrays, processing contents (calculating the sum of the compressed numerical strings), and a variable y as arguments. y) and a function X (y) that returns the data processing time with the variable y as an argument are input (step A31). In the present embodiment, the argument in the function T and the function X is the data size of the compressed chunk, for example, T (size) = 2α * size, X (size) = α * size.

データ分割部１１は、入力された１億件の数値データ配列を、２５００万件ずつの４個のチャンク（チャンクＣ０〜Ｃ３）に分割する（ステップＡ３２）。チャンクの数は４なので、コ・プロセッサ２０は、チャンク内のデータの和を計算し、その値をデータ蓄積部５に記憶されている結果値に加算する処理を４回繰り返すこととなる。 The data dividing unit 11 divides the input 100 million numerical data array into four 25 million chunks (chunks C0 to C3) (step A32). Since the number of chunks is 4, the co-processor 20 repeats the process of calculating the sum of the data in the chunks and adding the value to the result value stored in the data storage unit 5 four times.

データ圧縮部１２は、各チャンクを圧縮する（ステップＡ３３）。この例において、圧縮後の各チャンクのサイズは、ｃｈｕｎｋ[０]．ｓｉｚｅ＝１、ｃｈｕｎｋ[１]．ｓｉｚｅ＝２、ｃｈｕｎｋ[２]．ｓｉｚｅ＝２、ｃｈｕｎｋ[３]．ｓｉｚｅ＝３であったとする。 The data compression unit 12 compresses each chunk (step A33). In this example, the size of each chunk after compression is chunk [0]. size = 1, chunk [1]. size = 2, chunk [2]. size = 2, chunk [3]. Assume that size = 3.

データ登録部１は、圧縮された各チャンクと、圧縮されたチャンクに対する処理内容と、関数Ｔと、関数Ｘをデータ蓄積部２に保存する（ステップＡ３４）。 The data registration unit 1 stores each compressed chunk, the processing content for the compressed chunk, the function T, and the function X in the data storage unit 2 (step A34).

次に、処理の順番の決定処理が実行される（ステップＡ２２）。具体的には、順番決定部２は、図４に示すアルゴリズムに従って、データ蓄積部２に保存されたデータから、各チャンクの処理順番を決定する。各チャンクについて転送時間と処理時間を取得し（ステップＡ４２）、これらの大小関係を判定する（ステップＡ４３）。この例では、全てのチャンクにおいて、Ｔ（ｓｉｚｅ）＞Ｘ（ｓｉｚｅ）が成り立つ。このため、各チャンクはリストＢに登録されることとなる（ステップＡ４５）。全チャンクについて転送時間と処理時間の大小関係の判定が終了すると（ステップＡ４１：ＹＥＳ）、順番決定部２は、データ処理時間Ｘ（ｓｉｚｅ）の大きい順にソートし、ソート結果に基づいてスケジュールのリストを作成する。この例では、チャンク処理順は、Ｃ３，Ｃ２，Ｃ１，Ｃ０となる。 Next, processing order determination processing is executed (step A22). Specifically, the order determination unit 2 determines the processing order of each chunk from the data stored in the data storage unit 2 according to the algorithm shown in FIG. The transfer time and processing time are acquired for each chunk (step A42), and the magnitude relationship between these is determined (step A43). In this example, T (size)> X (size) holds in all chunks. Therefore, each chunk is registered in the list B (step A45). When the determination of the magnitude relationship between the transfer time and the processing time is completed for all chunks (step A41: YES), the order determining unit 2 sorts the data processing time X (size) in descending order, and lists the schedule based on the sorting result. Create In this example, the chunk processing order is C3, C2, C1, and C0.

次に、処理の実行が行われる（ステップＡ２３）。具体的には、処理指示部４は、図５に示すアルゴリズムに従って、チャンクＣ３，Ｃ２，Ｃ１，Ｃ０の順で処理の実行を指示する。 Next, processing is executed (step A23). Specifically, the processing instruction unit 4 instructs the execution of processing in the order of chunks C3, C2, C1, and C0 according to the algorithm shown in FIG.

処理指示部４は、コ・プロセッサ２０にデータ転送を指示し、データ蓄積部２からコ・プロセッサ側のデータ蓄積部５へのデータを転送する（ステップＡ５１）。 The processing instruction unit 4 instructs the co-processor 20 to transfer data, and transfers data from the data storage unit 2 to the data storage unit 5 on the co-processor side (step A51).

処理指示部４は、コ・プロセッサ２０にデータ処理を指示する。コ・プロセッサ２０の演算実行部６は、転送が終わったチャンクに対して、演算実行部６が総和の計算を行う（ステップＡ５２）。 The processing instruction unit 4 instructs the co-processor 20 to perform data processing. The arithmetic execution unit 6 of the co-processor 20 calculates the sum for the chunks that have been transferred (step A52).

処理指示部４は、全チャンクについて処理の指示が終わるまで（ステップＡ５３：ＮＯ）、ステップＡ５１、Ａ５２を繰り返す。そして、全チャンクの処理の指示が終わると（ステップＡ５３：ＹＥＳ）、コ・プロセッサ２０において最後のチャンクの処理が終了するのを待つ（ステップＡ５４）。最後のチャンクの計算が終わったことを検知すると、処理指示部４は、処理結果の転送をコ・プロセッサ２０に指示する。コ・プロセッサ２０は、データ蓄積部５から最終結果のデータを読み出し、ホストコンピュータ１０に転送する（ステップＡ５５）。 The process instruction unit 4 repeats steps A51 and A52 until the process instruction is completed for all chunks (step A53: NO). When all chunk processing instructions have been completed (step A53: YES), the co-processor 20 waits for the last chunk processing to end (step A54). When it is detected that the calculation of the last chunk has been completed, the processing instruction unit 4 instructs the co-processor 20 to transfer the processing result. The co-processor 20 reads the final result data from the data storage unit 5 and transfers it to the host computer 10 (step A55).

上述した例における各チャンクのデータ転送処理とデータ処理の実行時間を図６に示す（optimized）。ここでは、最終結果の転送時間がαであったとする。また、図６では、比較のため、チャンクＣ０，Ｃ１，Ｃ２，Ｃ３という元の順番のままで処理した場合のデータ転送処理とデータ処理の実行時間も示している（in-order）。図６に示すように、順番決定部３で決定した順番で処理を行うことによって、インオーダーで処理を行うよりも、転送待ち、処理待ち時間を短くすることができ、全体の処理を高速化できる。 FIG. 6 shows the data transfer processing and data processing execution time of each chunk in the above example (optimized). Here, it is assumed that the final transfer time is α. FIG. 6 also shows data transfer processing and data processing execution time when processing is performed in the original order of chunks C0, C1, C2, and C3 for comparison (in-order). As shown in FIG. 6, by performing the processing in the order determined by the order determination unit 3, it is possible to shorten the waiting time for transfer and the processing waiting time rather than performing the processing in order, thereby speeding up the overall processing. it can.

以上のように、本実施の形態によれば、処理データをコ・プロセッサに転送して処理させる場合に、処理データを複数に分割し、分割した各チャンクについて、その転送時間と処理時間に基づいて処理の順番を決定することにより、データ転送の効率とコ・プロセッサの利用率を向上させることができ、全体の処理時間を短縮させることができる。 As described above, according to the present embodiment, when processing data is transferred to a co-processor for processing, the processing data is divided into a plurality of chunks, and each divided chunk is based on the transfer time and processing time. By determining the processing order, it is possible to improve the efficiency of data transfer and the utilization rate of the coprocessor, and to shorten the overall processing time.

なお、上述した処理では、コ・プロセッサ２０において全てのチャンクの処理が終わってからホスト側に結果を渡しているが、これに限定されず、チャンク毎に処理結果をホスト側に渡すようにしてもよい。この場合の処理について図７を参照して説明する。 In the above-described processing, the co-processor 20 passes the result to the host side after all chunks have been processed. However, the present invention is not limited to this, and the processing result is passed to the host side for each chunk. Also good. Processing in this case will be described with reference to FIG.

処理指示部４は、チャンクの転送をコ・プロセッサ２０に指示し、順番決定部３により決定された順番に従って、チャンクをデータ蓄積部２から読み出してコ・プロセッサ２０に転送する（ステップＡ７１）。コ・プロセッサ２０は、転送されたチャンクをデータ蓄積部５に記憶する。 The processing instruction unit 4 instructs the co-processor 20 to transfer the chunk, reads the chunk from the data storage unit 2 and transfers it to the co-processor 20 according to the order determined by the order determination unit 3 (step A71). The co-processor 20 stores the transferred chunk in the data storage unit 5.

処理指示部４は、データ蓄積部２に保存された処理内容を読み出し、コ・プロセッサ２０にチャンクの処理を指示する（ステップＡ７２）。コ・プロセッサ２０の演算実行部６は、転送が終わったチャンクについて、ホストから指示されたデータ処理を行う。 The processing instruction unit 4 reads the processing content stored in the data storage unit 2, and instructs the co-processor 20 to process the chunk (step A72). The arithmetic execution unit 6 of the co-processor 20 performs data processing instructed by the host for the chunk that has been transferred.

処理指示部４は、コ・プロセッサ２０にチャンクの処理結果の転送を指示する（ステップＡ７３）。コ・プロセッサ２０は、チャンクの処理結果をホストに転送し、処理指示部４は、転送されてきた処理結果をデータ蓄積部２に蓄積する。 The processing instruction unit 4 instructs the co-processor 20 to transfer the processing result of the chunk (Step A73). The co-processor 20 transfers the processing result of the chunk to the host, and the processing instruction unit 4 stores the transferred processing result in the data storage unit 2.

処理指示部４は、全チャンクについて処理の指示が終わったかを判定し（ステップＡ７４）、終わっていない場合には（ステップＡ７４：ＮＯ）、ステップＡ７１に戻って次のチャンクについての処理を行う。また、全チャンクについて処理の指示が終わった場合には（ステップＡ７３：ＹＥＳ）、コ・プロセッサ２０演算実行部６による最後のチャンクの転送処理の終了を待つ（ステップＡ７５４）。 The process instruction unit 4 determines whether or not the process instruction has been completed for all chunks (step A74). If it has not been completed (step A74: NO), the process instruction unit 4 returns to step A71 to perform the process for the next chunk. When the processing instructions for all the chunks have been completed (step A73: YES), the process waits for the end of the last chunk transfer process by the co-processor 20 operation execution unit 6 (step A754).

処理指示部４は、コ・プロセッサ２０において最後のチャンクの転送処理が終了すると、データ蓄積部２に保存していたチャンク毎の結果処理をまとめる処理を行う（ステップＡ７６）。 When the co-processor 20 finishes transferring the last chunk in the co-processor 20, the process instructing unit 4 performs a process of collecting the result processes for each chunk stored in the data storage unit 2 (step A76).

なお、図７に示すような処理を行う場合、チャンクの処理の順番の決定は、チャンクの転送時間、チャンクの処理時間に加えて、コ・プロセッサ側に保持可能な結果のサイズ、結果の転送時間等をパラメータとする最適化問題となる。この場合、順番決定部３は、例えば、総当たりによって最適なスケジュールを得る、ヒューリスティックな解法により、近似最適なスケジュールを作成する。 When processing as shown in FIG. 7 is performed, the chunk processing order is determined in addition to the chunk transfer time and chunk processing time, the size of the result that can be held on the co-processor side, and the transfer of the result. This is an optimization problem with time as a parameter. In this case, the order determination unit 3 creates an approximate optimal schedule by, for example, a heuristic solution that obtains an optimal schedule by brute force.

上記実施形態では、ホストコンピュータ１０のデータ登録部１によるデータ登録処理（図３）では、各チャンクのデータ転送時間とデータ処理時間を求めるための各関数が入力されていた。上記実施形態の変形例として、のデータ登録部１の時間取得部１３が、各チャンクのデータ転送時間とデータ処理時間の少なくとも一方を実測して取得し、この実測値に基づいて順番決定部３が各チャンクの処理の順番を決定してもよい。この場合のデータ登録部１によるデータ登録処理について図８を参照して説明する。 In the above embodiment, in the data registration process (FIG. 3) by the data registration unit 1 of the host computer 10, each function for obtaining the data transfer time and data processing time of each chunk is input. As a modification of the above embodiment, the time acquisition unit 13 of the data registration unit 1 actually acquires and acquires at least one of the data transfer time and the data processing time of each chunk, and the order determination unit 3 based on the actual measurement value. May determine the processing order of each chunk. Data registration processing by the data registration unit 1 in this case will be described with reference to FIG.

データ登録部１は、処理対象データ、処理内容の入力を受ける（ステップＡ８１）。データ分割部１１は、入力された処理対象データを複数のチャンクに分割する（ステップＡ８２）。データ圧縮部１２は、各チャンクを圧縮する（ステップＡ８３）。時間取得部１３は、圧縮された各チャンクについてデータ転送時間とデータ処理時間を測定する（ステップＡ８４）。具体的には、各チャンクについて、データ転送をコ・プロセッサ２０に指示した時点から、その転送が完了するまでの時間を計測する。また、各チャンクについて、処理を指示した時点から、その処理が完了するまでの時間を計測する。データ登録部１は、圧縮された各チャンクと、入力された処理内容と、測定された各チャンクの転送時間と処理時間をデータ蓄積部２に記憶する（ステップＡ８５）。 The data registration unit 1 receives input of processing target data and processing contents (step A81). The data dividing unit 11 divides the input processing target data into a plurality of chunks (step A82). The data compression unit 12 compresses each chunk (step A83). The time acquisition unit 13 measures the data transfer time and the data processing time for each compressed chunk (step A84). Specifically, for each chunk, the time from when the data transfer is instructed to the co-processor 20 until the transfer is completed is measured. For each chunk, the time from when the processing is instructed until the processing is completed is measured. The data registration unit 1 stores the compressed chunks, the input processing contents, and the measured transfer time and processing time of each chunk in the data storage unit 2 (step A85).

なお、ここでは、データ転送時間とデータ処理時間の双方を測定しているが、どちらか一方だけ測定し、他方は関数等で求めてもよい。 Here, both the data transfer time and the data processing time are measured, but only one of them may be measured, and the other may be obtained by a function or the like.

上述した本発明の実施形態に係るデータ登録部１、順番決定部３、処理指示部４、演算実行部６は、ＣＰＵ等の処理装置が記憶部に格納された動作プログラム等を読み出して実行することにより実現されてもよく、また、ハードウェアで構成されてもよい。上述した実施の形態の一部の機能のみをコンピュータプログラムにより実現することもできる。 The data registration unit 1, the order determination unit 3, the processing instruction unit 4, and the calculation execution unit 6 according to the embodiment of the present invention described above read and execute an operation program or the like stored in a storage unit by a processing device such as a CPU. It may be realized by the above, or may be configured by hardware. Only some functions of the above-described embodiments can be realized by a computer program.

以上、好ましい実施の形態をあげて本発明を説明したが、本発明は必ずしも上記実施の形態に限定されるものではなく、その技術的思想の範囲内において様々に変形し実施することが出来る。 Although the present invention has been described with reference to the preferred embodiments, the present invention is not necessarily limited to the above-described embodiments, and various modifications can be made within the scope of the technical idea.

上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 A part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.

(付記１）
コ・プロセッサにデータを転送して処理させるデータ処理スケジュールリング装置であって、
処理対象のデータを複数のチャンクに分割する分割手段と、
各チャンクのデータ転送時間とデータ処理時間を取得する時間取得手段と、
各チャンクのデータ転送時間とデータ処理時間の大小関係に基づいて、前記各チャンクを前記コ・プロセッサに処理させる順番を決定する順番決定手段と、
を備えることを特徴とするデータ処理スケジュールリング装置。 (Appendix 1)
A data processing scheduling device for transferring data to a co-processor for processing;
A dividing means for dividing the data to be processed into a plurality of chunks;
Time acquisition means for acquiring the data transfer time and data processing time of each chunk;
Order determining means for determining the order in which each of the chunks is processed by the co-processor based on the magnitude relationship between the data transfer time and the data processing time of each chunk;
A data processing scheduling apparatus comprising:

(付記２）
前記順番決定手段は、前記各チャンクを、データ転送時間とデータ処理時間の大小関係に基づいて分別し、データ転送時間の方が小さいチャンクのグループをデータ転送時間の小さい順に処理した後にデータ転送時間の方が大きいチャンクのグループをデータ処理時間の大きい順に処理するように、各チャンクの順番を決定する、
ことを特徴とする付記１に記載のデータ処理スケジュールリング装置。 (Appendix 2)
The order determining means classifies each chunk based on the magnitude relationship between the data transfer time and the data processing time, and processes the group of chunks having the smaller data transfer time in the order of the smaller data transfer time. Determine the order of each chunk so that the larger chunk group is processed in order of the data processing time,
The data processing scheduling apparatus according to supplementary note 1, wherein:

(付記３）
前記順番決定手段は、データ転送時間の方が小さいチャンクのグループをデータ転送時間の小さい順にソートし、データ転送時間の方が大きいチャンクのグループをデータ処理時間の大きい順にソートし、データ転送時間の方が小さいチャンクのグループが先になるように前記２つのグループのソート結果を結合して、チャンクの処理の順番のスケジュールリストを生成する、
ことを特徴とする請求項２に記載のデータ処理スケジュールリング装置。 (Appendix 3)
The order determination means sorts groups of chunks having a smaller data transfer time in ascending order of data transfer time, sorts groups of chunks having a larger data transfer time in order of increasing data processing time, Combine the sorting results of the two groups so that the smaller chunk group comes first, and generate a schedule list of the chunk processing order.
The data processing scheduling apparatus according to claim 2, wherein:

(付記４）
前記時間取得手段は、各チャンクのデータ転送時間とデータ処理時間を計測して取得し、
前記順番決定手段は、前記計測された各チャンクのデータ転送時間とデータ処理時間に基づいて、各チャンクを前記コ・プロセッサに処理させる順番を決定する
ことを特徴とする請求項１から３のいずれか１項に記載のデータ処理スケジュールリング装置。 (Appendix 4)
The time acquisition means measures and acquires the data transfer time and data processing time of each chunk,
The said order determination means determines the order which makes a said co-processor process each chunk based on the measured data transfer time and data processing time of each chunk. The any one of Claim 1 to 3 characterized by the above-mentioned. A data processing scheduling apparatus according to claim 1.

(付記５）
ホストからコ・プロセッサにデータを転送して処理させるデータ処理スケジュールリング方法あって、
処理対象のデータを複数のチャンクに分割し、
各チャンクのデータ転送時間とデータ処理時間を取得し、
各チャンクのデータ転送時間とデータ処理時間の大小関係に基づいて、前記各チャンクを前記コ・プロセッサに処理させる順番を決定する、
ことを特徴とするデータ処理スケジュールリング方法。 (Appendix 5)
There is a data processing scheduling method for transferring data from a host to a co-processor and processing it.
Divide the data to be processed into multiple chunks,
Get the data transfer time and data processing time for each chunk,
Based on the magnitude relationship between the data transfer time and the data processing time of each chunk, the order in which each of the chunks is processed by the co-processor is determined.
A data processing scheduling method characterized by the above.

(付記６）
各チャンクを前記コ・プロセッサに処理させる順番を決定するときに、前記各チャンクを、データ転送時間とデータ処理時間の大小関係に基づいて分別し、データ転送時間の方が小さいチャンクのグループをデータ転送時間の小さい順に処理した後にデータ転送時間の方が大きいチャンクのグループをデータ処理時間の大きい順に処理するように、各チャンクの順番を決定する、
ことを特徴とする請求項５に記載のデータ処理スケジュールリング方法。 (Appendix 6)
When determining the order in which each of the chunks is processed by the co-processor, the chunks are separated based on the magnitude relationship between the data transfer time and the data processing time, and a group of chunks having a smaller data transfer time is determined as data. Determine the order of each chunk so that the group of chunks with the larger data transfer time after processing in ascending order of the transfer time is processed in the order of the larger data processing time.
The data processing scheduling method according to claim 5, wherein:

(付記７）
各チャンクを前記コ・プロセッサに処理させる順番を決定するときに、データ転送時間の方が小さいチャンクのグループをデータ転送時間の小さい順にソートし、データ転送時間の方が大きいチャンクのグループをデータ処理時間の大きい順にソートし、データ転送時間の方が小さいチャンクのグループが先になるように前記２つのグループのソート結果を結合して、チャンクの処理の順番のスケジュールリストを生成する、
ことを特徴とする請求項６に記載のデータ処理スケジュールリング方法。 (Appendix 7)
When determining the order in which the co-processor processes each chunk, the group of chunks with the smaller data transfer time is sorted in ascending order of the data transfer time, and the group of chunks with the longer data transfer time is processed with data. Sorting in descending order of time, combining the sorting results of the two groups so that the group of chunks with the smaller data transfer time comes first, and generating a schedule list of chunk processing order,
The data processing scheduling method according to claim 6.

(付記８）
各チャンクのデータ転送時間とデータ処理時間を計測して取得し、
各チャンクを前記コ・プロセッサに処理させる順番を決定するときに、前記計測された各チャンクのデータ転送時間とデータ処理時間に基づいて、各チャンクを前記コ・プロセッサに処理させる順番を決定する
ことを特徴とする請求項５から７のいずれか１項に記載のデータ処理スケジュールリング方法。 (Appendix 8)
Measure and acquire the data transfer time and data processing time of each chunk,
Determining the order in which each chunk is processed by the co-processor based on the measured data transfer time and data processing time of each chunk when determining the order in which each co-processor is processed by the co-processor. The data processing scheduling method according to any one of claims 5 to 7, wherein:

(付記９）
コンピュータに、
処理対象のデータを複数のチャンクに分割する分割処理、
各チャンクのデータ転送時間とデータ処理時間を取得する時間取得処理、
各チャンクのデータ転送時間とデータ処理時間の大小関係に基づいて、前記各チャンクを前記コ・プロセッサに処理させる順番を決定する順番決定処理、
を実行させることを特徴とするプログラム。 (Appendix 9)
On the computer,
Split processing to divide the data to be processed into multiple chunks,
Time acquisition processing to acquire the data transfer time and data processing time of each chunk,
An order determination process for determining the order in which each of the chunks is processed by the co-processor based on the magnitude relationship between the data transfer time and the data processing time of each chunk;
A program characterized by having executed.

(付記１０）
前記順番決定処理は、前記各チャンクを、データ転送時間とデータ処理時間の大小関係に基づいて分別し、データ転送時間の方が小さいチャンクのグループをデータ転送時間の小さい順に処理した後にデータ転送時間の方が大きいチャンクのグループをデータ処理時間の大きい順に処理するように、各チャンクの順番を決定する、
ことを特徴とする請求項９に記載のプログラム。 (Appendix 10)
In the order determination process, the chunks are classified based on the magnitude relationship between the data transfer time and the data processing time, and the group of chunks having the smaller data transfer time is processed in the order of the data transfer time, and then the data transfer time Determine the order of each chunk so that the larger chunk group is processed in order of the data processing time,
The program according to claim 9.

(付記１１）
前記順番決定処理は、データ転送時間の方が小さいチャンクのグループをデータ転送時間の小さい順にソートし、データ転送時間の方が大きいチャンクのグループをデータ処理時間の大きい順にソートし、データ転送時間の方が小さいチャンクのグループが先になるように前記２つのグループのソート結果を結合して、チャンクの処理の順番のスケジュールリストを生成する、
ことを特徴とする請求項１０に記載のプログラム。 (Appendix 11)
The order determination process sorts groups of chunks with a smaller data transfer time in ascending order of data transfer time, sorts groups of chunks with a larger data transfer time in order of increasing data processing time, Combine the sorting results of the two groups so that the smaller chunk group comes first, and generate a schedule list of the chunk processing order.
The program according to claim 10.

(付記１２）
前記時間取得処理は、各チャンクのデータ転送時間とデータ処理時間を計測して取得し、
前記順番決定処理は、前記計測された各チャンクのデータ転送時間とデータ処理時間に基づいて、各チャンクを前記コ・プロセッサに処理させる順番を決定する
ことを特徴とする請求項９から１１のいずれか１項に記載のプログラム。 (Appendix 12)
The time acquisition process measures and acquires the data transfer time and data processing time of each chunk,
The order determination process determines an order in which the co-processor processes each chunk based on the measured data transfer time and data processing time of each chunk. The program according to item 1.

１データ登録部
２データ蓄積部
３順番決定部
４処理指示部
５データ蓄積部
６演算実行部
７データバス
１０ホストコンピュータ
１１データ分割部
１２データ圧縮部
１３時間取得部
２０コ・プロセッサ DESCRIPTION OF SYMBOLS 1 Data registration part 2 Data storage part 3 Order determination part 4 Processing instruction | indication part 5 Data storage part 6 Operation execution part 7 Data bus 10 Host computer 11 Data division part 12 Data compression part 13 Time acquisition part 20 Co-processor

Claims

A data processing scheduling device for transferring data to a co-processor for processing;
A dividing means for dividing the data to be processed into a plurality of chunks;
Time acquisition means for acquiring the data transfer time and data processing time of each chunk;
Order determining means for determining the order in which each of the chunks is processed by the co-processor based on the magnitude relationship between the data transfer time and the data processing time of each chunk;
A data processing scheduling apparatus comprising:

The order determining means classifies each chunk based on the magnitude relationship between the data transfer time and the data processing time, and processes the group of chunks having the smaller data transfer time in the order of the smaller data transfer time. Determine the order of each chunk so that the larger chunk group is processed in order of the data processing time,
The data processing scheduling apparatus according to claim 1, wherein:

The order determination means sorts groups of chunks having a smaller data transfer time in ascending order of data transfer time, sorts groups of chunks having a larger data transfer time in order of increasing data processing time, Combine the sorting results of the two groups so that the smaller chunk group comes first, and generate a schedule list of the chunk processing order.
The data processing scheduling apparatus according to claim 2, wherein:

The time acquisition means measures and acquires the data transfer time and data processing time of each chunk,
The said order determination means determines the order which makes a said co-processor process each chunk based on the measured data transfer time and data processing time of each chunk. The any one of Claim 1 to 3 characterized by the above-mentioned. A data processing scheduling apparatus according to claim 1.

There is a data processing scheduling method for transferring data from a host to a co-processor and processing it.
Divide the data to be processed into multiple chunks,
Get the data transfer time and data processing time for each chunk,
Based on the magnitude relationship between the data transfer time and the data processing time of each chunk, the order in which each of the chunks is processed by the co-processor is determined.
A data processing scheduling method characterized by the above.

On the computer,
Split processing to divide the data to be processed into multiple chunks,
Time acquisition processing to acquire the data transfer time and data processing time of each chunk,
An order determination process for determining the order in which each of the chunks is processed by the co-processor based on the magnitude relationship between the data transfer time and the data processing time of each chunk;
A program characterized by having executed.