JP4408078B2

JP4408078B2 - Sort processing apparatus, sort processing method and program

Info

Publication number: JP4408078B2
Application number: JP2004350983A
Authority: JP
Inventors: 哲也武尾; 浩司西川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2004-12-03
Filing date: 2004-12-03
Publication date: 2010-02-03
Anticipated expiration: 2024-12-03
Also published as: JP2006163565A

Description

本発明は、主記憶装置上にある大容量のソート対象データをソートするソート処理技術に関する。 The present invention relates to a sort processing technique for sorting large-capacity sort target data on a main storage device.

従来のソート処理方式では、主記憶上に２つの領域を設け、ソート対象データ（レコード）をソートキー部分とその他の部分とに分割し、２つの領域に別々に格納してソートキーの部分だけをソートし、ソートされたソートキーと対応するソートキー以外の部分を結合して出力することにより、広範囲にメモリアクセスが入らないようにしてキャッシュヒット率の向上を行っていた（特許文献１）。
特開２０００−１０７６０号公報 In the conventional sort processing method, two areas are provided on the main memory, the data to be sorted (records) is divided into a sort key part and other parts, and stored separately in the two areas, and only the sort key part is sorted. Then, by combining and outputting the sorted sort key and the part other than the corresponding sort key, the cache hit rate is improved so that memory access does not enter in a wide range (Patent Document 1).
Japanese Patent Laid-Open No. 2000-10760

上述した従来のソート処理方式では、ソートキーのサイズが大きいときは、広範囲にメモリアクセスが入るため、キャッシュミスを頻発するという問題点があった。例えばソートキーのサイズが大きくレコードのサイズに近い値のときは、レコードを分割しても広範囲にわたりメモリアクセスが入ることになる。
この発明は上記のような問題点を解決することを目的の一つとしており、ソート処理の高速化を図るとともに、キャッシュヒット率を向上することを目的とする。 In the conventional sort processing method described above, when the size of the sort key is large, memory access is made in a wide range, so that cache misses frequently occur. For example, when the size of the sort key is large and the value is close to the size of the record, memory access can be performed over a wide range even if the record is divided.
An object of the present invention is to solve the above-described problems, and it is an object of the present invention to increase the speed of the sorting process and to improve the cache hit rate.

本実施の形態に係るソート処理装置は、
キャッシュメモリを用いて、ソート対象データのソートを行うソート処理装置であって、
前記キャッシュメモリのキャッシュサイズに基づきブロックサイズを算定し、算定したブロックサイズに基づいてソート対象データを複数のブロックに分割するブロック分割部と、
前記キャッシュメモリを用いて、前記ブロック分割部により分割されたブロック毎にブロック内のソートを行うブロック内ソート処理部と、
前記キャッシュメモリを用いて、前記ブロック内ソート処理部によるソート後の各ブロックをマージして、ソート対象データのソートを行うブロック間マージ処理部とを有することを特徴とする。 The sort processing device according to the present embodiment is
A sort processing device that sorts data to be sorted using a cache memory,
A block division unit that calculates a block size based on the cache size of the cache memory, and divides the data to be sorted into a plurality of blocks based on the calculated block size;
Using the cache memory, an intra-block sort processing unit that performs intra-block sorting for each block divided by the block dividing unit;
It has an inter-block merge processing unit that uses the cache memory to merge blocks after sorting by the intra-block sort processing unit and sort the data to be sorted.

本発明によれば、キャッシュサイズに適合させたブロックサイズに分割することで、キャッシュメモリを有効に活用して高速なソート処理を実現することができ、また、キャッシュヒット率を向上させることができる。 According to the present invention, by dividing into block sizes adapted to the cache size, high-speed sort processing can be realized by effectively using the cache memory, and the cache hit rate can be improved. .

実施の形態１．
図１は本発明の一実施形態の構成を示す図である。図１において、ソート処理装置１００は、主記憶装置１上のソート対象データ２のソート処理を実行し、ソート後のソート結果データ３を主記憶装置１に出力する。なお、図１に示していないが、ソート処理装置１００は、キャッシュメモリ（１次キャッシュ、２次キャッシュ）を有するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を用いてソート処理を実行する。ソート処理装置１００には、ブロック内ソート処理部４と、ブロック間マージ処理部５と、ブロック情報保持部６と、ブロック分割部１０と、ＣＰＵＩＤ検出部１０１と、ＣＰＵ情報保持部１０２が含まれる。 Embodiment 1 FIG.
FIG. 1 is a diagram showing the configuration of an embodiment of the present invention. In FIG. 1, the sort processing device 100 executes the sort processing of the sort target data 2 on the main storage device 1, and outputs the sorted result data 3 after sorting to the main storage device 1. Although not shown in FIG. 1, the sort processing device 100 executes sort processing using a CPU (Central Processing Unit) having a cache memory (primary cache, secondary cache). The sort processing apparatus 100 includes an intra-block sort processing unit 4, an inter-block merge processing unit 5, a block information holding unit 6, a block dividing unit 10, a CPU ID detection unit 101, and a CPU information holding unit 102. It is.

ソート処理装置１００はハードウェアで構成されていてもよいし、ソート処理装置１００の要素の全てあるいは一部をＣＰＵ上で動作可能なプログラムにより構成してもよい。或いは、ＲＯＭに記憶されたファームウェアで実現されていてもよい。或いは、ソフトウェアとハードウェアの組合せ、ソフトウェアとハードウェアとファームウェアとの組み合わせ等で実現されてもよい。 The sort processing apparatus 100 may be configured by hardware, or all or part of the elements of the sort processing apparatus 100 may be configured by a program operable on the CPU. Alternatively, it may be realized by firmware stored in the ROM. Alternatively, it may be realized by a combination of software and hardware, a combination of software, hardware and firmware, or the like.

ＣＰＵＩＤ検出部は、ソート処理に使用する（計算機に搭載されている）ＣＰＵのＩＤ（識別情報）を検出する。ＣＰＵＩＤ検出部１０１は、識別情報検出部の例である。
ＣＰＵ情報保持部１０２は、ＣＰＵが有する１次キャッシュサイズ、２次キャッシュサイズの情報（キャッシュサイズ情報）を保持する。図２は、ＣＰＵ情報保持部が保持するキャッシュサイズ情報の例を示す。例えばＣＰＵＩＤ検出部により検出されたＣＰＵＩＤが“Ａ０２”であったとすると、１次キャッシュサイズが１６ＫＢ、２次キャッシュサイズが１ＭＢという情報が抽出される。ＣＰＵ情報保持部１０２はキャッシュサイズ情報保持部の例である。 The CPU ID detection unit detects the ID (identification information) of the CPU (mounted on the computer) used for the sort process. The CPU ID detection unit 101 is an example of an identification information detection unit.
The CPU information holding unit 102 holds primary cache size and secondary cache size information (cache size information) of the CPU. FIG. 2 shows an example of cache size information held by the CPU information holding unit. For example, if the CPU ID detected by the CPU ID detection unit is “A02”, information that the primary cache size is 16 KB and the secondary cache size is 1 MB is extracted. The CPU information holding unit 102 is an example of a cache size information holding unit.

ブロック分割部１０は、ＣＰＵＩＤ検出部１０１により検出されたＣＰＵＩＤとＣＰＵ情報保持部１０２から得られるキャッシュサイズ情報によりソート対象データ２をブロック分割する際のブロックサイズを決定し、ソート対象データ２を複数のブロックに分割する。ブロックサイズの算出手順の詳細は、後述する。また、ブロック分割部１０は分割したブロックのアドレス情報をブロック情報保持部６に格納する。 The block division unit 10 determines the block size when the sort target data 2 is divided into blocks based on the CPU ID detected by the CPU ID detection unit 101 and the cache size information obtained from the CPU information holding unit 102, and the sort target data 2 Is divided into a plurality of blocks. Details of the block size calculation procedure will be described later. The block dividing unit 10 stores the address information of the divided blocks in the block information holding unit 6.

ブロック内ソート処理部４はブロック情報保持部に格納されている全てのブロックにおいてソート対象データのブロック内ソートを実行する。
ブロック間マージ処理部５はソートされた各ブロックのマージを実行し、その結果をソート結果データ３として主記憶装置上に出力する。 The intra-block sort processing unit 4 performs intra-block sorting of the data to be sorted in all blocks stored in the block information holding unit.
The inter-block merge processing unit 5 executes merging of the sorted blocks, and outputs the result as sort result data 3 on the main storage device.

次に、図３〜図６を参照して、ソート処理装置１００の動作について説明する。
まず、ステップＳ３０１において、ＣＰＵＩＤ検出部１０１がマイクロ命令によりＣＰＵのＩＤを検出する。例えばｉｎｔｅｌ（登録商標）社製マイクロプロセッサのＸｅｏｎ（登録商標）プロセッサでは”ｃｐｕｉｄ”という命令でＣＰＵのＩＤをＣＰＵから直接読み出すことができる。 Next, the operation of the sort processing apparatus 100 will be described with reference to FIGS.
First, in step S301, the CPU ID detection unit 101 detects the CPU ID by a micro instruction. For example, in the Xeon (registered trademark) processor of the microprocessor manufactured by Intel (registered trademark), the CPU ID can be directly read from the CPU by the instruction “cpuid”.

次に、ステップＳ３０２において、ブロック分割部１０がＣＰＵＩＤ検出部１０１で得たＣＰＵのＩＤに基づきＣＰＵ情報保持部１０２から該当するＣＰＵのキャッシュメモリの情報を取得し、取得したキャッシュメモリの情報に基づきキャッシュメモリのキャッシュサイズを判定する。 Next, in step S302, the block dividing unit 10 acquires the cache memory information of the corresponding CPU from the CPU information holding unit 102 based on the CPU ID obtained by the CPU ID detecting unit 101, and the acquired cache memory information is included in the acquired cache memory information. Based on this, the cache size of the cache memory is determined.

次に、ステップＳ３０３において、ブロック分割部１０がソート対象データ２の分割のためのブロックサイズを算出する（ブロック分割ステップ）。
ブロックサイズの算出の基準は、各ブロックの最小値で構成する部分データ列のサイズがＣＰＵの１次キャッシュにヒットするとともに、１ブロックのサイズがＣＰＵの２次キャッシュにヒットする大きさとする。図６に示すように、ソート対象データ２を各ブロックに分割した後、各ブロックの最小値を抽出し、各ブロックの最小値で構成する部分データ列を生成する。この各ブロックの最小値で構成する部分データ列が１次キャッシュサイズに収まる範囲であって、１ブロックのサイズが２次キャッシュサイズに収まるようにブロックサイズを算出する。 Next, in step S303, the block dividing unit 10 calculates a block size for dividing the sort target data 2 (block dividing step).
The standard for calculating the block size is such that the size of the partial data string formed by the minimum value of each block hits the CPU primary cache and the size of one block hits the CPU secondary cache. As shown in FIG. 6, after the sorting target data 2 is divided into blocks, the minimum value of each block is extracted, and a partial data string composed of the minimum value of each block is generated. The block size is calculated so that the partial data string composed of the minimum value of each block is within the range of the primary cache size and the size of one block is within the secondary cache size.

ＣＰＵの１次キャッシュサイズ：Ｃ１、ＣＰＵの２次キャッシュサイズ：Ｃ２、ソート対象データの１件当たりのレコード長：ＲＬ、ソート対象データの全件のデータサイズ（１件当たりのレコード長×件数）：ＡＲ、（ソート対象データ分割の際の）ブロック数：ＢＮ、（ソート対象データ分割の際の）ブロックサイズ：ＢＳとし、ＢＮ＝Ｃ１／ＲＬ、ＢＳ＝ＡＲ／ＢＮと設定する。
ここで、ＢＳ＜Ｃ２を満たすことでＣＰＵの２次キャッシュ内でのブロック内ソートを行うことができる。 CPU primary cache size: C1, CPU secondary cache size: C2, sort target data per record length: RL, data size of all sort target data (record length per record x number of records) : AR, number of blocks (when sorting target data is divided): BN, block size (when sorting target data is divided): BS, BN = C1 / RL, BS = AR / BN.
Here, by satisfying BS <C2, sorting within the block in the secondary cache of the CPU can be performed.

次に、ステップＳ３０４において、Ｓ３０３で算出したブロックサイズによりソート対象データ２を複数のブロックに分割する（ブロック分割ステップ）。
例えばＣＰＵＩＤ検出部１０１により検出されたＣＰＵＩＤが“Ａ０２”であったとすると、図２に示すＣＰＵ情報保持部から１次キャッシュサイズが１６ＫＢ、２次キャッシュサイズが１ＭＢであることがわかる。次に図４においてレコード長が１００Ｂ、１，０００，０００件のレコードがあったとするとソート対象データは１００，０００，０００Ｂである。これを１ブロックのサイズを１，０００，０００Ｂ−１００Ｂ（１レコード分）で分割する（４０１）。１００Ｂ（１レコード分）減らして計算するのは、ソート実行時のデータ交換用レコードを追加するためである。１，０００，０００Ｂ−１００Ｂにてソート対象データを分割すると、９，９９９件のレコードによるブロックが１００個と、１００件のレコードによるブロックが１個に分割される。各ブロックにデータ交換用の１レコード分を追加すると、サイズ１，０００，０００Ｂのブロック（９，９９９件のレコード（９，９９９件×１００Ｂ＝９９９，９００Ｂ）＋データ交換用１レコード（１００Ｂ））１００個と、サイズ１０，１００Ｂの（１００件のレコード（１００件×１００Ｂ＝１０，０００Ｂ）＋データ交換用１レコード（１００Ｂ））１個が得られる。この場合、１０１件のレコードによる最小値のデータ列が生成されるが、最小値のデータ列のサイズは１０１件となり約１０ＫＢで１次キャッシュサイズが１６ＫＢとするとソート実行時のデータ交換用の１レコード分を含めても１次キャッシュメモリにおさまるサイズである。 Next, in step S304, the sort target data 2 is divided into a plurality of blocks based on the block size calculated in S303 (block division step).
For example, if the CPU ID detected by the CPU ID detection unit 101 is “A02”, it can be seen from the CPU information holding unit shown in FIG. 2 that the primary cache size is 16 KB and the secondary cache size is 1 MB. Next, in FIG. 4, if there is a record with a record length of 100B and 1,000,000 records, the sort target data is 100,000,000B. This is divided into blocks of 1,000,000B-100B (one record) (401). The reason for calculating by reducing 100B (one record) is to add a record for data exchange at the time of sorting. When the sort target data is divided by 1,000,000B-100B, 100 blocks of 9,999 records and one block of 100 records are divided into one. When one record for data exchange is added to each block, a block of size 1,000,000B (9,999 records (9,999 records × 100B = 999,900B) + one record for data exchange (100B) ) 100 and a size of 10,100B (100 records (100 records × 100B = 10,000B) + one record for data exchange (100B)) are obtained. In this case, the minimum value data string is generated by 101 records, but the size of the minimum value data string is 101, and if the primary cache size is 16 KB when the primary cache size is 16 KB, 1 for data exchange at the time of sort execution Even if records are included, the size fits in the primary cache memory.

また、ステップＳ３０４にてブロック分割部１０がブロック分割を行う際に、ブロック情報保持部６に各ブロックのアドレス情報を通知し、ブロック情報保持部６は各ブロックのアドレス情報を保持する。このため、ブロック内ソート処理部４はブロック情報保持部６により各ブロックのアドレス情報を得ることができる。ブロック内ソート処理部４は、ステップＳ３０５において、ブロック分割部１０により分割された全てのブロックについて２次キャッシュを用いてブロック内ソートを実行する（ブロック内ソート処理ステップ）。例えばクイックソートを用いればデータの参照、データの交換ともキャッシュメモリ上で高速に実行することができる。 Further, when the block dividing unit 10 performs block division in step S304, the block information holding unit 6 is notified of the address information of each block, and the block information holding unit 6 holds the address information of each block. For this reason, the intra-block sort processing unit 4 can obtain the address information of each block by the block information holding unit 6. In step S305, the intra-block sort processing unit 4 performs intra-block sort using the secondary cache for all blocks divided by the block dividing unit 10 (intra-block sort processing step). For example, if quick sort is used, both data reference and data exchange can be executed at high speed on the cache memory.

全てのブロックについてブロック内ソートが完了すると、次にステップＳ３０６において、ブロック間マージ処理部５が各ブロックのマージを行う（ブロック間マージ処理ステップ）。ブロック間のマージ処理を図５のフローチャートに従って説明する。先ず、ブロック内ソートされたＮ個のブロックの最小値によるデータ列を生成する（Ｓ３０６１）。次に生成したデータ列をソートする（Ｓ３０６２）。このときソートされた各データはブロック情報保持部６によりどのブロックのデータであるかを判別することができる。ソートされたデータ列の最小値をソート結果データの最小値として主記憶上に出力する（Ｓ３０６３）。出力されたデータが属していたブロックに次の最小値のデータがあるか否かを判断し（Ｓ３０６４）、次の最小値のデータがあれば、これを最小値のデータ列に挿入するとともに（Ｓ３０６５）、最小値のデータ列をソートする処理（Ｓ３０６２）から繰り返して実行する。図６のケースでは、ブロック２に属していた“８”が最小値データ列内の最小値として出力されたため、ブロック２の次の最小値である“２２”が最小値データ列に挿入される。 When the intra-block sorting is completed for all the blocks, the inter-block merge processing unit 5 then merges each block in step S306 (inter-block merge processing step). The merge processing between blocks will be described with reference to the flowchart of FIG. First, a data string based on the minimum value of N blocks sorted in a block is generated (S3061). Next, the generated data string is sorted (S3062). Each block sorted at this time can be identified by the block information holding unit 6 as to which block data. The minimum value of the sorted data string is output on the main memory as the minimum value of the sort result data (S3063). It is determined whether there is the next minimum value data in the block to which the output data belonged (S3064). If there is the next minimum value data, it is inserted into the minimum value data string ( S3065), the process is repeated from the process of sorting the minimum value data string (S3062). In the case of FIG. 6, since “8” belonging to block 2 is output as the minimum value in the minimum value data string, “22” that is the next minimum value of block 2 is inserted into the minimum value data string. .

このように最小値を出力したデータのブロックから次の最小値を最小値データ列に挿入していくことによりソート対象データ全体でのソートを行う。
ブロックが空になるとそのブロックはマージの対象からはずされ、残りのブロックだけでマージ処理を行っていく。出力した最小値のブロックが空の場合、出力した最小値の次に小さいデータを出力する。そして出力したデータの属するブロックから最小値のデータ列へ次に小さいデータを挿入する。 In this way, the next minimum value is inserted into the minimum value data string from the block of data that has output the minimum value, thereby sorting the entire data to be sorted.
When a block becomes empty, the block is removed from the merge target, and the merge process is performed only with the remaining blocks. When the output minimum value block is empty, the next smallest data is output after the output minimum value. Then, the next smallest data is inserted from the block to which the output data belongs to the minimum value data string.

出力したデータの属するブロックから次に小さい値の最小値のデータ列への挿入、そのデータ列の最小値の主記憶装置への出力、を繰り返して行き、全てのブロックが空になった時点でソート対象データのソートが完了する（Ｓ３０６６）。
以上の処理を経て、主記憶装置１には、ソート結果データ３が得られることになる。 When the block to which the output data belongs is inserted into the next smallest minimum value data string and the minimum value of the data string is output to the main memory, when all the blocks are empty Sorting of the data to be sorted is completed (S3066).
Through the above processing, sort result data 3 is obtained in the main storage device 1.

なお、以上の処理では、各ブロック内の最小値を抽出して最小値データ列を生成し、最小値データ列内の最小値を順に主記憶装置上に出力していったが、各ブロック内の最大値を抽出して最大値データ列を生成し、最大値データ列内の最大値を順に主記憶装置上に出力するようにしてもよい。 In the above processing, the minimum value in each block is extracted to generate a minimum value data string, and the minimum value in the minimum value data string is sequentially output to the main storage device. The maximum value may be extracted to generate a maximum value data string, and the maximum value in the maximum value data string may be output to the main storage device in order.

このように、本実施の形態に係るソート処理装置は、主記憶装置上にある大容量のソート対象データをソートするために、ＣＰＵのＩＤを検出するＣＰＵＩＤ検出部、ＣＰＵのＩＤとＣＰＵのキャッシュメモリの情報を保持するＣＰＵ情報保持部、ＣＰＵ情報保持部により得たＣＰＵのキャッシュメモリの情報によりソート対象データを複数のブロックに分割するブロック分割部、分割したブロックの情報を保持するブロック情報保持部、分割されたブロック毎にソート対象データをソートするブロック内ソート部、ソートされた各ブロックについてマージを実行してソート対象データ全体のソート結果を得るブロック間マージ処理部を備え、ブロック分割部１０が、ソート処理に用いるＣＰＵの１次キャッシュ及び２次キャッシュのキャッシュサイズの情報をＣＰＵ情報保持部１０２から取得し、１次キャッシュ及び２次キャッシュのキャッシュサイズに基づきソート対象データ２を分割する際のブロックサイズを算定し、算定したブロックサイズに基づいてソート対象データを複数のブロックに分割し、ブロック内ソート処理部４が、ＣＰＵの２次キャッシュを用いて、ブロック分割部１０により分割されたブロック毎にブロック内のソートを行い、ブロック間マージ処理部５が、１次キャッシュを用いて、ブロック内ソート処理部４によるソート後の各ブロックをマージして、ソート対象データのソートを行い、ソート後のソート結果データ３を主記憶装置１に出力することを特徴とする。 As described above, the sort processing apparatus according to the present embodiment sorts the large-capacity sort target data on the main storage device, the CPU ID detection unit for detecting the CPU ID, the CPU ID and the CPU ID CPU information holding unit for holding cache memory information, block dividing unit for dividing sort target data into a plurality of blocks based on CPU cache memory information obtained by the CPU information holding unit, and block information for holding divided block information The block division includes a holding unit, an intra-block sort unit that sorts the sort target data for each divided block, and an inter-block merge processing unit that performs a merge on each sorted block and obtains a sort result of the entire sort target data. The CPU 10 uses the CPU's primary cache and secondary cache for the sort process. The size information is acquired from the CPU information holding unit 102, the block size for dividing the sort target data 2 is calculated based on the cache sizes of the primary cache and the secondary cache, and the sort target is calculated based on the calculated block size. The data is divided into a plurality of blocks, and the intra-block sort processing unit 4 sorts the blocks for each block divided by the block division unit 10 using the CPU secondary cache, and the inter-block merge processing unit 5 However, using the primary cache, merging each block after sorting by the intra-block sort processing unit 4 to sort the data to be sorted, and outputting the sorted result data 3 to the main storage device 1 It is characterized by.

そして、本実施の形態に係るソート処理装置は、ソート対象データの分割のためのブロックサイズを２次キャッシュサイズ以内にすることで、２次キャッシュメモリ上でブロック内ソートを高速に実行することができる。また、ソート対象データの分割のためのブロックサイズを最小値データ列のデータサイズが１次キャッシュサイズ以内となるサイズとすることで、ブロック内ソート後の最小値データ列のソートを１次キャッシュメモリ上で高速に実行することができる。これにより、主記憶上の大規模なデータを高速にソートすることができ、また、キャッシュヒット率を向上させることができる。 The sort processing device according to the present embodiment can execute the intra-block sort at high speed on the secondary cache memory by setting the block size for dividing the data to be sorted within the secondary cache size. it can. Further, by setting the block size for dividing the data to be sorted to a size in which the data size of the minimum value data string is within the primary cache size, the sorting of the minimum value data string after the intra-block sort is performed in the primary cache memory. It can be executed at high speed. Thereby, large-scale data on the main memory can be sorted at a high speed, and the cache hit rate can be improved.

実施の形態２．
以上の実施の形態１では、ＣＰＵＩＤを直接読み出していたが、次にこれを外部から入力する場合の実施形態を示す。 Embodiment 2. FIG.
In the first embodiment described above, the CPU ID is directly read out. Next, an embodiment in which this is input from the outside will be described.

図７は、本実施の形態の構成を示す図であり、実施の形態１のＣＰＵＩＤ検出部１０１及びＣＰＵ情報保持部１０２の代わりに、ＣＰＵ情報読み込み部１０３を挿入した構成である。ソート処理装置１００の他の構成要素、ブロック内ソート処理部４、ブロック間マージ処理部５、ブロック情報保持部６、ブロック分割部１０は実施の形態１とである。 FIG. 7 is a diagram showing a configuration of the present embodiment, in which a CPU information reading unit 103 is inserted instead of the CPU ID detection unit 101 and the CPU information holding unit 102 of the first embodiment. Other components of the sort processing apparatus 100, the intra-block sort processing unit 4, the inter-block merge processing unit 5, the block information holding unit 6, and the block dividing unit 10 are the same as those in the first embodiment.

ＣＰＵ情報読み込み部１０３は、ユーザからＣＰＵのキャッシュサイズ情報を入力する。ＣＰＵ情報読み込み部１０３は、キャッシュサイズ情報入力部の例である。
なお、本実施の形態においても、ソート処理装置１００はハードウェアで構成されていてもよいし、ソート処理装置１００の要素の全てあるいは一部をＣＰＵ上で動作可能なプログラムにより構成してもよい。或いは、ＲＯＭに記憶されたファームウェアで実現されていてもよい。或いは、ソフトウェアとハードウェアの組合せ、ソフトウェアとハードウェアとファームウェアとの組み合わせ等で実現されてもよい。 The CPU information reading unit 103 inputs CPU cache size information from the user. The CPU information reading unit 103 is an example of a cache size information input unit.
Also in this embodiment, the sort processing apparatus 100 may be configured by hardware, or all or part of the elements of the sort processing apparatus 100 may be configured by a program operable on the CPU. . Alternatively, it may be realized by firmware stored in the ROM. Alternatively, it may be realized by a combination of software and hardware, a combination of software, hardware and firmware, or the like.

次に、図８に従って本実施の形態に係るソート処理装置１００の動作を説明する。
ステップＳ８０１において、ＣＰＵ情報読み込み部１０３が、ユーザからＣＰＵのキャッシュサイズ情報を入力する。入力するキャッシュサイズ情報には、ＣＰＵの１次キャッシュメモリのサイズと２次キャッシュメモリのサイズが示されている。次に、ステップＳ８０２において、ブロック分割部１０が、ＣＰＵ情報読み込み部１０３が読み込んだキャッシュサイズ情報によりソート対象データ２を分割する際のブロックサイズを算出する。ブロックサイズの算出方法は実施の形態１で示したものと同様である。また、以降、実施の形態１と同様にして、ソート対象データを分割し（Ｓ８０３）、分割後のブロック内のソートを実行し（Ｓ８０４）、ブロック内ソート後の各ブロックのマージを行う（Ｓ８０５）。 Next, the operation of the sort processing apparatus 100 according to the present embodiment will be described with reference to FIG.
In step S801, the CPU information reading unit 103 inputs CPU cache size information from the user. The input cache size information indicates the size of the primary cache memory and the size of the secondary cache memory of the CPU. Next, in step S <b> 802, the block dividing unit 10 calculates a block size when dividing the sort target data 2 based on the cache size information read by the CPU information reading unit 103. The block size calculation method is the same as that shown in the first embodiment. In the same manner as in the first embodiment, the data to be sorted is divided (S803), the divided blocks are sorted (S804), and the blocks after the intra-block sorting are merged (S805). ).

このように、本実施の形態に係るソート処理装置は、主記憶装置上にある大容量のソート対象データをソートするために、ＣＰＵのキャッシュメモリの情報を読み込むＣＰＵ情報読み込み部、ＣＰＵ情報読み込み部により得たＣＰＵのキャッシュメモリ情報によってソート対象データを複数のブロックに分割するブロック分割部、分割したブロックの情報を保持するブロック情報保持部、分割されたブロック毎にソート対象データをソートするブロック内ソート部、ソートされた各ブロックについてマージを実行してソート対象データ全体のソート結果を得るブロック間マージ処理部を備えることを特徴とする。 As described above, the sort processing device according to the present embodiment has a CPU information reading unit and a CPU information reading unit that read information in the cache memory of the CPU in order to sort a large amount of data to be sorted on the main storage device. A block division unit that divides the data to be sorted into a plurality of blocks based on the CPU cache memory information obtained by the above, a block information holding unit that holds information on the divided blocks, and a block that sorts the data to be sorted for each divided block The sorting unit includes an inter-block merge processing unit that performs merging on each sorted block and obtains a sorting result of the entire sort target data.

実施の形態１では、読み出したＣＰＵＩＤのキャッシュメモリの情報がＣＰＵ情報保持部にない場合はブロック分割するための情報を得られなかったが、本実施の形態ではユーザがＣＰＵのキャッシュメモリの情報を入力し、ＣＰＵ情報読み込み部が読み込むことで、ブロック分割のサイズを決定することができる。 In the first embodiment, when the CPU information holding unit does not have the read CPU ID cache memory information, the information for dividing the block cannot be obtained. However, in this embodiment, the user does not have the CPU cache memory information. , And the CPU information reading unit reads the block division size.

実施の形態３．
以上の実施の形態１及び実施の形態２では、ＣＰＵのキャッシュメモリの使用状況にかかわらず、キャッシュメモリのメモリ容量を１００％使用可能との前提に基づいてブロックサイズを算出したが、本実施の形態では、キャッシュメモリの使用状況を勘案してブロックサイズを算出する場合について説明する。 Embodiment 3 FIG.
In the first and second embodiments described above, the block size is calculated based on the assumption that the memory capacity of the cache memory can be used 100% regardless of the use state of the cache memory of the CPU. In the embodiment, a case where the block size is calculated in consideration of the usage state of the cache memory will be described.

図９は、本実施の形態の構成を示す図であり、実施の形態１に示すソート処理装置にキャッシュ使用率情報保持部１０４を追加した構成である。
キャッシュ使用率情報保持部１０４は、ＣＰＵごとに１次キャッシュ及び２次キャッシュの使用率の統計的情報を保持している。キャッシュ使用率情報保持部１０４は、例えば、時間帯ごとの１次キャッシュ及び２次キャッシュの平均使用率を示す情報を保持している。 FIG. 9 is a diagram showing a configuration of the present embodiment, which is a configuration in which a cache usage rate information holding unit 104 is added to the sort processing apparatus shown in the first embodiment.
The cache usage rate information holding unit 104 holds statistical information on the usage rates of the primary cache and the secondary cache for each CPU. For example, the cache usage rate information holding unit 104 holds information indicating the average usage rate of the primary cache and the secondary cache for each time period.

また、本実施の形態のブロック分割部１０は、ＣＰＵ情報保持部１０２のＣＰＵのキャッシュサイズ情報とキャッシュ使用率情報保持部１０４のキャッシュ使用率の情報とを用いて、ソート対象データ分割の際のブロックサイズを算出する。例えば、図２の“Ａ０２”のＣＰＵがＣＰＵＩＤ検出部１０１により検出された場合を想定する。このとき、キャッシュ使用率情報保持部１０４のキャッシュ使用率情報に、ＣＰＵ “ＡＯ２”の使用率として、例えば１次キャッシュ：５０％、２次キャッシュ：５０％と示されていれば、ブロック分割部１０は、それぞれのキャッシュサイズの５０％である１次キャッシュ：８ＫＢ、２次キャッシュ：０．５ＭＢをブロックサイズ算出のための基準値として用い、最小値データ列のサイズが１次キャッシュ：８ＫＢの範囲内に収まるとともに、一つのブロックが２次キャッシュ：０．５ＭＢ以内となるようにブロックサイズを算出する。 In addition, the block dividing unit 10 according to the present embodiment uses the CPU cache size information of the CPU information holding unit 102 and the cache usage rate information of the cache usage rate information holding unit 104 to perform sorting target data division. Calculate the block size. For example, it is assumed that the CPU ID “A02” in FIG. 2 is detected by the CPU ID detection unit 101. At this time, if the cache usage rate information of the cache usage rate information holding unit 104 indicates the usage rate of the CPU “AO2”, for example, primary cache: 50%, secondary cache: 50%, the block dividing unit 10 is 50% of each cache size, primary cache: 8 KB, secondary cache: 0.5 MB is used as a reference value for block size calculation, and the size of the minimum value data string is primary cache: 8 KB. The block size is calculated so that it is within the range and one block is within the secondary cache: 0.5 MB.

なお、ブロックサイズ算出の際に、キャッシュ使用率情報保持部１０４のキャッシュ使用率情報を参照する点以外は、実施の形態１と同様である。
また、図９では、図１の構成にキャッシュ使用率情報保持部１０４を追加した構成としているが、図７の構成にキャッシュ使用率情報保持部１０４を追加した構成としてもよい。この場合は、ブロック分割部１０は、ＣＰＵ情報読み込み部１０３が読み込んだＣＰＵのキャッシュサイズ情報とキャッシュ使用率情報保持部１０４のキャッシュ使用率情報とを用いて、ソート対象データ分割の際のブロックサイズを算出する。 The block size calculation is the same as that of the first embodiment except that the cache usage rate information in the cache usage rate information holding unit 104 is referred to.
In FIG. 9, the cache usage rate information holding unit 104 is added to the configuration of FIG. 1, but the cache usage rate information holding unit 104 may be added to the configuration of FIG. 7. In this case, the block dividing unit 10 uses the cache size information of the CPU read by the CPU information reading unit 103 and the cache usage rate information of the cache usage rate information holding unit 104 to block the size of the data to be sorted. Is calculated.

なお、本実施の形態においても、ソート処理装置１００はハードウェアで構成されていてもよいし、ソート処理装置１００の要素の全てあるいは一部をＣＰＵ上で動作可能なプログラムにより構成してもよい。或いは、ＲＯＭに記憶されたファームウェアで実現されていてもよい。或いは、ソフトウェアとハードウェアの組合せ、ソフトウェアとハードウェアとファームウェアとの組み合わせ等で実現されてもよい。 Also in this embodiment, the sort processing apparatus 100 may be configured by hardware, or all or part of the elements of the sort processing apparatus 100 may be configured by a program operable on the CPU. . Alternatively, it may be realized by firmware stored in the ROM. Alternatively, it may be realized by a combination of software and hardware, a combination of software, hardware and firmware, or the like.

本実施の形態よれば、キャッシュメモリの使用状況を勘案してブロックサイズを算出することができる。 According to the present embodiment, the block size can be calculated in consideration of the usage state of the cache memory.

実施の形態１の構成を示したブロック図である。1 is a block diagram showing a configuration of a first embodiment. ＣＰＵ情報保持部が保持するキャッシュサイズ情報の例を示す図である。It is a figure which shows the example of the cache size information which a CPU information holding part hold | maintains. 実施の形態１に係るソート処理装置の動作例を示すフローチャート図である。FIG. 6 is a flowchart illustrating an operation example of the sort processing apparatus according to the first embodiment. ソート対象データを分割しブロック内ソートをし、マージを行う処理を示した模式図である。It is the schematic diagram which showed the process which divides sorting object data, sorts in a block, and performs a merge. ブロック内ソートが完了した後のマージ処理例を示したフローチャート図である。It is the flowchart figure which showed the example of merge processing after the sorting in a block was completed. ソート対象データを分割しブロック内ソートをした後のマージの方法を示した模式図である。It is the schematic diagram which showed the method of the merge after dividing | segmenting sorting object data and sorting in a block. 実施の形態２の構成を示したブロック図である。5 is a block diagram showing a configuration of a second embodiment. FIG. 実施の形態２に係るソート処理装置の動作例を示すフローチャート図である。FIG. 10 is a flowchart illustrating an operation example of the sort processing apparatus according to the second embodiment. 実施の形態３の構成を示したブロック図である。FIG. 6 is a block diagram showing a configuration of a third embodiment.

Explanation of symbols

１主記憶装置、２ソート対象データ、３ソート結果データ、４ブロック内ソート処理部、５ブロック間マージ処理部、６ブロック情報保持部、１０ブロック分割部、１００ソート処理装置、１０１ＣＰＵＩＤ検出部、１０２ＣＰＵ情報保持部、１０３ＣＰＵ情報読み込み部、１０４キャッシュ使用率情報保持部。 1 main storage device, 2 sort target data, 3 sort result data, 4 intra-block sort processing unit, 5 inter-block merge processing unit, 6 block information holding unit, 10 block dividing unit, 100 sort processing unit, 101 CPU ID detection unit , 102 CPU information holding unit, 103 CPU information reading unit, 104 Cache usage rate information holding unit.

Claims

A sort processing device that sorts data to be sorted using a cache memory having a primary cache and a secondary cache ,
A block division unit that calculates a block size based on the cache size of the cache memory, and divides the data to be sorted into a plurality of blocks based on the calculated block size;
Using the cache memory, an intra-block sort processing unit that performs intra-block sorting for each block divided by the block dividing unit;
Using the cache memory, the maximum value data or the minimum value data in each block is extracted from each block after being sorted by the intra-block sort processing unit, and the extracted maximum value data of each block Alternatively, a data string based on the minimum value data is generated, each data is extracted in order from the maximum value data or the minimum value data in the generated data string, and each block after sorting by the intra-block sort processing unit is extracted. possess the interblock merge processing unit for merging,
The block dividing unit
The block size for dividing the data to be sorted is a size within the cache size of the secondary cache, and the data size of the data string generated by the inter-block merge processing unit is within the cache size of the primary cache. Sort processing apparatus characterized by having a size of

The intra-block sort processing unit
Using the secondary cache, sorting within the block for each block divided by the block dividing unit,
The inter-block sort processing unit
The sort processing apparatus according to claim 1, wherein each block after sorting by the intra-block sort processing unit is merged using the primary cache.

The sort processing device further includes:
An identification information detection unit for detecting identification information of a CPU (Central Processing Unit) of the cache memory used for sorting the data to be sorted;
A cache size information holding unit for holding cache size information indicating the cache size of each cache memory for a plurality of types of CPUs;
The block dividing unit
Based on the CPU identification information detected by the identification information detection unit and the cache size information of the cache size information holding unit, the cache size of the cache memory used for sorting the sort target data is determined, and based on the determined cache size 2. The sort processing apparatus according to claim 1, wherein a block size is calculated.

The sort processing device further includes:
A cache size information input unit for inputting cache size information indicating the cache size of the cache memory used for sorting the data to be sorted;
The block dividing unit
The cache size of the cache memory used for sorting the sort target data is determined based on the cache size information input by the cache size information input unit, and the block size is calculated based on the determined cache size. The sort processing apparatus according to 1.

A sort processing method for sorting data to be sorted using a cache memory having a primary cache and a secondary cache ,
A block dividing step of calculating a block size based on the cache size of the cache memory and dividing the sort target data into a plurality of blocks based on the calculated block size;
An intra-block sorting process step that sorts the blocks for each block divided by the block dividing step using the cache memory;
Using the cache memory, the maximum value data or the minimum value data in each block is extracted from each block after the sorting by the intra-block sorting process step, and the extracted maximum value data of each block Alternatively, a data string based on the minimum value data is generated, each data is extracted in order from the maximum value data or the minimum value data in the generated data string, and each block after sorting by the intra-block sort processing step is extracted. have a merge process steps between blocks to be merged,
In the block dividing step ,
The block size for dividing the data to be sorted is within the cache size of the secondary cache, and the data size of the data string generated by the inter-block merge processing step is within the cache size of the primary cache A sort processing method characterized in that a size is calculated .

A program for sorting data to be sorted using a cache memory having a primary cache and a secondary cache ,
A block division process for calculating a block size based on the cache size of the cache memory and dividing the sort target data into a plurality of blocks based on the calculated block size;
Using the cache memory, an intra-block sort process that performs an intra-block sort for each block divided by the block division process;
Using the cache memory, the maximum value data or the minimum value data in each block is extracted from each block after being sorted by the intra-block sort process, and the maximum value data of each extracted block or A data string based on the minimum value data is generated, each data is extracted in order from the maximum value data or the minimum value data in the generated data string, and each block after sorting by the intra-block sort processing is merged. Let the computer execute the merge process between blocks ,
In the block division process,
In the computer,
The block size for dividing the data to be sorted is a size within the cache size of the secondary cache, and the data size of the data string generated by the inter-block merge process is within the cache size of the primary cache. The program characterized by calculating the size which becomes .