JP2014126931A

JP2014126931A - Logic circuit device for data processing

Info

Publication number: JP2014126931A
Application number: JP2012281410A
Authority: JP
Inventors: Hironari Hayashizaki; 弘成林▲崎▼; Ai Ito; 愛伊藤
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2012-12-25
Filing date: 2012-12-25
Publication date: 2014-07-07

Abstract

PROBLEM TO BE SOLVED: To provide a sorter for pull scheduling with a single comparator.SOLUTION: There is provided a device which has buffer queues of a plurality of stages and processes data by propagating the data from a lower stage to an upper stage while decreasing the data by 1/N (suitably, N: 2, where "m" is an integer equal to or larger than 2), the device including a single comparator which is arranged between buffer queues of adjacent stages and compares a pair of entries of a buffer queue and carries a value to a buffer of an upper stage according to a comparison result, and receiving specification of positions of a pair of entries of a buffer queue that each comparator should compare from an upper stage and then specifying them to a lower stage. A typical application example of this configuration is a sorter. Further, the device is provided with index queues of the plurality of stages which receive and store specification of positions of the pair of entries of buffer queues that each comparator should compare.

Description

この発明は、複数段のバッファ・キューをもち、下段からから上段にデータを送りながら処理するデータ処理用論理回路装置に関する。その１つの具体例は、FPGAなどによって実装される、ソータである。 The present invention relates to a data processing logic circuit device that has a plurality of buffer queues and performs processing while sending data from a lower stage to an upper stage. One specific example is a sorter implemented by an FPGA or the like.

複数段のバッファ・キューをもつソータにおいては、下段からから上段にデータを送り込む際に、データを比較するためのコンパレータ(比較器)を使用する。このようなソータは、複数のコンパレータにより一気にデータを比較する構成をとるが、複数のコンパレータを備えると、ハードウェア・サイズが大きくなるという問題がある。 A sorter having a plurality of buffer queues uses a comparator (comparator) for comparing data when data is sent from the lower stage to the upper stage. Such a sorter has a configuration in which data is compared at once by a plurality of comparators. However, if a plurality of comparators are provided, there is a problem that the hardware size increases.

そこで、ハードウェア・サイズを節約した、下段と上段の間に単一のコンパレータ構成のソータが例えば、Kerming Fleming et al., "High-throughput Pinelined Mergesort", MIT - CSAIL, 2008などによって提案された。しかし、コンパレータが単一だと、どのような順序でコンパレータを適用するかという、スケジューリングが必要になる。 Therefore, a sorter with a single comparator configuration between the lower and upper stages, saving hardware size, was proposed by Kerming Fleming et al., "High-throughput Pinelined Mergesort", MIT-CSAIL, 2008, etc. . However, if there is a single comparator, scheduling in which order the comparators are applied is necessary.

スケジューリングには、下段から上段にデータを送り込むプッシュ・スケジューリングと、上段が下段にデータを要求して引き上げるプル・スケジューリングがある。 Scheduling includes push scheduling in which data is sent from the lower stage to the upper stage and pull scheduling in which the upper stage requests and pulls data from the lower stage.

しかし、プッシュ・スケジューリングを実現するためには、依然として、比較的大規模のハードウェアが必要になり、レイテンシも大きいという問題があった。 However, in order to realize push scheduling, there is still a problem that relatively large-scale hardware is required and latency is large.

このとき、プッシュ・スケジューリングを、プル・スケジューリングにすることで、ハードウェア・サイズをさらに節約することができる。これによる既知のアルゴリズムは、入力が揃っていて、出力バッファが満杯でないとき、処理を行いデータを移動するというものである。 At this time, push size scheduling is pull scheduling, so that the hardware size can be further saved. A known algorithm is to process and move data when the inputs are complete and the output buffer is not full.

従来技術として、特開平５−１５１１６６号公報は、受信バッファに記憶された先頭データの、データメモリのインデックスレジスタが指示するメモリデータに対するソート順位の適否を比較部にて比較判断させ、ソート順位「適」の場合には、制御部により上記受信バッファの先頭データを上記インデックスレジスタが指示するメモリデータの前に挿入登録し、ソート順位「否」の場合には、制御部によりインデックスレジスタが指示するメモリデータを次のデータ位置まで更新させ、上記ソート順位適否の比較判断処理が常に最新ソート位置以降のメモリデータを対象にして実施されるよう構成するソータを開示する。 As a conventional technique, Japanese Patent Application Laid-Open No. 5-151166 discloses that a comparison unit compares and determines whether or not the head data stored in the reception buffer is suitable for the sort order with respect to the memory data indicated by the index register of the data memory. If “appropriate”, the control unit inserts and registers the head data of the reception buffer before the memory data indicated by the index register, and if the sorting order is “no”, the index register indicates by the control unit. Disclosed is a sorter configured to update memory data to the next data position so that the above-described sorting order suitability determination process is always performed on the memory data after the latest sorting position.

特開平７−１６０４７５号公報は、マージソート処理のマージ対象リスト群生成処理時に，書き出しブロック毎にそのブロックの識別子とそのブロックの代表キー値からなる補助情報を取得して補助情報リストを作成し、次に，補助情報リストを代表キー値に従ってソートし，そのソートされた補助情報リストをサブ補助情報リストに分割し，それぞれを各処理装置に割り当て、各処理装置は自分に割り当てられたサブ補助情報リストに従って並列にマージ処理を行いソート処理を完了する構成を開示する。 Japanese Patent Application Laid-Open No. 7-160475 creates auxiliary information list by acquiring auxiliary information including an identifier of a block and a representative key value of the block for each write block at the time of merge target list group generation processing of merge sort processing. Next, the auxiliary information list is sorted according to the representative key value, the sorted auxiliary information list is divided into sub auxiliary information lists, each is assigned to each processing device, and each processing device is assigned to the sub auxiliary information assigned to itself. A configuration is disclosed in which merge processing is performed in parallel according to an information list to complete sort processing.

特開平１０−３３６２１６号公報は、それぞれソートキーを含むデータ要素をソートするために、記憶手段が、それぞれ要素を含むことが可能であり、０からｎ−１まで番号を付けられたｎ個の連続したステージに分配されていて、ステージｑはノード２^ｑから２^ｑ＋１−１を有する、２ⁿ−１個のノードを有するバイナリツリーにしたがって構成され、要素は、ノードｉに含まれている各要素が、ノード２ｉおよび２ｉ＋１に含まれている要素のソートキーよりも小さいソートキーを有するように、ツリーの中で分散され、それぞれステージ２０qあるいは連続したステージ間にｎ−１個のインタフェース・レジスタ２６qを有するツリー（２≦ｍ≦ｎ）の数個の連続したステージと組み合わされたｍ個の連続したコントローラ２１qによってツリーが管理されるソータを開示する。 Japanese Patent Application Laid-Open No. 10-336216 discloses that in order to sort data elements each including a sort key, the storage means can include each element, and n consecutive numbers numbered from 0 to n−1. Stage q is organized according to a binary tree having 2 ⁿ -1 nodes with nodes 2 ^q to 2 ^{q + 1} -1 and elements are included in node i Are distributed in the tree such that they have a sort key that is smaller than the sort keys of the elements contained in nodes 2i and 2i + 1, each having n-1 interface registers 26q between stages 20q or successive stages. By m consecutive controllers 21q combined with several consecutive stages of the tree (2 ≦ m ≦ n) It discloses a sorter tree is managed.

しかし、これらの従来技術は、ハードウェア・サイズを節約できる、単一コンパレータでプルスケジューリングのソータの構成を示唆するものではない。 However, these prior arts do not suggest a single comparator, pull scheduling sorter configuration that can save hardware size.

特開平５−１５１１６６号公報JP-A-5-151166 特開平７−１６０４７５号公報JP-A-7-160475 特開平１０−３３６２１６号公報Japanese Patent Laid-Open No. 10-336216

Kerming Fleming et al., "High-throughput Pinelined Mergesort", MIT - CSAIL, 2008Kerming Fleming et al., "High-throughput Pinelined Mergesort", MIT-CSAIL, 2008

この発明の目的は、単一コンパレータでプル・スケジューリングのソータを提供することにある。 It is an object of the present invention to provide a pull scheduling sorter with a single comparator.

この発明の他の目的は、単一コンパレータでプル・スケジューリングのソータにおいて、動作時に出力バッファが常に一杯にならないことを保証することによって、デッドロックを回避する技法を提供することにある。 Another object of the present invention is to provide a technique for avoiding deadlocks in a single comparator and pull scheduling sorter by ensuring that the output buffer is not always full during operation.

この発明によれば、複数段のバッファ・キューをもち、下段から1/N(Nは好適には2^mで、mは1以上の整数)ずつ減らしながら上段に伝播することによりデータを処理する装置であって、隣接する段のバッファ・キューの間に配置され、バッファ・キューの対のエントリを比較して、比較結果に基づき上段のバッファに値を上げる、単一のコンパレータを有し、各コンパレータが比較すべきバッファ・キューの対のエントリの位置の指定を上段から受け取り、下段へ指定を行う装置が提供される。この構成の典型的な応用例は、ソータである。 According to this invention, it has a multi-stage buffer queue, and processes data by propagating to the upper stage while decreasing by 1 / N (N is preferably 2 ^m , m is an integer of 1 or more) from the lower stage. A device having a single comparator that is arranged between adjacent stage buffer queues, compares the entries of a pair of buffer queues and raises the value of the upper stage buffer based on the comparison result; An apparatus is provided for receiving designations from the upper stage of the entry of the pair of buffer queues to be compared by each comparator and making designations to the lower stage. A typical application of this configuration is a sorter.

この発明の1つの側面によれば、各コンパレータが比較すべきバッファ・キューの対のエントリの位置の指定を、上段から受け取って格納する、複数段のインデックス・キューが設けられる。 According to one aspect of the present invention, a multi-stage index queue is provided that receives and stores designations of entry positions of buffer queue pairs to be compared by each comparator from the upper stage.

この発明の別の側面によれば、前記対のエントリがどちらも空でなく、下位段のインデックス・キューが満杯でないことに応答して、前記コンパレータを動作させて前記対のエントリの一方の内容を上位段に送出するとともに、該一方の内容を除去し、該一方のエントリのインデックスに対するリクエストを下段に送る処理と、当該段のインデックス・キューの最初の要素をポップする処理を実行する論理回路が設けられる。 According to another aspect of the invention, in response to neither of the pair of entries being empty and the lower level index queue being full, the comparator is operated to cause the contents of one of the pair of entries. Is a logic circuit that executes processing for sending the request for the index of the one entry to the lower stage and popping the first element of the index queue of the stage Is provided.

この発明のさらに別の側面によれば、論理回路は、前記対のエントリがどちらかが空であることに応答して、空のエントリのインデックスに対するリクエストを下段に送る機能を実行する。 According to yet another aspect of the invention, the logic circuit performs a function of sending a request for an index of an empty entry to the lower stage in response to either of the pair of entries being empty.

この発明のさらに別の側面によれば、デッドロック回避のために、バッファ・キューがフル状態の時に更に要素を追加しようとしてオーバーフローすることがないようになされる。さらに、高性能を達成するために、スタートアップ終了時に、バッファ・キューがフルになるように設計される。このため、バッファ・キューのサイズと、インデックス・キューのサイズが等しくなるようになされる。 According to still another aspect of the present invention, in order to avoid deadlock, when the buffer queue is full, an attempt to add more elements does not cause overflow. In addition, the buffer queue is designed to be full at the end of startup to achieve high performance. For this reason, the size of the buffer queue is made equal to the size of the index queue.

この発明のさらに別の側面によれば、高性能の達成のため、上位段からのリクエストを受け取ることのレイテンシをサイクル数でLとしたとき、定常状態でバッファ・キューが決して空にならないように、バッファ・キューのサイズは、L+1以上とされる。 According to still another aspect of the present invention, in order to achieve high performance, the buffer queue is never emptied in a steady state when the latency of receiving a request from the upper stage is L in cycle number. The size of the buffer queue is set to L + 1 or more.

以上のように、この発明によれば、単一コンパレータでプル・スケジューリングの構成によって、ハードウェア実装した際に、少ないハードウェア消費量で効率的に動作するソータを実現できるという効果が得られる。 As described above, according to the present invention, it is possible to realize a sorter that operates efficiently with a small amount of hardware consumption when mounted in hardware by a configuration of pull scheduling with a single comparator.

本発明の装置のハードウェア構成のブロック図である。It is a block diagram of the hardware constitutions of the apparatus of this invention. 本発明の装置のハードウェア構成のブロック図である。It is a block diagram of the hardware constitutions of the apparatus of this invention. 本発明の装置のハードウェア構成の機能を説明するための図である。It is a figure for demonstrating the function of the hardware constitutions of the apparatus of this invention. 本発明の装置の機能構成のブロック図である。It is a block diagram of a function structure of the apparatus of this invention. 本発明の装置の処理のフローチャートを示す図である。It is a figure which shows the flowchart of a process of the apparatus of this invention. 本発明の装置のスタートアップ・サイクルの動作を示す図である。It is a figure which shows operation | movement of the start-up cycle of the apparatus of this invention. 本発明の装置のスタートアップ・サイクルの動作を示す図である。It is a figure which shows operation | movement of the start-up cycle of the apparatus of this invention. 本発明の装置のスタートアップ・サイクルの動作を示す図である。It is a figure which shows operation | movement of the start-up cycle of the apparatus of this invention. 本発明の装置のスタートアップ・サイクルの動作を示す図である。It is a figure which shows operation | movement of the start-up cycle of the apparatus of this invention. 本発明の装置のスタートアップ・サイクルの動作を示す図である。It is a figure which shows operation | movement of the start-up cycle of the apparatus of this invention. 本発明の装置のスタートアップ・サイクルの動作を示す図である。It is a figure which shows operation | movement of the start-up cycle of the apparatus of this invention. 本発明の装置のスタートアップ・サイクルの動作を示す図である。It is a figure which shows operation | movement of the start-up cycle of the apparatus of this invention. 本発明の装置のスタートアップ・サイクルの動作を示す図である。It is a figure which shows operation | movement of the start-up cycle of the apparatus of this invention. 本発明の装置の定常サイクルの動作を示す図である。It is a figure which shows operation | movement of the steady cycle of the apparatus of this invention. 本発明の装置の定常サイクルの動作を示す図である。It is a figure which shows operation | movement of the steady cycle of the apparatus of this invention. 本発明の装置の定常サイクルの動作を示す図である。It is a figure which shows operation | movement of the steady cycle of the apparatus of this invention. 本発明の装置の定常サイクルの動作を示す図である。It is a figure which shows operation | movement of the steady cycle of the apparatus of this invention. 本発明の装置の定常サイクルの動作を示す図である。It is a figure which shows operation | movement of the steady cycle of the apparatus of this invention. 本発明の装置の定常サイクルの動作を示す図である。It is a figure which shows operation | movement of the steady cycle of the apparatus of this invention.

以下、図面に従って、本発明の実施例を説明する。これらの実施例は、本発明の好適な態様を説明するためのものであり、発明の範囲をここで示すものに限定する意図はないことを理解されたい。また、以下の図を通して、特に断わらない限り、同一符号は、同一の対象を指すものとする。 Embodiments of the present invention will be described below with reference to the drawings. It should be understood that these examples are for the purpose of illustrating preferred embodiments of the invention and are not intended to limit the scope of the invention to what is shown here. Further, throughout the following drawings, the same reference numerals denote the same objects unless otherwise specified.

図１は、本発明の一実施例に係るソータのハードウェア構成のブロック図である。この構成は、FPGA(Field Programmable Gate Array)によって作製されたものである。 FIG. 1 is a block diagram of a hardware configuration of a sorter according to an embodiment of the present invention. This configuration is produced by an FPGA (Field Programmable Gate Array).

このソータの典型的な用途は、ETL(Extract/Transformation/Load)・データベース系の製品で、ETL/データベースエンジンからデータベース・レコードの列を受け取り、特定のキーでソートしてまたETL/データベースエンジンにソート済みレコード列を返す、という使用法である。しかし、これには限定されず、任意のソータとして用いることができる。 A typical use of this sorter is an ETL (Extract / Transformation / Load) database product that receives database record columns from an ETL / database engine, sorts them by a specific key, and then returns them to the ETL / database engine. The usage is to return a sorted record sequence. However, the present invention is not limited to this, and any sorter can be used.

また、この実施例では４段のバッファーキュー・アレイとして実装されているが、この段数及び個々のバッファーキュー・アレイにおけるバッファキューの数は使用目的に応じて適宜設定され、但し、上段に進むに従って、下段から1/N(Nは好適には2^mで、mは1以上の整数)ずつ減らしていくように設定される。この実施例では、m=1である。 In this embodiment, it is implemented as a four-stage buffer queue array. However, the number of stages and the number of buffer queues in each buffer queue array are appropriately set according to the purpose of use, but as the process proceeds to the upper stage. From the lower stage, 1 / N (N is preferably 2 ^m , m is an integer of 1 or more) is set to decrease. In this embodiment, m = 1.

図１の構成は、バッファキュー・アレイ１０２、１０４、１０６及び１０８をもつ。例えばメモリ・バスに接続されたデータ入力論理回路１１０から、バッファキュー・アレイ１０２、１０４、１０６及び１０８の順にソートしながらデータを送り込み、例えばメモリ・バスに接続されたデータ出力論理回路１１２にソートされたデータを出力する動作を行う。 The configuration of FIG. 1 has buffer queue arrays 102, 104, 106 and 108. For example, data is sent from the data input logic circuit 110 connected to the memory bus while being sorted in the order of the buffer queue arrays 102, 104, 106 and 108, and sorted to the data output logic circuit 112 connected to the memory bus, for example. The operation to output the recorded data is performed.

一番下の段であるバッファキュー・アレイ１０２は、８つのバッファキュー１０２ａ、１０２ｂ、１０２ｃ、１０２ｄ、１０２ｅ、１０２ｆ、１０２ｇ及び１０２ｈをもち、その上段のバッファキュー・アレイ１０４は、４つのバッファキュー１０４ａ、１０４ｂ、１０４ｃ及び１０４ｄをもち、その上段のバッファキュー・アレイ１０６は、２つのバッファキュー１０４ａ及び１０４ｂをもち、最上段のバッファキュー・アレイ１０８は、単一の１０８ａをもつ。 The lowermost buffer queue array 102 has eight buffer queues 102a, 102b, 102c, 102d, 102e, 102f, 102g, and 102h, and the upper buffer queue array 104 has four buffer queues. 104a, 104b, 104c, and 104d, the upper buffer queue array 106 has two buffer queues 104a and 104b, and the uppermost buffer queue array 108 has a single 108a.

バッファキュー・アレイ１０２とバッファキュー・アレイ１０４の間には、バッファキュー・アレイ１０２の２つのバッファキューの値を比較して、バッファキュー・アレイ１０４に送り込むための単一のコンパレータ１１４と、スケジューラ１２０からのリクエストを受け取り、コンパレータ１１４が比較動作を行うバッファキューの位置を指定するスケジューラ１１６が配置されている。 Between the buffer queue array 102 and the buffer queue array 104, a single comparator 114 for comparing the values of the two buffer queues of the buffer queue array 102 and feeding them to the buffer queue array 104, and a scheduler A scheduler 116 that receives a request from 120 and designates the position of a buffer queue to which the comparator 114 performs a comparison operation is arranged.

バッファキュー・アレイ１０４とバッファキュー・アレイ１０６の間には、バッファキュー・アレイ１０４の２つのバッファキューの値を比較して、バッファキュー・アレイ１０６に送り込むための単一のコンパレータ１１８と、スケジューラ１２４からのリクエストを受け取り、コンパレータ１１８が比較動作を行うバッファキューの位置を指定するスケジューラ１２０が配置されている。 Between the buffer queue array 104 and the buffer queue array 106, a single comparator 118 for comparing the values of the two buffer queues of the buffer queue array 104 and feeding them to the buffer queue array 106, and a scheduler A scheduler 120 that receives a request from 124 and designates the position of a buffer queue to which the comparator 118 performs a comparison operation is arranged.

バッファキュー・アレイ１０６とバッファキュー・アレイ１０８の間には、バッファキュー・アレイ１０６の２つのバッファキューの値を比較して、バッファキュー・アレイ１０８に送り込むための単一のコンパレータ１２２と、データ出力論理回路１１２からのリクエストを受け取り、コンパレータ１２２が比較動作を行うバッファキューの位置を指定するスケジューラ１２４が配置されている。 Between the buffer queue array 106 and the buffer queue array 108, a single comparator 122 for comparing the values of the two buffer queues of the buffer queue array 106 and feeding them to the buffer queue array 108, and data A scheduler 124 that receives a request from the output logic circuit 112 and designates the position of the buffer queue where the comparator 122 performs the comparison operation is arranged.

スケジューラ１１６、１２０、及び１２４はそれぞれ、図４に示すように、上段のバッファキュー・アレイからのインデックスを格納するためのインデックス・キューをもつ。以下、スケジューラ１１６のインデックス・キューは１１６ａ、スケジューラ１２０のインデックス・キューは１２０ａのような符号で示す。データ入力論理回路１１０も、メモリ・バスからバッファキュー・アレイ１０２にデータを送り込む位置を指定するためのスケジューラ機能をもつ。 Each of the schedulers 116, 120, and 124 has an index queue for storing indexes from the upper buffer queue array, as shown in FIG. Hereinafter, the index queue of the scheduler 116 is indicated by a symbol 116a, and the index queue of the scheduler 120 is indicated by a symbol 120a. The data input logic circuit 110 also has a scheduler function for designating a position for sending data from the memory bus to the buffer queue array 102.

図２は、論理的に図１の構成と等価な構成の一部を示す。すなわち、図１の構成では、コンパレータとスケジューラが、別の構成要素として示されていたが、図２では、コンパレータ１２２が、図１に示すスケジューラ１２４の機能も含むものとして示されている。これは単に、機能要素の呼び方だけの問題で本質的ではないと理解されたい。 FIG. 2 shows a part of a configuration logically equivalent to the configuration of FIG. That is, in the configuration of FIG. 1, the comparator and the scheduler are shown as separate components, but in FIG. 2, the comparator 122 is shown as including the function of the scheduler 124 shown in FIG. It should be understood that this is simply a matter of calling functional elements and not essential.

図３は、コンパレータとスケジューラの機能を図式的に示す図である。すなわち、図３に示されているように、上位の段（ステージともいう）のスケジューラ１２０が、すぐ下の段のスケジューラ１１６に２ビットのインデックスとしてリクエストを送る。 FIG. 3 is a diagram schematically illustrating the functions of the comparator and the scheduler. That is, as shown in FIG. 3, the scheduler 120 in the upper stage (also referred to as a stage) sends a request as a 2-bit index to the scheduler 116 in the immediately lower stage.

スケジューラ１１６は、このリクエストを処理して、以下の4箇所に制御信号を送る。
(1) バッファキュー・アレイ１０４の書込み位置指示の信号。図４あるいは図６などで示されているように、各バッファキュー・アレイは、書込みインデックスをもつ。すなわち、０番目から数えて、ｉ番目のバッファ・キューに書き込む、というように指示がなされる。
(2) バッファキュー・アレイ１０４の下位のバッファキュー・アレイ１０２の読取り位置指示。図４あるいは図６などに示されているように、各バッファキュー・アレイは、読取りインデックスをもつ。すなわち、読取りインデックスを以って、i*2番目とi*2+1番目のバッファ・キューからデータを読み取って、コンパレータ１１４で比較する、というように指示がなされる。
(3) コンパレータ１１４に対する制御信号。すなわち、コンパレータ１１４の処理を開始させる。また、コンパレータ１１４から処理完了の制御信号を受け取る。
(4) 下位のスケジューラに対するリクエスト。下位のバッファーキュー・アレイが最下段であるなら、データ入力論理回路１１０が、スケジューラの役割を果たす。
このような機能は、スケジューラ１２０及び１２４にも共通である。 The scheduler 116 processes this request and sends control signals to the following four locations.
(1) A signal for designating the write position of the buffer queue array 104. Each buffer queue array has a write index, as shown in FIG. 4 or FIG. That is, an instruction is given to write to the i-th buffer queue from the 0th.
(2) Reading position indication of the buffer queue array 102 lower than the buffer queue array 104. Each buffer queue array has a read index, as shown in FIG. 4 or FIG. That is, an instruction is given such that data is read from the i * 2 and i * 2 + 1th buffer queues using the read index, and compared by the comparator 114.
(3) Control signal for the comparator 114. That is, the processing of the comparator 114 is started. In addition, a control completion signal is received from the comparator 114.
(4) Requests for lower level schedulers. If the lower buffer queue array is at the bottom, the data input logic 110 acts as a scheduler.
Such a function is common to the schedulers 120 and 124.

次に、図４を参照して、図１に示したソータの論理的機能を説明する。まず、図１に示すように、各バッファキュー・アレイは、１つ以上のバッファキューをもつが、各バッファキューは、図４のように、複数のエントリ(要素)をもつ。やはり図４あるいは図６などで示されているように、インデックス・キューもまた、複数のエントリをもつ。本発明の１つの側面によれば、デッドロックを回避するために、バッファキューのエントリのサイズと、インデックス・キューのサイズが、所定の制約の下に選ばれる。この制約については後述する。 Next, the logical function of the sorter shown in FIG. 1 will be described with reference to FIG. First, as shown in FIG. 1, each buffer queue array has one or more buffer queues, but each buffer queue has a plurality of entries (elements) as shown in FIG. As also shown in FIG. 4, FIG. 6, etc., the index queue also has a plurality of entries. According to one aspect of the invention, to avoid deadlock, the size of the buffer queue entry and the size of the index queue are chosen subject to predetermined constraints. This restriction will be described later.

さらに、図４には、コンパレータ１１４が、バッファキュー・アレイ１０２の偶数番と奇数番のバッファ・キューの対を比較することが示されている。図４に示されているように、このようなバッファ・キューの対毎に0,1,2,3と、読取りインデックスとして番号が振られ、インデックス・キュー１１６ａのエントリが、この読取りインデックスを指し示す。 Further, FIG. 4 shows that the comparator 114 compares even and odd numbered pairs of buffer queues in the buffer queue array 102. As shown in FIG. 4, each such buffer queue pair is numbered as a read index, 0,1,2,3, and an entry in the index queue 116a points to this read index. .

次に、図５のフローチャートを参照して、図１のソータの機能、特にスケジューラの処理について説明する。図５は、各段のスケジューラの処理をあらわす。従って、個々のスケジューラが図５に示す処理を実行する。 Next, the function of the sorter in FIG. 1, particularly the processing of the scheduler, will be described with reference to the flowchart in FIG. FIG. 5 shows the process of the scheduler at each stage. Accordingly, each scheduler executes the processing shown in FIG.

ステップ５０２において、スケジューラは、自分のインデックス・キューが空かどうか判断し、もしそうなら、ステップ５２４で何もしないで、ステップ５２６で、上のステージ（段）からのリクエストがあればインデックス・キューの末尾に追加して、処理を終了する。 In step 502, the scheduler determines whether its index queue is empty, and if so, does nothing in step 524, and if there is a request from the above stage in step 526, the index queue. Is added to the end of, and the process ends.

もしステップ５０２においてスケジューラが、自分のインデックス・キューが空でないと判断したなら、ステップ５０４でインデックス・キューの先頭要素をｉという変数に格納し、ステップ５０６で、下のステージのインデックス・キューが一杯かどうか判断し、もしそうなら、ステップ５２４で何もしないで、ステップ５２６で、上のステージ（段）からのリクエストがあればインデックス・キューの末尾に追加して、処理を終了する。 If the scheduler determines in step 502 that its index queue is not empty, it stores the first element of the index queue in a variable i in step 504, and the index queue in the lower stage is full in step 506. If so, do nothing at step 524, and add at the end of the index queue if there is a request from the upper stage at step 526, and terminate the process.

下のステージのインデックス・キューが一杯でないなら、スケジューラは、ステップ５０８で、下のステージのバッファキューi*2が空かどうか判断する。そして、もしそうなら、スケジューラは、ステップ５１０で、リクエスト(i*2)を下のステージに送り、ステップ５２６で、上のステージからのリクエストがあればインデックス・キューの末尾に追加して、処理を終了する。 If the lower stage index queue is not full, the scheduler determines in step 508 whether the lower stage buffer queue i * 2 is empty. If so, the scheduler sends a request (i * 2) to the lower stage at step 510, and adds a request from the upper stage to the end of the index queue at step 526 for processing. Exit.

もし下のステージのバッファキューi*2が空でないなら、スケジューラは、ステップ５１２で、下のステージのバッファキューi*2+1が空かどうか判断する。そして、もしそうなら、スケジューラは、ステップ５１４で、リクエスト(i*2+1)を下のステージに送り、ステップ５２６で、上のステージ（段）からのリクエストがあればインデックス・キューの末尾に追加して、処理を終了する。 If the lower stage buffer queue i * 2 is not empty, the scheduler determines in step 512 whether the lower stage buffer queue i * 2 + 1 is empty. If so, the scheduler sends the request (i * 2 + 1) to the lower stage at step 514, and at step 526, if there is a request from the upper stage (stage), it is placed at the end of the index queue. Add and finish the process.

入力側のバッファキューi*2も下のステージのバッファキューi*2+1も空でないと判断すると、スケジューラは、ステップ５１６で、a := 下のステージのバッファキューi*2の先頭の値、b: = 下のステージのバッファキューi*2+1の先頭の値と格納する。 If the scheduler determines that neither the input-side buffer queue i * 2 nor the lower-stage buffer queue i * 2 + 1 is empty, the scheduler, at step 516, a: = the first value of the lower-stage buffer queue i * 2 , B: = Store with the first value of the buffer queue i * 2 + 1 of the lower stage.

ステップ５１８では、コンパレータがaとbの値を比較し、a > bならステップ５２０でbを出力側、すなわち上段のバッファキュー・アレイのバッファキューiの末尾に追加し、リクエスト(i*2+1)を下のステージに送り、入力バッファキューi*2+1の先頭の値を削除し、インデックス・キューの先頭の値(i)を削除し、ステップ５２６で、上のステージ（段）からのリクエストがあればインデックス・キューの末尾に追加して、処理を終了する。 In step 518, the comparator compares the values of a and b. If a> b, b is added to the output side, that is, the end of the buffer queue i of the upper buffer queue array in step 520, and the request (i * 2 + 1) is sent to the lower stage, the first value of the input buffer queue i * 2 + 1 is deleted, the first value (i) of the index queue is deleted, and in step 526, the upper stage (stage) is deleted. If there is a request, add it to the end of the index queue and finish the process.

ステップ５１８でコンパレータが、a > bでないと判断したなら、ステップ５２２でaを出力側、すなわち上段のバッファキュー・アレイのバッファキューiの末尾に追加し、リクエスト(i*2)を下のステージに送り、入力バッファキューi*2の先頭の値を削除し、インデックス・キューの先頭の値(i)を削除し、ステップ５２６で、上のステージ（段）からのリクエストがあればインデックス・キューの末尾に追加して、処理を終了する。 If the comparator determines in step 518 that a> b is not true, in step 522 a is added to the output side, that is, the end of buffer queue i of the upper buffer queue array, and request (i * 2) is added to the lower stage. , Delete the top value of the input buffer queue i * 2, delete the top value (i) of the index queue, and if there is a request from the upper stage in step 526, the index queue Is added to the end of, and the process ends.

尚、上記の処理で、ステップ５１０、５１４、５２２及び５２４で、リクエストを下のステージに送る処理が行われるが、この構成では、このときリクエストを投げる時点で、次のような条件設定で、出力側のバッファキューが一杯ならないように保証する。 In the above processing, in steps 510, 514, 522, and 524, processing for sending a request to the lower stage is performed. In this configuration, at the time when the request is thrown, the following condition setting is performed. Guarantees that the output buffer queue will not fill up.

すなわちまず、スタートアップ時に、バッファキューがフルとなり、また決してオーバーフローしないようにするために、バッファ・キューのサイズと、インデックスキューのサイズが等しくなるようになされる。ここでのインデックスキューのサイズとは、当該バッファキューの直下にあるインデックスキューのサイズのことである。 That is, at startup, the size of the buffer queue is made equal to the size of the index queue so that the buffer queue is full and never overflows. The size of the index queue here is the size of the index queue immediately below the buffer queue.

さらに、上位段からのリクエストを受け取ることのレイテンシをサイクル数でLとしたとき、定常状態でバッファキューが決して空にならないように、バッファキューのサイズは、L+1以上とされる。レイテンシをより詳細に説明すると、当該バッファキューの直上のステージからのリクエストが発行されてから、リクエストされたデータが当該バッファキューに対して書き込まれるまでのレイテンシのことである。 Further, when the latency for receiving a request from the upper stage is L in terms of the number of cycles, the size of the buffer queue is set to L + 1 or more so that the buffer queue never becomes empty in a steady state. In more detail, the latency is the latency from when a request from the stage immediately above the buffer queue is issued until the requested data is written to the buffer queue.

このように設定しておくことにより、図５のフローチャートで示す処理が、特に出力バッファキューの空きの有無をチェックすることなく、デッドロックを回避することが可能となる。 By setting in this way, the process shown in the flowchart of FIG. 5 can avoid deadlock without particularly checking whether the output buffer queue is empty.

次に、図６から図１９までを参照して、図１の構成の動作の一例を説明する。この実施例では、最下段のバッファキュー・アレイ１０２のバッファキューのサイズが３であるが、その上の２つのバッファキュー・アレイ１０４及び１０６のバッファキューのサイズは２である。このように、バッファキュー・アレイ毎にバッファキューのサイズは同一にする必要はなく、異なるサイズに設定してもよい。但し、上述の制約により、バッファキューのサイズと、その直下にあるインデックスキューのサイズは同一になるようになされる。図６の例では、バッファキュー・アレイ１０４の直下にあるのは、インデックスキュー１１６ａである。 Next, an example of the operation of the configuration of FIG. 1 will be described with reference to FIGS. In this embodiment, the size of the buffer queue of the lowermost buffer queue array 102 is 3, while the size of the buffer queues of the two buffer queue arrays 104 and 106 above it is 2. Thus, the buffer queue size does not need to be the same for each buffer queue array, and may be set to a different size. However, due to the above-mentioned restrictions, the size of the buffer queue and the size of the index queue immediately below it are made the same. In the example of FIG. 6, the index queue 116a is directly below the buffer queue array 104.

さて、ソータの動作は、スタートアップ・サイクルと定常サイクルからなる。図６は、スタートアップ・サイクル１を示す図である。図６において、バッファキュー・アレイ１０４のバッファキュー１０４ａとバッファキュー１０４ｂの先頭のエントリが空なので、スケジューラ１２０は、インデックス=0のリクエストを、下段のバッファキュー・アレイ１０2のスケジューラ１１６に送る。 The sorter operation consists of a start-up cycle and a steady cycle. FIG. 6 is a diagram showing the startup cycle 1. In FIG. 6, since the first entry of the buffer queue 104a and the buffer queue 104b of the buffer queue array 104 is empty, the scheduler 120 sends a request with index = 0 to the scheduler 116 of the lower buffer queue array 102.

図７に示す次のスタートアップ・サイクル２では、バッファキュー・アレイ１０４のバッファキュー１０４ａとバッファキュー１０４ｂの先頭のエントリが依然として空なので、スケジューラ１２０は、インデックス＝０のリクエストを、下段のバッファキュー・アレイ１０２のスケジューラ１１６に再び送る。こうして、スケジューラ１１６がリクエストを受け取るが、バッファキュー・アレイ１０２のバッファキュー１０２ａとバッファキュー１０２ｂの先頭のエントリが空なので、スケジューラ１１６は、下段の入力論理回路１１０にリクエストを送る。 In the next start-up cycle 2 shown in FIG. 7, since the top entries of the buffer queue 104a and the buffer queue 104b of the buffer queue array 104 are still empty, the scheduler 120 sends the request with the index = 0 to the lower buffer queue. It is sent again to the scheduler 116 of the array 102. In this way, the scheduler 116 receives the request, but the scheduler 116 sends the request to the lower input logic circuit 110 because the top entries of the buffer queue 102a and the buffer queue 102b of the buffer queue array 102 are empty.

図８に示す次のスタートアップ・サイクル３では、バッファキュー・アレイ１０４のバッファキュー１０４ａとバッファキュー１０４ｂの先頭のエントリが依然として空であるが、下位段のインデックス・キュー１１６ａが一杯なので、スケジューラ１２０は何もしない。一方、スケジューラ１１６は、バッファキュー・アレイ１０２のバッファキュー１０２ａとバッファキュー１０２ｂの先頭のエントリが空なので、スケジューラ１１６は、下段の入力論理回路１１０にリクエストを送る。 In the next start-up cycle 3 shown in FIG. 8, the top entries of the buffer queue 104a and the buffer queue 104b of the buffer queue array 104 are still empty, but the scheduler queue 120 is full because the lower-level index queue 116a is full. do nothing. On the other hand, the scheduler 116 sends a request to the lower input logic circuit 110 because the first entry in the buffer queue 102a and the buffer queue 102b of the buffer queue array 102 is empty.

図９に示す次のスタートアップ・サイクル４では、バッファキュー・アレイ１０４のバッファキュー１０４ａとバッファキュー１０４ｂの先頭のエントリが依然として空であるが、下位段のインデックス・キュー１１６ａが一杯なので、スケジューラ１２０は何もしない。一方、スケジューラ１１６は、バッファキュー・アレイ１０２のバッファキュー１０２ａとバッファキュー１０２ｂの両方の先頭にエントリが格納されるのを待つ。 In the next startup cycle 4 shown in FIG. 9, the first entry of the buffer queue 104a and the buffer queue 104b of the buffer queue array 104 is still empty, but the scheduler queue 120 is full because the lower-level index queue 116a is full. do nothing. On the other hand, the scheduler 116 waits for an entry to be stored at the head of both the buffer queue 102a and the buffer queue 102b of the buffer queue array 102.

図１０は、スタートアップ・サイクル７を示す図である。この段階でも、バッファキュー・アレイ１０４のバッファキュー１０４ａとバッファキュー１０４ｂの先頭のエントリが依然として空であるが、下位段のインデックス・キュー１１６ａが一杯なので、スケジューラ１２０は何もしない。一方、スケジューラ１１６は、バッファキュー・アレイ１０２のバッファキュー１０２ａにエントリが格納されたが、バッファキュー１０２ｂのエントリがまだ空なので、その両方の先頭にエントリが格納されるのを待つ。 FIG. 10 is a diagram showing the startup cycle 7. Even at this stage, the top entries of the buffer queue 104a and the buffer queue 104b of the buffer queue array 104 are still empty, but the scheduler 120 does nothing because the lower-level index queue 116a is full. On the other hand, the scheduler 116 has an entry stored in the buffer queue 102a of the buffer queue array 102, but since the entry in the buffer queue 102b is still empty, it waits for the entry to be stored at the head of both.

図１１は、スタートアップ・サイクル８を示す図である。バッファキュー・アレイ１０４のバッファキュー１０４ａとバッファキュー１０４ｂの先頭のエントリが依然として空であるが、下位段のインデックス・キュー１１６ａが一杯なので、スケジューラ１２０は何もしない。一方、スケジューラ１１６は、バッファキュー・アレイ１０２のバッファキュー１０２ａとバッファキュー１０２ｂの両方の先頭に値が格納されたので作動し始め、これにより、バッファキュー１０２ａの先頭のエントリが、バッファキュー・アレイ１０４のバッファキュー１０４ａにポップされる。 FIG. 11 is a diagram showing the startup cycle 8. The first entry of the buffer queue 104a and the buffer queue 104b of the buffer queue array 104 is still empty, but the scheduler 120 does nothing because the lower-level index queue 116a is full. On the other hand, the scheduler 116 starts to operate because the values are stored at the heads of both the buffer queue 102a and the buffer queue 102b of the buffer queue array 102, whereby the head entry of the buffer queue 102a becomes the buffer queue array. Popped to the buffer queue 104a of 104.

図１２のスタートアップ・サイクル９では、バッファキュー１０４ａが最早空ではなく、一方バッファキュー１０４ｂが依然として空なので、インデックス=1というリクエストをスケジューラ１２０が送る。一方、インデックス・キュー１１６ａのエントリに応答して、バッファキュー・アレイ１０２のバッファキュー１０２ａとバッファキュー１０２ｂがコンパレータ１１４によって比較され、結果がバッファキュー・アレイ１０４のバッファキュー１０４ａにポップされる。 In the startup cycle 9 of FIG. 12, since the buffer queue 104a is no longer empty, while the buffer queue 104b is still empty, the scheduler 120 sends a request with index = 1. On the other hand, in response to the entry in the index queue 116 a, the buffer queue 102 a and the buffer queue 102 b in the buffer queue array 102 are compared by the comparator 114, and the result is popped in the buffer queue 104 a in the buffer queue array 104.

図１３のスタートアップ・サイクル１０では、インデックス・キューの2つのエントリに対応して、バッファキュー・アレイ１０４のバッファキュー１０４ａに2つのエントリがプッシュされた状態となる。 In the startup cycle 10 of FIG. 13, two entries are pushed into the buffer queue 104a of the buffer queue array 104 in correspondence with the two entries in the index queue.

図１４から図１９までは、定常サイクルの動作を示す。図１４から図１９において、インデックス・キュー１１６ａ、１２０ａ、１２４ａは便宜上単一要素として示されているけれども、実際は２要素である。図１４の定常サイクル１では、スケジューラ１２４のインデックス・キュー１２４ａがバッファキュー・アレイ１０６の読取りインデックス=0を指し示し、この結果、バッファキュー１０６ａとバッファキュー１０６ｂの先頭のエントリがコンパレータ１２２により比較されて、バッファキュー１０４ａの先頭のエントリが上段に移動対象となる。また、移動対象になったバッファキュー１０４ａの書込みインデックスは、次のサイクルで使用される。 14 to 19 show the operation of a steady cycle. 14 to 19, the index queues 116a, 120a, and 124a are shown as a single element for convenience, but are actually two elements. In the steady cycle 1 of FIG. 14, the index queue 124a of the scheduler 124 points to the read index = 0 of the buffer queue array 106. As a result, the top entries of the buffer queue 106a and the buffer queue 106b are compared by the comparator 122. The top entry of the buffer queue 104a becomes the movement target in the upper stage. In addition, the write index of the buffer queue 104a to be moved is used in the next cycle.

さらに図１４において、インデックス・キュー１２０ａがバッファキュー・アレイ１０４の読取りインデックス=1を指し示し、この結果、バッファキュー１０４ｃとバッファキュー１０４ｄの先頭のエントリがコンパレータ１１８により比較されて、バッファキュー１０４ｃの先頭のエントリが上段に移動対象となる。また、移動対象になったバッファキュー１０４ｃの書込みインデックスは、次のサイクルで使用される。 Further, in FIG. 14, the index queue 120a points to the read index = 1 of the buffer queue array 104. As a result, the top entries of the buffer queue 104c and the buffer queue 104d are compared by the comparator 118, and the head of the buffer queue 104c is compared. The entry is moved to the upper row. Further, the write index of the buffer queue 104c that is the movement target is used in the next cycle.

さらに図１４において、インデックス・キュー１１６ａがバッファキュー・アレイ１０２の読取りインデックス１を指し示し、この結果、バッファキュー１０２ｃとバッファキュー１０２ｄの先頭のエントリがコンパレータ１１４により比較されて、バッファキュー１０４ｃの先頭のエントリが上段に移動対象となる。 Further, in FIG. 14, the index queue 116a points to the read index 1 of the buffer queue array 102. As a result, the top entries of the buffer queue 102c and the buffer queue 102d are compared by the comparator 114, and the top of the buffer queue 104c is compared. The entry becomes the movement target in the upper row.

図１５は、定常サイクル１の続きを示す。すなわち、図１５において、バッファキュー・アレイ１０６のバッファキュー１０６ａの先頭のエントリがデータ出力論理回路１１２に送られ、バッファキュー・アレイ１０４のバッファキュー１０４ｃの先頭のエントリがバッファキュー・アレイ１０６のバッファキュー１０６ｂに送られ、バッファキュー・アレイ１０２のバッファキュー１０２ｄの先頭のエントリがバッファキュー・アレイ１０４のバッファキュー１０４ｂに送られる。 FIG. 15 shows the continuation of steady cycle 1. That is, in FIG. 15, the head entry of the buffer queue 106 a of the buffer queue array 106 is sent to the data output logic circuit 112, and the head entry of the buffer queue 104 c of the buffer queue array 104 is the buffer of the buffer queue array 106. The first entry of the buffer queue 102d of the buffer queue array 102 is sent to the buffer queue 104b of the buffer queue array 104.

図１６の定常サイクル２では、スケジューラ１２４のインデックス・キュー１２４ａがバッファキュー・アレイ１０６の読取りインデックス=0を指し示し、この結果、バッファキュー１０６ａとバッファキュー１０６ｂの先頭のエントリがコンパレータ１２２により比較されて、バッファキュー１０４ａの先頭のエントリが上段に移動対象となる。また、移動対象になったバッファキュー１０４ａの書込みインデックスは、次のサイクルで使用される。 In steady cycle 2 of FIG. 16, the index queue 124a of the scheduler 124 points to the read index = 0 of the buffer queue array 106. As a result, the top entries of the buffer queue 106a and the buffer queue 106b are compared by the comparator 122. The top entry of the buffer queue 104a becomes the movement target in the upper stage. In addition, the write index of the buffer queue 104a to be moved is used in the next cycle.

さらに図１６において、インデックス・キュー１２０ａがバッファキュー・アレイ１０４の読取りインデックス=0を指し示し、この結果、バッファキュー１０４ａとバッファキュー１０４ｂの先頭のエントリがコンパレータ１１８により比較されて、バッファキュー１０４ａの先頭のエントリが上段に移動対象となる。また、移動対象になったバッファキュー１０４ａの書込みインデックスは、次のサイクルで使用される。 Further, in FIG. 16, the index queue 120a points to the read index = 0 of the buffer queue array 104. As a result, the top entries of the buffer queue 104a and the buffer queue 104b are compared by the comparator 118, and the head of the buffer queue 104a is compared. The entry is moved to the upper row. In addition, the write index of the buffer queue 104a to be moved is used in the next cycle.

さらに図１６において、インデックス・キュー１１６ａがバッファキュー・アレイ１０２の読取りインデックス=2を指し示し、この結果、バッファキュー１０２ｅとバッファキュー１０２ｆの先頭のエントリがコンパレータ１１４により比較されて、バッファキュー１０４ｆの先頭のエントリが上段に移動対象となる。 Further, in FIG. 16, the index queue 116a points to the read index = 2 of the buffer queue array 102. As a result, the top entries of the buffer queue 102e and the buffer queue 102f are compared by the comparator 114, and the head of the buffer queue 104f is compared. The entry is moved to the upper row.

図１７は、定常サイクル２の続きを示す。すなわち、図１７において、バッファキュー・アレイ１０６のバッファキュー１０６ａの先頭のエントリがデータ出力論理回路１１２に送られ、バッファキュー・アレイ１０４のバッファキュー１０４ａの先頭のエントリがバッファキュー・アレイ１０６のバッファキュー１０６ａに送られ、バッファキュー・アレイ１０２のバッファキュー１０２ｆの先頭のエントリがバッファキュー・アレイ１０４のバッファキュー１０４ｃに送られる。 FIG. 17 shows the continuation of steady cycle 2. That is, in FIG. 17, the head entry of the buffer queue 106 a of the buffer queue array 106 is sent to the data output logic circuit 112, and the head entry of the buffer queue 104 a of the buffer queue array 104 is the buffer of the buffer queue array 106. The first entry in the buffer queue 102 f of the buffer queue array 102 is sent to the buffer queue 104 c of the buffer queue array 104.

図１８の定常サイクル３では、スケジューラ１２４のインデックス・キュー１２４ａがバッファキュー・アレイ１０６の読取りインデックス=0を指し示し、この結果、バッファキュー１０６ａとバッファキュー１０６ｂの先頭のエントリがコンパレータ１２２により比較されて、バッファキュー１０４ｂの先頭のエントリが上段に移動対象となる。また、移動対象になったバッファキュー１０４ｂの書込みインデックスは、次のサイクルで使用される。 In steady cycle 3 of FIG. 18, the index queue 124a of the scheduler 124 points to the read index = 0 of the buffer queue array 106. As a result, the top entries of the buffer queue 106a and the buffer queue 106b are compared by the comparator 122. The top entry of the buffer queue 104b becomes the movement target in the upper stage. Further, the write index of the buffer queue 104b that is the movement target is used in the next cycle.

さらに図１８において、インデックス・キュー１２０ａがバッファキュー・アレイ１０４の読取りインデックス=0を指し示し、この結果、バッファキュー１０４ａとバッファキュー１０４ｂの先頭のエントリがコンパレータ１１８により比較されて、バッファキュー１０４ｂの先頭のエントリが上段に移動対象となる。また、移動対象になったバッファキュー１０４ｂの書込みインデックスは、次のサイクルで使用される。 Further, in FIG. 18, the index queue 120a points to the read index = 0 of the buffer queue array 104. As a result, the top entries of the buffer queue 104a and the buffer queue 104b are compared by the comparator 118, and the head of the buffer queue 104b is compared. The entry is moved to the upper row. Further, the write index of the buffer queue 104b that is the movement target is used in the next cycle.

さらに図１８において、インデックス・キュー１１６ａがバッファキュー・アレイ１０２の読取りインデックス=0を指し示し、この結果、バッファキュー１０２ａとバッファキュー１０２ｂの先頭のエントリがコンパレータ１１４により比較されて、バッファキュー１０４ａの先頭のエントリが上段に移動対象となる。 Further, in FIG. 18, the index queue 116a indicates the read index = 0 of the buffer queue array 102. As a result, the top entries of the buffer queue 102a and the buffer queue 102b are compared by the comparator 114, and the head of the buffer queue 104a is compared. The entry is moved to the upper row.

図１９は、定常サイクル２の続きを示す。すなわち、図１９において、バッファキュー・アレイ１０６のバッファキュー１０６ａの先頭のエントリがデータ出力論理回路１１２に送られ、バッファキュー・アレイ１０４のバッファキュー１０４ｂの先頭のエントリがバッファキュー・アレイ１０６のバッファキュー１０６ａに送られ、バッファキュー・アレイ１０２のバッファキュー１０２ａの先頭のエントリがバッファキュー・アレイ１０４のバッファキュー１０４ａに送られる。 FIG. 19 shows the continuation of steady cycle 2. That is, in FIG. 19, the head entry of the buffer queue 106 a of the buffer queue array 106 is sent to the data output logic circuit 112, and the head entry of the buffer queue 104 b of the buffer queue array 104 is the buffer of the buffer queue array 106. The first entry of the buffer queue 102 a of the buffer queue array 102 is sent to the buffer queue 104 a of the buffer queue array 104.

以上のように、特定の数のバッファキュー・アレイの段数と、バッファキューのサイズと、インデックス・キューのサイズ(要素数)における実施例を説明してきたが、上記制約条件に従う限り、適用するアプリケーションに従い、任意の数のバッファキュー・アレイの段数、バッファキューのサイズ及びインデックス・キューの数を使用することができる。 As described above, the embodiments of the specific number of buffer queue arrays, the buffer queue size, and the index queue size (number of elements) have been described. Any number of buffer queue array stages, buffer queue sizes, and index queue numbers can be used.

また、この発明の構成の好適な実装は、FPGA(Field Programmable Gate Array)またはCPLD(Complex Programmable Logic Device)などの書き換え可能な論理素子を用いたハードウェアによる実装であるが、通常のフォトリソグラフ技術によりシリコン・ウェーハ上に作りこんだ論理回路でもよいし、ソフトウェアにより実装してもよい。ソフトウェアによる実装の場合は、一般的にハードウェアによる実装よりは動作速度は低下するが、それでも、本発明の機能をコード化した場合、機能の簡易化によるコード記述の合理化を計ることができる。 Further, the preferred implementation of the configuration of the present invention is hardware implementation using a rewritable logic element such as FPGA (Field Programmable Gate Array) or CPLD (Complex Programmable Logic Device). The logic circuit built on the silicon wafer may be used, or may be implemented by software. In the case of software implementation, the operation speed is generally lower than that of hardware implementation. However, when the functions of the present invention are coded, the code description can be rationalized by simplifying the functions.

１０２、１０４、１０６・・・バッファキュー・アレイ
１１４、１１８、１２２・・・コンパレータ
１１６、１２０、１２４・・・スケジューラ
１１６ａ、１２０ａ、１２４ａ・・・インデックス・キュー 102, 104, 106 ... Buffer queue array 114, 118, 122 ... Comparator 116, 120, 124 ... Scheduler 116a, 120a, 124a ... Index queue

Claims

A device that has multiple stages of buffer queues and processes data by propagating to the upper stage while decreasing by 1 / N (N is an integer of 2 or more) from the lower stage,
A single comparator located between adjacent stage buffer queues that compares entries in a pair of buffer queues and increments the upper stage buffer based on the comparison results;
Each comparator receives the specification of the position of the entry of the buffer queue pair to be compared from the upper stage, and performs the specification to the lower stage.
apparatus.

The apparatus of claim 1, further comprising a multi-stage index queue that receives and stores designations of buffer queue pair entries to be compared by each comparator from the upper stage.

In response to the fact that neither of the pair of entries is empty and the lower-level index queue is not full, the comparator is operated to send the contents of one of the pair of entries to the upper level, And a means for sending a request for the index of the one entry to the lower stage,
The apparatus of claim 2, further comprising means for popping the first element of the stage index queue.

The apparatus of claim 1, further comprising means for sending a request for an index of an empty entry to a lower stage in response to either of the paired entries being empty.

The apparatus according to claim 2, wherein a size of the buffer queue is equal to a size of the index queue immediately below the buffer queue.

The buffer queue size is set to L + 1 or more so that the buffer queue never becomes empty when the latency of receiving a request from an upper stage is L in terms of the number of cycles. Equipment.

The apparatus of claim 1, wherein N is 2 or 4.

The apparatus of claim 1, wherein the apparatus is a sorter.

The apparatus of claim 1, wherein the apparatus is implemented by hardware.

The apparatus of claim 9, wherein the apparatus is implemented by an FPGA.