JPWO2008123406A1

JPWO2008123406A1 - Data-driven processing device and its sequential confluence control device

Info

Publication number: JPWO2008123406A1
Application number: JP2009509192A
Authority: JP
Inventors: 佐藤　哲朗; 哲朗佐藤; 文法河口
Original assignee: NODC INCORPORATED
Current assignee: NODC INCORPORATED
Priority date: 2007-03-28
Filing date: 2008-03-28
Publication date: 2010-07-15
Anticipated expiration: 2028-03-28
Also published as: WO2008123406A1; JP5256193B2

Abstract

【課題】複数の合流ノード間で並列動作を可能にするとともにパケット順序維持を可能にする。【解決手段】ＯＤ＝‘１’であれば、ノード１１１からの分岐先方向を示す、行先アドレスＤＡの対応するビットＤＡｉが、キュー１１３の入力段１１３ａの１ビットラッチのデータ入力端に供給される。ＤＡｉ＝‘１’のとき、ノード１１１は、ノード１１７へパケットを分岐転送させるとともにキュー１１３の入力段１１３ａにＤＡｉ＝‘１’を転送させる。Ｎ段経過後に、一方ではこれに対応するパケットがノード１１５に保持され、他方ではマルチプレクサ１１０Ｍの選択制御入力端に、前記ＤＡｉ＝‘１’に対応したＳＥＬ＝‘１’が供給されて、ノード１１０はノード１１５側を選択する。ノード１１０の転送制御回路１１０Ｃは、連接ビットＣＮが‘１’のときは例外として、キュー１１３の出力段１１３ｂへのＡＣＫをインアクティブに維持する。【選択図】図２６A parallel operation is possible between a plurality of merging nodes and packet order can be maintained. If OD = '1', a bit DAi corresponding to a destination address DA indicating a branch destination direction from a node 111 is supplied to a data input terminal of a 1-bit latch of an input stage 113a of a queue 113. The When DAi = “1”, the node 111 branches and transfers the packet to the node 117 and transfers DAi = “1” to the input stage 113 a of the queue 113. After N stages, a packet corresponding to this is held in the node 115 on the one hand, and on the other hand, SEL = '1' corresponding to DAi = '1' is supplied to the selection control input terminal of the multiplexer 110M, and the node 110 selects the node 115 side. The transfer control circuit 110C of the node 110 maintains ACK to the output stage 113b of the queue 113 in an inactive state, except when the concatenated bit CN is “1”. [Selection] Figure 26

Description

本発明は、データ駆動型半導体装置に用いられ、データ駆動型処理装置及びその順序合流制御装置に係り、特に、分岐ノードでのパケット分岐方向に基づき、これに対応した合流ノードでのパケット合流方向を制御する順序合流制御装置及びこれを用いたデータ駆動型処理装置に関する。 The present invention relates to a data driven type processing device and its sequential merging control device used in a data driven type semiconductor device, and more particularly, based on a packet branching direction at a branching node and a packet merging direction at a corresponding merging node. The present invention relates to a sequential merging control device that controls the data and a data driven processing device using the same.

データ駆動型半導体装置では、ローカルな同期制御が複数の要素のそれぞれで自立分散的に行われるので、システムクロックに同期して各要素を集中制御する同期型半導体装置よりも、処理の並列度を容易に高くすることができるとともに、消費電力を低減できる。 In a data driven semiconductor device, local synchronization control is performed in a self-supporting and distributed manner by each of a plurality of elements. Therefore, the parallelism of processing is higher than that of a synchronous semiconductor device that centrally controls each element in synchronization with a system clock. It can be easily increased and power consumption can be reduced.

この装置の合流ノードでは、先着優先でパケットを選択的にラッチするので、パケットが混雑したノードを通って来たパケットとそうでないノードを通って来たパケットとの順番が入れ替わる。 In the merging node of this apparatus, the packets are selectively latched on a first-come-first-served basis, so the order of the packets that have passed through the congested node and the packets that have passed through the other node are switched.

そこで、従来のデータ駆動型半導体装置では、下記特許文献１に開示されているように、パケットには行き先アドレスのみならず、例えば９ビットのペア識別用カラーを付加し、同文献の図８に示すようなＦＩＦＯメモリ、ハッシュメモリ、連想メモリ、演算部及び制御部等を備えたマッチングメモリで同一カラーのパケットを待ち合わせて、処理要素で処理を行っていた。 Therefore, in the conventional data driven semiconductor device, as disclosed in the following Patent Document 1, not only the destination address but also a 9-bit pair identification color is added to the packet, for example, as shown in FIG. A matching memory having a FIFO memory, a hash memory, an associative memory, a calculation unit, a control unit, and the like as shown in FIG.

このため、パケット待ち合わせの構成が複雑になるという問題があった。 For this reason, there is a problem that the configuration of packet waiting is complicated.

一般に、非同期型ではミクロにおいてもマクロにおいても、自律分散処理であればパケットの順序が入れ替わるのは当然と考えられ、ミクロの非同期型では上述のようなカラーを用いてパケット順序を後処理で整えていた。 In general, in the asynchronous type, it is natural that the order of packets is interchanged in both micro and macro modes, and in the micro asynchronous type, the order of packets is adjusted by post-processing using the colors described above. It was.

マクロの非同期型では、豊潤なハードウェアリソース及びソフトウェアリソースを用いてパケット順序を整えることができるが、ミクロの非同期型では、このような後処理は重荷となり、構成が複雑になるとともにスループットが低下する原因となる。 The macro asynchronous type can arrange the packet order using abundant hardware resources and software resources, but the micro asynchronous type makes such post-processing heavy, complicating the configuration and lowering the throughput. Cause.

もし、パケット順序を維持することができれば、パケット順を予測できるので、例えば図２２（Ａ）に示すようなループを図２２（Ｂ）又は図２３（Ａ）に示すようなループに分割して並列化及びコンポーネント化が可能となり、さらに、図２３（Ｂ）に示すような階層化も可能となる。すなわち、非同期型でパケット順序を維持することは、コンポーネント間で順序同期をとることを意味し、同期型でシステムクロックを使用するのと同じような効果を奏する。 If the packet order can be maintained, the packet order can be predicted. For example, the loop shown in FIG. 22 (A) is divided into the loops shown in FIG. 22 (B) or FIG. 23 (A). Parallelization and componentization are possible, and further, a hierarchy as shown in FIG. 23B is possible. That is, maintaining the packet order in the asynchronous type means that the components are synchronized in order, and has the same effect as using the system clock in the synchronous type.

このため、ミクロの非同期型では、パケット順序を維持することが重要な技術的課題となる。 For this reason, in the micro asynchronous type, maintaining the packet order is an important technical problem.

下記特許文献２には、分配制御回路と収集制御回路との間に複数のメモリ部（メモリアクセス回路＋メモリ本体）を備えたデータフロー型装置に関する次のような構成が開示されている。すなわち、該複数のメモリ部に対応して１つのパケットキュー回路を備え、メモリアクセス情報を含むパケットＸと転送識別情報を含むパケットＹとの対をパケット入力線に順次供給し、パケットＸで複数のメモリ部のいずれかをアクセスして、加工されたパケットＺを読み出すと共に、パケットＹを順次パケットキュー回路に供給し、複数のメモリ部においてこのアクセスを並列に行い、それぞれパケットＺをメモリ部に待機させておき、収集制御回路において、複数のメモリ部で待機している複数のパケットＺのいずれかを、パケットキュー回路から読み出したパケットＹに基づき選択して読み出し、パケット出力線から出力する構成が開示されている。
特開平５−２８２９１号公報特開平３-２３３７４０号公報 The following Patent Document 2 discloses the following configuration relating to a data flow type apparatus including a plurality of memory units (memory access circuit + memory body) between a distribution control circuit and a collection control circuit. In other words, one packet queue circuit is provided corresponding to the plurality of memory units, and pairs of a packet X including memory access information and a packet Y including transfer identification information are sequentially supplied to the packet input line. The processed packet Z is read out and the packet Y is sequentially supplied to the packet queue circuit, and this access is performed in parallel in a plurality of memory units. A configuration in which the collection control circuit selects and reads one of the plurality of packets Z waiting in the plurality of memory units based on the packet Y read from the packet queue circuit and outputs the selected packet from the packet output line. Is disclosed.
Japanese Patent Laid-Open No. 5-28291 JP-A-3-233740

しかしながら、複数のメモリ部のいずれか１つを選択して収集制御回路側へデータを読み出すので、メモリ部で並列処理ができても、データ読み出しで動作が遅れ、データ駆動型の並列処理の利点を充分に活かすことができない。 However, since any one of a plurality of memory units is selected and data is read to the collection control circuit side, even if parallel processing can be performed in the memory unit, the operation is delayed in data reading, and the advantages of data-driven parallel processing Cannot be fully utilized.

また、メモリ部の前段と後段との間のみで順序を維持しているので、収集制御回路が複数段のノードを備えている場合には、データの混雑が偏在すると、データの順序が狂うことがあり得る。 In addition, since the order is maintained only between the former stage and the latter stage of the memory unit, if the collection control circuit has multiple stages of nodes, the data order may be out of order if data congestion is unevenly distributed. There can be.

さらに、パケット入力線が複数あって並列にパケットが供給される場合には、データの順序を維持することができない。 Furthermore, when there are a plurality of packet input lines and packets are supplied in parallel, the data order cannot be maintained.

本発明の目的は、このような問題点に鑑み、複数の合流ノード間で並列動作を可能にするとともにパケット順序維持を可能にする順序合流制御装置及びこれを備えたデータ駆動型処理装置を提供することにある。 In view of such problems, an object of the present invention is to provide an order merging control device that enables a parallel operation between a plurality of merging nodes and maintains a packet order, and a data driven processing device including the same. There is to do.

本発明の第１態様は、第１パケットが分流路内の分岐ノードを通れば、該第１パケットに対応した第２パケットが合流路内の合流ノードを通り、且つ、該分岐ノードでの該第１パケットの分岐方向と該合流ノードでの該第２パケットの合流方向とが対応しており、該分岐ノードと該合流ノードとの間のパイプライン段数がＮ（Ｎ≧１）であるデータ駆動型処理装置の該合流ノードでの入力選択を制御する順序合流制御回路を、該合流路内の複数の合流ノードのそれぞれに関して備えたデータ駆動型処理装置用順序合流制御装置であって、各順序合流制御回路は、
Ｎ段以上のキューと、
該分岐ノードでのパケットに含まれる分岐方向情報を該キューの入力段へ転送させる第１転送制御回路と、
該キューの出力段の出力を該合流ノードの入力選択制御信号として転送させる第２転送制御回路と、を有する。In the first aspect of the present invention, when the first packet passes through the branch node in the branch path, the second packet corresponding to the first packet passes through the junction node in the junction path, and the branch node Data in which the branch direction of the first packet corresponds to the merge direction of the second packet at the merge node, and the number of pipeline stages between the branch node and the merge node is N (N ≧ 1) An order merging control device for a data driven type processing apparatus, comprising an order merging control circuit for controlling input selection at the merging node of the driving type processing apparatus with respect to each of a plurality of merging nodes in the merging channel The sequential merge control circuit
A queue of N stages or more,
A first transfer control circuit for transferring branch direction information contained in a packet at the branch node to an input stage of the queue;
A second transfer control circuit that transfers an output of the output stage of the queue as an input selection control signal of the junction node.

本発明によるデータ駆動型処理装置用順序合流制御装置の第２態様では、第１態様において、該分岐方向情報は、行先アドレスのうち該分岐ノードから次段への行先を示すビットである。 In a second aspect of the sequential merge control device for a data driven processor according to the present invention, in the first aspect, the branch direction information is a bit indicating a destination from the branch node to the next stage in the destination address.

本発明によるデータ駆動型処理装置用順序合流制御装置の第３態様では、第１態様において、該第１転送制御回路は、該パケットに含まれる順序情報が順序有りを示している場合に、該入力段への転送を有効にする。 In a third aspect of the order merging control device for a data driven processing device according to the present invention, in the first aspect, the first transfer control circuit, when the order information included in the packet indicates that there is an order, Enable transfer to the input stage.

本発明によるデータ駆動型処理装置用順序合流制御装置の第４態様では、第３態様において、該順序情報は、該パケットが該合流ノードへの途中のノードで消失する場合に順序無しを示す。 In a fourth aspect of the order merging control apparatus for data-driven processor according to the present invention, in the third aspect, the order information indicates that there is no order when the packet is lost at a node on the way to the merging node.

本発明によるデータ駆動型処理装置用順序合流制御装置の第５態様では、第１態様において、該第１転送制御回路は、該分岐ノードから次段ノードへのＳＥＮＤ信号をアクティブにするときに、該キューの入力段へのＳＥＮＤ信号をアクティブにする。 In a fifth aspect of the sequential merge control device for a data driven processor according to the present invention, in the first aspect, the first transfer control circuit activates the SEND signal from the branch node to the next stage node. Activate the SEND signal to the input stage of the queue.

本発明によるデータ駆動型処理装置用順序合流制御装置の第６態様では、第５態様において、該第１転送制御回路は、該分岐ノードの後段からのＳＥＮＤ信号がアクティブであり、且つ、該分岐ノードの次段及び該キューの入力段からのＡＣＫ信号が共にアクティブであるときに、該分岐ノードにデータを取り込ませて保持させ、該次段及び該入力段へのＳＥＮＤ信号をアクティブにする。 In a sixth aspect of the sequential merging control device for a data driven processor according to the present invention, in the fifth aspect, the first transfer control circuit is configured such that the SEND signal from the subsequent stage of the branch node is active and the branch When the ACK signal from the next stage of the node and the input stage of the queue are both active, the branch node captures and holds data, and activates the SEND signal to the next stage and the input stage.

本発明によるデータ駆動型処理装置用順序合流制御装置の第７態様では、第１態様において、該第２転送制御回路は、該合流ノードの後段へのＡＣＫ信号をアクティブにするときに、該キューの出力段に対するＡＣＫをアクティブにする。 In a seventh aspect of the sequential merging controller for data-driven processing device according to the present invention, in the first aspect, the second transfer control circuit activates the ACK signal to the subsequent stage of the merging node. Activate ACK for the output stage.

本発明によるデータ駆動型処理装置用順序合流制御装置の第８態様では、第６態様において、該第２転送制御回路は、該キューの出力段及び該合流ノードの後段の一方からのＳＥＮＤ信号が共にアクティブになり且つ該合流ノードの次段からのＡＣＫ信号がアクティブになったときに、該合流段にデータを取り込ませて保持させ、該キューの出力段に対するＡＣＫをアクティブにする。 In an eighth aspect of the sequential merging controller for a data driven processor according to the present invention, in the sixth aspect, the second transfer control circuit is configured such that the SEND signal from one of the output stage of the queue and the subsequent stage of the merging node is received. When both become active and the ACK signal from the next stage of the joining node becomes active, data is taken in and held in the joining stage, and ACK for the output stage of the queue is activated.

本発明の第９態様では、第８態様において、該第２転送制御回路は、該第１パケットに対応した第２パケットが複数パケットで構成され、該複数パケットの末尾パケット以外の連接ビットが後続有りを示している場合、例外として、該キューの出力段に対するＡＣＫをインアクティブに維持する。 According to a ninth aspect of the present invention, in the eighth aspect, the second transfer control circuit is configured such that the second packet corresponding to the first packet is composed of a plurality of packets, and a concatenated bit other than the last packet of the plurality of packets follows. If it is present, the ACK for the output stage of the queue is kept inactive as an exception.

本発明の第１０態様では、分岐ノードと合流ノードとを複数対備え、各対について、第１パケットが分岐ノードを通れば、該第１パケットに対応した第２パケットが合流ノードを通り、且つ、該分岐ノードでの該第１パケットの分岐方向と該合流ノードでの該第２パケットの合流方向とが対応しているデータ駆動型処理装置であって、さらに、
該複数対のうち、分岐ノードと合流ノードとの間のパイプライン段数が１以上である対のそれぞれに第１乃至９態様のいずれかに記載のデータ駆動型処理装置用順序合流制御装置を備えている。In the tenth aspect of the present invention, a plurality of pairs of branch nodes and junction nodes are provided, and for each pair, if the first packet passes through the branch node, the second packet corresponding to the first packet passes through the junction node, and A data-driven processing device in which a branch direction of the first packet at the branch node and a merge direction of the second packet at the merge node correspond to each other, and
The sequential merging controller for a data driven processing device according to any one of the first to ninth aspects is provided in each of the plurality of pairs in which the number of pipeline stages between the branch node and the merging node is one or more. ing.

本発明の第１１態様では、第１０態様において、
第１入口ノードに供給されるパケットを、パケット内の行先アドレスの値に応じ下流側のノードへ順次選択的に分流させて、複数の第１出口ノードのうちの該値に対応したものへ到達させるツリー形分流路と、
該複数の第１出口ノードのそれぞれに対応した第２入口ノードを有し、パケットを下流側のノードへ順次選択的に合流させて第２出口ノードに到達させるツリー形合流路と、
各第１出口ノードとこれに対応する第２入口ノードとの間に結合され、該第１出口ノードからのパケットの内容に応じたパケットを該第２入口ノードへ供給する第１機能エレメントと、
を有し、該ツリー形分流路及び該ツリー形合流路の各ノードはパイプラインステージを構成し、該ツリー形分流路及び該ツリー形合流路はいずれも３ステージ以上有し、
該ツリー形分流路を構成する複数の分岐ノードと該ツリー形合流路を構成する複数の合流ノードとが該複数対を構成している。In an eleventh aspect of the present invention, in the tenth aspect,
Packets supplied to the first ingress node are selectively diverted to downstream nodes sequentially according to the destination address value in the packet, and reach the one corresponding to the value among the plurality of first egress nodes A tree-shaped branch channel
A tree-shaped merge path having a second entry node corresponding to each of the plurality of first exit nodes, selectively joining packets sequentially to a downstream node and reaching the second exit node;
A first functional element coupled between each first egress node and a corresponding second ingress node for supplying a packet according to the content of the packet from the first egress node to the second egress node;
And each node of the tree-shaped branch channel and the tree-shaped joint channel constitutes a pipeline stage, and each of the tree-shaped branch channel and the tree-shaped joint channel has three or more stages,
A plurality of branch nodes constituting the tree-shaped branch channel and a plurality of junction nodes constituting the tree-shaped junction channel constitute the plurality of pairs.

パケットの順序の乱れは、選択的に合流するノードでのパケット追い越し、すなわち分岐ノードでのパケット順序が、これに対応した合流ノードでのパケット順序と相違することにより生ずる。 The disorder of the packet order is caused by the packet overtaking at the node that selectively joins, that is, the packet order at the branch node is different from the packet order at the corresponding joining node.

上記第１態様の構成によれば、複数の分岐ノードのそれぞれについて、パケット分岐方向に基づき、これに対応した合流ノードでのパケット合流方向を制御するので、複数の合流ノード間で並列動作が可能になるとともに、場所によりパケット混雑度が異なっていても、パケットの順序を維持することが可能となるという効果を奏し、データ駆動型処理装置の基盤技術として寄与するところが大きい。 According to the configuration of the first aspect described above, for each of a plurality of branch nodes, the packet junction direction at the junction node corresponding to this is controlled based on the packet branch direction, so a parallel operation is possible between the plurality of junction nodes. At the same time, even if the degree of packet congestion varies depending on the location, it is possible to maintain the order of the packets, which greatly contributes to the fundamental technology of the data driven processing device.

上記第２態様の構成によれば、分岐方向情報が行先アドレスのうちの１ビットであるので、構成が簡単であるという効果を奏する。 According to the configuration of the second aspect, since the branch direction information is 1 bit of the destination address, there is an effect that the configuration is simple.

上記第３態様の構成によれば、パケットに含まれる順序情報が順序有りを示している場合のみパケット合流方向が制御されるので、順序に無関係なパケットを流路上に混在させることができるという効果を奏する。 According to the configuration of the third aspect, the packet merging direction is controlled only when the order information included in the packet indicates that there is an order, so that it is possible to mix packets irrelevant to the order on the flow path. Play.

上記第４態様の構成によれば、合流ノードへの途中のノードで消失するパケットが存在しても本発明を適用できるという効果を奏する。 According to the configuration of the fourth aspect, there is an effect that the present invention can be applied even if there is a packet that disappears at a node on the way to the joining node.

上記第５又は６態様の構成によれば、一般的なハンドシェイクプロトコルによりキューの入力段へ分岐方向情報を転送することができるという効果を奏する。 According to the configuration of the fifth or sixth aspect, it is possible to transfer the branch direction information to the input stage of the queue by a general handshake protocol.

上記第７又は８態様の構成によれば、一般的なハンドシェイクプロトコルによりキューの出力段を制御することができるという効果を奏する。 According to the structure of the said 7th or 8th aspect, there exists an effect that the output stage of a queue can be controlled by a general handshake protocol.

上記第９態様の構成によれば、分岐ノードでの１パケットが合流ノードでの複数パケットに対応する場合であっても、パケットの順序を維持することが可能となるという効果を奏する。 According to the configuration of the ninth aspect, it is possible to maintain the order of the packets even when one packet at the branch node corresponds to a plurality of packets at the junction node.

上記第１０態様の構成によれば、流路が分岐ノードと合流ノードとを複数対備えた場合であっても、パケットの順序を維持することが可能となり、これによりパケット待ち合わせの構成を簡単化できるという効果を奏する。 According to the configuration of the tenth aspect, it is possible to maintain the order of packets even when the flow path includes a plurality of pairs of branch nodes and junction nodes, thereby simplifying the configuration of packet waiting. There is an effect that can be done.

上記第１１態様の構成によれば、パイプラインステージ数が３以上のツリー形分流路とツリー形合流路とを備え、両者間に機能エレメントアレイを備えていても、上記効果を奏する。また、ツリー形分流路により多数の機能エレメントの１つに対しパケットを選択的に転送できるとともに、流路幅が比較的広い分流路出口側及び合流路入口側でパケットの混雑が避けられるので、機能エレメントでの処理の遅延が複数の機能エレメントでの分散並列処理により吸収され、ランダムアクセス又はランダム処理のスループットが比較的高いという効果を奏する。 According to the configuration of the eleventh aspect, the above-described effect can be obtained even if a tree-shaped branch channel and a tree-shaped merge channel having three or more pipeline stages are provided and a functional element array is provided between them. In addition, the packet can be selectively transferred to one of a large number of functional elements by the tree-shaped branching channel, and the congestion of the packet is avoided on the branching channel outlet side and the combined channel inlet side with a relatively wide channel width. Processing delays in the functional elements are absorbed by distributed parallel processing in a plurality of functional elements, and the random access or random processing throughput is relatively high.

本発明の他の目的、構成及び効果は以下の説明から明らかになる。 Other objects, configurations and effects of the present invention will become apparent from the following description.

図１は、非同期（自己タイミング）式のデータ駆動型メモリ１０を示す概略ブロック図である。 FIG. 1 is a schematic block diagram showing an asynchronous (self-timing) type data driven memory 10.

メモリ１０では、分流路２０の下流側に、機能エレメントアレイとしてのメモリ行アレイ３０を介して合流路４０が接続されている。 In the memory 10, a joint channel 40 is connected to the downstream side of the branch channel 20 via a memory row array 30 as a functional element array.

図２（Ａ）は、メモリ行アレイ３０の配列の具体例を示す。 FIG. 2A shows a specific example of the arrangement of the memory row array 30.

メモリ行アレイ３０の行及び列をそれぞれセット番号及びページ番号で識別する。説明の簡単化のため、メモリ行アレイ３０が６４行、１ページが８ワード、１ワードが３２ビットであるとする。以下では、メモリ行アレイ３０に対するリード及びライトがそれぞれ、ページ単位及びワード単位で行われる場合を説明する。 The rows and columns of the memory row array 30 are identified by a set number and a page number, respectively. For simplicity of explanation, it is assumed that the memory row array 30 has 64 rows, 1 page has 8 words, and 1 word has 32 bits. Hereinafter, a case where reading and writing with respect to the memory row array 30 are performed in units of pages and words will be described.

図１に戻って、分流路２０は、入口ノード２１１に供給されるパケットを、その行先アドレスに応じて順次選択的に分岐させるものであり、アドレスデコーダとして機能する。 Returning to FIG. 1, the diversion channel 20 divides packets supplied to the ingress node 211 sequentially and selectively according to the destination address, and functions as an address decoder.

図２（Ｂ）は、分流路２０でのパケットのフォーマットを示す。 FIG. 2B shows a packet format in the branch path 20.

パケット５０は、１ビットのコマンドフィールドと、１１ビットのアドレスフィールドと、３２ビットのデータフィールドとからなる。コマンドＣＭＤは、'０'のときリード、'１'のときライトを示す。アドレスＡＤＲは、上位６ビットの行先アドレスＤＡと、中位２ビットのページアドレスＰＡと、下位３ビットのワードアドレスＷＡとに分けられる。 The packet 50 includes a 1-bit command field, an 11-bit address field, and a 32-bit data field. The command CMD indicates read when “0” and write when “1”. The address ADR is divided into an upper 6-bit destination address DA, a middle 2-bit page address PA, and a lower 3-bit word address WA.

行先アドレスＤＡは、分流路２０の行先、すなわちメモリ行アレイ３０の行（セット番号）を示す。ページアドレスＰＡは、パケット５０が行先アドレスへ到達した後に、そのメモリ行におけるリード対象の識別に用いられる。ページアドレスＰＡとページ内ワードアドレスＷＡとの組は、パケット５０が行先アドレスへ到達した後に、メモリ行におけるライト対象の識別に用いられる。データＤＡＴＡは、ライトのデータであり、リードの場合にはダミーである。 The destination address DA indicates the destination of the branch channel 20, that is, the row (set number) of the memory row array 30. The page address PA is used for identifying the read target in the memory row after the packet 50 reaches the destination address. The set of the page address PA and the in-page word address WA is used for identifying the write target in the memory row after the packet 50 reaches the destination address. Data DATA is write data, and is dummy in the case of reading.

以下では、コマンドＣＭＤがリードの場合の分流路２０及び合流路４０でのパケットをそれぞれリードパケット及びリードデータパケット、ライトの場合の分流路２０でのパケットをライトデータパケットと称す。 Hereinafter, packets in the branch channel 20 and the combined channel 40 when the command CMD is read are referred to as a read packet and a read data packet, respectively, and packets in the branch channel 20 in the case of write are referred to as a write data packet.

合流路４０でのリードデータパケットは、パケット５０からコマンドＣＭＤの１ビットを除いた４３ビットであり、アドレスＡＤＲは、合流路４０の出口ノードに到達したパケット内のデータの識別に用いられる。 The read data packet in the combined channel 40 is 43 bits obtained by removing one bit of the command CMD from the packet 50, and the address ADR is used for identifying data in the packet that has reached the exit node of the combined channel 40.

図１に戻って、分流路２０及び合流路４０はいずれも６段パイプラインであり、各パイプラインステージにおけるノードは、ラッチと、転送制御回路とを備えている。 Returning to FIG. 1, each of the branch flow path 20 and the combined flow path 40 is a six-stage pipeline, and a node in each pipeline stage includes a latch and a transfer control circuit.

図３は、束データ方式で分流路２０を構成した場合の第１段と第２段とのノードで構成される分流回路を示す概略ブロック図である。 FIG. 3 is a schematic block diagram showing a shunt circuit composed of nodes of the first stage and the second stage when the shunt flow path 20 is configured by the bundle data method.

第１段の入口ノード２１１は、ラッチ２１１Ｌと転送制御回路２１１Ｃとを備え、第２段のノード２２１は、ラッチ２２１Ｌと転送制御回路２２１Ｃとを備え、第２段のノード２２２は、ラッチ２２２Ｌと転送制御回路２２２Ｃとインバータ２２２Ｇとを備えている。転送制御回路２１１Ｃ、２２１Ｃ及び２２２Ｃはそれぞれ、ラッチ２１１Ｌ、２２１Ｌ及び２２２Ｌ内の入力段ゲート開閉をハンドシェイクプロトコルで行うものであり、段間が縦続接続されている。 The first-stage entry node 211 includes a latch 211L and a transfer control circuit 211C, the second-stage node 221 includes a latch 221L and a transfer control circuit 221C, and the second-stage node 222 includes a latch 222L. A transfer control circuit 222C and an inverter 222G are provided. The transfer control circuits 211C, 221C, and 222C perform input stage gate opening / closing in the latches 211L, 221L, and 222L by a handshake protocol, and the stages are cascaded.

転送制御回路はいずれも、後段からのＳＥＮＤ−ＩＮ（転送要求入力）信号がアクティブ、すなわち後段からのデータが確定していて、前段からのＡＣＫ−ＩＮ（転送許可入力）信号がアクティブ、すなわち前段がエンプティである場合に、ラッチのクロック入力端ＣＫにパルスを供給して後段からのデータをラッチに取り込み保持し、特別な制限がなければ後段へのＡＣＫ−ＯＵＴ信号をアクティブにし、前段へデータが到達したと考えられる所定時間経過後に前段へのＳＥＮＤ−ＯＵＴ信号をアクティブにする。 In any of the transfer control circuits, the SEND-IN (transfer request input) signal from the subsequent stage is active, that is, the data from the subsequent stage is determined, and the ACK-IN (transfer permission input) signal from the previous stage is active, that is, the previous stage. Is empty, a pulse is supplied to the clock input terminal CK of the latch to capture and hold data from the subsequent stage, and if there is no special restriction, the ACK-OUT signal to the subsequent stage is activated and the data to the previous stage is activated. The SEND-OUT signal to the previous stage is made active after a lapse of a predetermined time considered to have arrived.

各転送制御回路は、出力を有効／無効にするための制御入力端を備えており、転送制御回路２２１Ｃ及び２２２Ｃの該制御入力端にはそれぞれ、ラッチ２１１Ｌに保持されたパケットの行先アドレス（ＤＡ５〜ＤＡ０）ＤＡの最上位ビットＤＡ５及びこれをインバータ２２２Ｇで反転させたものが供給される。したがって、ビットＤＡ５が'１'の場合、ラッチ２２１Ｌ及び２２２Ｌがそれぞれ有効及び無効になって、ラッチ２１１Ｌの内容がラッチ２２１Ｌに保持され、ビットＤＡ５が'０'の場合、ラッチ２２１Ｌ及び２２２Ｌがそれぞれ無効及び有効になって、ラッチ２１１Ｌの内容がラッチ２２２Ｌに保持される。 Each transfer control circuit has a control input terminal for enabling / disabling the output, and the destination address (DA5) of the packet held in the latch 211L is provided at each control input terminal of the transfer control circuits 221C and 222C. ... DA0) The most significant bit DA5 of DA and its inverted version by the inverter 222G are supplied. Therefore, when the bit DA5 is “1”, the latches 221L and 222L are enabled and disabled, respectively, and the contents of the latch 211L are held in the latch 221L. When the bit DA5 is “0”, the latches 221L and 222L are respectively It becomes invalid and valid, and the contents of the latch 211L are held in the latch 222L.

各転送制御回路はさらに、不図示のリセット入力端を有し、システムリセット時にこれにリセットパルスが供給されて、ＡＣＫ−ＩＮ及びＡＣＫ−ＯＵＴがアクティブ、ＳＥＮＤ−ＩＮ及びＳＥＮＤ−ＯＵＴがインアクティブになる。 Each transfer control circuit further has a reset input terminal (not shown), which is supplied with a reset pulse at the time of system reset so that ACK-IN and ACK-OUT are active, and SEND-IN and SEND-OUT are inactive. Become.

転送制御回路は各種のものが公知であるので、その構成の説明を省略する。 Since various types of transfer control circuits are known, description of the configuration is omitted.

図１に戻って、例えばノード２２１に保持されたパケットは、行先アドレスＤＡの第２ビットに応じてノード２３１又はノード２３２に保持され、例えばノード２３２に保持されたパケットは、行先アドレスＤＡの第３ビットに応じてノード２４３又はノード２４４に保持される。以下同様にして、分流路２０の行先アドレスＤＡの内容に応じ、第６段に配置された３２個の出口ノードの１つにパケットが到達する。各出力ノードは２つの分岐出力を有する。 Returning to FIG. 1, for example, the packet held in the node 221 is held in the node 231 or the node 232 according to the second bit of the destination address DA, and the packet held in the node 232 is, for example, the first address of the destination address DA. It is held in the node 243 or the node 244 according to 3 bits. Similarly, the packet arrives at one of the 32 exit nodes arranged in the sixth stage according to the contents of the destination address DA of the diversion channel 20. Each output node has two branch outputs.

各ノードにおいて、行先アドレスＤＡの対応するビットが'１'／'０'のとき図１においてそれぞれ上側／下側へデータが分岐するように定められているとする。例えば行先アドレスＤＡが'１１１１１１'の場合、このパケットは出力ノード２６１に到達する。ノード２６１において、行先アドレスＤＡの最下位ビットＤＡ０が'１'であるとき、メモリ行アレイ３０のメモリ行３１が有効にされ、ビットＤＡ０が'０'であるとき、メモリ行３２が有効にされる。 In each node, it is assumed that when the corresponding bit of the destination address DA is “1” / “0”, it is determined that data is branched upward / downward in FIG. For example, when the destination address DA is “111111”, this packet reaches the output node 261. At the node 261, when the least significant bit DA0 of the destination address DA is “1”, the memory row 31 of the memory row array 30 is enabled, and when the bit DA0 is “0”, the memory row 32 is enabled. The

メモリ行アレイ３０を構成する６４個のメモリ行は、互いに同一構成である。各メモリ行は、その入力端及び出力端がそれぞれ分流路２０及び合流路４０の対応する出力端及入力端に結合されている。分流路２０の出力端及び合流路４０の入力端のそれぞれにラッチを接続することもできるが、段数を少なくしてターンアランドタイムを短縮するために、図１ではこれらのラッチが省略された構成となっている。 The 64 memory rows constituting the memory row array 30 have the same configuration. Each memory row has its input end and output end coupled to the corresponding output end and input end of the shunt channel 20 and the combined channel 40, respectively. A latch can be connected to each of the output end of the branch flow path 20 and the input end of the combined flow path 40, but in order to reduce the number of stages and shorten the turnaround time, these latches are omitted in FIG. It has become.

図５は、図１の分流路２０の出力ノード２６１と合流路４０の入口ノード４１１との間に接続されたメモリ行３１及び３２を示す概略ブロック図である。 FIG. 5 is a schematic block diagram showing the memory rows 31 and 32 connected between the output node 261 of the branch channel 20 and the inlet node 411 of the combined channel 40 in FIG.

メモリ行３１及び３２は、ノード２６１と入口ノード４１１との間に接続されている。ノード２６１は、ラッチ２６１Ｌと、この入力ゲートを開閉する転送制御回路２６１Ｃとからなり、入口ノード４１１は、ラッチ４１１Ｌと、この入力ゲートを開閉する転送制御回路４１１Ｃとからなる。 Memory rows 31 and 32 are connected between node 261 and entry node 411. The node 261 includes a latch 261L and a transfer control circuit 261C that opens and closes the input gate. The entry node 411 includes a latch 411L and a transfer control circuit 411C that opens and closes the input gate.

メモリ行３１及び３２には、ループ状の３２ビットのデータバスとアドレスＡＤＲの上位８ビットのアドレスバスからなるループ配線３１０が配設され、これがラッチ２６１Ｌのデータ出力端及びラッチ４１１Ｌのデータ入力端に接続されている。ループ配線３１０のデータバスには、メモリ行３１の構成要素である３２個のワードメモリ３１０Ｗ〜３１３１Ｗのそれぞれのデータ入力端及びデータ出力端が接続され、同様にメモリ行３２の構成要素である３２個のワードメモリ３２０Ｗ〜３２３１Ｗのそれぞれのデータ入力端及びデータ出力端が接続されている。 The memory rows 31 and 32 are provided with a loop wiring 310 comprising a looped 32-bit data bus and an upper 8-bit address bus of the address ADR, which are arranged as a data output terminal of the latch 261L and a data input terminal of the latch 411L. It is connected to the. The data input terminal and the data output terminal of each of the 32 word memories 310W to 3131W that are constituent elements of the memory row 31 are connected to the data bus of the loop wiring 310. Similarly, the data bus 32 is a constituent element of the memory row 32. Data input terminals and data output terminals of the word memories 320W to 3231W are connected.

これらワードメモリ３１０Ｗ〜３１３１Ｗ及び３２０Ｗ〜３２３１Ｗのそれぞれのクロック入力端ＣＫ及び出力イネーブル制御入力端ＯＥを制御するために、転送制御回路２６１Ｃと転送制御回路４１１Ｃとの間に制御回路３１１が接続されている。 A control circuit 311 is connected between the transfer control circuit 261C and the transfer control circuit 411C in order to control the clock input terminal CK and the output enable control input terminal OE of each of the word memories 310W to 3131W and 320W to 3231W. Yes.

制御回路３１１には、ラッチ２６１Ｌに保持されたコマンドＣＭＤ、ページアドレスＰＡ、ワードアドレスＷＡ及びにラッチ４１１Ｌのクロック入力端ＣＫに供給されるクロックパルスＣＫ１が供給される。制御回路３１１は、このクロックパルスＣＫ１をカウントするカウンタ３１１ａを備え、リードの場合、そのカウントをワードアドレスＷＸとして、ラッチ４１１Ｌのデータ入力端のワードアドレスＷＡ部に供給する。 The control circuit 311 is supplied with the command CMD, page address PA, word address WA, and clock pulse CK1 supplied to the clock input terminal CK of the latch 411L held in the latch 261L. The control circuit 311 includes a counter 311a that counts the clock pulse CK1. In the case of reading, the control circuit 311 supplies the count as the word address WX to the word address WA section at the data input end of the latch 411L.

制御回路３１１は、転送制御回路２６１ＣからのＳＥＮＤ１及び転送制御回路４１１ＣからのＡＣＫ２のいずれか一方又は両方がインアクティブの場合には、各ワードメモリのクロック入力端ＣＫ及び出力イネーブル制御入力端ＯＥをインアクティブに維持してその入力ゲート及び出力ゲートを閉じる（ワードメモリのアクセスを無効にする）。 When one or both of SEND1 from the transfer control circuit 261C and ACK2 from the transfer control circuit 411C are inactive, the control circuit 311 sets the clock input terminal CK and the output enable control input terminal OE of each word memory. Keep it inactive and close its input and output gates (disable word memory access).

制御回路３１１は、転送制御回路２６１ＣからのＳＥＮＤ１及び４４１ＣからのＡＣＫ２が共にアクティブになると、カウンタ３１１ａをゼロクリアし、アドレスＡＤＲのうち、ビットＤＡ０が'１'であればワードメモリ３２０Ｗ〜３２３１Ｗのアクセスを無効にし、以下のような制御を行う。 When both SEND1 from the transfer control circuit 261C and ACK2 from the 441C become active, the control circuit 311 clears the counter 311a to zero, and if the bit DA0 of the address ADR is “1”, the word circuits 320W to 3231W are accessed. Is disabled and the following control is performed.

制御回路３１１は、コマンドＣＭＤがリードを示していれば、転送制御回路２６１Ｃに対するＡＣＫ１をインアクティブに維持した状態で、次のような制御を行う。 If the command CMD indicates a read, the control circuit 311 performs the following control while keeping ACK1 for the transfer control circuit 261C inactive.

（１）ワードメモリ３１３１Ｗ〜３１０Ｗのうち、ページアドレスＰＡとワードアドレスＷＸとで指定されるワードメモリの出力イネーブル制御入力端ＯＥをアクティブにさせて、このワードメモリの内容をループ配線３１０上に読み出させ、このデータがラッチ４１１Ｌのデータ入力端で確定したと考えられる所定時間経過後に、ＳＥＮＤ２をアクティブにさせる。転送制御回路４１１Ｃはこれに応答して、次段からのＡＣＫがアクティブであれば、クロックパルスＣＫ１をラッチ４１１Ｌのクロック入力端ＣＫに供給してループ配線３１０上のデータ（ＤＡＴＡ、ＤＡ及びＰＡ）及び制御回路３１１からのワードアドレスＷＸをラッチ４１１Ｌに取り込ませ保持させ、次いでＡＣＫ２をアクティブにさせる。制御回路３１１は、クロックパルスＣＫ１をカウンタ３１１ａでカウントしてワードアドレスＷＸをインクリメントし、ＡＣＫ２のアクティブに応答してＳＥＮＤ２をインアクティブにさせる。 (1) Among the word memories 3131W to 310W, the output enable control input terminal OE of the word memory designated by the page address PA and the word address WX is activated, and the contents of the word memory are read onto the loop wiring 310. SEND2 is activated after a lapse of a predetermined time which is considered to be determined at the data input terminal of the latch 411L. In response to this, if the ACK from the next stage is active, the transfer control circuit 411C supplies the clock pulse CK1 to the clock input terminal CK of the latch 411L and the data (DATA, DA, and PA) on the loop wiring 310. The word address WX from the control circuit 311 is fetched and held in the latch 411L, and then ACK2 is activated. The control circuit 311 counts the clock pulse CK1 with the counter 311a, increments the word address WX, and makes SEND2 inactive in response to the activation of ACK2.

（２）入口ノード４１１から次段へのデータ転送が完了すると、ＡＣＫ２がアクティブになり、制御回路３１１はこれに応答して、カウンタ３１１ａの値が８未満であれば（１）へ戻る。 (2) When the data transfer from the ingress node 411 to the next stage is completed, ACK2 becomes active. In response to this, the control circuit 311 returns to (1) if the value of the counter 311a is less than 8.

カウンタ３１１ａの値が８になれば、転送制御回路２６１Ｃに対するＡＣＫ１をアクティブにして、ラッチ２６１Ｌがその後段からのデータを取り込めるようにさせる。 When the value of the counter 311a becomes 8, ACK1 for the transfer control circuit 261C is made active so that the latch 261L can take in data from the subsequent stage.

このような処理により、ノード２６１に保持されたアドレスＡＤＲのページアドレスＰＡで示される８ワードの記憶内容が順次メモリ行３１からラッチ４１１Ｌへ転送される。 By such processing, the stored contents of 8 words indicated by the page address PA of the address ADR held in the node 261 are sequentially transferred from the memory row 31 to the latch 411L.

制御回路３１１は、コマンドＣＭＤがライトを示していれば、ＳＥＮＤ２をインアクティブに維持した状態で、アドレスＡＤＲのページアドレスＰＡとワードアドレスＷＡとで指定されるワードメモリのクロック入力端ＣＫにパルスを供給して、ループ配線３１０上のデータをこのワードメモリに取り込ませ保持させ、次いでＡＣＫ１をアクティブにする。 If the command CMD indicates write, the control circuit 311 pulses the clock input terminal CK of the word memory specified by the page address PA and the word address WA of the address ADR while maintaining SEND2 inactive. Then, the data on the loop wiring 310 is taken in and held in the word memory, and then ACK1 is activated.

このようなメモリアクセスを、メモリ行アレイ３０のうち最大３２個のメモリ行に対し同時に行うことが可能である。 Such memory access can be performed simultaneously on up to 32 memory rows in the memory row array 30.

リードパケットの場合、図１に戻って、合流路４０のどの入口ノードからでも、出口ノード４６１に到達する。すなわち、合流路４０では、経路選択に行先アドレスを用いる必要がない。合流路４０の各ノードでは、２入力のうち先に到達したデータを選択的に保持する。 In the case of a read packet, returning to FIG. 1, it reaches the exit node 461 from any entrance node of the combined flow path 40. That is, in the joint channel 40, it is not necessary to use the destination address for route selection. In each node of the combined flow path 40, the data that has reached first among the two inputs is selectively held.

図４は、束データ方式で合流路４０を構成した場合の第２段と第３段の一部である合流回路を示す概略ブロック図である。 FIG. 4 is a schematic block diagram showing a joining circuit that is a part of the second stage and the third stage when the joining channel 40 is configured by the bundle data method.

第２段のノード４２１は、ラッチ４２１Ｌと転送制御回路４２１Ｃとを備え、第２段のノード４２２は、ラッチ４２２Ｌと転送制御回路４２２Ｃとインバータ４２２Ｇとを備え、第３段のノード４３１は、ラッチ４３１Ｌと転送制御回路４３１Ｃとを備えている。転送制御回路４２１Ｃ、４２２Ｃ及び４３１Ｃはそれぞれ、ラッチ４２１Ｌ、４２２Ｌ及び４３１Ｌ内の入力段ゲート開閉をハンドシェイクプロトコルで行うものであり、段間が縦続接続されている。 The second stage node 421 includes a latch 421L and a transfer control circuit 421C, the second stage node 422 includes a latch 422L, a transfer control circuit 422C, and an inverter 422G, and the third stage node 431 includes a latch. 431L and a transfer control circuit 431C. The transfer control circuits 421C, 422C, and 431C perform input stage gate opening and closing in the latches 421L, 422L, and 431L, respectively, by a handshake protocol, and the stages are cascaded.

図４の回路は、図３の回路において信号の方向を逆にしたものになっている。但し、行き先アドレスのビットによる制御は行われていない。また、ラッチ４２１Ｌの出力とラッチ４２２Ｌの出力との衝突を避けるため、各ラッチは出力イネーブル制御入力端ＯＥを備え、転送制御回路４３１Ｃからラッチ４２１Ｌの出力イネーブル制御入力端ＯＥへ直接、ラッチ４２２Ｌにはインバータ４２２Ｇを介して出力イネーブル制御入力端ＯＥへ、制御信号が供給される。転送制御回路４３１Ｃは、転送制御回路４２１ＣからのＳＥＮＤ−ＩＮと転送制御回路４２２ＣからのＳＥＮＤ−ＩＮのうち先にアクティブになった方に対応するラッチの出力イネーブル制御入力端ＯＥを'１'にし、他方を'０'にする。 The circuit of FIG. 4 is obtained by reversing the signal direction in the circuit of FIG. However, control by the bit of the destination address is not performed. Further, in order to avoid a collision between the output of the latch 421L and the output of the latch 422L, each latch has an output enable control input terminal OE, and the transfer control circuit 431C directly enters the latch 422L to the output enable control input terminal OE. The control signal is supplied to the output enable control input terminal OE via the inverter 422G. The transfer control circuit 431C sets the output enable control input terminal OE of the latch corresponding to the one that becomes active first among SEND-IN from the transfer control circuit 421C and SEND-IN from the transfer control circuit 422C to “1”. , Set the other to '0'.

このような制御により、選択的（排他的）合流が行われる。 By such control, selective (exclusive) merging is performed.

上記の如く構成されたメモリ１０において、入口ノード２１１にライトデータパケットを供給するとともに、入口ノード２１１へのＳＥＮＤ−ＩＮ信号をアクティブにさせると、その行先アドレスに応じ分流路２０内のパイプラインステージを順次流れてメモリ行アレイ３０に到達し、ライトデータパケット内のアドレスＡＤＲで指定されたワードに、ライトデータパケット内のデータＤＡＴＡが書き込まれる。 In the memory 10 configured as described above, when a write data packet is supplied to the ingress node 211 and the SEND-IN signal to the ingress node 211 is activated, the pipeline stage in the branch channel 20 according to the destination address. The data DATA in the write data packet is written in the word designated by the address ADR in the write data packet.

同様に、入口ノード２１１にリードパケットを供給するとともに、入口ノード２１１へのＳＥＮＤ−ＩＮ信号をアクティブにさせると、その行先アドレスＤＡに応じ分流路２０内のパイプラインステージを順次流れてメモリ行アレイ３０に到達し、リードパケット内のページアドレスＰＡで指定されたページのデータがワード単位で順次読み出され、行先アドレスＤＡの値とは無関係に、合流路４０内のパイプラインステージを順次通って出口ノード４６１に８ワード分のデータが到達する。 Similarly, when a read packet is supplied to the ingress node 211 and the SEND-IN signal to the ingress node 211 is activated, the memory row array sequentially flows through the pipeline stage in the diversion channel 20 according to the destination address DA. 30, the data of the page specified by the page address PA in the read packet is sequentially read out in units of words, and sequentially passes through the pipeline stage in the junction 40 regardless of the value of the destination address DA. Eight words of data reach the egress node 461.

入口ノード２１１内のパケットがノード２２１又はノード２２２に転送されてＡＣＫ−ＯＵＴ信号がアクティブになると、次のパケットを入口ノード２１１に保持させることができる。また、次に供給するパケットの種類は、先に供給したパケットがリードパケットであるかライトデータパケットであるかによらず、任意である。 When the packet in the ingress node 211 is transferred to the node 221 or the node 222 and the ACK-OUT signal becomes active, the next packet can be held in the ingress node 211. The type of packet to be supplied next is arbitrary regardless of whether the previously supplied packet is a read packet or a write data packet.

本実施例１のメモリ１０によれば、メモリ行アレイ３０を介してツリー形分流路２０及びツリー形合流路４０を配設するという簡単な構成で、集積配置されたメモリ行アレイ３０の任意の１行に対し、行き先アドレスを含むパケットを転送し、これに対応したパケットをツリー形合流路４０の出口ノード４６１から取り出すことができるという効果を奏する。 According to the memory 10 of the first embodiment, an arbitrary configuration of the memory row array 30 arranged in an integrated manner can be obtained with a simple configuration in which the tree-shaped branch channel 20 and the tree-shaped merge channel 40 are disposed via the memory row array 30. There is an effect that a packet including a destination address is transferred to one row, and a packet corresponding to the destination address can be taken out from the exit node 461 of the tree-shaped merge channel 40.

また、流路幅が比較的広い分流路２０の出口側及び合流路４０の入口側でパケットの混雑が避けられるので、メモリ行での処理の遅延が複数のメモリ行での分散並列処理により吸収され、ランダムアクセスのスループットが比較的高いという効果を奏する。 In addition, since congestion of packets is avoided on the outlet side of the branch channel 20 and the inlet side of the combined channel 40 having a relatively wide channel width, processing delays in the memory rows are absorbed by distributed parallel processing in a plurality of memory rows. As a result, the random access throughput is relatively high.

さらに、データ駆動型回路でプロセッサを構成した場合、非データ駆動型メモリを多数用いて並列度を上げるよりも１つのデータ駆動型メモリを用いた方が消費電力を大幅に低減できるので、特に長電池寿命が要求されるモバイル機器に用いて好適であるという効果を奏する。 Furthermore, when a processor is configured with data-driven circuits, the power consumption can be greatly reduced by using one data-driven memory rather than using many non-data-driven memories to increase parallelism. There is an effect that it is suitable for use in mobile devices that require battery life.

なお、本実施例１ではページ単位でのリードについて説明したが、行単位、ワード単位又はバイト単位等でのアクセスであってもよいことは勿論である。この点は、以下の実施例においても同様である。 In the first embodiment, reading in units of pages has been described, but it is needless to say that access may be performed in units of rows, words, or bytes. This also applies to the following embodiments.

図１のメモリ１０では、並列度が高いにもかかわらず入口ノード及び出口ノードがそれぞれ１つである点がボトルネックとなっている。図６は、この点を改良した本発明の実施例２のメモリ１０Ａを示す。 In the memory 10 of FIG. 1, the bottleneck is that there is one entrance node and one exit node each in spite of a high degree of parallelism. FIG. 6 shows a memory 10A according to the second embodiment of the present invention in which this point is improved.

このメモリ１０Ａでは、分流路２０Ａに入口ノード２１２が追加され、入口ノード２１２の出力がノード２２１及び２２２Ａに供給されて、第２段のノード２２１Ａ及び２２２Ａが２合流・２分岐回路となっている。この合流は上述の選択型であり、例えばノード２２１Ａは、入口ノード２１１と２１２からのＳＥＮＤ−ＩＮのうち先にアクティブになったものに対応するデータを取り込んで保持する。この分流路２０Ａにおいても、図１の分流路２０と同様に、行先アドレスＤＡのみで定まる出口ノードへ到達する。したがって、ライトデータパケットについては新たな規則を設ける必要がない。 In this memory 10A, an inlet node 212 is added to the branch channel 20A, the output of the inlet node 212 is supplied to the nodes 221 and 222A, and the second-stage nodes 221A and 222A form a two-merging / two-branch circuit. . This merging is the above-described selection type. For example, the node 221A captures and holds data corresponding to the previously activated SEND-IN from the ingress nodes 211 and 212. Also in the diversion channel 20A, similarly to the diversion channel 20 of FIG. 1, the diversion channel 20A reaches the exit node determined only by the destination address DA. Therefore, it is not necessary to provide a new rule for the write data packet.

合流路４０Ａでは、出力段に出口ノード４６２を追加し、ノード４５１Ａ又はノード４６２Ａから出口ノード４６２へ転送可能にしている。ノード４５１Ａ及び４６２Ａはいずれも、２合流・２分岐回路である。 In the combined flow path 40A, an exit node 462 is added to the output stage so that the node 451A or the node 462A can transfer to the exit node 462. Each of the nodes 451A and 462A is a two-merging / two-branch circuit.

ここで、ノード４５１Ａから出口ノード４６１又は出口ノード４６２のいずれにデータを転送させるかの規則が必要になる。例えば、出口ノード４６１と４６２に優先順位を付け、両方がエンプティ（ＡＣＫ−ＩＮがアクティブ）である場合にはノード４５１Ａから優先順位の高いものの方へ転送させ、一方のみ空いている場合にはそちらへ転送させるように構成することもできる。 Here, a rule is required as to whether the data is transferred from the node 451A to the egress node 461 or the egress node 462. For example, when priority is given to the egress nodes 461 and 462 and both are empty (ACK-IN is active), the node 451A transfers the node to the higher priority node, and when only one is available, It can also be configured to be transferred to.

本実施例では、データ流を整然とさせるため、図７（Ａ）に示すように、パケット５０Ａに１ビットの系統ＣＨを追加し、この値が'０'のときはノード４５１Ａ又はノード４５２Ａから出口ノード４６２へ転送させ、'１'のときには、ノード４５１Ａ又はノード４５２Ａから出口ノード４６１へ転送させる。系統ＣＨの値は、リードパケットを入口ノード２１１と２１２とのいずれに供給するかにより定める。例えば、入口ノード２１２にパケットを供給するとき、系統ＣＨに'１'をセットし、入口ノード２１１に供給するとき、系統ＣＨに'０'をセットする。 In this embodiment, in order to make the data flow orderly, as shown in FIG. 7A, a 1-bit channel CH is added to the packet 50A, and when this value is “0”, the node 451A or the node 452A exits. Transfer to the node 462, and when “1”, transfer from the node 451A or the node 452A to the egress node 461. The value of the system CH is determined by which of the entry nodes 211 and 212 supplies the read packet. For example, when a packet is supplied to the ingress node 212, “1” is set to the system CH, and when supplied to the ingress node 211, “0” is set to the system CH.

このようにしてリードパケットを入口ノード２１１へ供給すると、メモリ行アレイ３０から読み出されるデータは必ず出口ノード４６１に到達し、リードパケットを入口ノード２１２へ供給すると、メモリ行アレイ３０から読み出されるデータは必ず出口ノード４６２に到達する。パケット経路は論理的対称性を有する。すなわち、メモリ行アレイ３０の列に関し分流路２０Ａと合流路４０Ａとでパケット経路が論理的に対称（第１の対称性）になる。また、互いに相補的な行先アドレス、例えば行先アドレス０１１０１１を有するパケットの経路と行先アドレス１００１００を有するパケットの経路とが、流路方向の軸に関し互いに、論理的に対称（第２の対称性）になる。本発明では、少なくとも第２の対称性を備えておればよい。 When the read packet is supplied to the ingress node 211 in this way, the data read from the memory row array 30 always reaches the egress node 461. When the read packet is supplied to the ingress node 212, the data read from the memory row array 30 is The egress node 462 is always reached. The packet path has logical symmetry. That is, the packet path is logically symmetric (first symmetry) between the branch flow path 20A and the combined flow path 40A with respect to the columns of the memory row array 30. Further, the path of a packet having a destination address complementary to each other, for example, the path of a packet having a destination address 0101111 and the path of a packet having a destination address 100100 are logically symmetrical (second symmetry) with respect to the axis in the flow path direction. Become. In the present invention, at least the second symmetry may be provided.

図７（Ｂ）は、系統ＣＨが'０'である場合に分流路２０Ａの第１及び第２段を通り得るリードパケットの経路と、読み出されたリードデータパケットが通り得る、合流路４０Ａの第５段及び第６段の経路とを示している。点線は系統ＣＨが'０'である場合を示し、実線は系統ＣＨが'１'である場合を示す。 FIG. 7B shows the path of the read packet that can pass through the first and second stages of the branch channel 20A and the combined flow path 40A through which the read data packet that has been read can pass when the system CH is “0”. The 5th and 6th stage routes are shown. A dotted line indicates a case where the system CH is “0”, and a solid line indicates a case where the system CH is “1”.

リードパケットの行先は、系統ＣＨの値によらず、行先アドレスＤＡの値のみで定まる。例えば、系統ＣＨが'１'で行先アドレスＤＡの最上位ビットが'１'の場合、上述のように'１'で図７（Ｂ）の上側へ分岐し'０'で下側へ分岐すると定めると、入口ノード２１１に供給されたパケットはノード２２１Ａへ進む。 The destination of the read packet is determined only by the value of the destination address DA regardless of the value of the system CH. For example, when the system CH is “1” and the most significant bit of the destination address DA is “1”, as described above, when “1” branches to the upper side of FIG. 7B and “0” branches to the lower side. If determined, the packet supplied to the ingress node 211 proceeds to the node 221A.

合流路４０Ａでは、第５段まで合流はあっても分岐がないので、系統ＣＨや行先アドレスＤＡの値と無関係に経路が一意的に定まり、前記の場合、リードデータパケットはノード４５１Ａに到達する。 In the merge channel 40A, even if there is a merge up to the fifth stage, there is no branch, so the path is uniquely determined regardless of the values of the system CH and the destination address DA. In this case, the read data packet reaches the node 451A. .

系統ＣＨが'１'であるので、ノード４５１Ａから４６１Ａへ進む。行先アドレスＤＡの最上位ビットが'０'の場合についても同様にして、リードデータパケットは出口ノード４６１に到達する。すなわち、合流路４０Ａの第５〜６段での経路を系統ＣＨの値で定めると、メモリ行アレイ３０に関し分流路２０Ａと合流路４０Ａとで経路が対称になり、系統ＣＨが'１'の場合には必ず、分流路２０Ａの入口ノード２１１に対応した合流路４０Ａのノード４６１Ａに到達する。 Since the system CH is “1”, the process proceeds from the node 451A to 461A. Similarly, when the most significant bit of the destination address DA is “0”, the read data packet reaches the egress node 461. That is, when the path in the fifth to sixth stages of the combined flow path 40A is determined by the value of the system CH, the path is symmetrical between the branch path 20A and the combined flow path 40A with respect to the memory row array 30, and the system CH is '1'. In some cases, it always reaches the node 461A of the combined channel 40A corresponding to the inlet node 211 of the branch channel 20A.

他の点は上記第１実施例と同一である。 The other points are the same as in the first embodiment.

本実施例２によれば、上記のようなノードの追加及び変更により、メモリ１０Ａの入力ポート及び出力ポートの数が２倍になるので、スループットを大きく向上させることができるという効果を奏する。 According to the second embodiment, by adding and changing the nodes as described above, the number of input ports and output ports of the memory 10A is doubled, so that the throughput can be greatly improved.

また、２系統で、流路幅が比較的広い分流路２０Ａの後段及び合流路４０Ａの前段を共用するので、パフォーマンス低下を抑制しつつ通信路の規模に対する並列度を高くすることができるという効果を奏する。 Further, since the two systems share the latter stage of the branch channel 20A and the former stage of the combined channel 40A with a relatively wide channel width, the parallelism with respect to the scale of the communication path can be increased while suppressing the performance degradation. Play.

さらに、パケット５０Ａに系統ＣＨを追加し、合流路４０Ａの出口側の合流・分岐回路で系統ＣＨの値に従って分岐させることにより、分流路２０Ａのどの入口ノードにリードパケットを供給すれば合流路４０Ａのどの出口ノードからリードデータパケットが得られるかが定まるので、合流路４０Ａから取り出されたデータの処理が容易になるという効果を奏する。 Further, by adding the system CH to the packet 50A and branching it according to the value of the system CH in the junction / branch circuit on the outlet side of the combined channel 40A, if the lead packet is supplied to which inlet node of the branch channel 20A, the combined channel 40A Since it is determined from which exit node the read data packet can be obtained, there is an effect that the processing of the data taken out from the combined channel 40A becomes easy.

図８は、入力ポート及び出力ポートの数を実施例２の場合の２倍にした、本発明の実施例３のメモリ１０Ｂを示す。 FIG. 8 shows the memory 10B according to the third embodiment of the present invention in which the number of input ports and output ports is doubled as compared with the second embodiment.

このメモリ１０Ｂでは、パケットの流れの方向の軸に関し構成が対称になるように、図６の構成にノードが追加されている。 In this memory 10B, nodes are added to the configuration of FIG. 6 so that the configuration is symmetric with respect to the axis of the packet flow direction.

すなわち、分流路２０Ｂの入力段に入口ノード２１３及び２１４が追加され、第２段にノード２２３Ａ及び２２４Ａが追加され、これらの間の接続が、ノード２１１及び２１２とノード２２１Ａ及び２２２Ａとの間の接続と同じになっている。また、分流路２０Ｂの第３段の各ノードも第２段と同様に２合流・２分岐回路にし、上記対称になるように第２段と第３段との間が接続されている。 That is, inlet nodes 213 and 214 are added to the input stage of the diversion channel 20B, nodes 223A and 224A are added to the second stage, and the connection between them is between the nodes 211 and 212 and the nodes 221A and 222A. It is the same as the connection. Similarly to the second stage, each node of the third stage of the branch channel 20B is also a two-merging / two-branch circuit, and the second stage and the third stage are connected so as to be symmetrical.

分流路２０Ｂを流れるパケットの経路は、実施例２の場合と同様に、行先アドレスＤＡのみにより定まる。したがって、ライトデータパケットについては新たな規則を設ける必要がない。 The path of the packet flowing through the branch path 20B is determined only by the destination address DA as in the second embodiment. Therefore, it is not necessary to provide a new rule for the write data packet.

合流路４０Ｂについても分流路２０Ｂと同様に、出力段にノード４６３Ａ及び４６４Ａが追加され、この後段にノード４５３Ａ及び４５４Ａが追加され、これらの間の接続が、ノード４６１Ａ及び４６２Ａとノード４５１Ａ及び４５２Ａとの間の接続と同じになっている。また、合流路４０Ｂのさらに後段（第４段）の各ノードも第５段と同様に２合流・２分岐回路にし、上記対称になるように第４段と第５段との間が接続されている。 Similarly to the branch flow path 20B, the combined flow path 40B includes nodes 463A and 464A added to the output stage, and nodes 453A and 454A added to the subsequent stage. The connection is the same. In addition, each node in the subsequent stage (fourth stage) of the combined flow path 40B is also a two-merging / two-branch circuit like the fifth stage, and the fourth stage and the fifth stage are connected so as to be symmetrical. ing.

図９（Ａ）は、パケット５０Ｂのフォーマットを示す。このパケット５０Ｂは、系統ＣＨが２ビットであり、他の点は図７（Ａ）と同一である。リードパケットの場合、パケットが入口ノード２１４〜２１１に供給されるとき、それぞれ系統ＣＨの値を０〜３とする。これにより、メモリ行アレイ３０から読み出されたリードデータパケットは、メモリ行アレイ３０に関し分流路２０Ｂでの経路と対称な経路を通ることになる。 FIG. 9A shows the format of the packet 50B. In this packet 50B, the system CH has 2 bits, and the other points are the same as those in FIG. In the case of a read packet, when the packet is supplied to the ingress nodes 214 to 211, the value of the system CH is set to 0 to 3, respectively. As a result, the read data packet read from the memory row array 30 passes through a path that is symmetric with respect to the path in the diversion channel 20B with respect to the memory row array 30.

図９（Ｂ）は、系統ＣＨが'０１'である場合に分流路２０Ｂの第１〜３段を通り得るリードパケットの経路と、読み出されたリードデータパケットが通り得る経路とを点線で示している。 In FIG. 9B, when the system CH is “01”, the path of the read packet that can pass through the first to third stages of the branch channel 20B and the path that the read data packet that has been read can pass are indicated by dotted lines. Show.

リードパケットの行先は、上述のように、系統ＣＨの値によらず行先アドレスＤＡの値のみで定まる。例えば、行先アドレスＤＡの上位２ビットが'１１'の場合、上述のように'１'で図１１（Ｂ）の上側へ分岐し'０'で下側へ分岐すると定めると、最上位ビットが'１'であるので入口ノード２１３からノード２２３Ａへ進み、次のビットが'１'であるのでノード２２３Ａから２３１Ａへ進む。 As described above, the destination of the read packet is determined only by the value of the destination address DA regardless of the value of the system CH. For example, when the upper 2 bits of the destination address DA are “11”, if it is determined that “1” branches to the upper side of FIG. 11B and “0” branches to the lower side as described above, the most significant bit is Since it is “1”, the process proceeds from the entry node 213 to the node 223A, and since the next bit is “1”, the process proceeds from the node 223A to 231A.

合流路４０Ｂでは、第４段まで合流はあっても分岐がないので、系統ＣＨや行先アドレスＤＡの値と無関係に経路が一意的に定まり、前記の場合、リードデータパケットはノード４４１Ａに到達する。 In the merge channel 40B, there is no branch even if it merges up to the fourth stage. Therefore, the path is uniquely determined regardless of the values of the system CH and the destination address DA. In this case, the read data packet reaches the node 441A. .

系統ＣＨが'０１'であり、この第２ビットが'０'であるので、ノード４４１Ａから４５３Ａへ進む。次に、第１ビットが'１'であるのでノード４５３Ａから４６３Ａへ進む。行先アドレスＤＡの上位２ビットが他の場合についても同様にして、リードデータパケットはノード４６３Ａに到達する。すなわち、合流路４０Ｂの第４〜６段での経路を系統ＣＨの値で定めると、メモリ行アレイ３０に関し分流路２０Ｂと合流路４０Ｂとで経路が対称になり、系統ＣＨが'０１'の場合には必ず、分流路２０Ｂの入口ノード２１３に対応した合流路４０Ｂのノード４６３Ａに到達する。 Since the system CH is “01” and the second bit is “0”, the process proceeds from the node 441A to 453A. Next, since the first bit is “1”, the process proceeds from the node 453A to the node 463A. The read data packet arrives at the node 463A in the same manner when the upper 2 bits of the destination address DA are in other cases. That is, when the route in the fourth to sixth stages of the combined channel 40B is determined by the value of the system CH, the path is symmetrical between the branch channel 20B and the combined channel 40B with respect to the memory row array 30, and the system CH is “01”. In some cases, it always reaches the node 463A of the combined channel 40B corresponding to the inlet node 213 of the branch channel 20B.

図９（Ｃ）は、系統ＣＨが'１１'である場合に分流路２０Ｂの第１〜３段を通り得るリードパケットの経路と、読み出されたリードデータパケットが通り得る経路とを点線で示している。 In FIG. 9C, the path of the read packet that can pass through the first to third stages of the branch channel 20B when the system CH is “11” and the path that the read data packet that has been read can be shown by dotted lines. Show.

本実施例３によれば、上記実施例２の構成を少し変えただけで上記実施例２で述べた効果がさらに高められる。 According to the third embodiment, the effect described in the second embodiment can be further enhanced by slightly changing the configuration of the second embodiment.

また、４系統で分流路２０Ｂの流路幅が比較的広い第４〜６段及び合流路４０Ｂの流路幅が比較的広い第１〜４段のノードを共用するので、パフォーマンス低下を抑制しつつ通信路の規模に対する並列度を高くすることができるという効果を奏する。 In addition, since the 4th to 6th stage nodes having a relatively wide channel width of the branch channel 20B and the 1st to 4th stage nodes having a relatively wide channel width of the combined channel 40B are shared in four systems, the performance degradation is suppressed. However, the parallelism with respect to the size of the communication path can be increased.

図１０は、パイプライン段数を低減した、本発明の実施例４のメモリ１０Ｃを示す。 FIG. 10 shows a memory 10C according to the fourth embodiment of the present invention in which the number of pipeline stages is reduced.

分流路２０Ｃでは、第３段の入力まで、図８の分流路２０Ｂのそれと同一である。分流路２０Ｂとの相違点は、第３段の各ノード及び第４段の各ノードの出力が４分岐となっている点である。これにより、分流路２０Ｂが６段パイプラインであるのに対し分流路２０Ｃは４段パイプラインとなる。合流路４０Ｃは、メモリ行アレイ３０に関し分流路２０Ｃと対称にし且つデータ流の方向を逆にした構成であり、４段パイプラインである。 In the diversion channel 20C, the input up to the third stage is the same as that of the diversion channel 20B in FIG. The difference from the branch flow path 20B is that the output of each node in the third stage and each node in the fourth stage has four branches. As a result, the branch channel 20B is a six-stage pipeline, whereas the branch channel 20C is a four-stage pipeline. The combined flow path 40C is configured to be symmetric with respect to the branch flow path 20C with respect to the memory row array 30 and the direction of the data flow is reversed, and is a four-stage pipeline.

実施例３の場合と同様に、分流路２０Ｃでのパケットの経路は、入口ノードが決まると、パケットの経路は行先アドレスのみで定まり、合流路４０Ｄについては、選択的分岐出力を持つノードからのパケット経路は、系統により定まる。 As in the case of the third embodiment, when the entry node is determined for the packet path in the branch channel 20C, the packet path is determined only by the destination address, and the combined channel 40D is sent from the node having the selective branch output. The packet path is determined by the system.

ノード入力端での合流の数が増えると、先着優先の選択的合流であるので、同一の合流ノードに転送されるパケット数が多くなると、転送待ちが生ずる。しかしながら、パケットが混雑していない時には、パイプライン段数が少ないので、レイテンシを短縮することができる。 If the number of merging at the node input end increases, it is a selective merging with the first priority, so if the number of packets transferred to the same merging node increases, transfer waiting occurs. However, when the packet is not congested, the number of pipeline stages is small, so that the latency can be shortened.

ライトパケットのようにメモリ行アレイ３０への書き込みが１ワードで完了する場合には分流路２０Ｃの出口ノードでの待ち時間が比較的短いので効果的である。これに対し、リードデータパケットは、メモリ行３１から８ワードのデータが順次読み出されるので、合流路４０Ｃの入力ノードにおいて、他のメモリ行３１から同一入口ノードへの待ち時間が比較的長くなる。これを避けるためには、合流路４０Ｃの代わりに合流路４０Ｂを用いればよい。すなわち、分流路２０Ｃと合流路４０Ｃとを組み合わせればよい。 When writing to the memory row array 30 is completed in one word like a write packet, it is effective because the waiting time at the exit node of the branch channel 20C is relatively short. On the other hand, in the read data packet, since 8 words of data are sequentially read from the memory row 31, the waiting time from the other memory row 31 to the same entrance node becomes relatively long at the input node of the joint channel 40C. In order to avoid this, the combined flow path 40B may be used instead of the combined flow path 40C. That is, the diversion channel 20C and the combined flow channel 40C may be combined.

図１１は、選択的合流ノードへの転送待ちを短縮した、本発明の実施例５の２ポート入力・２ポート出力型のメモリ１０Ｄを示す。 FIG. 11 shows a two-port input / two-port output type memory 10D according to the fifth embodiment of the present invention in which the waiting time for transfer to the selective merging node is shortened.

図６の分流路２０Ａにおいて、選択的合流は第２段のノード２２１Ａ及び２２２Ａであり、第１段で待ちが生ずる。 In the diversion channel 20A of FIG. 6, the selective merge is the second stage nodes 221A and 222A, and a wait occurs in the first stage.

そこで、分流路２０Ｄでは、第２段において選択的合流が生じないように、第２段にノード２２３及び２２４を追加している。ノード２２３からノード２３１Ａ又は２３３Ａへ分岐して合流し、ノード２２４からノード２３５Ａ又は２３７Ａへ分岐して合流する。 Therefore, in the shunt channel 20D, nodes 223 and 224 are added to the second stage so that selective merging does not occur in the second stage. The node 223 branches to the node 231A or 233A and merges, and the node 224 branches to the node 235A or 237A to merge.

これにより、第３段の各ノードが選択的合流になるが、ノード数が４であるので、図６の分流路２０Ａの第１段でのパケット転送平均待ち時間よりも、分流路２０Ｄの第２段でのそれのほうが約半分になり、パケットの停滞を低減してスループットを向上させることができる。 As a result, each node in the third stage is selectively merged, but since the number of nodes is 4, the number of nodes in the shunt 20D is larger than the packet transfer average waiting time in the first stage of the shunt 20A in FIG. It is about half that of the two stages, so that packet stagnation can be reduced and throughput can be improved.

他の点は、実施例２と同一である。 The other points are the same as those in the second embodiment.

図１２は、入力ポート及び出力ポートの数を実施例５の場合の２倍にした、本発明の実施例６のメモリ１０Ｅを示す。 FIG. 12 shows a memory 10E according to the sixth embodiment of the present invention in which the number of input ports and output ports is doubled as compared with the fifth embodiment.

このメモリ１０Ｅでは、パケットの流れの方向の軸に関し構成が対称になるように、図１１の構成にノードが追加されている。 In the memory 10E, nodes are added to the configuration of FIG. 11 so that the configuration is symmetric with respect to the axis of the packet flow direction.

すなわち、分流路２０Ｅの入力段に入口ノード２１３及び２１４が追加され、第２段にノード２２５〜２２８が追加され、第３段に１つおきにノード２３２Ａ、２３４Ａ、２３６Ａ及び２３８Ａが追加され、これらとノード２２５〜２２８との間の接続が、図１１の分流路２０Ｄの第２段と第３段との間の接続と同じ形になっている。また、分流路２０Ｅの第４段の各ノードも第３段と同様に２合流・２分岐回路にし、上記対称になるように第３段と第４段との間が接続されている。 That is, inlet nodes 213 and 214 are added to the input stage of the diversion channel 20E, nodes 225 to 228 are added to the second stage, and nodes 232A, 234A, 236A, and 238A are added to the third stage, and The connection between these and the nodes 225 to 228 has the same shape as the connection between the second stage and the third stage of the diversion channel 20D of FIG. Further, each node of the fourth stage of the diversion channel 20E is also a two-merging / two-branch circuit like the third stage, and the third stage and the fourth stage are connected so as to be symmetrical.

合流路４０Ｅは、メモリ行アレイ３０に関し分流路２０Ｅと対称にし且つデータ流の方向を逆にした構成である。 The combined flow path 40E is configured to be symmetric with respect to the branch flow path 20E with respect to the memory row array 30 and to reverse the direction of data flow.

実施例５の場合と同様に、分流路２０Ｅでのパケットの経路は、入口ノードが決まると、パケットの経路は行先アドレスのみで定まり、合流路４０Ｅについては、選択的分岐出力を持つノードからのパケット経路は、系統により定まる。 As in the case of the fifth embodiment, the packet path in the branch path 20E is determined only by the destination address when the entry node is determined, and the combined path 40E is sent from the node having the selective branch output. The packet path is determined by the system.

本実施例６によれば、上記実施例５の構成を少し変えただけで上記実施例５で述べた効果がさらに高められる。 According to the sixth embodiment, the effect described in the fifth embodiment can be further enhanced by slightly changing the configuration of the fifth embodiment.

ＣＰＵでは一般に、２つのオペランドに対して処理を行う命令が多数有る。パイプライン段数が多いとレイテンシが長くなるが、１つのＣＰＵコアで時分割ｎ並列処理を行う場合、同期型では、切替時間がゼロであると仮定しても各処理の速度が１／ｎとなるので、例えばサーバーコンピュータのように並列度が高い場合には、非同期型の方が有利となる。 In general, a CPU has a large number of instructions for processing two operands. When the number of pipeline stages is large, the latency becomes long. However, when time-shared n parallel processing is performed by one CPU core, the speed of each processing is 1 / n even if the switching time is assumed to be zero in the synchronous type. Therefore, for example, when the degree of parallelism is high as in a server computer, the asynchronous type is more advantageous.

図１３は、本発明が適用された実施例７の、このような用途に用いて好適なプロセッサの一部であるデータ処理部１０ＡＰを示す概略ブロック図である。 FIG. 13 is a schematic block diagram showing a data processing unit 10AP which is a part of a processor suitable for use in the seventh embodiment to which the present invention is applied.

この図に太線で示すように、合流路４０ＡＰのどの出口ノードにパケットが到達するかは系統値のみにより定まるので、同一系統に複数のパケットを連続して分流路２０Ａの入口ノードに供給することにより、これに対応したデータパケットを複数、合流路４０ＡＰの同一出口ノードに集めることができる。すなわち、系統値を同一にすることにより、出口ノードで複数のパケットの待ち合わせを自動的に行うことができる。 As indicated by the bold line in this figure, the exit node of the combined flow path 40AP is determined only by the system value, so that a plurality of packets are continuously supplied to the entrance node of the branch path 20A in the same system. Thus, a plurality of data packets corresponding to this can be collected at the same exit node of the combined channel 40AP. In other words, by making the system values the same, a plurality of packets can be automatically queued at the egress node.

そこで、合流路４０ＡＰでは、出口ノード４６１ＡＰ〜４６４ＡＰのそれぞれに、処理要素を備えている。各処理要素での処理内容は、同一であっても、系統値により定まるものであってもよい。処理要素は、高機能であっても低機能であってもよい。３０Ｒは、レジスタファイルとして用いられる。レジスタファイル３０Ｒを、これら処理要素で共有する領域と個々に専用する領域とに、自由に分割することができる。 Therefore, in the joint channel 40AP, each of the exit nodes 461AP to 464AP includes a processing element. The processing content in each processing element may be the same or may be determined by the system value. The processing element may have a high function or a low function. 30R is used as a register file. The register file 30R can be freely divided into an area shared by these processing elements and an area dedicated to each.

図１４（Ａ）は、コマンドＣＭＤを含む第１オペランドパケットＰ１と、第２オペランドパケットＰ２とを順次入口ノード２１１に投入したときに、これらに対応したパケットＰ１Ａ及びＰ２Ａがノード４６１Ｐに到達し、その処理要素により結果パケットＰ３が得られる場合を示している。第１オペランドパケットＰ１又は／及び第２オペランドパケットＰ２は、順次供給される複数のパケットであってもよい。 In FIG. 14A, when the first operand packet P1 including the command CMD and the second operand packet P2 are sequentially input to the ingress node 211, the corresponding packets P1A and P2A arrive at the node 461P. The case where the result packet P3 is obtained by the processing element is shown. The first operand packet P1 and / or the second operand packet P2 may be a plurality of packets that are sequentially supplied.

このパケットＰ３が、図１４（Ｂ）に示すパケットＰ１Ｎ及びＰ２Ｎのように、次のステップのパケットＰ１とＰ２とに対応したものである場合、これらを、ノード４７を介し入口ノード２１１にフィードバックさせることにより、処理を連続的に高速に行うことができる。ノード４７は、ノード４６１Ｐが出力したパケットＰ１Ｎに基づいて、処理が完了したか否かを判定し、肯定判定した場合には結果を出力し、パケットに含まれる処理モード（又はＣＭＤ）に基づいて、処理を打ち切り又は継続する。 When this packet P3 corresponds to the packets P1 and P2 in the next step as in the packets P1N and P2N shown in FIG. 14B, these are fed back to the ingress node 211 via the node 47. Thus, the processing can be continuously performed at high speed. The node 47 determines whether or not the processing is completed based on the packet P1N output from the node 461P. If the determination is affirmative, the node 47 outputs the result, and based on the processing mode (or CMD) included in the packet. , Abort or continue processing.

分流路２０Ａは、デコーダとして機能するとともに、キューとしても機能する。また、合流路４０ＡＰの出口ノード以外のノードは、同一系統の処理要素へパケットを集配するとともに、キューとしても機能する。したがって、入口ノード２１１〜２１４にパケットが不定期に供給され、且つ、その平均時間が出口ノード４６１ＡＰ〜４６４ＡＰに備えられた処理要素の処理時間にほぼ等しい場合には、データ処理部１０ＡＰの外部にキューを設けることなく、効率よく処理を行うことができる。この平均時間は、入口ノード２１１〜２１４にパケットを供給する回路又は装置の並列度を調整することにより、適正な値に変更可能である。 The branch channel 20A functions as a decoder and also as a queue. Further, nodes other than the exit node of the combined channel 40AP collect and deliver packets to the processing elements of the same system, and also function as queues. Therefore, when packets are irregularly supplied to the ingress nodes 211 to 214 and the average time is approximately equal to the processing time of the processing elements provided in the egress nodes 461AP to 464AP, the data processing unit 10AP is externally connected. Processing can be performed efficiently without providing a queue. This average time can be changed to an appropriate value by adjusting the degree of parallelism of the circuits or devices that supply packets to the ingress nodes 211 to 214.

また、１つのリードパケットに対しレジスタファイル３０Ｒから複数パケットが読み出される場合にも、合流路４０ＡＰの出口ノード以外のノードはこれらに対するキューとして機能し、キューを新たに設けることなく、効率よく処理を行うことができる。 In addition, even when a plurality of packets are read from the register file 30R for one read packet, the nodes other than the exit node of the merge channel 40AP function as a queue for these, and efficiently process without providing a new queue. It can be carried out.

したがって、データ処理部１０ＡＰ内の段数が比較的多くても、逆に利点となる場合がある。 Therefore, even if the number of stages in the data processing unit 10AP is relatively large, it may be advantageous.

並列度が高いと多数のデータを同時に使用するが、本実施例７のデータ処理部１０ＡＰによれば、比較的多数のレジスタを複数の処理要素において選択的に利用でき、かつ、実施例１で述べたように高スループットでランダムアクセスができるので、効率よく処理を行うことができるという効果を奏する。 When the degree of parallelism is high, a large number of data is used simultaneously. According to the data processing unit 10AP of the seventh embodiment, a relatively large number of registers can be selectively used in a plurality of processing elements. As described above, since random access can be performed with high throughput, there is an effect that processing can be performed efficiently.

また、従来ではＦＩＦＯメモリ、ハッシュメモリ、連想メモリ、演算部及び制御部等を備えたマッチングメモリで同一カラーのパケットを待ち合わせて処理要素で処理を行っていたので、構成が複雑であるとともに、処理が遅延してスループットが低下する原因となっていたが、本実施例７では、パケットペアが連続して合流するのでマッチングメモリを用いる必要が無く、構成が簡単になるとともにスループットが高くなるという効果を奏する。 Conventionally, a matching memory equipped with a FIFO memory, a hash memory, an associative memory, a calculation unit, a control unit, and the like waits for packets of the same color and processes them with processing elements. However, in the seventh embodiment, since packet pairs are continuously joined, there is no need to use a matching memory, and the configuration is simplified and the throughput is increased. Play.

図１４（Ａ）において、パケットＰ１に対しレジスタファイル３０Ｒから読み出されるデータが例えば上述のリードパケットのように８ワードである場合、通信路でデータが混雑する。この場合、演算結果のパケット数が少なければその下流側のデータ混雑度を低減することができる。 In FIG. 14A, when the data read from the register file 30R for the packet P1 is, for example, 8 words as in the above-described read packet, the data is congested on the communication path. In this case, if the number of operation result packets is small, the data congestion degree on the downstream side can be reduced.

そこで、本発明のプロセッサのデータ処理部１０ＢＰでは、図１５において、合流路４０ＢＰの各ノードに処理要素を備え、パケットＰ１とＰ２（図１４）とが合流路４０ＢＰ上で合流したノードにおいて演算を行い、その結果を下流側に転送させる。 Therefore, in the data processing unit 10BP of the processor of the present invention, in FIG. 15, each node of the combined channel 40BP includes a processing element, and the calculation is performed at the node where the packets P1 and P2 (FIG. 14) merge on the combined channel 40BP. And transfer the result downstream.

図１５に示す太線は、第１系統と第４系統でのパケットペアの経路を示す。これら経路は、行先アドレスＤＡと系統ＣＨとで定まる。レジスタファイル３０Ｒに関し分流路２０Ａ上の経路と合流路４０ＢＰ上の経路とが論理的に対称になるように系統ＣＨを定めれば、行先アドレスＤＡのみで合流路４０ＡＰ上の合流点が定まる。この合流点に対応する分流路２０Ａ上の分岐点は、パケットペアの行先アドレスＤＡの最上位ビットからの一致ビット数により定まる。 The thick line shown in FIG. 15 indicates the route of the packet pair in the first system and the fourth system. These routes are determined by the destination address DA and the system CH. If the system CH is determined so that the path on the branch channel 20A and the path on the merge channel 40BP are logically symmetric with respect to the register file 30R, the merge point on the merge channel 40AP is determined only by the destination address DA. The branch point on the branch path 20A corresponding to this junction is determined by the number of coincidence bits from the most significant bit of the destination address DA of the packet pair.

例えば図１５の上側のパケットペア経路Ｔ１及びＴ２の行先アドレスビットＤＡ１及びＤＡ２の一致部は、図１６（Ａ）に示すように上位２ビットの'１０'である。一致ビット数をｉで表すと、分流路２０Ａ上の第（ｉ＋１）段でパケットペアが分岐し、合流路４０ＡＰ上の第（６−ｉ）段でパケットペアが合流する。 For example, the matching part of the destination address bits DA1 and DA2 of the upper packet pair paths T1 and T2 in FIG. 15 is “10” of the upper 2 bits as shown in FIG. When the number of coincidence bits is represented by i, the packet pair branches at the (i + 1) -th stage on the branch channel 20A, and the packet pair joins at the (6-i) -th stage on the junction path 40AP.

パケット合流段識別子を、図１５中に示すように上記ｉの値で表し、合流路４０ＡＰ上の各ノードに、固定した合流段識別子の値を持たせ、これがパケットペアの合流段識別子ＭＡと一致する場合、そのノードの処理要素でパケットペアに対する処理を行う。図１６（Ｂ）は、合流段識別子ＭＡを含むパケットのフォーマットを示す。図１５中に示すように、後の実施例で用いる分流路２０Ａ上での分岐段ＩＤも、上記ｉの値で表す。 The packet merge stage identifier is represented by the value i as shown in FIG. 15, and each node on the merge channel 40AP has a fixed merge stage identifier value, which matches the merge stage identifier MA of the packet pair. If so, the packet pair is processed by the processing element of the node. FIG. 16 (B) shows the format of a packet including the merge stage identifier MA. As shown in FIG. 15, the branch stage ID on the branch channel 20 </ b> A used in a later embodiment is also represented by the value i.

合流段識別子ＭＡの決定は、分流路２０Ａの後段側のノード２０１〜２０４において行う。図１４（Ｂ）に示すようにループを構成する場合には、ノード２０１〜２０４の配設位置は、合流路４０ＡＰの下流側であってもよい。 The determination of the merge stage identifier MA is performed in the nodes 201 to 204 on the rear stage side of the branch channel 20A. In the case where a loop is configured as shown in FIG. 14B, the arrangement positions of the nodes 201 to 204 may be downstream of the combined flow path 40AP.

図１７は、ノード２０１の構成を示す概略ブロック図である。 FIG. 17 is a schematic block diagram illustrating the configuration of the node 201.

このノード２０１は、ラッチ２０１Ｌと転送制御回路２０１Ｃとの基本構成のほかに、パケットペア判定部２０１Ｐと合流段ＩＤ決定部２０１Ｆとを備えている。 In addition to the basic configuration of the latch 201L and the transfer control circuit 201C, the node 201 includes a packet pair determination unit 201P and a merging stage ID determination unit 201F.

パケットペア判定部２０１Ｐは、ラッチ２０１Ｌの出力及びその後段のデータ出力が確定している場合、すなわちノード２０１から出力されるＳＥＮＤ２及び転送制御回路２０１Ｃに供給されるＳＥＮＤ１が同時にアクティブである場合、ラッチ２０１Ｌの下流側及び上流側のパケットに含まれるパケットタイプＰＴ（図１６（Ｂ））がそれぞれ第１オペランドパケット及び第２オペランドパケットであることを示していれば、パケットペアであると判定する。 The packet pair determination unit 201P latches when the output of the latch 201L and the subsequent data output are fixed, that is, when SEND2 output from the node 201 and SEND1 supplied to the transfer control circuit 201C are simultaneously active. If the packet type PT (FIG. 16B) included in the downstream and upstream packets of 201L indicates the first operand packet and the second operand packet, respectively, it is determined that the packet is a packet pair.

合流段ＩＤ決定部２０１Ｆは、この判定に応答して、両パケットの行先アドレスＤＡに基づき、上述のようにして合流段識別子ＭＡを決定し、これを下流側ラッチに供給することにより、合流段識別子ＭＡを図１５のノード２１１のラッチに取り込ませ保持させる。 In response to this determination, the merging stage ID determination unit 201F determines the merging stage identifier MA as described above based on the destination addresses DA of both packets, and supplies the merging stage identifier MA to the downstream latch. The identifier MA is taken in and held in the latch of the node 211 in FIG.

リードパケットのペア（第１オペランドパケット及び第２オペランドパケット）に対し、第１パケットに含まれるコマンドに応じた処理を行う場合、それぞれのリードパケットは、図１５の分流路２０Ａにおいて、図１６（Ｂ）に示すデータフィールドデータＤＡＴＡが空きになっている。一方、分流路２０Ｂでは入口ノード側の流路幅が比較的狭いので、パケット数が多いと混雑し易い。また、パケットペアを順次分流路２０Ｂに供給しても、合流ノードでは先着優先であるので、パケットペア間に他のパケットが割り込むことが考えられる。 When processing according to a command included in the first packet is performed on a pair of read packets (first operand packet and second operand packet), each read packet is shown in FIG. The data field data DATA shown in B) is empty. On the other hand, since the flow path width on the entrance node side is relatively narrow in the branch path 20B, it is likely to be crowded if the number of packets is large. Further, even if packet pairs are sequentially supplied to the branch path 20B, the first node is given priority at the joining node, so it is conceivable that another packet may interrupt between the packet pairs.

そこで、リードパケットのペアを、図１８（Ａ）に示すように１パケットに圧縮する。図中、アドレスＡＤＲ１及びＡＤＲ２は、それぞれ第１オペランドアドレス及び第２オペランドアドレスである。これらの上位側ビットは、行先アドレスＤＡ１及びＤＡ２を除き、圧縮前の第１及び第２オペランドパケットに共通のフィールドであり、これにより圧縮率が高くなる。 Therefore, the read packet pair is compressed into one packet as shown in FIG. In the figure, addresses ADR1 and ADR2 are a first operand address and a second operand address, respectively. These high-order bits are fields common to the first and second operand packets before compression, except for the destination addresses DA1 and DA2, thereby increasing the compression rate.

図１８（Ａ）において、アドレスＡＤＲ１及びＡＤＲ２の上位ビットである行先アドレスＤＡ１及びＤＡ２がそれぞれページアドレスＰＡ１及びＰＡ２から離れた位置にあるのは、パケットをその先頭側の通信路層制御データとそれ以外の機能モジュール層データとに分けた為である。通信路層制御データは通信路のみで用いられ、機能モジュール層データは、機能モジュールとしてのレジスタファイル３０Ｒ、及び合流路４０ＢＰ上の各ノードに含まれる処理要素で用いられる。圧縮パケットの行先アドレスはＤＡ１とＤＡ２の上位側一致ビットであるので、これらの一方のみでよいが、上述のノード２０１〜２０４で用いられるので、両方とも通信路層制御データとしている。 In FIG. 18A, the destination addresses DA1 and DA2, which are the upper bits of the addresses ADR1 and ADR2, are located at positions away from the page addresses PA1 and PA2, respectively. This is because it is divided into functional module layer data other than. The communication path layer control data is used only in the communication path, and the functional module layer data is used in a register file 30R as a functional module and a processing element included in each node on the combined path 40BP. Since the destination address of the compressed packet is the higher-order coincidence bit of DA1 and DA2, only one of them is sufficient, but since it is used in the nodes 201 to 204 described above, both are used as communication path layer control data.

ここで、アドレスＡＤＲ１とＡＤＲ２とは、行先アドレスＤＡが一般に異なるので、図１５において、分流路２０Ａ上のパケット経路Ｔ１及びＴ２の分岐点で、圧縮パケットをパケットペアに伸張する必要がある。どの段で分岐するかは、上述のようにノード２０１〜２０４で決定される合流段識別子ＭＡ（＝分岐段識別子）の値により定まる。 Here, since the addresses ADR1 and ADR2 are generally different in destination address DA, in FIG. 15, it is necessary to expand the compressed packet into a packet pair at a branch point of the packet paths T1 and T2 on the branch channel 20A. Which stage branches is determined by the value of the joining stage identifier MA (= branch stage identifier) determined by the nodes 201 to 204 as described above.

そこで、分流路２０Ａの各ノードに、圧縮パケットをパケットペアに伸張する機能を備え、そのノードに、固定の分岐段識別子を割り当てておき、パケット内の合流段識別子ＭＡ（＝分岐段識別子）の値が該ノードの分岐段識別子に一致したときに、圧縮パケットをパケットペアに伸張する。 Therefore, each node of the branch channel 20A has a function of expanding the compressed packet into a packet pair, and a fixed branch stage identifier is assigned to the node, and the merge stage identifier MA (= branch stage identifier) in the packet is assigned. When the value matches the branch stage identifier of the node, the compressed packet is expanded into a packet pair.

図１８（Ａ）の圧縮パケット５０Ｄをパケットペアに伸張したパケット５０Ｅ及び５０Ｆをそれぞれ図１８（Ｂ）及び（Ｃ）に示す。パケット５０Ｅは、パケット５０Ｄをそのまま用いることができる。したがって、最初はパケット５０Ｄをコピーしたものをパケット５０Ｅとして次段へ転送させる。次いで、パケット５０Ｄ内の行先アドレスビットＤＡ１、ページアドレスＰＡ１及びワードアドレスＷＡ１をそれぞれ行先アドレスビットＤＡ２、ページアドレスＰＡ２及びワードアドレスＷＡ２に書き換えてこれをパケット５０Ｆとし、次段へ転送させる。 Packets 50E and 50F obtained by decompressing the compressed packet 50D of FIG. 18A into packet pairs are shown in FIGS. 18B and 18C, respectively. The packet 50E can use the packet 50D as it is. Accordingly, a copy of the packet 50D is first transferred to the next stage as a packet 50E. Next, the destination address bit DA1, the page address PA1, and the word address WA1 in the packet 50D are rewritten to the destination address bit DA2, the page address PA2, and the word address WA2, respectively, so that the packet 50F is transferred to the next stage.

次に、レジスタファイル３０Ｒから１ページ分読み出したリードデータパケット及びレジスタファイル３０Ｒへの１ページ分のライトパケットについては、いずれも先頭パケットのフォーマットをパケット５０Ｅと同一にし、これに、図１８（Ｄ）に示すフォーマットのパケット５０Ｇを８個連接させる。そして、パケット５０Ｅの軌跡に沿ってパケット５０Ｇを転送させ、その各ノードで行き先方向を切り替えないことにより、転送中に他のパケットに割り込まれないようにして、これら９パケットを連続させる。 Next, for the read data packet read for one page from the register file 30R and the write packet for one page to the register file 30R, the format of the top packet is the same as that of the packet 50E, and FIG. The eight packets 50G having the format shown in FIG. Then, the packet 50G is transferred along the trajectory of the packet 50E, and the destination direction is not switched at each node, so that these nine packets are made continuous without being interrupted by other packets during the transfer.

このような転送を可能にするために、一方では、各パケットに１ビットの連接ビットＣＮを備える。連接ビットＣＮが'１'のとき、これに後続するパケットが有ることを示し、'０'のとき、無いことを示す。図１９（Ａ）及び（Ｂ）はそれぞれ、パケットタイプＰＴが'０'の先頭パケットである第１オペランドパケット（レジスタファイル３０Ｒ内でコピーされた第１オペランドパケット）及びこれに続く、読み出された８ワードのリードデータパケットを示す。図１９（Ｃ）及び（Ｄ）はそれぞれ、パケットタイプＰＴが'１'の先頭パケットである第２オペランドパケット（レジスタファイル３０Ｒ内でコピーされた第２オペランドパケット）及びこれに続く、読み出された８ワードのリードデータパケットを示す。 In order to allow such a transfer, on the one hand, each packet is provided with one concatenated bit CN. When the concatenated bit CN is “1”, it indicates that there is a subsequent packet, and when it is “0”, it indicates that there is no packet. FIGS. 19A and 19B are respectively read out from the first operand packet (the first operand packet copied in the register file 30R) which is the first packet having the packet type PT of “0” and the subsequent packets. And an 8-word read data packet. FIGS. 19C and 19D respectively read the second operand packet (second operand packet copied in the register file 30R) which is the first packet having the packet type PT of “1” and the subsequent packets. And an 8-word read data packet.

なお、順序ビットＯＤの値は、分流路４０ＢＰを出た後に順序維持を必要とするか否かにより、機能エレメントアレイ３０Ｒにおいて決定される。 The value of the order bit OD is determined in the functional element array 30R depending on whether or not the order needs to be maintained after leaving the branch flow path 40BP.

他方では、合流路４０ＢＰ上の各ノードに、連接ビットＣＮに対応したフリップフロップ（ノード側連接ビット）を備えておき、このフリップフロップの状態を次のように制御する。 On the other hand, each node on the joint channel 40BP is provided with a flip-flop (node-side connection bit) corresponding to the connection bit CN, and the state of this flip-flop is controlled as follows.

図２１は、合流路４０ＢＰ上の入口ノード及び出口ノード以外の任意の合流ノードＮ１のノード側連接ビットＦ１に対する状態制御回路４７とこれに関連する要素を示すブロック図である。合流ノードＮ１の後段のノードＮ０１及びＮ０２並びに前段のノードＮ２のフリップフロップをそれぞれＦ０１、Ｆ０２及びＦ０３と表記する。 FIG. 21 is a block diagram showing a state control circuit 47 for the node-side connection bit F1 of any merging node N1 other than the inlet node and the outlet node on the merging channel 40BP and elements related thereto. The flip-flops of the nodes N01 and N02 at the subsequent stage of the merging node N1 and the node N2 at the preceding stage are denoted as F01, F02, and F03, respectively.

合流ノードＮ１は、フリップフロップＦ０１が'１'であれば先着優先の例外として、ノードＮ０１からのパケットを優先的に選択してラッチし、フリップフロップＦ０２が'１'であれば先着優先の例外として、ノードＮ０２からのパケットを優先的に選択してラッチする。 The joining node N1 preferentially selects and latches a packet from the node N01 as a first-priority exception if the flip-flop F01 is “1”, and a first-priority exception if the flip-flop F02 is “1”. Then, the packet from the node N02 is preferentially selected and latched.

状態制御回路４７は以下のようにノード側連接ビットＦ１の状態を制御し、これにより、フリップフロップＦ０１及びＦ０２のうち一方が先に'１'になっているときに他方が後から'１'にならないようにする。 The state control circuit 47 controls the state of the node-side connection bit F1 as follows, so that when one of the flip-flops F01 and F02 is first “1”, the other is later “1”. Do not become.

（１）状態制御回路４７は、フリップフロップＦ２が'０'であり、ノードＮ１がラッチしたパケットの連接ビットＣＮが'１'である場合、フリップフロップＦ１を'１'にする。 (1) The state control circuit 47 sets the flip-flop F1 to “1” when the flip-flop F2 is “0” and the concatenated bit CN of the packet latched by the node N1 is “1”.

（２）状態制御回路４７は、ノードＮ１がラッチしたパケットの連接ビットＣＮが'０'であれば、フリップフロップＦ０１及びＦ０２を'０'にする。 (2) If the concatenated bit CN of the packet latched by the node N1 is “0”, the state control circuit 47 sets the flip-flops F01 and F02 to “0”.

（３）状態制御回路４７は、ノードＮ１がラッチしたパケットの連接ビットＣＮが'０'であり、ノードＮ１の合流段識別子がノードＮ１に保持されているパケットの合流段識別子ＭＡに一致していれば、フリップフロップＦ１を'０'にする。 (3) In the state control circuit 47, the concatenated bit CN of the packet latched by the node N1 is “0”, and the joining stage identifier of the node N1 matches the joining stage identifier MA of the packet held in the node N1. Then, the flip-flop F1 is set to “0”.

合流路４０ＢＰ上の入口ノードのフリップフロップＦ１に対する状態制御回路４７は、上記（１）及び（３）のみの処理を行う。合流路４０ＢＰ上の出口ノードのフリップフロップＦ１に対する状態制御回路４７は、上記（２）及び（３）の処理を行い、上記（１）について、フリップフロップＦ２が'０'であるとみなした処理を行う。 The state control circuit 47 for the flip-flop F1 at the inlet node on the combined flow path 40BP performs only the processes (1) and (3). The state control circuit 47 for the flip-flop F1 at the exit node on the combined flow path 40BP performs the above processes (2) and (3), and regarding the above (1), the flip-flop F2 is regarded as “0”. I do.

図２０は、このようにしてセットされたフリップフロップをノード上の'１'で示す。 FIG. 20 shows the flip-flop set in this way by “1” on the node.

各処理要素は処理対象である９ワード×２のパケットを保持するキューを備えており、上述の制御により、２組の連接パケットの合流ノードでは、先着優先によりフリップフロップが先に'１'になった方のノードからの９パケットを連続して取り込み保持し、次いで他方のノードのフリップフロップが'１'になって、このノードからの９パケットを連続して取り込み保持することができ、２組の連接パケットの一方が他方に混入したり他のパケットが連接パケットに混入したりするのを防止することができる。 Each processing element has a queue for holding a packet of 9 words × 2 to be processed. At the joining node of two sets of concatenated packets, the flip-flop is set to “1” first on a first-come-first-served basis. Nine packets from the new node can be captured and held continuously, and then the flip-flop of the other node becomes '1', so that 9 packets from this node can be captured and held continuously. It is possible to prevent one of a set of concatenated packets from being mixed into the other or another packet from being mixed into a concatenated packet.

すなわち、第１オペランドの９パケットと第２オペランドの９パケットとがそれぞれ連接したものとなり、かつ、両者間が連接したものとなり、これらが処理要素に保持されて処理される。この処理要素で、処理結果が第１オペランドパケットと第２オペランドパケットとの２個になるとすると、処理結果を上述のように圧縮して１パケット化することにより、後流側でのパケットの混雑を避けるとともに、パケットに割り込みが生じないようにすることができる。 That is, 9 packets of the first operand and 9 packets of the second operand are connected to each other, and are connected to each other, and these are held in the processing element and processed. In this processing element, if there are two processing results, ie, the first operand packet and the second operand packet, the processing result is compressed as described above into one packet, thereby congesting packets on the downstream side. Can be avoided, and the packet can be prevented from being interrupted.

分流路２０Ａ上の９連接ライトパケットに関しても、リードデータパケットの場合と同様にして、フリップフロップが'１'のノードを通ってデータパケットを転送させる。この場合、分流ノードでは連接パケットへの割り込みが生じないので、その状態制御回路は上記合流ノードのそれよりも簡単になる。なお、ライトパケットに関しては、パケット間の演算を行わないので、連接ビットを用いずに、図１８（Ｂ）の下位１１ビットを３２ビットに変更し、パケット単位でライト処理を行うようにしてもよい。 Similarly to the case of the read data packet, the flip-flop causes the data packet to be transferred through the node “1” with respect to the 9-connected write packet on the branch channel 20A. In this case, since the interrupt to the concatenated packet does not occur in the shunt node, the state control circuit becomes simpler than that of the junction node. As for write packets, since computation between packets is not performed, the lower 11 bits in FIG. 18B are changed to 32 bits without using concatenated bits, and write processing may be performed in units of packets. Good.

連接パケットに関しては、上記構成により連接パケット内でその順序が保たれる。 With respect to the concatenated packet, the order is maintained in the concatenated packet by the above configuration.

しかしながら、シングルパケット同士、連接パケット同士及びシングルパケットと連接パケットとの間では、先着優先であるので、同一系統であっても場所によるパケットの混み具合により、合流路の出力ノードでのパケット順序が分流路の入力ノードでのパケット順序と同一になるとは限らない。異なる系統間では、分流路の入り口ノード及び合流路の出口ノードでパケットの系統値がノード位置で定まるので、パケット順序は問題とならない。 However, first packet first packets, connected packets, and single packets and connected packets have first-come-first-served priority. The packet order at the input node of the shunt channel is not always the same. Between different systems, the packet system value is determined by the node position at the entrance node of the diversion channel and the exit node of the merge channel, so that the packet order does not matter.

次に、同一系統内でパケット順序が保たれている場合を、本発明の実施例１０として説明する。 Next, a case where the packet order is maintained in the same system will be described as Example 10 of the present invention.

図２２（Ａ）〜（Ｃ）及び図２３（Ａ）、（Ｂ）において、○印はパケットを示し、○印内の符号はパケットＩＤを示し、矢印はパケットの進む方向を示している。同じ符号のパケットは、同一パケットではなく、互いに対応していることを示している。パケットＩＤは、例えば処理対象のストリームＩＤである。簡単化のため、これらの図では１系統のみを示している。 22A to 22C and FIGS. 23A and 23B, a circle indicates a packet, a symbol in the circle indicates a packet ID, and an arrow indicates a direction in which the packet proceeds. Packets with the same code are not the same packet, but correspond to each other. The packet ID is, for example, a stream ID to be processed. For simplicity, only one system is shown in these figures.

データ駆動型処理回路では、一般に上述のように、互いに異なる処理対象のパケットを同一ループ内の各パイプラインステージで分散並列処理することができる。 In the data driven type processing circuit, as described above, different packets to be processed can be distributed and processed in parallel in each pipeline stage in the same loop.

図２２（Ａ）に示すように、ループ１００上の部分１０１で処理ＰＲ１を行い、次いでループ１００上の部分１０２で処理ＰＲ２を行う場合を考える。ループ１０１は、例えば図２４に示すような構成の１系統分を含んでいてもよい。 As shown in FIG. 22A, consider a case where the process PR1 is performed in the portion 101 on the loop 100 and the process PR2 is performed in the portion 102 on the loop 100. The loop 101 may include one system having a configuration as shown in FIG.

処理ＰＲ１の結果を処理ＰＲ２で用い又は処理ＰＲ２の結果を処理ＰＲ１で用いる場合に、図２２（Ｂ）に示すように、ループ１００を処理ＰＲ１のループ１０１Ａと処理ＰＲ２のループ１０２Ｂとに分割し、これらを結合ノード１０３で結合し、結合ノード１０３で、対応するパケット同士を待ち合わせて少なくとも一方から他方へ情報を伝達することにより、処理ＰＲ１とＰＲ２とで、少なくとも一方の処理結果を他方で利用する。 When the result of the process PR1 is used in the process PR2 or the result of the process PR2 is used in the process PR1, as shown in FIG. 22B, the loop 100 is divided into a loop 101A of the process PR1 and a loop 102B of the process PR2. By combining these at the combination node 103, the combination node 103 waits for the corresponding packets and transmits information from at least one to the other, so that at least one of the processing results is used by the processing PR1 and PR2. To do.

これにより、図２２（Ａ）の１直列処理が２並列処理となり、ループのパイプライン段数が低減するので、結合ノード１０３での待ち合わせ時間が短ければ、スループットが向上する。 Accordingly, one serial processing in FIG. 22A becomes two parallel processing, and the number of pipeline stages in the loop is reduced. Therefore, if the waiting time at the coupling node 103 is short, the throughput is improved.

例えばループ１０１Ａ上のパケット６が結合ノード１０３にラッチされたとき、これに対応したループ１０２Ａ上のパケット６が直ぐに結合ノード１０３に到達すれば、その結果を受け取って次のノードへ直ぐに移動できる。 For example, when the packet 6 on the loop 101A is latched by the joining node 103, if the corresponding packet 6 on the loop 102A arrives at the joining node 103 immediately, the result can be received and moved to the next node immediately.

しかし、例えばループ１０１Ａ上のパケット５がパケット６を追い越し、これが、対応するループ１０２Ａ上のパケット５と待ち合わせてその結果を取得し、結合ノード１０３から離れた後に、ループ１０１Ａ上のパケット６が結合ノード１０３でラッチされると、ループ１０２Ａ上のパケット６は結合ノード１０３を通過した後なので、その結果を用いることができなくなる。 However, for example, the packet 5 on the loop 101A overtakes the packet 6 and waits for the packet 5 on the corresponding loop 102A to obtain the result. After leaving the joining node 103, the packet 6 on the loop 101A joins. When latched by the node 103, the packet 6 on the loop 102A is after passing through the joining node 103, so that the result cannot be used.

これを避けるためにパケットを一時記憶させてそこからパケットの内容を取得するようにすると、処理が遅延するとともに、順次比較によりＩＤが一致するパケットを検索しなければならないので、構成が複雑になるとともに処理時間が長くなり、２並列化の意味がなくなる。 In order to avoid this, if the packet is temporarily stored and the content of the packet is acquired from the packet, the processing is delayed and the packet having the matching ID must be searched by sequential comparison, which makes the configuration complicated. At the same time, the processing time becomes longer, and the meaning of parallelization is lost.

もし、パケットの順番が保たれれば、結合ノード１０３で相手パケットのＩＤを確認することなく、それぞれが対応するパケットの処理結果を用いることができ、スループットが向上するとともに、パケットのデータ幅を短縮して回路規模を縮小することができ、さらに、コンポーネント化が可能となるので、システムの構築が容易となる。 If the order of the packets is maintained, the processing result of each corresponding packet can be used without confirming the ID of the partner packet at the joining node 103, the throughput is improved, and the data width of the packet is increased. The circuit scale can be reduced by shortening, and further componentization is possible, so that the system can be easily constructed.

ループ１０２Ａ上のパケットは、加工されない定数であってもよい。すなわち、ループ１０２Ａはリングキュー（循環キュー）であってもよい。 The packet on the loop 102A may be a constant that is not processed. That is, the loop 102A may be a ring queue (circulation queue).

例えば、ループ１０１Ａ上に第１パケットを投入し、ループ１０２Ａ上に該第１パケットと関係した第２パケットを投入し、結合ノード１０３は、ループ１０１Ａでのパケットに含まれるコマンド又は特定ビットが結合ノード１０３からの出力（分岐方向が出力側）を示している場合、これに対応してループ１０２Ａからパケットを取り出すことにより、ループ１０１Ａでの第１パケットに対応した処理結果のパケットとともに第２パケットを取り出す。これにより、ループ１０１Ａ上で常に第２パケットを同伴させる必要が無く、構成が簡単になる。 For example, the first packet is input on the loop 101A, the second packet related to the first packet is input on the loop 102A, and the combining node 103 combines the command or specific bit included in the packet in the loop 101A. When the output from the node 103 indicates that the branch direction is the output side, the packet is taken out from the loop 102A corresponding to this, and the second packet together with the packet of the processing result corresponding to the first packet in the loop 101A Take out. Thereby, it is not necessary to always accompany the second packet on the loop 101A, and the configuration is simplified.

また、ループ１０２Ａがリングキューである場合、ループ１０１Ａはループ１０２Ａをスタックとして用いることができる。ループ１０１Ａと対応するループ１０２Ａ上のパケットが複数あっても、その個数ｎをループ１０１Ａ上のパケットに含ませておき、結合ノード１０３において、ループ１０２Ａ側のＳＥＮＤ−ＩＮがアクティブになったときにループ１０２Ａ側のＡＣＫ−ＯＵＴをアクティブにし、ループ１０１Ａ側のＡＣＫ−ＯＵＴをインアクティブに維持した状態でこれをｎ回繰り返すことにより、対応関係を保つことができる。 When the loop 102A is a ring queue, the loop 101A can use the loop 102A as a stack. Even if there are a plurality of packets on the loop 102A corresponding to the loop 101A, the number n is included in the packet on the loop 101A, and when the SEND-IN on the loop 102A side becomes active in the joining node 103 The correspondence can be maintained by repeating this n times with the ACK-OUT on the loop 102A side active and the ACK-OUT on the loop 101A side maintained inactive.

すなわち、ループ１０１Ａのパケットが個数ｎの情報を含み、このパケットを１個転送させるとともにループ１０２Ａのパケットをｎ個転送させることにより、ループ１０１Ａの１パケットをループ１０２Ａのｎパケットと対応させる。結合ノード１０３は、ループ１０１Ａのパケットのコマンド又は特定ビットが、このパケットの全部又は一部（処理結果）をコピーしてループ１０２Ａへ投入することを示している場合、これを実行してループ１０１Ａの該パケットに含まれる個数ｎをインクリメント（これは他のノードで行ってもよい）する。 That is, the packet of the loop 101A includes the information of the number n, and one packet of the loop 101A is made to correspond to the n packet of the loop 102A by transferring one packet and transferring n packets of the loop 102A. If the command or specific bit of the packet in the loop 101A indicates that all or part of the packet (processing result) is copied and input to the loop 102A, the joining node 103 executes this and executes the loop 101A. The number n included in the packet is incremented (this may be performed by another node).

前記の場合において、もし順序同期をとることができなければ、ループ１０２Ａを設けることができず、ループ１０１Ａ上のパケットは、対応するパケットをループ１０１Ａ上で連接させて引き連れていかなければならず、スループットが低下するとともに、ループ１０１Ａの構成及び処理が複雑になる。 In the above case, if the order synchronization cannot be achieved, the loop 102A cannot be provided, and the packet on the loop 101A must be connected by connecting the corresponding packets on the loop 101A. As the throughput decreases, the configuration and processing of the loop 101A become complicated.

順序同期は、条件によっては全てのパケットについてとる必要はない。このような場合、図１８に示すように、順序制御用の順序ビットＯＤをパケットに備え、これが'１'のとき順序制御有り、'０'のとき無しと定める。そして、結合ノード１０３においてループ１０２Ａ上の対応するパケットを待つ際に、順序ビットＯＤが'０'であればループ１０２Ａ側のＳＥＮＤ−ＩＮがアクティブのときにループ１０２Ａ側のＡＣＫ−ＯＵＴをアクティブにしてこれを通過させることにより、ループ１０２Ａ上に順序制御不要なパケットを混在させることができる。ループ１０１Ａ上についても同様である。 The order synchronization need not be performed for all packets depending on conditions. In such a case, as shown in FIG. 18, an order bit OD for order control is provided in the packet, and when this is “1”, it is determined that there is order control, and when it is “0”, there is no order control. When the joining node 103 waits for a corresponding packet on the loop 102A, if the order bit OD is “0”, the ACK-OUT on the loop 102A side is activated when the SEND-IN on the loop 102A side is active. By passing this, packets that do not require sequence control can be mixed on the loop 102A. The same applies to the loop 101A.

図２２及び２６中のパケットＡ〜Ｄは、順序ビットＯＤが'０'のものであり、その他のパケット１〜６は順序ビットＯＤが'１'のものを示している。 The packets A to D in FIGS. 22 and 26 have the order bit OD “0”, and the other packets 1 to 6 have the order bit OD “1”.

なお、ループ間でパケットの対応がとれればよいので、ループ１０１Ａへの初期パケットの投入とループ１０１Ｂへの初期パケットの投入は、異なるノードで行ってもよい。 Note that it is only necessary to be able to handle the packets between the loops, so that the initial packet input to the loop 101A and the initial packet input to the loop 101B may be performed by different nodes.

また、ループ１０１Ａ及び１０２Ａは、条件分岐ノードを備え、パケットが含むコマンド又は特定ビットの値に応じてこのパケットの情報がループから外部へ取り出される。 The loops 101A and 102A include conditional branch nodes, and information on the packet is extracted from the loop to the outside according to the command included in the packet or the value of the specific bit.

図２２（Ｃ）は、より複雑な関係のループを示す。 FIG. 22C shows a more complicated relationship loop.

この例では、ループ１０１Ａと１０１Ｂとが結合ノード１０３Ａで結合され、条件に応じて、ループ１０１Ａ上のパケットがループ１０１Ｂ上へ移動したり、その逆が行われたりするとする。同様に、ループ１０２Ａと１０２Ｂとが結合ノード１０３Ａで結合され、条件に応じて、ループ１０２Ａ上のパケットがループ１０２Ｂ上へ移動したり、その逆が行われたりするとする。また、同じ符号のパケットは同時に存在し得ず、ある時点ではどちらか一方のループに存在するとする。さらに、パケット１〜３はそれぞれパケット４〜６に対応しているとする。 In this example, it is assumed that the loops 101A and 101B are joined by the joining node 103A, and the packet on the loop 101A moves to the loop 101B or vice versa depending on the conditions. Similarly, it is assumed that the loops 102A and 102B are coupled by the coupling node 103A, and a packet on the loop 102A moves to the loop 102B or vice versa depending on the condition. In addition, it is assumed that packets with the same code cannot exist at the same time and exist in one of the loops at a certain time. Further, it is assumed that the packets 1 to 3 correspond to the packets 4 to 6, respectively.

このような複雑な場合でも、例えばループ１０１Ａ上のパケット３が結合ノード１０３Ａを通ってループ１０１Ｂ上へ移動する際に、結合ノード１０３Ａにおいてこれに対応するパケット６をループ１０２Ａ上から１０２Ｂ上へ上記同様の制御により移動させてパケット順序の同期を取ることにより、上述の利点を得ることができる。 Even in such a complicated case, for example, when the packet 3 on the loop 101A moves to the loop 101B through the coupling node 103A, the packet 6 corresponding to the packet 3 moves from the loop 102A to 102B on the coupling node 103A. The above-described advantages can be obtained by moving the packets in the same manner and synchronizing the packet order.

待ち合わせ時間を短縮して順序同期の処理速度を速めるには、図２３（Ａ）に示すように、ループ１０２Ａと１０２Ｂとの間を、キュー１０４及び１０５を介して結合させ、処理結果のパケットを順次キューに格納し相手方が直ぐにこれから取り出せるようにすればよい。順序同期は、順序が予測できるので、予め処理結果をキューに入れておくことにより、処理結果を直ちに使用することが可能となる。 In order to shorten the waiting time and increase the processing speed of the order synchronization, as shown in FIG. 23A, the loops 102A and 102B are coupled via the queues 104 and 105, and the packet of the processing result is transmitted. It is only necessary to store them sequentially in the queue so that the other party can immediately take them out. In order synchronization, since the order can be predicted, the processing results can be used immediately by putting the processing results in a queue in advance.

上述のようにループを分割することは、ハードウェアのコンポーネント化のみならず、階層構造化をも可能にする。すなわち、図２３（Ｂ）に示すように、上述のキュー１０４及び１０５を上階層のループ１０６で処理すれば、階層構造となる。この例では、上階層のループ１０６での処理結果がキュー１０７及び１０８を介してそれぞれ下階層のループ１０１Ａ及び１０２Ａにフィードバックされている。 Dividing the loop as described above enables not only hardware componentization but also hierarchical structure. That is, as shown in FIG. 23B, when the above-described queues 104 and 105 are processed by the loop 106 of the upper hierarchy, a hierarchical structure is obtained. In this example, the processing result in the upper layer loop 106 is fed back to the lower layer loops 101A and 102A via the queues 107 and 108, respectively.

以上のことは、各系統について成立するので、複数系統のそれぞれについて適用することができる。 Since the above is true for each system, it can be applied to each of a plurality of systems.

なお、ループ処理は効率がよいが、ループを１回通る場合でも順序同期を適用できるので、処理はループでなくてもよい。 Although the loop process is efficient, the process may not be a loop because order synchronization can be applied even when passing through the loop once.

従来のデータ駆動型処理装置では、ローカルに同期を取って自律分散処理を行うことができるが、同期回路のシステムクロックに対応するものが存在しなかったので、自律分散処理に優れていても協調性が欠け、マイナーな存在であった。非同期回路において、パケットの順序を維持してループ間で順序同期をとることは、同期回路においてシステムクロックで同期をとることに対応している。 Conventional data-driven processing devices can perform autonomous distributed processing with local synchronization, but there is no one corresponding to the system clock of the synchronous circuit, so even if it is superior to autonomous distributed processing, it cooperates He lacked sex and was a minor being. In the asynchronous circuit, maintaining the packet order and synchronizing the order between the loops corresponds to synchronizing the system clock in the synchronous circuit.

マクロのネットワークでの非同期通信では、通信路でのパケット順序を維持できなくても同期型のＣＰＵ及び記憶装置と、ソフトウェアとの組み合わせによる高級機能により、ＴＣＰ層で順序を復元でき、パケット順序とは直接関係なく高級機能で自律分散協調制御を行うことが出来る。これに対し、内部でミクロのネットワークが構成されるデータ駆動型処理装置では、パケットの順序維持が協調制御の基本となる。 In asynchronous communication in a macro network, even if the packet order in the communication path cannot be maintained, the order can be restored in the TCP layer by a high-level function in combination with a synchronous CPU and storage device and software. Can perform autonomous distributed cooperative control with high-level functions regardless of direct relationship. On the other hand, in a data driven processing apparatus in which a micro network is internally configured, maintaining the order of packets is the basis of cooperative control.

本発明の順序同期は、自律分散による並列処理を維持しつつ簡単な構成で協調制御を可能にしデータ駆動型処理装置を高機能化するのに寄与するところが大きい。 The order synchronization of the present invention greatly contributes to the enhancement of the functionality of a data driven processor by enabling cooperative control with a simple configuration while maintaining parallel processing by autonomous distribution.

順序同期を実現するには、ループ状通信路でパケットの順序を同一系統内で維持する必要がある。パケットの順序を維持させるために順序合流を行わせる構成例を、本発明の実施例１１として説明する。 In order to realize the order synchronization, it is necessary to maintain the order of the packets in the same system on the loop communication path. A configuration example in which order merging is performed in order to maintain the packet order will be described as an embodiment 11 of the present invention.

分岐ノードでパケットが混雑していない方向へ分岐して先回りしても、同一系統ではその後、合流する。同一系統内でのパケットの順序の乱れは、選択的に合流するノードでのパケット追い越し、すなわち分岐ノードでのパケット順序が、これに対応した合流ノードでのパケット順序と相違することにより生ずる。 Even if the branching node branches in a direction where the packet is not congested and goes ahead, the same system then joins. The disorder of the packet order in the same system is caused by the packet overtaking at the node that selectively joins, that is, the packet order at the branch node is different from the packet order at the corresponding joining node.

この相違が何に対応するかを調べるため、分岐ノードとこれに対応する合流ノードでのパケット進行方向に着目する。例えば図２４の合流路４０ＢＰ上のノード４３３Ｐを通過するパケットは、その前に、これに対応する分流路２０Ａ上のノード２４３を通過している。パケットがノード２４３から次の段のどちらへ進むかで、パケットがノード４３３Ｐの後段のどちらからノード４３３Ｐに進むかが定まるという規則性がある。図２４ではこの関係がレジスタファイル３０Ｒに関し対称になるが、必ずしもこれに限定されず、論理的な対応関係があればよい。 In order to examine what this difference corresponds to, attention is paid to the packet traveling direction at the branch node and the corresponding junction node. For example, a packet passing through the node 433P on the combined flow path 40BP in FIG. 24 passes through the node 243 on the corresponding branch path 20A before that. There is a regularity in which the packet proceeds from the node 243 to the next stage, and from which of the subsequent stage of the node 433P the packet proceeds to the node 433P. In FIG. 24, this relationship is symmetric with respect to the register file 30R. However, the relationship is not necessarily limited to this, and any logical correspondence may be used.

簡単化のため、リードデータパケットが１ワードの場合のリードパケットとこれに対応するリードデータパケットを考える。パケットの順序が保たれていれば、ノード２４３を順次通過するパケットのノード２４３での分岐方向の順序と、ノード４３３Ｐを順次通過するパケットのノード４３３Ｐでの分岐方向の順序とが対応する。 For the sake of simplicity, consider a read packet and a corresponding read data packet when the read data packet is one word. If the order of the packets is maintained, the order of the branching direction at the node 243 of the packet that sequentially passes through the node 243 corresponds to the order of the branching direction at the node 433P of the packet that sequentially passes through the node 433P.

もし、全ての系統について、パケット順序が維持されていれば、合流路４０ＢＰ上の任意の合流ノードとこれに対応する分流路２０Ａ上の分岐ノード（ノードペア）とについて、この対応関係が成立する。もし、２つのパケット間の順序に乱れがあれば、いずれかのノードペアで該対応関係が不成立となる。 If the packet order is maintained for all the systems, this correspondence relationship is established between an arbitrary joining node on the joint channel 40BP and a branch node (node pair) on the branch channel 20A corresponding thereto. If there is a disorder in the order between two packets, the corresponding relationship is not established in any node pair.

そこで、全てのノードペアについて、この対応関係を維持するように、合流路４０ＢＰ上のノードの切換を、これに対応する分流路２０Ａ上のノードの切換情報（Ｎ段前の時点での切換情報）に基づいて制御することにより、パケット順序を維持する。但し、分流路２０Ａの出口ノードと合流路４０ＢＰの入口ノードについては、Ｎ＝０であって、対応関係が既に維持されている。図２４の場合、Ｎは２、４、６、８及び１０である。 Therefore, the node switching on the combined flow path 40BP is switched to the node switching information on the branch flow path 20A corresponding to this so as to maintain this correspondence relationship for all the node pairs (switching information at the time point before N stages). The packet order is maintained by controlling based on. However, regarding the outlet node of the branch channel 20A and the inlet node of the combined channel 40BP, N = 0, and the correspondence is already maintained. In the case of FIG. 24, N is 2, 4, 6, 8, and 10.

図２４において、例えば、ノード２４３を上側及び下側へ進むパケットの軌跡をそれぞれＴ１及びＴ２とする。軌跡Ｔ１のパケットが先にノード２４３に保持され、次に軌跡Ｔ２のパケットがノード２４３に保持されるとする。軌跡Ｔ１上でパケットが混雑し、軌跡Ｔ２上でパケットがすいていて、ノード４３３Ｐの後段には軌跡Ｔ２のパケットの方が先に到達したとする。この場合、ノード２４３で上側に切り替えたという情報がノード４３３Ｐへ伝達され、ノード４３３Ｐで上側からのパケットを待ち、これがノード４３３Ｐに保持された後に、ノード２４３で下側に切り替えたという情報がノード４３３へ伝達され、次にノード４３３Ｐで下側からのパケットを待つようにすれば、パケットの順序が維持される。全てのノードペアについて、このような制御を行えば、少なくとも同一系統内でパケットの順序が維持される。 In FIG. 24, for example, the trajectories of packets going up and down the node 243 are T1 and T2, respectively. Assume that the packet of the trajectory T1 is first held in the node 243, and then the packet of the trajectory T2 is held in the node 243. Assume that the packet is congested on the trajectory T1, the packet is covered on the trajectory T2, and the packet of the trajectory T2 arrives at the subsequent stage of the node 433P. In this case, information that the node 243 is switched to the upper side is transmitted to the node 433P, the node 433P waits for a packet from the upper side, and after this is held in the node 433P, the information that the node 243 is switched to the lower side is If the packet is transmitted to 433 and then the node 433P waits for a packet from the lower side, the packet order is maintained. If such control is performed for all node pairs, the order of packets is maintained at least within the same system.

図２５（Ａ）は、この順序を維持させるための合流路４０ＢＰの入口ノードを除く任意のノード１１０と、これに対応する分流路２０Ａ上のノード１１１との間に備えられた構成を示す。図２６は図２５（Ａ）の詳細ブロック図である。 FIG. 25A shows a configuration provided between an arbitrary node 110 excluding the inlet node of the combined flow path 40BP for maintaining this order and the node 111 on the branch flow path 20A corresponding thereto. FIG. 26 is a detailed block diagram of FIG.

図２５（Ａ）ではノード１１０とノード１１１との間でパケットが流れ得る流路を分岐合流ノード１１２と表す。 In FIG. 25A, a flow path through which a packet can flow between the node 110 and the node 111 is represented as a branch / merging node 112.

この構成では、ノード１１０と１１１との間にキュー１１３が備えられ、ＯＤ＝'１'であれば、ノード１１１からの分岐先方向を示す、行先アドレスＤＡの対応するビットＤＡｉ（図において上側分岐のとき'１'、下側分岐のとき'０'）が、キュー１１３の入力段１１３ａの１ビットラッチのデータ入力端に供給される。データ駆動型のキュー１１３は、転送制御回路で用いられるハンドシェイクプロトコルにより、途中にエンプティが存在すると自動的に詰められるという緩衝作用があるので、その段数は、ノード１１１とノード１１０との間のパイプライン段数Ｎ以上であればよい。キュー１１３の出力段から順次データを取り出せばよく、取り出す際に段数Ｎを考慮する必要はない。 In this configuration, a queue 113 is provided between the nodes 110 and 111. If OD = '1', the corresponding bit DAi of the destination address DA indicating the branch destination direction from the node 111 (upper branch in the figure). "1" at the time of "1" and "0" at the time of the lower branch) are supplied to the data input terminal of the 1-bit latch of the input stage 113a of the queue 113. The data driven queue 113 has a buffering action that automatically fills if there is an empty in the middle by the handshake protocol used in the transfer control circuit, so the number of stages is between the nodes 111 and 110. The number of pipeline stages may be N or more. Data may be extracted sequentially from the output stage of the queue 113, and there is no need to consider the number of stages N when extracting.

ノード１１０とその後段１１５及び１１６との構成は、図４の対応する構成と実質的に同一である。すなわち、ノード１１０はラッチ１１０Ｌの入力側にマルチプレクサ１１０Ｍが接続されているが、これは図４のラッチ４２１Ｌ及び４２２Ｌ内の出力側のゲートと出力イネーブル制御入力端ＯＥとの構成に対応している。図４との相違点は、図２６ではマルチプレクサ１１０Ｍの選択制御をキュー１１３の出力段１１３ｂのラッチ出力ＳＥＬで行っている点である。 The configuration of the node 110 and the subsequent stages 115 and 116 is substantially the same as the corresponding configuration in FIG. That is, the multiplexer 110M is connected to the input side of the latch 110L in the node 110, which corresponds to the configuration of the output side gate and the output enable control input terminal OE in the latches 421L and 422L in FIG. . The difference from FIG. 4 is that in FIG. 26, the selection control of the multiplexer 110M is performed by the latch output SEL of the output stage 113b of the queue 113.

ノード１１１とその次段１１７及び１１８との構成は、図３の対応する構成と、ノード１１１の転送制御回路１１１Ｃを除き同一である。ノード１１１の転送制御回路１１１Ｃはキュー１１３の入力段１１３ａの転送制御回路との間についても信号授受を行っている点で、転送制御回路２１１Ｃと異なる。 The configuration of the node 111 and the subsequent stages 117 and 118 is the same as the corresponding configuration of FIG. 3 except for the transfer control circuit 111C of the node 111. The transfer control circuit 111C of the node 111 is different from the transfer control circuit 211C in that signals are also exchanged with the transfer control circuit of the input stage 113a of the queue 113.

なお、図２６の分岐合流回路１１２Ａは、図２５（Ａ）の分岐合流回路１１２からノード１１５〜１１８を除いた部分である。 26 is a portion obtained by removing the nodes 115 to 118 from the branch / merging circuit 112 in FIG. 25 (A).

ノード１１１から次段ノード１１７又は１１８へのＳＥＮＤをアクティブにするときに、同時にキュー１１３の入力段１１３ａへのＳＥＮＤをアクティブにする。すなわち、ノード１１１の転送制御回路１１１Ｃは、次段１１７又は１１８及び入力段１１３ａからのＡＣＫが共にアクティブであり且つ後段からのＳＥＮＤがアクティブであるときに次段１１７又は１１８及び入力段１１３ａへのＳＥＮＤをアクティブにする。 When the SEND from the node 111 to the next stage node 117 or 118 is activated, the SEND to the input stage 113a of the queue 113 is simultaneously activated. In other words, the transfer control circuit 111C of the node 111 is connected to the next stage 117 or 118 and the input stage 113a when the ACK from the next stage 117 or 118 and the input stage 113a is both active and the SEND from the subsequent stage is active. Activate SEND.

ノード１１０は、２入力のうち、キュー１１３の出力段１１３ｂの出力に基づいて、ノード１１０の後段１１５及び１１６のラッチ出力の一方を選択する。すなわち、キュー１１３の出力段１１３ｂの出力ＳＥＬが'１'であれば、ノード１１０の後段上側のノード１１５からのデータを選択し、'０'であれば、ノード１１０の後段下側のノード１１６からのデータを選択する。この選択は、キュー１１０の出力段１１３ｂのラッチ出力ＳＥＬによりノード１１０のマルチプレクサ１１０Ｍを選択制御することにより行われる。 The node 110 selects one of the latch outputs of the subsequent stages 115 and 116 of the node 110 based on the output of the output stage 113b of the queue 113 out of the two inputs. That is, if the output SEL of the output stage 113b of the queue 113 is “1”, the data from the node 115 on the upper stage of the node 110 is selected, and if it is “0”, the node 116 on the lower stage of the rear stage of the node 110 is selected. Select data from. This selection is performed by selectively controlling the multiplexer 110M of the node 110 by the latch output SEL of the output stage 113b of the queue 110.

ノード１１０は、その後段１１５又は１１６へのＡＣＫをアクティブにするときに、キュー１１３の出力段１１３ｂに対するＡＣＫをアクティブにする。すなわち、ノード１１０は、ノード１１０の後段１１５又は１１６及びキュー１１３の出力段１１３ｂからのＳＥＮＤが共にアクティブになり且つノード１１０の次段からのＡＣＫがアクティブになったときに、ノード１１０のラッチ１１０Ｌにデータを取り込ませて保持させ、キュー１１３の出力段１１３ｂ及びノード１１０の後段１１５又は１１６へのＡＣＫを共にアクティブにする。 When node 110 subsequently activates an ACK to stage 115 or 116, it activates an ACK for output stage 113b of queue 113. That is, the node 110 receives the latch 110L of the node 110 when both SEND from the subsequent stage 115 or 116 of the node 110 and SEND from the output stage 113b of the queue 113 become active and ACK from the next stage of the node 110 becomes active. The data is fetched and held in ACK, and both the ACK to the output stage 113b of the queue 113 and the subsequent stage 115 or 116 of the node 110 are made active.

図２５（Ｂ）及び図２６において、ＤＡｉ＝'１'のとき、ノード１１１は、ノード１１７へパケットを分岐転送させる（この分岐が第１段）とともにキュー１１３の入力段１１３ａにＤＡｉ＝'１'を転送させる。Ｎ段経過後に、一方ではこれに対応するパケットがノード１１５に保持され、他方ではマルチプレクサ１１０Ｍの選択制御入力端に、前記ＤＡｉ＝'１'に対応したＳＥＬ＝'１'が供給されて、ノード１１０はノード１１５側を選択する。図２５（Ｃ）においても同様である。 In FIG. 25B and FIG. 26, when DAi = “1”, the node 111 branches and transfers the packet to the node 117 (this branch is the first stage) and DAi = “1” to the input stage 113a of the queue 113. Let 's forward. After N stages, a packet corresponding to this is held on the node 115 on the one hand, and on the other hand, SEL = '1' corresponding to DAi = '1' is supplied to the selection control input terminal of the multiplexer 110M, and the node 110 selects the node 115 side. The same applies to FIG.

ここで、ライトパケットについては、レジスタファイル３０Ｒへの書き込みが終了し、合流路４０Ａ側へ対応するパケットが転送されないので、このパケットの順序ビットＯＤを'０'にしておく。転送制御回路１１１Ｃは、ノード１１１のラッチ１１１Ｌに保持した順序ビットＯＤが'０'であるとき、キューの入力段１１３へのＳＥＮＤをインアクティブに維持する。これによりキュー１１３の入力段１１３ａのラッチにはビットＤＡｉが転送されないので、順序維持の切り替えとは無関係になる。 Here, with respect to the write packet, the writing to the register file 30R is completed, and the corresponding packet is not transferred to the merge channel 40A side, so the order bit OD of this packet is set to “0”. When the order bit OD held in the latch 111L of the node 111 is “0”, the transfer control circuit 111C maintains SEND in the queue input stage 113 inactive. As a result, since the bit DAi is not transferred to the latch of the input stage 113a of the queue 113, the order maintenance is not changed.

一方、リードパケットのように分流路２０Ａ側の１パケットが合流路４０ＢＰ側の複数パケットに対応する場合、キュー１１３においてもこの対応関係を維持する必要がある。この対応関係を維持するために、ノード１１０の転送制御回路１１０Ｃは、連接ビットＣＮが'１'のときは例外として、キュー１１３の出力段１１３ｂへのＡＣＫをインアクティブに維持する。これにより、連接パケットについてもノード１１０とノード１１１とで切り替えの対応関係を保つことができる。連接パケットの末尾パケットは連接ビットＣＮが'０'であるが、その１つ前のパケットの連接ビットＣＮが'１'であるので、図２７（Ｊ）に示すように、末尾パケットに対してもノード１１０の選択方向は変わらない。 On the other hand, when one packet on the branch channel 20A side corresponds to a plurality of packets on the combined channel 40BP side like a read packet, it is necessary to maintain this correspondence also in the queue 113. In order to maintain this correspondence, the transfer control circuit 110C of the node 110 maintains the ACK to the output stage 113b of the queue 113 inactive, except when the concatenated bit CN is “1”. Thereby, it is possible to maintain the switching correspondence between the node 110 and the node 111 for the connection packet. Although the concatenated bit CN of the end packet of the concatenated packet is “0”, the concatenated bit CN of the immediately preceding packet is “1”, so as shown in FIG. However, the selection direction of the node 110 does not change.

図２７（Ａ）〜（Ｊ）は、分流路２０Ａ側の１つのリードパケットの流れと、これに対応した合流路４０ＢＰ側の複数のリードデータパケット（連接パケット）の流れとを、時間を追って示す。図中の'１'は、上述のノード側フリップフロップの値を示す。図２７（Ａ）は４段分のデータ転送を纏めて示している。 FIGS. 27A to 27J show the flow of one read packet on the branch flow path 20A side and the flow of a plurality of read data packets (concatenated packets) on the side of the combined flow path 40BP corresponding thereto over time. Show. “1” in the figure indicates the value of the above-described node-side flip-flop. FIG. 27A collectively shows data transfer for four stages.

（１）図２７（Ａ）で、ＤＡｉ＝'１'であればノード１１１から次段上側（'１'側）１１７へデータが転送されると共に、ＯＤ＝'１'であれば行先アドレスＤＡｉの値がキュー１１３の入力段１１３ａに転送される。 (1) In FIG. 27A, if DAi = '1', data is transferred from the node 111 to the next upper stage ('1' side) 117, and if OD = '1', the destination address DAi Is transferred to the input stage 113 a of the queue 113.

（２）図２７（Ｅ）で、連接先頭パケットがノード１１０の後段上側１１５に取り込まれて保持されるとともに、（１）で保持したＤＡｉ＝'１'がキュー１１３の出力段１１３ｂに取り込まれて保持され、ノード１１０のマルチプレクサ１１０Ｍはその選択制御入力端への'１'に応答して、ノード１１０の後段上側１１５のノードからのデータを選択する。 (2) In FIG. 27E, the connected leading packet is fetched and held in the upper stage 115 following the node 110, and DAi = '1' held in (1) is fetched into the output stage 113b of the queue 113. The multiplexer 110M of the node 110 selects data from the node on the upper stage 115 after the node 110 in response to “1” to the selection control input terminal.

（３）これにより、図２７（Ｆ）で、ノード１１０はこのデータを取り込み保持する。 (3) Thereby, the node 110 captures and holds this data in FIG.

（４）その後、ノード１１０が保持しているパケットの連接ビットＣＮの値が'１'の間、ノード１１０の転送制御回路１１０Ｃからキュー１１３へのＡＣＫがインアクティブに維持されて、キュー１１３の出力段１１３ｂの出力ＳＥＬ＝'１'が維持され、ノード１１５（図２５）から連接パケットが順次ノード１１０へ到達する。 (4) Thereafter, while the value of the concatenated bit CN of the packet held by the node 110 is “1”, the ACK from the transfer control circuit 110C of the node 110 to the queue 113 is maintained inactive, and the queue 113 The output SEL of the output stage 113b is maintained at “1”, and the concatenated packets sequentially reach the node 110 from the node 115 (FIG. 25).

このようにして、分流路２０Ａの任意のノードから、合流路４０ＢＰの対応するノードへ、順序制御情報ＤＡｉ→ＳＥＬが伝達され、これに応じ合流ノードでの選択制御が行われ、これにより全ての系統についてパケットの順序が維持される。 In this way, the order control information DAi → SEL is transmitted from an arbitrary node of the branch flow path 20A to the corresponding node of the combined flow path 40BP, and selection control is performed at the merge node accordingly, thereby The order of the packets is maintained for the system.

したがって、この構成によれば、図２２及び図２３で述べた構成を実現して、その効果を達成することができる。 Therefore, according to this configuration, the configuration described in FIGS. 22 and 23 can be realized, and the effect can be achieved.

なお、順序ビットＯＤは、図２２及び図２３について説明した順序ビットＯＤとしても使用できる。 The order bit OD can also be used as the order bit OD described with reference to FIGS.

また、本発明の順序合流制御が行われるノード１１０とノード１１１との対は、ツリー形分流路とツリー形合流路の対応するノード対に限定されず、第１パケットが分岐ノードを通れば、該第１パケットに対応した第２パケットが合流ノードを通り、且つ、該分岐ノードでの該第１パケットの分岐方向と該合流ノードでの該第２パケットの合流方向とが対応しており、該分岐ノードと該合流ノードとの間のパイプライン段数がＮ（Ｎ≧１）であるという条件を満たす分岐ノードと合流ノードの対であればよい。 In addition, the pair of the node 110 and the node 111 on which the order merge control of the present invention is performed is not limited to the corresponding node pair of the tree-shaped branch channel and the tree-shaped merge channel, and if the first packet passes through the branch node, The second packet corresponding to the first packet passes through the junction node, and the branch direction of the first packet at the branch node corresponds to the junction direction of the second packet at the junction node; A pair of a branch node and a merge node that satisfy the condition that the number of pipeline stages between the branch node and the merge node is N (N ≧ 1) may be used.

さらに、ノード１１０を通るパケットはノード１１１を通るパケットと対応しているが、この対応関係は、両者が同一パケットであってもよい。 Further, although the packet passing through the node 110 corresponds to the packet passing through the node 111, both of these correspondences may be the same packet.

次に、本発明のデータ駆動型処理装置の適用例として、有限オートマトン動作を行うＣＰＵアクセラレータについて説明する。 Next, a CPU accelerator that performs a finite automaton operation will be described as an application example of the data driven processing apparatus of the present invention.

有限オートマトンは、言語学、情報工学、生物学、数学、論理学など様々な領域で利用されている。有限オートマトンでは、現在状態と入力とにより、次状態が定まり、この状態遷移が繰り返し行われてパターン一致有無が判定される。 Finite automata are used in various fields such as linguistics, information engineering, biology, mathematics, and logic. In the finite automaton, the next state is determined based on the current state and the input, and this state transition is repeatedly performed to determine whether or not the pattern matches.

図３３は、簡単な有限オートマトンの例を示す状態遷移図である。 FIG. 33 is a state transition diagram showing an example of a simple finite automaton.

この例では、データストリームＤＳ＝"ＣＡＡＢＡＢＡＢＣＣＣＣＢＢＡＢＡＣＣ"の中に、検索データ集合ＲＤのパターン"ＡＢＡ"又は"ＡＢＣ"が含まれているか否かを決定する。現在の状態にデータストリームＤＳ中のエレメント"Ａ"、"Ｂ"又は"Ｃ"が入力されると、次の状態が定まり、これに次のエレメントが入力されるという処理が繰り返し行われ、出力時の状態が検出パターンに対応している。エレメントは文字コードに限定されず、所定のデータ幅のデータであればよい。 In this example, it is determined whether or not the pattern “ABA” or “ABC” of the search data set RD is included in the data stream DS = “CAABABABCCCCBBABACC”. When the element “A”, “B” or “C” in the data stream DS is input to the current state, the next state is determined, and the process of inputting the next element to this is repeatedly performed and output. The time state corresponds to the detection pattern. The element is not limited to a character code, and may be data having a predetermined data width.

ウイルス検出の例で言うと、検索データ集合ＲＤに含まれるパターンのそれぞれがウイルスに対応している。入力データストリームＤＳが多数のウイルスのどれに感染しているかのパターンマッチング処理を、１つの状態遷移図で表すことができる（パターンマルチング）。 In the virus detection example, each of the patterns included in the search data set RD corresponds to a virus. The pattern matching process of which of the many viruses the input data stream DS is infected with can be represented by one state transition diagram (pattern mulching).

以下では、有限オートマトンをウイルス検出に適用した場合について説明するが、本発明のＣＰＵアクセラレータはこれに限定されるものではなく、全ての有限オートマトンに適用可能である。 Hereinafter, a case where the finite automaton is applied to virus detection will be described. However, the CPU accelerator of the present invention is not limited to this, and can be applied to all finite automata.

本発明の装置では、並列度が高いので、同時に多数の入力データストリームＤＳを取り扱うことができる。 In the apparatus of the present invention, since the degree of parallelism is high, a large number of input data streams DS can be handled simultaneously.

図２８は、行を状態Ｓ、入力である列を、データストリームを構成する１バイトのストリームエレメントＳＥとした状態遷移テーブルを示す。但し、この状態遷移テーブルには、１ビットの結果ビットＲが含まれている。 FIG. 28 shows a state transition table in which a row is a state S and an input column is a 1-byte stream element SE constituting a data stream. However, this state transition table includes a 1-bit result bit R.

状態Ｓを上位ビット、ストリームエレメントＳＥを下位ビットとするアドレスに、次の状態が格納されたメモリを用いる。１６進数表記で、例えば状態Ｓの初期値を"００００"とし、ストリームエレメントＳＥが"０１"であった場合、次の状態Ｓは"０００２"となる。これと次のストリームエレメントＳＥとで、次の状態Ｓが定まる。 A memory in which the next state is stored at an address having the state S as the upper bit and the stream element SE as the lower bit is used. In hexadecimal notation, for example, when the initial value of the state S is “0000” and the stream element SE is “01”, the next state S is “0002”. The next state S is determined by this and the next stream element SE.

結果ビットＲは１ビットであり、ウイルスパターンが検出されたとき、Ｒ＝'１'となる。このときの状態Ｓで指定されるアドレスには、次の状態はなく、ウイルスコードＶＣが格納されている。ウイルスコードＶＣに対応したウイルス名は、ＣＰＵに管理させる。結果ビットＲは、パケット内のコマンドの役割を果たす。 The result bit R is 1 bit, and R = '1' when a virus pattern is detected. The address specified in the state S at this time does not have the next state, and stores the virus code VC. The CPU manages the virus name corresponding to the virus code VC. The result bit R serves as a command in the packet.

図２９は、本発明が適用された、実施例１２のデータ駆動型ＣＰＵアクセラレータ６０Ｑを示す概略ブロック図である。 FIG. 29 is a schematic block diagram showing a data driven CPU accelerator 60Q according to the twelfth embodiment to which the present invention is applied.

状態テーブルメモリ１２０は、例えば図８のメモリ１０Ｂの記憶容量を大きくしたものであり、その分流路１２１、メモリ行アレイ１２２及び合流路１２３はそれぞれ、図８の分流路２０Ｂ、メモリ行アレイ３０及び合流路４０Ｂに対応している。リードパケットに対するリードデータパケットは後述のように１ワードであり、これらのフォーマットは上述のものと異なる。メモリ行アレイ１２２には、図２８のテーブルが格納されている。 The state table memory 120 has, for example, a larger storage capacity of the memory 10B in FIG. 8, and the branching channel 121, the memory row array 122, and the combined channel 123 are respectively connected to the branching channel 20B, the memory row array 30, and the like in FIG. This corresponds to the combined flow path 40B. The read data packet corresponding to the read packet is one word as will be described later, and these formats are different from those described above. The memory row array 122 stores the table of FIG.

図３１は、図２９の装置における１系統に関するデータフローをデータフォーマットとともに示す図である。 FIG. 31 is a diagram showing a data flow regarding one system in the apparatus of FIG. 29 together with a data format.

系統ＣＨは、上述のように合流路１２３で用いられる定数である。結果ビットＲ及び状態Ｓは、状態テーブルメモリ１２０から読み出されたデータであり、これと、下位ビットとしてのストリームエレメントＳＥとで、状態テーブルメモリ１２０のアドレスが指定される。各系統で複数のデータストリームを処理することができ、そのストリーム識別子ＳＩＤをこの例では３ビットとしている。ストリーム識別子ＳＩＤ及び系統ＣＨは、状態テーブルメモリ１２０を含むループで、同一ストリームに対し不変である。 The system CH is a constant used in the combined channel 123 as described above. The result bit R and the state S are data read from the state table memory 120, and the address of the state table memory 120 is designated by this and the stream element SE as a lower bit. A plurality of data streams can be processed in each system, and the stream identifier SID is 3 bits in this example. The stream identifier SID and the system CH are a loop including the state table memory 120, and are unchanged for the same stream.

図２９に戻って、複数のデータストリームは、ＤＭＡＣにより、インターフェイス１２４及びメモリコントローラ１２５を介し、バッファとしてのＲＡＭ１２６に一時格納された後、ＣＰＵ１２７によりインターフェイス１２４及びメモリコントローラ１２５を介してＲＡＭ１２６の内容が読み出され、メモリコントローラ１２５、インターフェイス１２４及び１２８並びにストリームバッファ１３０の分流路１３１を介しキューアレイ１３２に供給され保持される。ＣＰＵ１２７、ＲＡＭ１２６、メモリコントローラ１２５及びインターフェイス１２８は同期型であり、インターフェイス１２８は、同期型と非同期型との相互変換部を備えている。 Returning to FIG. 29, the plurality of data streams are temporarily stored in the RAM 126 as a buffer by the DMAC via the interface 124 and the memory controller 125, and then the contents of the RAM 126 are changed by the CPU 127 via the interface 124 and the memory controller 125. The data is read and supplied to the queue array 132 via the memory controller 125, the interfaces 124 and 128, and the diversion channel 131 of the stream buffer 130, and is held. The CPU 127, the RAM 126, the memory controller 125, and the interface 128 are synchronous, and the interface 128 includes a mutual conversion unit between a synchronous type and an asynchronous type.

図３０は、図２９中のストリームバッファ１３０の概略ブロック図である。 FIG. 30 is a schematic block diagram of the stream buffer 130 in FIG.

このストリームバッファ１３０の分流路１３１及び合流路（マルチプレクサ）１３３０〜１３３３はそれぞれ、図１の分流路２０の第３〜５段を抽出したもの及び合流路４０の第２〜４段を抽出したものと同一である。分流路１３１は、インターフェイス１２４の端子数を少なくするためのものであり、この例では４組としているが、１組以上であればよい。 The branch flow path 131 and the combined flow paths (multiplexers) 1330 to 1333 of the stream buffer 130 are extracted from the third to fifth stages of the divided flow path 20 of FIG. 1 and extracted from the second to fourth stages of the combined flow path 40, respectively. Is the same. The shunt flow path 131 is for reducing the number of terminals of the interface 124. In this example, four sets are provided, but one or more sets may be used.

分流路１３１に供給されるパケットのフォーマットは、図３１に示す如く、３ビットのストリーム識別子ＳＩＤのフィールドと、８ビットのストリームエレメントＳＥのフィールドとからなる。 As shown in FIG. 31, the format of the packet supplied to the diversion channel 131 is composed of a 3-bit stream identifier SID field and an 8-bit stream element SE field.

ストリーム識別子ＳＩＤは、４系統×８本のキューアレイ１３２の８本のキューＩＤと対応づけられている。このようなキューＩＤをストリーム識別子ＳＩＤと対応させることにより、分流路１３１で行先アドレス５ビットの下位３ビットとして用いられたストリーム識別子ＳＩＤは、分流路１３１を出ると不要となり、キューアレイ１３２では８ビットのストリームエレメントＳＥのみ保持される。マルチプレクサ１３３０〜１３３３はそれぞれ系統０〜３の８本のキューの１つを選択して、それぞれノード１４０〜１４３に供給する。この選択は、分流路１２１へ転送しようとするパケットに含まれるストリーム識別子ＳＩＤであるＳＩＤ０〜ＳＩＤ３をそれぞれデコーダ１４５〜１４８でデコードした制御信号により行われる。 The stream identifier SID is associated with 8 queue IDs of 4 systems × 8 queue arrays 132. By associating such a queue ID with the stream identifier SID, the stream identifier SID used as the lower 3 bits of the destination address 5 bits in the branch channel 131 becomes unnecessary after exiting the branch channel 131. Only the bit stream element SE is retained. The multiplexers 1330 to 1333 select one of the eight queues of the systems 0 to 3 and supply them to the nodes 140 to 143, respectively. This selection is performed by a control signal obtained by decoding SID0 to SID3, which are stream identifiers SID included in a packet to be transferred to the diversion channel 121, by decoders 145 to 148, respectively.

マルチプレクサ１３３０〜１３３３は、通常の構成を用いることができるが、図３０に示すように８入力１出力の合流路を用いてもよい。この場合、ストリーム識別子ＳＩＤをデコードして、８本のキューのうちの対応する１つのキューの出力段に対してのみＡＣＫをアクティブにすればよい。この場合のデコーダは、１入力８出力の分流路を用いることができる。 The multiplexers 1330 to 1333 can use a normal configuration, but as shown in FIG. 30, a combined flow path with 8 inputs and 1 output may be used. In this case, it is only necessary to decode the stream identifier SID and activate ACK only for the output stage of one corresponding queue among the eight queues. The decoder in this case can use a diversion channel with 1 input and 8 outputs.

キューアレイ１３２を構成する各キューについて、半空になったときには、これを半空検出回路１３４で検出し、そのキューの系統ＣＨとストリーム識別子ＳＩＤとを伴って、インターフェイス１２８及び１２４を介したＣＰＵ１２７への割込要求ＩＲＱ２をアクティブにする。これによりＣＰＵ１２７は、ＲＡＭ１２６からデータを読み出して、対応する系統ＣＨ及びストリーム識別子ＳＩＤのキューにこれを補給する。ＲＡＭ１２６からインターフェイス１２４にはＤＭＡ転送することができる。 When each queue constituting the queue array 132 becomes half-empty, this is detected by the half-empty detection circuit 134 and the queue CH and the stream identifier SID are sent to the CPU 127 via the interfaces 128 and 124. The interrupt request IRQ2 is activated. As a result, the CPU 127 reads data from the RAM 126 and replenishes it to the queue of the corresponding system CH and stream identifier SID. DMA transfer from the RAM 126 to the interface 124 is possible.

半空検出は例えば、設定時間内におけるキューアレイ１３２の先頭でのＳＥＮＤ−ＯＵＴパルス数と中間部でのそれとの差が所定値以上となったことにより検出することができる。またキューアレイ１３２のそれぞれのキューについて、出力されるパケット数（ラッチパルス数）と供給されるパケット数（ラッチパルス数）とをカウントし、その差が設定値以上になったとき、同様に割込要求ＩＲＱ２をアクティブにする構成であってもよい。半空でなく、キューの所定割合が空になったことを検出してもよいことは勿論である。 The half sky detection can be detected, for example, when the difference between the number of SEND-OUT pulses at the head of the queue array 132 within the set time and that at the intermediate portion is a predetermined value or more. In addition, for each queue of the queue array 132, the number of output packets (the number of latch pulses) and the number of supplied packets (the number of latch pulses) are counted. The configuration may be such that the load request IRQ2 is activated. Of course, it is possible to detect that the predetermined ratio of the queue is not half empty.

ノード１５０〜１５３のパケットは、結果ビットＲを含む必要がない。合成ノード１４０〜１４３にはそれぞれ、一方ではノード１５０〜１５３からパケットが供給され、これらのストリーム識別子ＳＩＤがそれぞれＳＩＤ０〜ＳＩＤ３としてマルチプレクサ１３３０〜１３３３に対する選択制御信号として供給され、他方ではマルチプレクサ１３３０〜１３３３からのストリームエレメントＳＥが合成ノード１４０〜１４３に付加されて合成され、合成ノード１４０〜１４３に取り込まれ保持される。 The packets of the nodes 150 to 153 need not include the result bit R. Each of the synthesis nodes 140 to 143 is supplied with packets from the nodes 150 to 153 on the one hand, and these stream identifiers SID are supplied as selection control signals to the multiplexers 1330 to 1333 as SID0 to SID3, respectively, and on the other hand, the multiplexers 1330 to 1333. Are added to the synthesis nodes 140 to 143 and synthesized, and are taken in and held by the synthesis nodes 140 to 143.

合成ノード１４０〜１４３の出力が分流路１２１に転送される。合成ノード１４０〜１４３を省略し、これらの替わりに分流路１２１の入口ノードを用いてもよい。 The outputs of the synthesis nodes 140 to 143 are transferred to the branch channel 121. The synthesis nodes 140 to 143 may be omitted, and the inlet node of the branch path 121 may be used instead.

ノード１５０〜１５３には、ＣＰＵ１２７からインターフェイス１２４及び１２８を介した初期パケットと、合流路１２３からのパケットとが選択的に合流する。この初期パケットは、ＣＰＵアクセラレータ６０Ｑを起動させるためのものであり、図３１において、例えばＳ＝０、Ｒ＝０とし、系統ＣＨ及びストリーム識別子ＳＩＤをそれぞれの系統ごとに与えたものである。 An initial packet from the CPU 127 via the interfaces 124 and 128 and a packet from the merge channel 123 are selectively joined to the nodes 150 to 153. This initial packet is for activating the CPU accelerator 60Q. In FIG. 31, for example, S = 0 and R = 0, and a system CH and a stream identifier SID are given for each system.

ストリーム識別子ＳＩＤの値は、ＣＰＵ１２７がインターフェイス１２４、１２８及び分流路１３１を介しキューアレイ１３２にデータストリームを供給したものであればよく、ＣＰＵ１２７が定めることができる。 The value of the stream identifier SID may be any value as long as the CPU 127 supplies the data stream to the queue array 132 via the interfaces 124 and 128 and the diversion channel 131, and can be determined by the CPU 127.

ＣＰＵ１２７は、ノード１５０〜１５３のそれぞれに１つ又は複数の初期パケットを順次供給する。ＣＰＵ１２７はこの際、合流路１２３の対応する出口ノードに対するＡＣＫをインアクティブにして、出口ノードからのパケットの流れを停止させておく。次いでこの停止を解除すると、状態テーブルメモリ１２０を含むループ内でパケットがパイプライン処理される。各系統の初期パケットは、本実施例では最大８個である。実際には、ループ内にパケットを分散させることができるので、その最大値は状態テーブルメモリ１２０の全段数に２を加えたものとすることができる。 The CPU 127 sequentially supplies one or a plurality of initial packets to each of the nodes 150 to 153. At this time, the CPU 127 inactivates the ACK for the corresponding exit node of the combined path 123 and stops the flow of packets from the exit node. Then, when this stop is released, the packet is pipelined in a loop including the state table memory 120. In this embodiment, the maximum number of initial packets for each system is eight. Actually, since packets can be distributed in a loop, the maximum value can be obtained by adding 2 to the total number of stages in the state table memory 120.

状態テーブルメモリ１２０の合流路１２３の各系統の出口ノードの出力は、出力回路１６０に供給される。 The output of the exit node of each system in the combined flow path 123 of the state table memory 120 is supplied to the output circuit 160.

出力回路１６０は、４系統の結果ビットＲのいずれかが'１'となると、その系統ＣＨ、ストリーム識別子ＳＩＤ及びウイルスコードＶＣを取り込んで保持し、ＣＰＵ１２７に対し、これらを供給するとともに割込要求ＩＲＱ１をアクティブにする。各データストリームについて、１つのウイルスを検出すればそのストリームに対する処理を打ち切ることができる。この場合、ストリームバッファ１３０内の、ウイルスが検出されたストリームをフラッシュし又は／及びこのストリームの追加を停止し、未処理ストリームがあればストリームバッファ１３０へ他のストリームを供給し、これに対応して初期パケットを、上述のように供給し、該他のストリームに対する処理を開始する。 When any of the four system result bits R becomes “1”, the output circuit 160 captures and holds the system CH, the stream identifier SID, and the virus code VC, supplies them to the CPU 127, and requests an interrupt. Activate IRQ1. For each data stream, if one virus is detected, processing for that stream can be aborted. In this case, the stream in which the virus is detected in the stream buffer 130 is flushed and / or the addition of this stream is stopped, and if there is an unprocessed stream, another stream is supplied to the stream buffer 130 and correspondingly. The initial packet is supplied as described above, and processing for the other stream is started.

本実施例１２によれば、ＣＰＵアクセラレータ６０Ｑがデータ駆動型で構成されており、さらに状態テーブルメモリ１２０とストリームバッファ１３０とが並列動作するので、処理の並列度が高くてスループットが高いとともに、低消費電力であり、各種モバイル機器に好適である。 According to the twelfth embodiment, since the CPU accelerator 60Q is configured as a data driven type, and the state table memory 120 and the stream buffer 130 operate in parallel, the parallelism of processing is high, the throughput is high, and the low Power consumption is suitable for various mobile devices.

図３２は、本発明が適用された、実施例１３の順序同期・データ駆動型ＣＰＵアクセラレータ６０ＱＡを示す概略ブロック図である。 FIG. 32 is a schematic block diagram showing a sequence synchronization / data driven type CPU accelerator 60QA according to the thirteenth embodiment to which the present invention is applied.

このＣＰＵアクセラレータ６０ＱＡではまず、ストリームバッファ１３０Ａの分流路１３１Ａを、図１の６段分流路２０を５段にしたもので構成するとともに、インターフェイス１２８Ａから、一方では分岐ノード１６３を介して分流路１３１Ａへデータストリームを転送させ、他方では分岐ノード１６１及びデマルチプレクサ（分流路）１６２を介して初期パケットをノード１５０〜１５３へ供給することにより、インターフェイス１２８Ａの出力端子数を低減している。 In this CPU accelerator 60QA, first, the branch flow path 131A of the stream buffer 130A is constituted by five stages of the six-stage flow path 20 of FIG. 1, and the branch flow path 131A from the interface 128A, on the one hand, via the branch node 163. On the other hand, the initial packet is supplied to the nodes 150 to 153 via the branching node 161 and the demultiplexer (branch channel) 162, thereby reducing the number of output terminals of the interface 128A.

ノード１６１でのパケットは、初期パケットであるか否かを示すビット及び系統ＣＨを有し、前者でノード１６１でのパケット分岐先が定まる。デマルチプレクサ１６２では、系統ＣＨが行先アドレスとして用いられ、これはその出力ノードで不要となる。 The packet at the node 161 has a bit indicating whether or not it is an initial packet and a system CH, and the packet branch destination at the node 161 is determined by the former. In the demultiplexer 162, the system CH is used as the destination address, which is unnecessary at the output node.

次に、ストリームバッファ１３０Ａのそれぞれのキューが順序を維持しているので、このＣＰＵアクセラレータ６０ＱＡでは、上述の順序合流制御が行われる分流路１２１Ａ及び合流路１２３Ａを備えた状態テーブルメモリ１２０Ａを用いて、状態テーブルメモリ１２０Ａでのパケット順序を維持させることにより、状態テーブルメモリ１２０Ａとストリームバッファ１３０Ａとの間で順序同期をとっている。 Next, since the respective queues of the stream buffer 130A maintain the order, the CPU accelerator 60QA uses the state table memory 120A including the branch flow path 121A and the combined flow path 123A in which the above-described sequential merge control is performed. By maintaining the packet order in the state table memory 120A, the state table memory 120A and the stream buffer 130A are synchronized in order.

状態テーブルメモリ１２０Ａから出力されるパケットの順序が維持されるので、マルチプレクサ１３３０〜１３３３に対する選択制御を確実に予測することができる。この順序は、ＣＰＵ１２７がインターフェイス１２４、１２８、ノード１６１及びデマルチプレクサ１６２を介しノード１５０〜１５３へ供給する初期パケットの順序により定まる。すなわち、順序はＣＰＵ１２７が決定することになる。 Since the order of the packets output from the state table memory 120A is maintained, selection control for the multiplexers 1330 to 1333 can be reliably predicted. This order is determined by the order of initial packets that the CPU 127 supplies to the nodes 150 to 153 via the interfaces 124 and 128, the node 161, and the demultiplexer 162. That is, the order is determined by the CPU 127.

ストリームＩＤ予測回路１６３は、系統ＣＨ毎に不図示のリングキューを備えており、系統ＣＨ毎に、ノード１６１からのパケット内のストリーム識別子ＳＩＤを順次このリングキューに保持し、その出力に基づき、マルチプレクサ１３３０〜１３３３へそれぞれストリーム識別子ＳＩＤ０〜ＳＩＤ３を供給するとともに該リングキュー内のパケットを１段進ませ、キュー１７０〜１７３の先頭からのＡＣＫがアクティブになる毎にこれを繰り返すことにより、予めキュー１７０〜１７３へストリームエレメントＳＥを複数取り込ませ保持させる。 The stream ID prediction circuit 163 includes a ring queue (not shown) for each system CH, and sequentially stores the stream identifier SID in the packet from the node 161 in this ring queue for each system CH. Based on the output, The stream identifiers SID0 to SID3 are supplied to the multiplexers 1330 to 1333, the packets in the ring queue are advanced by one stage, and this is repeated each time an ACK from the head of the queues 170 to 173 becomes active, so that A plurality of stream elements SE are fetched and held in 170 to 173.

合成ノード１４０〜１４３は、ノード１５０〜１５３へのＡＣＫをアクティブにするとき、同時に、対応するキュー１７０〜１７３の出力段へのＡＣＫをアクティブにする。 When combining nodes 140-143 activate ACKs to nodes 150-153, they simultaneously activate ACKs to the output stages of the corresponding queues 170-173.

このようにして、パケットが状態テーブルメモリ１２０Ａからノード１５０〜１５３を介しそれぞれ合成ノード１４０〜１４３へ到達したときに、キュー１７０〜１７３からのパケットをこれと同時に合成ノード１４０〜１４３へ到達させることが可能となり、合成ノード１４０〜１４３での待ち合わせのタイムラグがなくなるので、上記実施例１２よりも高速処理を行うことができる。 In this way, when a packet arrives at the synthesis nodes 140 to 143 from the state table memory 120A via the nodes 150 to 153, respectively, the packets from the queues 170 to 173 reach the synthesis nodes 140 to 143 at the same time. Since there is no waiting time lag at the synthesis nodes 140 to 143, it is possible to perform higher speed processing than in the above-described embodiment 12.

なお、順序同期により合成ノード１４０〜１４３では、合成されるそれぞれのパケットのストリームＩＤが一致するので、状態テーブルメモリ１２０を含むループ内では、ストリームＩＤをパケットに含ませなくてもよい。 Note that in the synthesis nodes 140 to 143 due to the order synchronization, the stream IDs of the respective packets to be synthesized coincide with each other. Therefore, in the loop including the state table memory 120, it is not necessary to include the stream ID in the packet.

この場合、出力回路１６０でウイルスを検出した際にストリーム識別子ＳＩＤを出力する必要があるので、ＳＩＤ予測回路１６３と同様に系統ＣＨ毎にリングキューを出力回路１６０に備えてこれにストリームＩＤを保持させ、合流路１２３Ａの出口ノードからのＳＥＮＤパルスで、対応するリングキュー内の所定段に対するＡＣＫをアクティブにして該所定段でパケットを１個進ませ、合流路１２３Ａの出口ノードから出力されているパケットのストリームＩＤを識別する。 In this case, since it is necessary to output the stream identifier SID when a virus is detected by the output circuit 160, a ring queue is provided in the output circuit 160 for each system CH as in the SID prediction circuit 163, and the stream ID is held in this. In response to the SEND pulse from the exit node of the combined path 123A, the ACK for the predetermined stage in the corresponding ring queue is activated to advance one packet at the predetermined stage, and is output from the exit node of the combined path 123A. Identify the stream ID of the packet.

また、ＳＩＤ予測回路１６３でのリングキューと、マルチプレクサ１３３０から合流路１２３Ａの出口ノードまでのパイプライン段数と、出口ノードからのＳＥＮＤパルスの数とから、該出口ノードから出力されているパケットのストリームＩＤを識別することもできる。 Further, the stream of packets output from the exit node based on the ring queue in the SID prediction circuit 163, the number of pipeline stages from the multiplexer 1330 to the exit node of the combined flow path 123A, and the number of SEND pulses from the exit node. An ID can also be identified.

さらに、ＣＰＵアクセラレータ以外の有限オートマトン装置として用いてもよい。 Furthermore, it may be used as a finite automaton device other than the CPU accelerator.

また、本発明の特徴の１つが予測回路を用いている点であることに着目すれば、本発明は、状態テーブルメモリ１２０Ａを含むループを、他の機能のループに置換した構成であってもよい。 If attention is paid to the fact that one of the features of the present invention is using a prediction circuit, the present invention can be applied even if the loop including the state table memory 120A is replaced with a loop of another function. Good.

なお、本発明には外にも種々の変形例が含まれる。 Note that the present invention includes various other modifications.

例えば、上記各実施例又はその変形例の構成要素の組み合わせを変えた構成も、その機能を達成できるものは本発明に含まれる。 For example, the present invention includes a configuration that can achieve the function of a configuration in which the combination of the components in each of the above-described embodiments or modifications thereof is changed.

また、分流路の行先アドレスをプロセッサの命令コードとし、分流路の出力側でこの命令コードに応じた処理手段を配置した構成であってもよい。この場合、レジスタファイル３０Ｒの各行をその命令コードに応じたレジスタ群として用いるこのができる。 Further, a configuration may be adopted in which the destination address of the diversion channel is used as an instruction code of the processor, and processing means corresponding to the instruction code is arranged on the output side of the diversion channel. In this case, each line of the register file 30R can be used as a register group corresponding to the instruction code.

さらに、ストリームバッファ１３０又は１３０Ａは、その中でのデータ流を逆流させて、他のループコンポーネントでの処理結果を分類してＣＰＵ等へ出力するのに用いることができる。 Furthermore, the stream buffer 130 or 130A can be used to reverse the data flow in the stream buffer, classify the processing results in other loop components, and output them to the CPU or the like.

以上の説明から明らかなように、本発明には特許請求の範囲に記載したもの以外に、以下のような構成も含まれる。 As is apparent from the above description, the present invention includes the following configurations in addition to those described in the claims.

［項目１−１］
分流路と、該分流路の下流側に配設され各ノードに処理要素及び合流段識別子を有する複数段の合流路と、該分流路の入口ノード又は該分流路の上流側に配設された合流段識別子生成ノードとを備え、
該合流段識別子生成ノードは、該分流路上の第１パケットペアのそれぞれの行先アドレスに基づいて合流段識別子を決定しこれを該第１パケットペアのそれぞれ又は該第１パケットペアを１パケットに圧縮した圧縮パケットに付加し、
該合流路の各ノードは、その合流段識別子が、該第１パケットペアに対応した第２パケットペアのそれぞれの合流段識別子と対応するとき、その処理要素が該第２パケットペアに対し処理を実行する、
ことを特徴とするデータ駆動型処理装置。[Item 1-1]
A diversion channel, a multi-stage merge channel disposed downstream of the diversion channel and having a processing element and a merge stage identifier at each node, and an inlet node of the diversion channel or an upstream side of the diversion channel A junction stage identifier generation node,
The joining stage identifier generation node determines a joining stage identifier based on each destination address of the first packet pair on the branch channel, and compresses each joining of the first packet pair or the first packet pair into one packet. Added to the compressed packet
Each node of the merging channel has its processing element process the second packet pair when the merging stage identifier corresponds to each merging stage identifier of the second packet pair corresponding to the first packet pair. Execute,
A data driven type processing apparatus.

この構成によれば、パケットペアの待ち合わせの構成が簡単である。また、合流段での待ち合わせ時間を短縮できる。さらに、該合流段の下流側に流れるパケット数を低減して混雑を避け、これによりスループットを向上させることができる。 According to this configuration, the configuration of waiting for packet pairs is simple. In addition, the waiting time at the merging stage can be shortened. Furthermore, the number of packets flowing downstream of the merging stage can be reduced to avoid congestion, thereby improving the throughput.

［項目１−２］
該第１パケットペアのそれぞれの行先アドレスは、該第１パケットペアのそれぞれに含まれていることを特徴とする項目１−１に記載のデータ駆動型処理装置。[Item 1-2]
Item 11. The data driven processing device according to Item 1-1, wherein each destination address of the first packet pair is included in each of the first packet pairs.

この構成によれば、圧縮パケットを用いた場合の伸張処理が不要になる。 This configuration eliminates the need for decompression processing when using compressed packets.

［項目１−３］
該第１パケットペアのそれぞれの行先アドレスは、該圧縮パケットに含まれており、該分流路は、該圧縮パケットを該第１パケットペアのそれぞれに伸張するノードを有することを特徴とする項目１−１に記載のデータ駆動型処理装置。[Item 1-3]
Item 1 is characterized in that each destination address of the first packet pair is included in the compressed packet, and the branch path has a node that expands the compressed packet into each of the first packet pair. A data driven processing apparatus according to -1.

この構成によれば、パケット数を低減して混雑を避けることができ、また、該第１パケットペアのそれぞれの間にパケットが割り込むのを防止でき、さらに、１パケットに基づいて合流段識別子を決定できるのでその構成が簡単になる。 According to this configuration, the number of packets can be reduced to avoid congestion, packets can be prevented from being interrupted between each of the first packet pairs, and the merge stage identifier can be set based on one packet. Since it can be determined, the configuration becomes simple.

［項目１−４］
該分流路の各出口ノードと該合流路のこれに対応する入口ノードとの間に結合された機能エレメントをさらに有し、該機能エレメントは、該出口ノードのパケットに対応したパケットを該入口ノードへ供給することを特徴とする項目１−１に記載のデータ駆動型処理装置。[Item 1-4]
And a functional element coupled between each outlet node of the diversion channel and a corresponding inlet node of the combined channel, the functional element including a packet corresponding to the packet of the outlet node. Item 11. The data driven processing device according to item 1-1, wherein the data driven processing device is supplied to the device.

この構成によれば、より高度の処理を行うことができる。 According to this configuration, more advanced processing can be performed.

［項目２−１］
パイプライン段数が３以上の分流路を備え、共通データと、第１行先アドレスと、第２行先アドレスと、パケット側分岐段識別子とを含む圧縮パケットが該分流路に供給されるデータ駆動型処理装置であって、
該分流路の各ノードにノード側分岐段識別子及びパケット伸張部が備えられ、
該パケット伸張部は、そのノード側分岐段識別子が圧縮パケットのパケット側分岐段識別子と対応するとき、該圧縮パケットを、その共通データとその第１行先アドレスとを含む第１パケットと、該共通データとその第２行先アドレスとを含む第２パケットとに伸張させて下流側へ供給する、
ことを特徴とするデータ駆動型処理装置。[Item 2-1]
Data-driven processing comprising a branch channel with three or more pipeline stages, and a compressed packet including common data, a first destination address, a second destination address, and a packet side branch stage identifier is supplied to the branch channel A device,
Each node of the branch channel is provided with a node side branch stage identifier and a packet decompression unit,
When the node-side branch stage identifier corresponds to the packet-side branch stage identifier of the compressed packet, the packet decompression unit treats the compressed packet with the first packet including the common data and the first destination address. The data and the second packet including the second destination address are expanded and supplied to the downstream side.
A data driven type processing apparatus.

この構成によれば、流路幅が比較的狭い分流路入口側では圧縮パケットが流れてパケットの混雑度が緩和され、これによりスループットが向上する。また、分流路入口側でパケットが混雑することによりパケットペアのそれぞれの間へ他のパケットが割り込むのを低減でき、これによって処理が容易となるのでスループットが向上する。 According to this configuration, the compressed packet flows on the side of the flow channel inlet corresponding to the relatively narrow flow channel width, and the congestion level of the packet is alleviated, thereby improving the throughput. In addition, it is possible to reduce the interruption of another packet between each of the packet pairs due to the congestion of the packets at the branch flow path inlet side, which facilitates the processing and improves the throughput.

［項目２−２］
該分流路の上流側に配設された分岐段識別子生成ノードをさらに備え、
該分岐段識別子生成ノードは、圧縮パケットに含まれる第１行先アドレスと第２行先アドレスとに基づいて、該圧縮パケットを伸張すべき該分流路上の分岐段の識別子を決定しこれを該圧縮パケットに付加する、
ことを特徴とする項目２−１に記載のデータ駆動型処理装置。[Item 2-2]
A branch stage identifier generation node disposed on the upstream side of the branch flow path;
The branch stage identifier generation node determines an identifier of the branch stage on the branch path to which the compressed packet is to be decompressed based on the first destination address and the second destination address included in the compressed packet, and determines the identifier of the compressed stage. To add to the
The data driven type processing apparatus according to item 2-1, wherein the data driven type processing apparatus is characterized in that:

この構成によれば、分流路上の各ノードに分岐段識別子生成機能を備える必要が無く、構成が簡単になる。 According to this configuration, it is not necessary to provide a branch stage identifier generation function at each node on the branch channel, and the configuration is simplified.

［項目２−３］
互いに相補的な行先アドレスを有するパケットの経路が、流路方向の軸に関し互いに、論理的に対称になるように、該分流路が形成されていることを特徴とする項目２−１又は２−２に記載のデータ駆動型処理装置。[Item 2-3]
Item 2-1 or 2-, characterized in that the diversion path is formed so that the paths of packets having destination addresses complementary to each other are logically symmetrical with respect to the axis in the flow path direction. 3. A data driven processing apparatus according to 2.

この構成によれば、分流路の対称性により、分岐段識別子の決定論理が簡単になる。 According to this configuration, the decision logic of the branch stage identifier is simplified due to the symmetry of the branch flow path.

［項目２−４］
該分岐段識別子生成ノードは、該第１行先アドレスと該第２行先アドレスとのヘッド側又はテイル側からの一致ビット数に基づいて該分岐段識別子を決定することを特徴とする項目２−３に記載のデータ駆動型処理装置。[Item 2-4]
The branch stage identifier generation node determines the branch stage identifier based on the number of matching bits from the head side or the tail side of the first destination address and the second destination address. A data driven processing device according to claim 1.

この構成によれば、第１行先アドレスと第２行先アドレスとの一致ビット数に基づいて分岐段識別子を容易に決定することができる。 According to this configuration, the branch stage identifier can be easily determined based on the number of matching bits between the first destination address and the second destination address.

［項目３−１］
（１）連接有りを示す連接ビット、行先アドレス及びコマンドを含む先頭パケットが、パイプライン段数３以上の合流路上の合流ノードを通る毎に、そのノードが持つフリップフロップを第２状態から第１状態にさせ、
（２）該先頭パケットに後続する、連接有りを示す連接ビット及びデータを含む（ｎ−１）個（ｎ≧１）のデータパケットを順次、該第１状態のフリップフロップを持つ合流ノードを通って転送させ、
（３）該（ｎ−１）個のデータパケットに後続する、連接無しを示す連接ビット及びデータを含む末尾データパケットを、該第１状態のフリップフロップを持つ合流ノードを通って転送させながら該第１状態のフリップフロップを第２状態にさせる、
ことを特徴とするデータ駆動型処理方法。[Item 3-1]
(1) Each time a leading packet including a connection bit indicating presence of connection, a destination address, and a command passes through a merge node on a merge channel having three or more pipeline stages, the flip-flop of the node is changed from the second state to the first state. Let
(2) (n−1) data packets (n ≧ 1) including concatenated bits and data indicating the presence of concatenation following the head packet are sequentially passed through the merge node having the flip-flop in the first state. Transfer
(3) The tail data packet including the concatenated bit indicating no concatenation and data following the (n−1) data packets is transferred through the confluence node having the flip-flop in the first state while the data packet is transferred. Causing the flip-flop in the first state to enter the second state;
A data-driven processing method.

この構成によれば、データパケットに行先アドレスが不要となり、パケットのデータ幅を短縮して回路規模を縮小することができる。 According to this configuration, the destination address is not required for the data packet, and the circuit size can be reduced by reducing the data width of the packet.

［項目３−２］
該合流路の１つの入口ノードに、該工程（１）〜（３）においてそれぞれ先頭パケット、（ｎ−１）個のデータパケット及び末尾データパケットを順次供給させることを特徴とする項目３−１に記載のデータ駆動型処理方法
この構成によれば、合流路の上流側にデータ記憶部を配設することができる。[Item 3-2]
Item 3-1 is characterized in that a leading packet, (n−1) data packets, and a trailing data packet are sequentially supplied to one inlet node of the combined flow path in the steps (1) to (3), respectively. According to this configuration, the data storage unit can be disposed upstream of the combined flow path.

［項目３−３］
ｎ＝ｎ１とし、該合流路の第１入口ノードに、該工程（１）〜（３）においてそれぞれ先頭パケット、（ｎ１−１）個のデータパケット及び末尾データパケットを順次供給させ、
ｎ＝ｎ２とし、該合流路の第２入口ノードに、該工程（１）〜（３）においてそれぞれ先頭パケット、（ｎ２−１）個のデータパケット及び末尾データパケットを順次供給させ、
該第１及び第２入口ノードのそれぞれからのパケット経路が合流するノードにおいて、該第１入口ノードからの（ｎ１＋１）個のパケット群と該第２入口ノードからの（ｎ２＋１）個のパケット群とに対し処理を行い、その結果のパケットを生成する、
ことを特徴とする項目３−１に記載のデータ駆動型処理方法。[Item 3-3]
n = n1, and let the first entry node of the combined flow path sequentially supply the first packet, (n1-1) data packets, and the last data packet in the steps (1) to (3), respectively.
n = n2, and let the second inlet node of the combined flow path sequentially supply the leading packet, (n2-1) data packets, and the trailing data packet in the steps (1) to (3), respectively.
At a node where packet paths from each of the first and second ingress nodes join, (n1 + 1) packet groups from the first ingress node and (n2 + 1) packet groups from the second ingress node; Process and generate the resulting packet,
The data-driven processing method according to item 3-1, characterized in that:

この構成によれば、世代のような特別なＩＤを用いることなく両者を合流ノードで待ち合わせて処理することが可能になり、しかも、処理結果のパケット数が処理前よりも少なければ、合流路上の流路幅が比較的狭い出口側でのパケットのトラフィック量を低減して混雑を避けることができ、スループットを向上させることができる。 According to this configuration, it is possible to wait for processing at the confluence node without using a special ID such as a generation, and if the number of processed packets is smaller than before processing, The amount of packet traffic on the exit side having a relatively narrow channel width can be reduced to avoid congestion, and throughput can be improved.

［項目３−４］
該合流路上の、次段及び後段を有する任意の合流ノードにおいて、
第１状態のフリップフロップを持つ後段ノードからのパケットを優先的に選択してラッチさせ
次段の合流ノードが持つフリップフロップが第２状態であり、本合流ノードがラッチしたパケットの連接ビットが後続有り場合、本合流ノードのフリップフロップを第１状態にさせ、
本合流ノードがラッチしたパケットの連接ビットが後続無しの値であれば、該第１状態のフリップフロップを持つ後段ノードの該フリップフロップを第２状態にさせ、
本合流ノードの処理要素で処理を行うことになっていてパケットの連接ビットが後続無しの値であれば本合流ノードのフリップフロップを第２状態にさせる、
ことにより工程（１）〜（３）を実施することを特徴とする項目２又は３に記載のデータ駆動型処理方法。[Item 3-4]
In an arbitrary merge node having a next stage and a subsequent stage on the merge channel,
The packet from the subsequent node having the flip-flop in the first state is preferentially selected and latched. The flip-flop in the merge node in the next stage is in the second state, and the concatenated bit of the packet latched by this merge node follows. If yes, put the flip-flop of this confluence node in the first state,
If the concatenated bit of the packet latched by this confluence node is a value with no following, the flip-flop of the subsequent node having the flip-flop of the first state is set to the second state,
If processing is to be performed by the processing element of this joining node, and if the concatenated bit of the packet is a value having no trailing, the flip-flop of this joining node is set to the second state.
4. The data driven processing method according to item 2 or 3, wherein the steps (1) to (3) are performed.

この構成によれば、末尾データパケットが第１状態のフリップフロップを目印に進んだ後に該フリップフロップを第２状態に戻すことができる。また、他方の連続パケットを一方の連続パケットに混入させることなく、該他方の連続パケット同様にして処理要素へ順次転送させることができる。 According to this configuration, the flip-flop can be returned to the second state after the tail data packet has advanced to the flip-flop in the first state. Further, the other continuous packet can be sequentially transferred to the processing element in the same manner as the other continuous packet without being mixed into the one continuous packet.

［項目４−１］
記憶行アレイと、
入口ノードに供給される第１パケットを、該第１パケットが含む行先アドレスに応じ下流側のノードへ順次選択的に分流させて、該記憶行アレイ内の１つの記憶行へ転送させるツリー形分流路と、
該記憶行で生成された第２パケットを、下流側へ順次選択的に合流させて出口ノードへ転送させるツリー形合流路と、
を有し、該記憶行は、
該第１パケットが含むコマンドがリードを示している場合、該第１パケットが含む記憶行内アドレスで指定される該記憶行内の一部の記憶データを読み出して該第１パケットに対応した該第２パケットを生成し、
該第１パケットのコマンドがライトを示している場合、該第１パケットが含む記憶行内アドレスで指定される該記憶行内の一部に、該第１パケットが含むデータを書き込み、
該ツリー形分流路及び該ツリー形合流路のパイプライン段数がそれぞれ３以上であることを特徴とするデータ駆動型半導体記憶装置。[Item 4-1]
A storage row array;
A tree-type branch that selectively splits the first packet supplied to the ingress node sequentially to the downstream node in accordance with the destination address included in the first packet, and transfers it to one storage row in the storage row array. Road,
A tree-shaped merge path for selectively merging the second packets generated in the storage row sequentially downstream and transferring them to the exit node;
And the memory line is
When the command included in the first packet indicates a read, the second data corresponding to the first packet is read out by reading a part of the storage data in the storage row specified by the storage row address included in the first packet. Generate packets,
When the command of the first packet indicates a write, the data included in the first packet is written into a part of the storage row specified by the storage row address included in the first packet;
A data-driven semiconductor memory device, wherein the number of pipeline stages in each of the tree-shaped branch channel and the tree-shaped combined channel is 3 or more.

この構成によれば、ツリー形分流路により多数の記憶行に対しパケットを選択的に転送できるとともに、流路幅が比較的広い分流路出口側及び合流路入口側でパケットの混雑が避けられるので、記憶行での処理の遅延が複数の記憶行での分散並列処理により吸収され、ランダムアクセスのスループットが比較的高い。 According to this configuration, packets can be selectively transferred to a large number of storage rows by the tree-shaped branch flow path, and congestion of packets can be avoided on the flow path outlet side and the combined flow path inlet side having a relatively wide flow path width. The processing delay in the storage row is absorbed by the distributed parallel processing in the plurality of storage rows, and the random access throughput is relatively high.

［項目４−２］
該記憶行アレイは、複数の記憶行ペアを備え、各記憶行ペアは、
該ツリー形分流路の１つの出口ノードと、この出口ノードに対応した、該ツリー形合流路の１つの入口ノードとの間に結合され、
第１及び第２の記憶行と、制御回路とを有し、該制御回路は、該分流路の出口ノードに保持された第１パケットが含む行先アドレスの所定ビットの値に応じて該第１及び第２記憶行の一方をリード又はライトの制御対象とする、
ことを特徴とする項目４−１に記載のデータ駆動型半導体記憶装置。[Item 4-2]
The storage row array comprises a plurality of storage row pairs, each storage row pair comprising:
Coupled between one outlet node of the tree-shaped branch channel and one inlet node of the tree-shaped combined channel corresponding to the outlet node;
The first and second storage rows and a control circuit, the control circuit according to the value of a predetermined bit of the destination address included in the first packet held at the outlet node of the branch path And one of the second memory rows is a read or write control target,
Item 4. The data driven semiconductor memory device according to item 4-1.

この構成によれば、各記憶行ペアに対し１つの制御回路を共用することができる。 According to this configuration, one control circuit can be shared for each storage row pair.

［項目４−３］
該制御回路は、該分流路の出口ノードに保持された第１パケットが含むコマンドがリードを示している場合、該分流路出口ノードの転送制御回路に対するＡＣＫ信号をインアクティブに維持した状態で、リード制御対象の記憶行からデータを読み出させ、該第１パケットの一部とともに該合流路入口ノードに転送させることを特徴とする項目４−２に記載のデータ駆動型半導体記憶装置。[Item 4-3]
When the command included in the first packet held at the outlet node of the diversion channel indicates a read, the control circuit maintains the ACK signal for the transfer control circuit of the diversion channel outlet node in an inactive state. Item 3. The data driven semiconductor memory device according to item 4-2, wherein data is read from a memory row to be read-controlled and transferred to the junction channel entry node together with a part of the first packet.

この構成によれば、記憶行にはデータのみを格納すればよく、また、１つのリードパケットで複数のデータパケットを読み出すことができる。 According to this configuration, only the data needs to be stored in the storage row, and a plurality of data packets can be read with one read packet.

［項目４−４］
該制御回路は、該分流路の出口ノードに保持された第１パケットが含むコマンドがライトを示している場合、該合流路入口ノードの転送制御回路に対するＳＥＮＤ信号をインアクティブに維持した状態で、ライト制御対象の記憶行の一部に、該第１パケットが含むデータを書き込ませることを特徴とする項目４−２に記載のデータ駆動型半導体記憶装置。[Item 4-4]
When the command included in the first packet held at the outlet node of the diversion channel indicates a write, the control circuit maintains the SEND signal for the transfer control circuit of the combined channel inlet node in an inactive state. Item 3. The data driven semiconductor memory device according to item 4-2, wherein data included in the first packet is written into a part of a memory row to be write controlled.

この構成によれば、１つの記憶行又はランダムな記憶行に対し複数パケットで連続して書き込みを行うことができる。 According to this configuration, it is possible to write continuously in a plurality of packets to one storage row or a random storage row.

［項目５−１］
パケットの内容に応じてパイプライン処理を行うループに備えられたノードに、そのパケットが含む識別子に応じてデータを供給するデータ駆動型データバッファ装置であって、
それぞれの識別子に対応したキューに、データが順次格納されるキュー列と、
該ノードからのパケットに含まれる識別子に基づき、該キュー列のキューを選択して、このキューの出力段を該ノードに結合させるマルチプレクサと、
を有することを特徴とするデータ駆動型データバッファ装置。[Item 5-1]
A data-driven data buffer device that supplies data to a node provided in a loop that performs pipeline processing according to the contents of a packet according to an identifier included in the packet,
A queue sequence in which data is sequentially stored in a queue corresponding to each identifier;
A multiplexer that selects a queue in the queue sequence based on an identifier contained in a packet from the node and couples the output stage of the queue to the node;
A data-driven data buffer device comprising:

この構成によれば、並列処理対象に応じ高速にキューを切り替えてデータを取り出し該ノードに供給することができる。 According to this configuration, data can be extracted and supplied to the node by switching the queue at high speed according to the parallel processing target.

［項目５−２］
該キュー列の各キューについて、空きが所定量を超えた場合には該キューに対応した識別子とともに割込要求信号を出力する空検出回路をさらに有することを特徴とする項目６−１に記載のデータ駆動型データバッファ装置。[Item 5-2]
Item 6. The item 6-1 further includes an empty detection circuit that outputs an interrupt request signal together with an identifier corresponding to the queue when each queue in the queue row exceeds a predetermined amount. Data-driven data buffer device.

この構成によれば、該ノードへのデータ供給を途切れなく行うことができるとともにＣＰＵへの負担を軽減できる。 According to this configuration, the data supply to the node can be performed without interruption, and the burden on the CPU can be reduced.

［項目５−３］
該マルチプレクサは、その入力が複数の入口ノードの入力であり、その出力が出口ノードの出力であるツリー形合流路を有することを特徴とする項目５−２に記載のデータ駆動型データバッファ装置。[Item 5-3]
5. The data driven data buffer device according to item 5-2, characterized in that the multiplexer has a tree-shaped merge channel whose inputs are inputs of a plurality of inlet nodes and whose outputs are outputs of outlet nodes.

この構成によれば、パイプライン処理が行われてスループットが向上する。 According to this configuration, pipeline processing is performed and throughput is improved.

［項目６−１］
上位ビット及び下位ビットをそれぞれ状態及びデータストリーム要素とするアドレスに次状態が格納され、アドレスとデータストリーム識別子とを含むパケットが入力され、次状態と該データストリーム識別子とを含むパケットが出力されるデータ駆動型状態テーブルメモリと、
該状態テーブルメモリの出力を該状態テーブルメモリの入力にフィードバックさせる流路に介在され、データ入力端の第１部に供給されるデータを該データ入力端の第２部に供給される該次状態と合成する合成ノードと、
それぞれのキューにデータストリームが格納されるキュー列と、
該出力に含まれるデータストリーム識別子に基づき、該キュー列のキューを選択して、このキューの出力段を該第１部に結合させるマルチプレクサと、
を有することを特徴とする有限オートマトン装置。[Item 6-1]
The next state is stored in the address having the upper bit and the lower bit as the state and the data stream element, respectively, a packet including the address and the data stream identifier is input, and a packet including the next state and the data stream identifier is output. A data driven state table memory;
The next state in which the data supplied to the first part of the data input end is supplied to the second part of the data input end, interposed in the flow path for feeding back the output of the state table memory to the input of the state table memory A composition node to compose with
Queue queues where data streams are stored in each queue,
A multiplexer that selects a queue in the queue sequence based on a data stream identifier included in the output and couples the output stage of the queue to the first part;
A finite automaton apparatus characterized by comprising:

この構成によれば、データ駆動型状態テーブルメモリと合成ノードとキュー列とマルチプレクサとの組み合わせにより、スループットを向上させると共に低消費電力化を図ることができる。 According to this configuration, the combination of the data driven state table memory, the synthesis node, the queue queue, and the multiplexer can improve the throughput and reduce the power consumption.

［項目６−２］
該フィードバック流路の、該状態テーブルメモリと該合成ノードとの間に、初期パケットと該次状態のパケットとを選択的に合流させる合流ノードをさらに有することを特徴とする項目６−１に記載の有限オートマトン装置。[Item 6-2]
Item 6-1 further includes a joining node that selectively joins an initial packet and a packet in the next state between the state table memory and the synthesis node in the feedback channel. Finite automaton equipment.

この構成によれば、処理開始を容易に行うことができる。 According to this configuration, it is possible to easily start processing.

［項目６−３］
該状態テーブルメモリには、パターン一致情報が含まれ、
該状態テーブルメモリの出力に含まれるパターン一致情報がパターン一致を示しているか否かを判定し、肯定判定した場合には該出力に含まれるデータストリーム識別子とともに割込要求信号を出力する出力回路をさらに有することを特徴とする項目６−２に記載の有限オートマトン装置。[Item 6-3]
The state table memory includes pattern matching information,
An output circuit that determines whether or not the pattern matching information included in the output of the state table memory indicates pattern matching, and outputs an interrupt request signal together with the data stream identifier included in the output when the determination is affirmative The finite automaton device according to Item 6-2, further comprising:

この構成によれば、処理結果を高速に得ることができる。 According to this configuration, the processing result can be obtained at high speed.

［項目６−４］
該キュー列の各キューについて、空きが所定量を超えた場合には該キューに対応したデータストリーム識別子とともに割込要求信号を出力する空検出回路をさらに有することを特徴とする項目６−３に記載の有限オートマトン装置。[Item 6-4]
Item 6-3 further includes an empty detection circuit that outputs an interrupt request signal together with a data stream identifier corresponding to the queue when each queue in the queue row exceeds a predetermined amount. The described finite automaton device.

この構成によれば、多数のデータストリームを並列処理しても、キューに対しデータストリームを必要時に補給することができる。 According to this configuration, even when a large number of data streams are processed in parallel, the data streams can be supplied to the queue when necessary.

［項目７−１］
パケット順序を維持してパケットの内容に応じた処理を行う第１ループと、
パケット順序を維持してパケットの内容に応じた処理を行う第２ループと、
該第１ループと該第２ループとを結合させてこれらの一方のループでのパケットに含まれる情報を他方のループに与える結合手段と、
を有することを特徴とする、順序同期を行うデータ駆動型処理装置。[Item 7-1]
A first loop that maintains the packet order and performs processing according to the contents of the packet;
A second loop that maintains the packet order and performs processing according to the contents of the packet;
Coupling means for coupling the first loop and the second loop to give information contained in a packet in one of these loops to the other loop;
A data driven processing apparatus for performing sequence synchronization.

この構成によれば、従来１つであったループを２つのループに分割することが可能となるので、パイプライン段数を低減してレイテンシを短縮するとともにスループットを向上させることができ、また、パケットのデータ幅を短縮して回路規模を縮小することができ、さらに、コンポーネント化が可能となるので、システムの構築が容易となる。 According to this configuration, since it was possible to divide the loop that was conventionally one into two loops, the number of pipeline stages can be reduced, the latency can be shortened, and the throughput can be improved. The data width can be shortened to reduce the circuit scale, and further, componentization is possible, so that the system can be easily constructed.

［項目７−２］
該結合手段は、該第１ループ内のパケットと該第２ループ内のパケットとを対応させてそれぞれのループ内で転送させる結合ノードであることを特徴とする項目７−１に記載の、順序同期を行うデータ駆動型処理装置。[Item 7-2]
The order according to item 7-1, wherein the joining means is a joining node that transfers the packets in the first loop and the packets in the second loop in correspondence with each other in the respective loops. Data-driven processing device that performs synchronization.

この構成によれば、第１ループと第２ループとで結合ノードにおいて同期をとることができる。 According to this configuration, synchronization can be achieved at the coupling node between the first loop and the second loop.

［項目７−３］
該第２ループはリングキューであることを特徴とする項目７−２に記載の、順序同期を行うデータ駆動型処理装置。[Item 7-3]
The data driven processing device for performing order synchronization according to item 7-2, wherein the second loop is a ring queue.

この構成によれば、第２ループを第１ループのデータバッファとして用いることができる。 According to this configuration, the second loop can be used as a data buffer for the first loop.

［項目７−４］
該第１ループに第１パケットが投入され、該第２ループに該第１パケットと関係した第２パケットが投入され、
該結合ノードは、該第１ループでのパケットに含まれるコマンド又は特定ビットが該結合ノードからの出力を示している場合、これに対応して該第２ループからパケットを取り出すことにより、該第１ループでの第１パケットに対応した処理結果のパケットとともに該第２パケットを取り出すことを特徴とする項目７−３に記載の、順序同期を行うデータ駆動型処理装置。[Item 7-4]
A first packet is input to the first loop, and a second packet related to the first packet is input to the second loop;
When the command or specific bit included in the packet in the first loop indicates the output from the combining node, the combining node extracts the packet from the second loop in response to the command or the specific bit. 4. The data driven processing device for performing order synchronization according to item 7-3, wherein the second packet is extracted together with a packet of a processing result corresponding to the first packet in one loop.

この構成によれば、第１ループで常に第２パケットを引き連れて行く必要がない。 According to this configuration, it is not always necessary to take the second packet with the first loop.

［項目７−５］
該結合ノードは、該第１ループのパケットが個数ｎの情報を含むとき、このパケットを１個転送させるとともに該第２ループのパケットをｎ個転送させることにより、該第１ループの１パケットを該第２ループのｎパケットと対応させることを特徴とする項目７−３に記載の、順序同期を行うデータ駆動型処理装置。[Item 7-5]
When the packet of the first loop includes n pieces of information, the joining node transfers one packet of the first loop by transferring one packet of the second loop and n packets of the second loop. 4. The data driven processing device for performing order synchronization according to item 7-3, wherein the data is associated with the n packets of the second loop.

この構成によれば、第１ループが第２ループを中間結果保存用スタックとして利用できる。 According to this configuration, the first loop can use the second loop as an intermediate result storage stack.

［項目７−６］
該第１ループ上又は該第２ループ上のパケットは、順序有無を示す順序情報を含み、
該結合ノードは、一方のループ上のパケットが含む順序情報が順序無しを示している場合、このパケットを転送させる際に、他方のループ上のパケットを転送させないことを特徴とする項目７−２に記載の、順序同期を行うデータ駆動型処理装置。[Item 7-6]
The packet on the first loop or the second loop includes order information indicating the presence or absence of order,
Item 7-2 is characterized in that, when the order information included in the packet on one loop indicates that there is no order, the joining node does not forward the packet on the other loop when forwarding the packet. 2. A data driven processing device for performing order synchronization.

この構成によれば、順序を維持する必要がないパケット、例えばメモリへのライトパケットのように途中で消滅するパケットがループに含まれていても、本発明を適用できる。 According to this configuration, the present invention can be applied even when a packet that does not need to maintain the order, for example, a packet that disappears in the middle, such as a write packet to a memory, is included in the loop.

［項目７−７］
該結合手段は、該第１ループと該第２ループとの一方のループ上のパケットを受け取る入力段と他方のループ上へパケットを供給する出力段とを備えたキューであることを特徴とする項目７−１に記載の、順序同期を行うデータ駆動型処理装置。[Item 7-7]
The coupling means is a queue having an input stage for receiving a packet on one loop of the first loop and the second loop, and an output stage for supplying the packet on the other loop. Item 7. A data driven processing apparatus that performs order synchronization.

この構成によれば、ループ間での待ち時間を省略できるので、スループットを向上させることができる。 According to this configuration, since the waiting time between the loops can be omitted, the throughput can be improved.

本発明の実施例１の非同期（自己タイミング）データ駆動型メモリを示す概略ブロック図である。1 is a schematic block diagram illustrating an asynchronous (self-timed) data driven memory according to a first embodiment of the present invention. メモリ行アレイの配列の具体例を示す図である。It is a figure which shows the specific example of the arrangement | sequence of a memory row array. 束データ方式で分流路を構成した場合の第１段と第２段とで構成される分流回路を示す概略ブロック図である。It is a schematic block diagram which shows the shunt circuit comprised by the 1st stage and 2nd stage at the time of comprising a shunt flow path by a bundle data system. 束データ方式で合流路を構成した場合の第２段と第３段の一部である合流回路を示す概略ブロック図である。It is a schematic block diagram which shows the confluence | merging circuit which is a part of 2nd stage and 3rd stage at the time of comprising a confluence | merging path by a bundle data system. 図１の分流路２０の出力ノード２６１と合流路４０の入口ノード４１１との間に接続されたメモリ行３１及び３２を示す概略ブロック図である。FIG. 3 is a schematic block diagram showing memory rows 31 and 32 connected between an output node 261 of the diversion channel 20 of FIG. 1 and an inlet node 411 of the merge channel 40. 本発明の実施例２のデータ駆動型メモリを示す概略ブロック図である。It is a schematic block diagram which shows the data drive type memory of Example 2 of this invention. （Ａ）はパケットのフォーマットを示し、（Ｂ）は系統とパケットフローの関係を示す説明図である。(A) shows the format of a packet, (B) is explanatory drawing which shows the relationship between a system | strain and a packet flow. 入力ポート及び出力ポートの数を実施例２の場合の２倍にした、本発明の実施例３のメモリを示す概略ブロック図である。It is a schematic block diagram which shows the memory of Example 3 of this invention which doubled the number of input ports and output ports in the case of Example 2. FIG. パケットのフォーマットを示す図である。It is a figure which shows the format of a packet. パイプライン段数を低減した、本発明の実施例４のメモリを示す概略ブロック図である。It is a schematic block diagram which shows the memory of Example 4 of this invention which reduced the number of pipeline stages. 選択的合流ノードへの転送待ちを短縮した、本発明の実施例５の２ポート入力・２ポート出力型のメモリを示す概略ブロック図である。FIG. 10 is a schematic block diagram illustrating a 2-port input / 2-port output type memory according to a fifth embodiment of the present invention in which waiting for transfer to a selective merging node is shortened. 入力ポート及び出力ポートの数を実施例５の場合の２倍にした、本発明の実施例６のメモリを示す概略ブロック図である。It is a schematic block diagram which shows the memory of Example 6 of this invention which doubled the number of input ports and output ports in the case of Example 5. FIG. 本発明の実施例７の、プロセッサの一部であるデータ処理部を示す概略ブロック図である。It is a schematic block diagram which shows the data processing part which is a part of processor of Example 7 of this invention. （Ａ）及び（Ｂ）は、パケットペアを分流路入口ノードに投入した後の処理の流れを示す概略説明図である。(A) And (B) is a schematic explanatory drawing which shows the flow of a process after throwing a packet pair into a branching channel inlet node. 本発明の実施例８の、プロセッサの一部であるデータ処理部を示す概略ブロック図である。It is a schematic block diagram which shows the data processing part which is a part of processor of Example 8 of this invention. （Ａ）はパケットペア行先アドレスに基づいて合流段ＩＤを決定する方法の説明図、（Ｂ）はパケットフォーマットを示す説明図である。(A) is explanatory drawing of the method of determining confluence | merging stage ID based on a packet pair destination address, (B) is explanatory drawing which shows a packet format. 合流段識別ノードの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of a confluence | merging stage identification node. （Ａ）〜（Ｄ）は本発明の実施例９に係るパケットフォーマット説明図であり、（Ａ）はパケットペアを１パケットに圧縮したもののフォーマット、（Ｂ）及び（Ｃ）はこのパケットを２パケットに伸張させたもののフォーマット、（Ｃ）は連接パケットでの先頭に続くデータパケットを示す図である。(A)-(D) are packet format explanatory views according to Embodiment 9 of the present invention, (A) shows a format of a packet pair compressed into one packet, (B) and (C) show 2 packets. The format of the packet expanded, (C) is a diagram showing the data packet following the head of the concatenated packet. （Ａ）及び（Ｂ）はそれぞれ第１オペランドの連接パケットの先頭パケット及びこれに続くデータパケットを示す説明図、（Ｂ）及び（Ｃ）はそれぞれ第２オペランドの連接パケットの先頭パケット及びこれに続くデータパケットを示す説明図である。(A) and (B) are explanatory diagrams showing the first packet of the concatenated packet of the first operand and the following data packet, respectively. (B) and (C) are the first packet of the concatenated packet of the second operand and It is explanatory drawing which shows the following data packet. 合流路のノードに備えられた連接ビットがパケットペアの連接ビットによりセットされている状態を示す説明図である。It is explanatory drawing which shows the state by which the connection bit with which the node of the joint flow path was set with the connection bit of a packet pair. ノードＮ１のノード側連接ビットＦ１に対する状態制御回路とこれに関連する要素を示すブロック図である。It is a block diagram which shows the state control circuit with respect to the node side connection bit F1 of the node N1, and the element relevant to this. （Ａ）〜（Ｃ）は本発明の実施例１０に係り、（Ａ）はデータ駆動型処理ループを示し、（Ｂ）は（Ａ）を２分割して並列結合した回路を示し、（Ｃ）は複雑な処理ループを並列結合した回路を示す概略図である。(A) to (C) relate to a tenth embodiment of the present invention, (A) shows a data driven type processing loop, (B) shows a circuit in which (A) is divided into two and connected in parallel, (C ) Is a schematic diagram showing a circuit in which complex processing loops are coupled in parallel. （Ａ）及び（Ｂ）はそれぞれ同層及び異層間において、順序同期が成立している並列処理ループ間でのキューを介した処理結果の伝達を示す図である。(A) and (B) are diagrams showing transmission of processing results via queues between parallel processing loops in which order synchronization is established in the same layer and different layers, respectively. 本発明の実施例１１に係る、合流路のノードとこれに対応する分流路のノードとの間で生ずる切替順序の乱れの説明図である。It is explanatory drawing of disorder of the switching order which arises between the node of a combined flow path, and the node of a shunt path corresponding to this according to Example 11 of this invention. （Ａ）は合流路の任意のノードについて、これに対応する分流路のノードとの間で切替同期を行う構成を示し、（Ｂ）及び（Ｃ）はこの構成の動作を示す図である。(A) shows a configuration in which switching synchronization is performed with respect to an arbitrary node of a combined flow path with a node of a branch flow path corresponding to the node, and (B) and (C) are diagrams illustrating operations of this configuration. 図２５の（Ａ）の詳細ブロック図である。FIG. 26 is a detailed block diagram of FIG. （Ａ）〜（Ｊ）は、分流路側の１つのリードパケットの流れと、これに対応した合流路側の複数のリードデータパケット（連接パケット）との流れとを、時間を追って示す説明図である。(A)-(J) is explanatory drawing which shows the flow of one read packet by the side of a shunt flow path, and the flow of several read data packets (connection packet) by the side of the joint flow path corresponding to this in time. . 行を状態Ｓとし、入力である列を、データストリームを構成する１バイトのストリームエレメントＳＥとした出力コマンド付状態遷移テーブルを示す図である。FIG. 10 is a diagram showing a state transition table with an output command in which a row is a state S and an input column is a 1-byte stream element SE constituting a data stream. 本発明の実施例１２のＣＰＵアクセラレータを示す概略ブロック図である。It is a schematic block diagram which shows the CPU accelerator of Example 12 of this invention. 図２９中のストリームバッファの概略ブロック図である。FIG. 30 is a schematic block diagram of the stream buffer in FIG. 29. 図２９の装置における１系統に関するデータフローをデータフォーマットとともに示す図である。It is a figure which shows the data flow regarding 1 system | strain in the apparatus of FIG. 29 with a data format. 本発明の実施例１３の順序同期型ＣＰＵアクセラレータを示す概略ブロック図である。It is a schematic block diagram which shows the order synchronous CPU accelerator of Example 13 of this invention. 簡単な有限オートマトンの例を示す状態遷移図である。It is a state transition diagram showing an example of a simple finite automaton.

Brief description of symbols

１０、１０Ａ〜１０Ｅ、１２０、１２０Ａメモリ
１０ＡＰ、１０ＢＰデータ処理部
２０、２０Ａ〜２０Ｅ、７１、１２１、１３１、１３１Ａ分流路
２０１Ｃ、２１１Ｃ、２２１Ｃ、２２２Ｃ、２６１Ｃ、４１１Ｃ、３１１Ｃ、３１２Ｃ、３１３１Ｃ、４１１Ｃ、４２１Ｃ、４２２Ｃ、４３１Ｃ、７１１Ｃ、７３１Ｃ転送制御回路
２０１Ｌ、２１１Ｌ、２２１Ｌ、２２２Ｌ、２６１Ｌ、３１１Ｌ、３１２Ｌ、３１３１Ｌ、４１１Ｌ、４２１Ｌ、４２２Ｌ、４３１Ｌ、７１１Ｌ、７３１Ｌラッチ
１１０、１１１、１１５〜１１８、２０１〜２０４、２２１〜２２８、２２１Ａ、２２２Ａ、２２３Ａ、２２４Ａ、２３１〜２３４、２３１Ａ、２３２Ａ、２４１、２４８、２５１、２６１、４１１、４１１Ａ、４１１Ｂ、４２１、４３１、４４１〜４４４、４４１Ａ、４５１〜４５４、４４１Ａ、４４２Ａ、４４３Ａ、４４４Ａ、４５１Ａ、４５２Ａ、４５３Ａ、４５４Ａ、４６１Ａ、４６１ＡＰ、４６２Ａ、４６３Ａ、４６４Ａ、４７、７１１、７３１、７７、７７１〜７７４、８２、Ｎ０１、Ｎ０２、Ｎ１、Ｎ２ノード
２０１Ｆ合流段ＩＤ決定部
２０１Ｐパケットペア判定部
２１１〜２１４、４１１入口ノード
２２２Ｇ、２６１Ｇ１、４１１Ｇ、４２２Ｇインバータ
２５１、４６１〜４６４出口ノード
２６１Ｇ２、３１３１Ｇオアゲート
３０、１２２メモリ行アレイ
３０Ｒレジスタファイル
３１０、７４０ループ配線
３１、３２メモリ行
３１１、７４１制御回路
３１１ａ、３１０Ｃカウンタ
３１０Ｗ、３１１Ｗ、３１２Ｗ、３１３Ｗ、３２０Ｗワードメモリ
４０、４０Ａ〜４０Ｅ、４０ＡＰ、４０ＢＰ、７３、１２３、１２３Ａ合流路
４７状態制御回路
５０、５０Ａ〜５０Ｇ、Ｐ１〜Ｐ３、Ｐ１Ａ、Ｐ２Ａ、Ｐ１Ｎ、Ｐ２Ｎパケット
６０、６０Ａキャッシュメモリ
６０Ｑ、６０ＱＡＣＰＵアクセラレータ
７０、７０Ａタグテーブル
７２タグアレイ
７２１、７２２タグ行
７５０〜７５３、７５ｉページ情報
７６０〜７６３コンパレータ
７６４オアゲート
７６５エンコーダ
７６６マルチプレクサ
８０、８０１〜８０４入出力部
８１、１２４、１２８、１２８Ａインターフェイス
１００、１０１、１０１Ａ、１０１Ｂ、１０２、１０２Ａ、１０２Ｂループ
１０３、１０３Ａ結合ノード
１０４、１０５、１１３、１７０〜１７３キュー
１２５メモリコントローラ
１２６ＲＡＭ
１２７ＣＰＵ
１３０、１３０Ａストリームバッファ
１３２キューアレイ
１３３０〜１３３３マルチプレクサ
１３４半空検出回路
１４０〜１４３、１５０〜１５３ノード
１６０出力回路
１６３ストリームＩＤ予測回路
１６２デマルチプレクサ
ＣＫクロック入力端
ＣＫ１、ＣＫ２クロックパルス
ＯＥ出力イネーブル制御入力端
ＣＭＤコマンド
ＡＤＲ、ＡＤＲ１、ＡＤＲ２アドレス
ＤＡ、ＤＡ１、ＤＡ２、ＤＡｉ行先アドレス
ＤＡ０〜ＤＡ５、ＣＨ０、ＣＨ１ビット
ＰＡ、ＰＡ１、ＰＡ２ページアドレス
ＷＡ、ＷＸ、ＷＡ１、ＷＡ２ワードアドレス
ＤＡＴＡデータ
ＣＮ連接ビット
ＯＤ順序ビット
ＨＭ、ＨＭ１、ＨＭ２ヒットビット
ＣＨ系統
ＰＴパケットタイプ
ＴＡ、ＴＡＧタグアドレス
ＣＮＴカウンタ
ＭＡ合流段識別子
ＰＲ１、ＰＲ２処理
Ｖバリッドビット
Ｄダーティビット
Ｌロックビット
Ｒ結果ビット
ＶＣウイルスコード
Ｓ状態
ＳＥストリームエレメント
ＳＩＤ、ＳＩＤ０〜ＳＩＤ３ストリーム識別子
ＩＲＱ１、ＩＲＱ２割込要求
ＤＳ入力データストリーム
ＲＤ検索データ集合
Ｆ０１、Ｆ０２、Ｆ１、Ｆ２フリップフロップ10, 10A to 10E, 120, 120A Memory 10AP, 10BP Data processing unit 20, 20A to 20E, 71, 121, 131, 131A Split channel 201C, 211C, 221C, 222C, 261C, 411C, 311C, 312C, 3131C, 411C , 421C, 422C, 431C, 711C, 731C Transfer control circuit 201L, 211L, 221L, 222L, 261L, 311L, 311L, 3131L, 411L, 421L, 422L, 431L, 711L, 731L Latch 110, 111, 115-118, 201 ~ 204, 221 to 228, 221A, 222A, 223A, 224A, 231A to 234, 231A, 232A, 241, 248, 251, 261, 411, 411A, 411B, 421, 431, 44 1-444, 441A, 451-454, 441A, 442A, 443A, 444A, 451A, 452A, 453A, 454A, 461A, 461AP, 462A, 463A, 464A, 47, 711, 731, 77, 771-774, 82, N01, N02, N1, N2 Node 201F Merged stage ID determination unit 201P Packet pair determination unit 211-214, 411 Ingress node 222G, 261G1, 411G, 422G Inverter 251, 461-464 Egress node 261G2, 3131G OR gate 30, 122 Memory row Array 30R Register file 310, 740 Loop wiring 31, 32 Memory row 311, 741 Control circuit 311a, 310C Counter 310W, 311W, 312W, 313W, 320W Word memory 4 0, 40A to 40E, 40AP, 40BP, 73, 123, 123A Combined flow path 47 State control circuit 50, 50A to 50G, P1 to P3, P1A, P2A, P1N, P2N Packet 60, 60A Cache memory 60Q, 60QA CPU accelerator 70 , 70A Tag table 72 Tag array 721, 722 Tag row 750-753, 75i Page information 760-763 Comparator 764 OR gate 765 Encoder 766 Multiplexer 80, 801-804 Input / output unit 81, 124, 128, 128A interface 100, 101, 101A, 101B, 102, 102A, 102B Loop 103, 103A Join node 104, 105, 113, 170-173 Queue 125 Memory controller 126 RAM
127 CPU
130, 130A Stream buffer 132 Queue array 1330-1333 Multiplexer 134 Half empty detection circuit 140-143, 150-153 Node 160 Output circuit 163 Stream ID prediction circuit 162 Demultiplexer CK Clock input terminal CK1, CK2 Clock pulse OE Output enable control input terminal CMD command ADR, ADR1, ADR2 Address DA, DA1, DA2, DAi Destination address DA0-DA5, CH0, CH1 Bit PA, PA1, PA2 Page address WA, WX, WA1, WA2 Word address DATA Data CN Concatenated bit OD Order bit HM , HM1, HM2 hit bit CH system PT packet type TA, TAG tag address CNT counter MA merge stage identifier PR 1, PR2 processing V valid bit D dirty bit L lock bit R result bit VC virus code S state SE stream element SID, SID0 to SID3 stream identifier IRQ1, IRQ2 interrupt request DS input data stream RD search data set F01, F02, F1 F2 flip-flop

Claims

If the first packet passes through the branch node in the branch channel, the second packet corresponding to the first packet passes through the junction node in the junction channel, and the branch direction of the first packet at the branch node and the branch packet The merging node of the data driven type processing apparatus in which the merging direction of the second packet at the merging node corresponds and the number of pipeline stages between the branching node and the merging node is N (N ≧ 1) A data-driven processing device sequential merging control device provided with a sequential merging control circuit for controlling input selection at each of a plurality of merging nodes in the merging channel, wherein each sequential merging control circuit includes:
A queue of N stages or more,
A first transfer control circuit for transferring branch direction information contained in a packet at the branch node to an input stage of the queue;
A second transfer control circuit for transferring the output of the output stage of the queue as an input selection control signal of the junction node;
A sequential merging control device for a data driven processor.

2. The order merging control apparatus according to claim 1, wherein the branch direction information is a bit indicating a destination from the branch node to the next stage in a destination address.

2. The order merge control device according to claim 1, wherein the first transfer control circuit enables transfer to the input stage when the order information included in the packet indicates that there is an order. .

The order joining control apparatus according to claim 3, wherein the order information indicates that there is no order when the packet is lost at a node on the way to the joining node.

The first transfer control circuit activates the SEND signal to the input stage of the queue when the SEND signal from the branch node to the next stage node is activated. Order merge control device.

When the SEND signal from the subsequent stage of the branch node is active and the ACK signal from the next stage of the branch node and the input stage of the queue are both active, the first transfer control circuit 6. The sequential merging control apparatus according to claim 5, wherein the node captures and holds data, and activates the SEND signal to the next stage and the input stage.

2. The sequential merge control apparatus according to claim 1, wherein the second transfer control circuit activates an ACK for an output stage of the queue when an ACK signal to the subsequent stage of the merge node is activated. 3. .

When the SEND signal from one of the output stage of the queue and the subsequent stage of the joining node becomes active and the ACK signal from the subsequent stage of the joining node becomes active, the second transfer control circuit 7. The sequential merging control apparatus according to claim 6, wherein data is fetched and held in the merging stage and ACK for the output stage of the queue is activated.

When the second packet corresponding to the first packet is composed of a plurality of packets and the concatenated bit other than the last packet of the plurality of packets indicates that there is a subsequent packet, the second transfer control circuit, as an exception, 9. The sequential merging control apparatus according to claim 8, wherein ACK for the output stage is maintained inactive.

A plurality of pairs of branch nodes and junction nodes are provided, and for each pair, if the first packet passes through the branch node, the second packet corresponding to the first packet passes through the junction node and the second packet at the branch node A data driven processing device in which a branch direction of one packet corresponds to a merge direction of the second packet at the merge node, and
The sequence merging control device according to any one of claims 1 to 9 is provided in each of the plurality of pairs in which the number of pipeline stages between the branching node and the merging node is 1 or more. A data driven processing device.

Packets supplied to the first ingress node are selectively diverted to downstream nodes in accordance with the destination address value in the packet, and reach the one corresponding to the value among the plurality of first egress nodes A tree-shaped branch channel
A tree-shaped merge path having a second entry node corresponding to each of the plurality of first exit nodes, selectively joining packets sequentially to a downstream node and reaching the second exit node;
A first functional element coupled between each first egress node and a corresponding second ingress node for supplying a packet according to the content of the packet from the first egress node to the second egress node;
And each node of the tree-shaped branch channel and the tree-shaped joint channel constitutes a pipeline stage, and each of the tree-shaped branch channel and the tree-shaped joint channel has three or more stages,
11. The data driven processing according to claim 10, wherein a plurality of branch nodes constituting the tree-shaped branch flow path and a plurality of merge nodes constituting the tree-shaped merge flow path constitute the plurality of pairs. apparatus.

In the tree-shaped branch channel, the tree-shaped branch channel is formed so that the paths of packets having destination addresses complementary to each other are logically symmetric with respect to the axis in the channel direction. The data driven processing apparatus according to claim 11, wherein the data driven processing apparatus is a data driven processing apparatus.

For an arbitrary destination address, the path of the packet in the tree-shaped branch path and the path of the packet in the tree-shaped branch path are logically symmetrical with respect to an axis perpendicular to the path. 13. The data driven processing apparatus according to claim 12, wherein the tree-shaped branch channel and the tree-shaped combined channel are formed.

A plurality of packets related to each other are sequentially supplied to the first ingress node,
The tree-shaped merge channel includes at least the second functional element that generates a packet according to the contents of the plurality of packets that have reached the second egress node, at the second egress node.
14. The data driven processing apparatus according to claim 12, wherein the data driven processing apparatus is a data driven processing apparatus.

The packet has a systematic value;
The tree-shaped branch channel has the first inlet node which is different for each system, selectively joins packets from a plurality of systems at an intermediate node, and the plurality of first outlet nodes are common to each system. ,
In the tree-shaped joint channel, the plurality of second entrance nodes are common to each system, and packets from the plurality of systems are selectively branched according to the system value at an intermediate node, and the second channel is different for each system. Has two exit nodes,
The data driven type processing apparatus according to claim 11.

In the tree-shaped branch channel, the tree-shaped branch channel is formed so that the paths of packets having destination addresses complementary to each other are logically symmetric with respect to the axis in the channel direction. 16. The data driven processing device according to claim 15, characterized in that

For an arbitrary destination address, the path of the packet in the tree-shaped branch path and the path of the packet in the tree-shaped branch path are logically symmetrical with respect to an axis perpendicular to the path. 17. The data driven processing apparatus according to claim 16, wherein the tree-shaped branch channel and the tree-shaped combined channel are formed.

A plurality of packets related to each other are sequentially supplied to each first ingress node,
The tree-shaped joint channel includes a second functional element that generates a packet corresponding to the contents of the plurality of packets at the second egress node at least at each second egress node.
The data driven processing apparatus according to claim 15, wherein:

19. The data driven processing apparatus according to claim 18, wherein the tree-shaped joint channel includes the second functional element at each node thereof.

Each of the plurality of associated packets includes a packet side merge stage identifier identifying a node to merge in the tree-shaped merge channel;
Each node of the tree-shaped merge channel operates the second functional element if the packet-side merge stage identifier corresponds to the node-side merge stage identifier.
The data driven processing apparatus according to claim 19.

A merging stage identifier generation node is further provided upstream of each first inlet node of the tree-shaped branching channel, and the merging stage identifier generation node is based on the value of each destination address in the plurality of packets related to each other 21. The data driven processing apparatus according to claim 20, wherein a value of the packet side merge stage identifier is generated.

The data according to any one of claims 15 to 21, further comprising a communication path that couples each of the plurality of second exit nodes of the tree-shaped merge path to the corresponding merge stage identifier generation node. Driven processing device.

The tree-shaped branch channel includes a next-stage node at a position where the tree-shaped branch channel is branched from each of the plurality of first inlet nodes without being merged between systems. The data driven processing apparatus as described.