JP2013021452A

JP2013021452A - Control device and designing method for determining the number of arithmetic circuits

Info

Publication number: JP2013021452A
Application number: JP2011152076A
Authority: JP
Inventors: Yuki Ishii; 友規石井; Takao Yamaguchi; 孝雄山口; Atsushi Yoshida; 篤吉田; Eiji Watanabe; 栄児渡辺
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2011-07-08
Filing date: 2011-07-08
Publication date: 2013-01-31
Anticipated expiration: 2031-07-08
Also published as: JP5838367B2

Abstract

PROBLEM TO BE SOLVED: To provide a transmission interval control technique to control a semiconductor bus, which asynchronously transmits plural packets having a predetermined processing termination time, to execute a necessary access amount in a minimum period of time within the processing limit by appropriately adjusting the transmission interval of plural packets.SOLUTION: The control device includes: an allowable delay calculation section that calculates an allowable delay, which represents an allowable delay amount allowed for packets to be subsequently transmitted, and outputs the same as a piece of adjust range information at a timing determined depending on the transmission termination time of the predetermined number of packets; a transmission interval determination section that determines the transmission interval of the packets based on at least a piece of adjusting range information and permits or inhibits an initiator, which is connected adjacent thereto, to transmit the data according to the determined result; and a transmission/reception section that transmits at least one packet which is generated based on the data received from the initiator which is permitted to transmit the data.

Description

本発明は、ネットワーク化された通信バスを備える半導体チップにおいて、通信バスの制御を行うための技術に関する。 The present invention relates to a technique for controlling a communication bus in a semiconductor chip having a networked communication bus.

近年、システム・オン・チップ（ＳｙｓｔｅｍｏｎＣｈｉｐ；以下「ＳｏＣ」）やマルチコア・プロセッサ分野における通信バスとして、ネットワーク・オン・チップ（Ｎｅｔｗｏｒｋ−ｏｎ−Ｃｈｉｐ、以下「ＮｏＣ」）が用いられるようになってきた。 In recent years, network-on-chip (hereinafter “NoC”) has been used as a communication bus in the field of system on chip (hereinafter “SoC”) and multi-core processors. I came.

図１は、イニシエータＩ１〜Ｉ４とターゲットＴ１〜Ｔ４との間のメインバスにＮｏＣを適用した一般的な接続構成の例を示す。イニシエータは、たとえば、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＤＭＡＣ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓＣｏｎｔｒｏｌｌｅｒ）などである。また、ターゲットは、たとえば、外部のＤＲＡＭに接続されるメモリコントローラ、ＳＲＡＭ、外部との入出力用のバッファメモリなどである。 FIG. 1 shows an example of a general connection configuration in which NoC is applied to the main bus between the initiators I1 to I4 and the targets T1 to T4. The initiator is, for example, a DSP (Digital Signal Processor), a CPU (Central Processing Unit), or a DMAC (Direct Memory Access Controller). The target is, for example, a memory controller connected to an external DRAM, an SRAM, or a buffer memory for input / output with the outside.

イニシエータおよびターゲットは、ネットワーク・インタフェース・コントローラ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣｏｎｔｒｏｌｌｅｒ；ＮＩＣ）であるＮＩＣ１〜ＮＩＣ８を介して、ＮｏＣルータＲ１〜Ｒ４で構成されるネットワークバスにより、相互に通信可能な状態に接続されている。ＮｏＣでは、バス上でのデータのやりとりは全てパケット単位で行われる。そのために、ＮＩＣはイニシエータやターゲットが送受信するトランザクションデータをパケット化および脱パケット化する機能を有する。またＮｏＣの構造上、ルータ間を接続するリンクＬ１〜Ｌ８は、複数のイニシエータＩ１〜Ｉ４からターゲットＴ１〜Ｔ４へパケットが送信される際に共用される。 The initiator and the target are connected to each other through a network bus composed of NoC routers R1 to R4 via NIC1 to NIC8, which are network interface controllers (NICs). . In NoC, all data exchange on the bus is performed in packet units. For this purpose, the NIC has a function of packetizing and depacketizing transaction data transmitted and received by the initiator and the target. Further, because of the NoC structure, the links L1 to L8 connecting the routers are shared when packets are transmitted from the plurality of initiators I1 to I4 to the targets T1 to T4.

図１のような構成のＮｏＣにおいて、各リンクの通信帯域は、相反する２つの制約の下で設計される。すなわち、リンク帯域を向上させることにより、各イニシエータが要求するレイテンシやスループットなどの伝送品質を確保するという第一の制約と、リンク帯域を低減させることにより、回路の高周波設計を避け、消費電力を低減するという第二の制約である。 In the NoC configured as shown in FIG. 1, the communication band of each link is designed under two contradictory constraints. In other words, by improving the link bandwidth, the first restriction of ensuring the transmission quality such as latency and throughput required by each initiator, and by reducing the link bandwidth, avoiding high frequency design of the circuit and reducing power consumption. The second restriction is to reduce.

各イニシエータが要求する帯域は、一般的には時間変動するため、単位時間当たりに要求される平均帯域と最大帯域とは一致しない。仮に、同一のリンクを共用し、非同期的にパケットの送信を行う複数のイニシエータがターゲットに対してほぼ同時に最大帯域のアクセスを行った場合、経路上のリンクは全てのアクセスを収容することができず、レイテンシの増加やスループットの低下が発生する。とくに、バースト型のアクセスを行うイニシエータが存在する場合、一連のアクセス（バーストアクセス）の発生中は、送信経路上のリンクが過負荷となることで、リンクを共用する他のイニシエータの通信品質も低下させ、処理のリアルタイム性を阻害する。 Since the bandwidth required by each initiator generally varies with time, the average bandwidth required per unit time does not match the maximum bandwidth. If multiple initiators that share the same link and asynchronously transmit packets access the maximum bandwidth to the target almost simultaneously, the link on the path can accommodate all accesses. Therefore, an increase in latency and a decrease in throughput occur. In particular, when there is an initiator that performs burst-type access, the link on the transmission path is overloaded during a series of accesses (burst access). Reduce the real-time nature of the process.

ここでいう「バースト型のアクセス」または「バーストアクセス」とは、たとえば、一定数のアクセスを連続的に行った後に内部で計算処理を行い、再び一定数のアクセスを連続的に行う形態のアクセスである。図２は、メモリへのバーストアクセスの一例を示す。横軸が時間を表し、縦軸がメモリから読み出したデータ量（メモリアクセスの量）を示している。一定期間継続する一連のアクセスが発生した後、アクセスが一旦止まり、その後再び一定期間継続する一連のアクセスが発生していることが理解される。 “Burst-type access” or “burst access” here means, for example, an access in which a fixed number of accesses are continuously performed and then calculation processing is performed internally, and a fixed number of accesses are continuously performed again. It is. FIG. 2 shows an example of burst access to the memory. The horizontal axis represents time, and the vertical axis represents the amount of data read from the memory (memory access amount). It is understood that after a series of accesses continuing for a certain period occurs, the access is temporarily stopped, and thereafter, a series of accesses continuing for a certain period again occurs.

特許文献１は、非同期的に動作する複数のイニシエータ間で通信バスを共用する場合のパケット送信の競合を回避する技術を開示する。イニシエータが通信バスをモニタすることで送信の競合を検出し、イニシエータ毎にランダムな待ち時間を設定して待機し、待ち時間の経過後に再び送信動作を行う。かかる動作により、各イニシエータは、自律的に送信パケットの競合を避けながら通信バスを共用することが可能となる。 Patent Document 1 discloses a technique for avoiding packet transmission contention when a communication bus is shared among a plurality of initiators operating asynchronously. The initiator monitors the communication bus to detect a transmission conflict, sets a random waiting time for each initiator, waits, and performs a transmission operation again after the waiting time has elapsed. With this operation, each initiator can share the communication bus while autonomously avoiding contention of transmission packets.

また非特許文献１には、待機時間の計算方法が示されている。非特許文献１によれば、他のイニシエータとの競合を検出している時間が長いほど、待機時間も長くすることで、効率的に送信要求の競合を防ぐことが可能となる。 Non-Patent Document 1 discloses a method for calculating a standby time. According to Non-Patent Document 1, it is possible to efficiently prevent contention for transmission requests by increasing the standby time as the time for detecting contention with other initiators is longer.

特開昭６１−２３０４４４号公報JP-A-61-230444 特許第１３７３８１０号明細書Japanese Patent No. 1373810

ＩＥＥＥＳｔｄ８０２．３規格書（セクション４．２．３．２．５）IEEE Std 802.3 Standard (Section 4.2.3.2.5)

しかしながら、ＤＳＰは、映像信号や音声信号などの信号の処理をリアルタイムで行うことを想定して実装されるのが通常である。そのため、予め定められた処理期限までに必要量のデータアクセスを完了させる必要性がある。従来技術では、処理のリアルタイム性を確保することができなかった。その理由は、競合が繰り返し発生すると、従来技術では、待機時間の上限を切り上げながら、ランダムに待機時間を延ばしていくためである。 However, the DSP is usually mounted on the assumption that processing of signals such as video signals and audio signals is performed in real time. Therefore, there is a need to complete a necessary amount of data access by a predetermined processing time limit. In the prior art, the real time property of the process could not be secured. The reason for this is that when the conflict repeatedly occurs, the prior art randomly extends the waiting time while raising the upper limit of the waiting time.

リアルタイム性を確保できなければ、たとえば表示のコマ落ち、音飛び、サービスの待ち時間の増加という形で、ユーザがその影響を被ることになる。従って、遅延は許容されるべきではない。 If real-time performance cannot be ensured, the user will be affected, for example, in the form of dropping frames, skipping sound, and increasing service wait time. Therefore, delay should not be tolerated.

なお、上述した非特許文献１は、一般的なネットワークにおける伝送プロトコル技術に関している。一般的なネットワークの通信はベストエフォート型通信であり、遅延が許容される。したがって、一般的なネットワークの技術は、ＮｏＣに一律に適用することは困難である。 Note that Non-Patent Document 1 described above relates to a transmission protocol technique in a general network. General network communication is best-effort communication and delay is allowed. Therefore, it is difficult to apply general network technology uniformly to NoC.

図３は、従来技術を適用した場合において、処理の締切時刻までに全てのアクセスが完了されない例を示す。図３に示されるイニシエータ１およびイニシエータ２は、ともにバーストアクセスを行うイニシエータであるとする。バーストアクセスを行う複数のイニシエータが競合する状況下では、図３に示すように、競合の繰返し発生による待機時間上限の切り上げが起こる。 FIG. 3 shows an example in which all accesses are not completed before the processing deadline when the conventional technique is applied. Assume that both initiator 1 and initiator 2 shown in FIG. 3 are initiators that perform burst access. Under a situation where a plurality of initiators performing burst access compete, as shown in FIG. 3, the upper limit of the waiting time is raised due to repeated occurrence of the competition.

いま、イニシエータ１およびイニシエータ２がともに時刻ｔ０で同時にアクセス要求を開始すると仮定する。すなわち、最初のアクセス競合は時刻ｔ０で発生する。このとき、イニシエータ１、イニシエータ２はそれぞれ上限をＬとするランダムな待機時間を設定して競合回避を行う。イニシエータ１の待機時間のほうがイニシエータ２の待機時間よりも短かった場合、イニシエータ１がイニシエータ２より先に、バーストアクセスを再開させる。イニシエータ２の待機状態が時刻ｔ１に解除されると、再び競合が発生する。このとき、イニシエータ２は連続した２回の競合を検出しているため、待機時間の上限は２倍に切り上げられる。その結果、再びイニシエータ１の待機時間上限のほうがイニシエータ２よりも短くなる。さらに、時刻ｔ２にイニシエータ２の待機状態が解除されると、先にアクセスを再開させたイニシエータ１との間で再び競合が発生し、イニシエータ２は３回の競合を検出したため、待機時間の上限は４倍に切り上げられる。 Now, it is assumed that both initiator 1 and initiator 2 simultaneously start access requests at time t0. That is, the first access conflict occurs at time t0. At this time, each of the initiator 1 and the initiator 2 performs a conflict avoidance by setting a random waiting time with an upper limit of L. When the standby time of the initiator 1 is shorter than the standby time of the initiator 2, the initiator 1 restarts the burst access before the initiator 2. When the standby state of the initiator 2 is released at time t1, a conflict occurs again. At this time, since the initiator 2 has detected two consecutive conflicts, the upper limit of the waiting time is rounded up by a factor of two. As a result, the waiting time upper limit of the initiator 1 becomes shorter than that of the initiator 2 again. Furthermore, when the standby state of the initiator 2 is canceled at time t2, a conflict occurs again with the initiator 1 that has previously resumed access, and the initiator 2 has detected three conflicts. Is rounded up to 4 times.

従来の技術によれば、初回の競合で長時間待機したイニシエータ２が、その後の再競合でも不利となり、イニシエータ２のアクセスが再び開始される時刻は、時刻ｔ３となる。時刻ｔ３においては、イニシエータ１のアクセスは全て完了しているため、イニシエータ２のバーストアクセスが再開される。しかしながら、度重なる待機時間の延長により、図示されている処理の締切時刻までに全てのアクセスを完了するだけの時間的な余裕はもはや存在しない。このように従来技術では、処理期限内に全てのアクセスを完了することができず、信号処理の処理落ちが発生していた。そして処理のリアルタイム性を確保することもできなかった。 According to the conventional technique, the initiator 2 that has waited for a long time in the initial competition becomes disadvantageous in the subsequent re-competition, and the time when the initiator 2 starts to access again is time t3. At time t3, all accesses of initiator 1 are completed, so burst access of initiator 2 is resumed. However, due to repeated waiting times, there is no longer enough time to complete all accesses by the process deadline. As described above, in the conventional technique, all accesses cannot be completed within the processing time limit, and the processing of the signal processing is lost. In addition, the real time property of the process could not be secured.

同一の処理完了時刻を有する複数パケットを非同期的に通信する半導体バスにおいて、複数パケットの送信間隔を適切に調整することにより、処理期限を守りながら、必要なアクセス量を最短時間で実行することのできる送信間隔制御方式を考える。各イニシエータは、処理完了期限Ｔｂ（サイクル）の間に、Ｎｐ回のデータアクセスを完了させることで、処理のリアルタイム性を確保する必要がある。 In a semiconductor bus that asynchronously communicates multiple packets having the same processing completion time, the necessary access amount can be executed in the shortest time while adhering to the processing deadline by appropriately adjusting the transmission interval of the multiple packets. A possible transmission interval control method is considered. Each initiator needs to ensure real-time processing by completing Np data accesses during the processing completion time limit Tb (cycle).

隣接する各データアクセス間の間隔をＧサイクルとし、１回の通信間隔の制御で調整される調整単位をＫ、処理完了期限迄に残されたサイクル数をＲｃ、残されたデータアクセス数をＲｐ、１パケットを転送するために要するサイクル数をＮｆとすると、各データアクセス毎の通信間隔Ｇは、数１で与えられる。

The interval between adjacent data accesses is G cycles, the adjustment unit adjusted by the control of one communication interval is K, the number of cycles remaining until the process completion deadline is Rc, and the number of remaining data accesses is Rp Assuming that the number of cycles required to transfer one packet is Nf, the communication interval G for each data access is given by equation (1).

数１によれば、現在時刻ｔと処理完了期限Ｔｂとに基づいてリアルタイム性を確保しつつ通信間隔Ｇを適切に制御するためには、通信間隔の調整範囲の上限を計算する必要がある。よって、残サイクル数Ｒｃを残アクセス数Ｒｐで除する必要がある。 According to Equation 1, it is necessary to calculate the upper limit of the communication interval adjustment range in order to appropriately control the communication interval G while ensuring the real-time property based on the current time t and the process completion time limit Tb. Therefore, it is necessary to divide the remaining cycle number Rc by the remaining access number Rp.

一般に、除算処理はハードウェアによる演算コストを増加させる。すなわち、除算処理には、加減算などと比較して多くのサイクル数が必要とされる。 In general, division processing increases the computational cost of hardware. That is, the division process requires a larger number of cycles than addition and subtraction.

そのため、除算実行に費やされるサイクル数により、通信間隔Ｇを制御できる最小間隔が制限されてしまい、実際のパケットサイズに合わせて最適なタイミングで通信間隔Ｇを更新できないという第一の課題があった。また除算実行に必要なサイクル数を短縮するためには、特許文献２に記載のように複雑な高速除算アルゴリズムの実行や大規模な除算結果のデータテーブルを保持することが必要となり、非常に大きなゲート数規模のハードウェア回路を要するという第二の課題があった。さらに、各データアクセスが処理されるタイミングはバスやターゲットの混み具合によって揺らぐため、通信間隔Ｇの制御を行う時点でのデータアクセス残数Ｒｐの値を予測することは難しく、予測的に除算実行を行うことで高速化することができないという第三の課題があった。 For this reason, the minimum interval at which the communication interval G can be controlled is limited by the number of cycles spent for division execution, and there is a first problem that the communication interval G cannot be updated at an optimal timing according to the actual packet size. . Also, in order to reduce the number of cycles required for division execution, it is necessary to execute a complex high-speed division algorithm and to hold a large-scale division result data table as described in Patent Document 2, which is very large. There was a second problem of requiring a hardware circuit having the number of gates. Furthermore, since the timing at which each data access is processed fluctuates depending on the bus and target congestion, it is difficult to predict the value of the remaining data access Rp at the time of controlling the communication interval G, and division is executed predictively. There was a third problem that it was not possible to speed up by doing.

本発明は上記課題に鑑みてなされたものであり、その目的は、予め定められた処理完了時刻を有する複数パケットを非同期的に通信する半導体バスにおいて、複数パケットの送信間隔を適切に調整することにより、処理期限を守りながら、必要なアクセス量を最短時間で実行することのできる送信間隔制御方式を提供することにある。 The present invention has been made in view of the above problems, and an object thereof is to appropriately adjust a transmission interval of a plurality of packets in a semiconductor bus that asynchronously communicates a plurality of packets having a predetermined processing completion time. Accordingly, it is an object of the present invention to provide a transmission interval control method capable of executing a necessary access amount in the shortest time while keeping the processing deadline.

本発明による制御装置は、複数のパケットが伝送される、ネットワーク化されたバスを有する半導体回路において、各パケットの送信タイミングを制御して各パケットを前記バスに送信する制御装置であって、所定数のパケットの送信完了時刻に応じて定まるタイミングで、以後に送信されるパケットの許容可能な送信遅延量を示す許容遅延を計算し、調整範囲情報として出力する許容遅延演算部と、少なくとも前記調整範囲情報に基づいて各パケットの送信間隔を決定し、決定結果に応じて、隣接して接続されたイニシエータによるデータの送信を許可し、または禁止する送信間隔決定部と、前記データの送信が許可された前記イニシエータから受信したデータに基づいて生成された少なくとも１つのパケットを、バスに送信する送受信部とを備えている。 A control device according to the present invention is a control device for controlling a transmission timing of each packet and transmitting each packet to the bus in a semiconductor circuit having a networked bus through which a plurality of packets are transmitted. An allowable delay calculation unit that calculates an allowable delay indicating an allowable transmission delay amount of a packet to be transmitted later and is output as adjustment range information at a timing determined according to the transmission completion time of a plurality of packets, and at least the adjustment The transmission interval of each packet is determined based on the range information, and a transmission interval determination unit that permits or prohibits data transmission by an adjacently connected initiator according to the determination result, and transmission of the data is permitted Transmitting / receiving unit for transmitting to the bus at least one packet generated based on the received data from the initiator It is equipped with a.

前記許容遅延演算部は、各々が所定の演算の実行に時間ｎを要する、並列的なＮ個の演算実行部と、前記バスに送信されるパケットの最小のサイズａ、前記演算実行部の並列数Ｎおよび前記時間ｎに基づいて制御間隔ｐを計算する制御間隔決定部とを備えており、前記所定数は前記制御間隔ｐによって定められてもよい。 The allowable delay calculation unit includes N calculation execution units in parallel, each of which requires time n to execute a predetermined calculation, a minimum size a of a packet transmitted to the bus, and a parallel of the calculation execution units. A control interval determining unit that calculates a control interval p based on the number N and the time n, and the predetermined number may be determined by the control interval p.

前記制御間隔決定部は、下記の式によって前記制御間隔ｐを決定してもよい。

The control interval determination unit may determine the control interval p according to the following equation.

前記送信間隔決定部は、前記調整範囲情報に加えて、さらに前記バスの負荷Ｌの情報に基づいて各パケットの送信間隔を決定してもよい。 The transmission interval determination unit may determine the transmission interval of each packet based on the information on the load L of the bus in addition to the adjustment range information.

前記負荷Ｌが予め定められた値より大きいときは、前記送信間隔決定部は各パケットの送信間隔を拡大し、前記負荷Ｌが前記予め定められた値以下のときは、前記送信間隔決定部は各パケットの送信間隔を縮小してもよい。 When the load L is larger than a predetermined value, the transmission interval determination unit expands the transmission interval of each packet, and when the load L is equal to or less than the predetermined value, the transmission interval determination unit The transmission interval of each packet may be reduced.

前記バスには、他のイニシエータまたはターゲットが接続されており、前記送信間隔制御部は、アクセス要求を前記他のイニシエータまたは前記ターゲットに送信してから応答が帰ってくるまでの時間であるレイテンシを、前記送受信部を利用して求め、前記バスの負荷として利用してもよい。 Another initiator or target is connected to the bus, and the transmission interval control unit calculates a latency that is a time from when an access request is transmitted to the other initiator or the target until a response is returned. The transmission / reception unit may be used and used as a load on the bus.

本発明による方法は、複数のパケットが伝送される、ネットワーク化されたバスを有する半導体回路において、各パケットの送信タイミングを制御して各パケットを前記バスに送信する制御装置に並列的に設けられる演算回路の数Ｎを決定するための設計方法であって、前記制御装置は、所定数のパケットの送信完了時刻に応じて定まるタイミングで、以後に送信されるパケットの許容可能な送信遅延量を示す許容遅延を計算し、調整範囲情報として出力する許容遅延演算部と、少なくとも前記調整範囲情報に基づいて各パケットの送信間隔を決定し、決定結果に応じて、隣接して接続されたイニシエータによるデータの送信を許可し、または禁止する送信間隔決定部と、前記データの送信が許可された前記イニシエータから受信したデータに基づいて生成された少なくとも１つのパケットを、バスに送信する送受信部とを備えており、前記許容遅延演算部が、各々が所定の演算の実行に時間ｎを要する、並列的なＮ個の演算回路と、前記バスに送信されるパケットの最小のサイズａ、前記演算回路の並列数Ｎおよび前記時間ｎに基づいて、前記所定数である制御間隔ｐを計算する制御間隔決定部とを備えるときにおいて、前記方法は、予め定められた制御間隔ｐの値を入力するステップと、予め定められた最小のサイズａの値および前記時間ｎの値を取得するステップと、下記の式によって、前記演算回路の並列数Ｎを決定するステップとを包含する。

The method according to the present invention is provided in parallel to a control device for controlling the transmission timing of each packet and transmitting each packet to the bus in a semiconductor circuit having a networked bus through which a plurality of packets are transmitted. A design method for determining the number N of arithmetic circuits, wherein the control device determines an allowable transmission delay amount of a packet to be transmitted thereafter at a timing determined according to a transmission completion time of a predetermined number of packets. An allowable delay calculation unit that calculates the allowable delay shown and outputs it as adjustment range information, determines a transmission interval of each packet based on at least the adjustment range information, and depends on the determination result by an initiator connected adjacently Data received from the transmission interval determination unit that permits or prohibits data transmission, and the initiator that is permitted to transmit the data A transmission / reception unit that transmits at least one packet generated based on the bus to the bus, and the allowable delay calculation unit includes N operations in parallel, each of which requires time n to execute a predetermined calculation. A circuit and a control interval determination unit that calculates the control interval p, which is the predetermined number, based on the minimum size a of packets transmitted to the bus, the parallel number N of the arithmetic circuits, and the time n The method includes the step of inputting a value of a predetermined control interval p, a step of acquiring a predetermined value of the minimum size a and the value of the time n, and the calculation by the following formula: Determining the parallel number N of the circuit.

本発明によれば、同一の処理完了時刻を有する複数パケットを非同期的に通信する半導体バスにおいて、複数パケットの送信間隔を適切に調整することにより、フロー間の干渉を抑制し、システム全体としてのスループットとレイテンシを向上させることができる。また同時に処理完了時刻を守りながら、必要なアクセス量を最短時間で実行可能となるため、メディア処理などのアプリケーションのリアルタイム性を確保することが可能となる。また網の状態に応じた送信間隔の自動調整が働くため、設計フェーズにおける複雑なユースケースを用いた送信タイミングの設計評価のコストが削減できる。 According to the present invention, in a semiconductor bus that asynchronously communicates a plurality of packets having the same processing completion time, interference between flows is suppressed by appropriately adjusting a transmission interval of the plurality of packets, and the entire system is Throughput and latency can be improved. At the same time, since the required access amount can be executed in the shortest time while keeping the processing completion time, it is possible to ensure real-time performance of applications such as media processing. In addition, since automatic adjustment of the transmission interval according to the network status works, the cost of design evaluation of transmission timing using complicated use cases in the design phase can be reduced.

イニシエータＩ１〜Ｉ４とターゲットＴ１〜Ｔ４との間のメインバスにＮｏＣを適用した一般的な接続構成の例を示す図である。It is a figure which shows the example of the general connection structure which applied NoC to the main bus between initiator I1-I4 and target T1-T4. メモリへのバーストアクセスの一例を示す図である。It is a figure which shows an example of the burst access to a memory. 従来技術を適用した場合において、処理の締切時刻までに全てのアクセスが完了されない例を示す図である。FIG. 10 is a diagram illustrating an example in which all accesses are not completed by the processing deadline when the conventional technique is applied. ＮｏＣバスを有するＳｏＣ１００の構成例を示す図である。It is a figure which shows the structural example of SoC100 which has a NoC bus | bath. ＤＭＡＣが生成した要求トランザクション１０のデータ構造の例を示す図である。It is a figure which shows the example of the data structure of the request transaction 10 which DMAC produced | generated. ＮＩＣ１によってパケットヘッダＰＨが付加されてパケット化された要求トランザクション１２を示す図である。It is a figure which shows request transaction 12 packetized by adding packet header PH by NIC1. ＮＩＣ５がＭＥＭ１から受け取った返信トランザクション１４のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the reply transaction 14 which NIC5 received from MEM1. パケット化された返信トランザクション１６および１８のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the reply transactions 16 and 18 packetized. 典型的なバーストアクセス例を示す図である。It is a figure which shows the typical burst access example. ＤＭＡＣ、ＥＮＣ、ＤＥＣが同時にＭＥＭ１に対してアクセス要求を行った場合に、ＤＭＡＣのアクセス遅延例を示す図である。It is a figure which shows the example of an access delay of DMAC when DMAC, ENC, and DEC simultaneously request access to MEM1. ＤＭＡＣ、ＥＮＣ、ＤＥＣが送信間隔を調整しながらＭＥＭ１に対してアクセス要求を行った場合の、ＤＭＡＣのアクセス遅延例を示す図である。It is a figure which shows the example of access delay of DMAC when DMAC, ENC, and DEC make an access request to MEM1 while adjusting the transmission interval. イニシエータ側のＮＩＣ１２０の基本的な構成を示す図である。2 is a diagram illustrating a basic configuration of a NIC 120 on an initiator side. FIG. 送信間隔制御部１２５の基本構成を示す図である。3 is a diagram illustrating a basic configuration of a transmission interval control unit 125. FIG. イニシエータ側のＮＩＣ１２０の処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process of NIC120 by the side of an initiator. 許容遅延演算部１３１の基本構成を示す図である。3 is a diagram illustrating a basic configuration of an allowable delay calculation unit 131. FIG. 制御間隔決定部１４４の処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process of the control interval determination part 144. 送信間隔制御部１２５が管理するパラメータを記録するためのレジスタ群の例を示す図である。It is a figure which shows the example of the register group for recording the parameter which the transmission space | interval control part 125 manages. 送信間隔制御部１２５の処理の手順を示すフローチャートである。7 is a flowchart illustrating a processing procedure of a transmission interval control unit 125. バーストアクセスの種類を示す図である。It is a figure which shows the kind of burst access. イニシエータからのデータアクセスの最小パケットサイズが９フリットであるとしたときのタイミング図である。It is a timing chart when the minimum packet size of data access from the initiator is 9 flits. 新たにイニシエータからパケットサイズが７フリットのデータアクセスが発生したとしたときのタイミング図である。FIG. 6 is a timing diagram when a new data access with a packet size of 7 flits occurs from an initiator. 演算回路を２つの回路Ａ及び回路Ｂを用いて並列化することにより、調整範囲を制限時間内に算出可能にしたときのタイミング図である。FIG. 10 is a timing chart when an arithmetic circuit is parallelized using two circuits A and B so that an adjustment range can be calculated within a time limit. 並列数Ｎ＝３、最小パケットサイズａ＝２、除算回路の演算に要するサイクル数ｎ＝９のときのタイミング図である。FIG. 6 is a timing chart when the number of parallel N = 3, the minimum packet size a = 2, and the number of cycles n = 9 required for the operation of the divider circuit. 図２３に示す条件下で動作中に、イニシエータが送信する最小パケットサイズが２フリット（ａ＝２）から４フリット（ａ＝４）に変更されたときのタイミング図である。FIG. 24 is a timing chart when the minimum packet size transmitted by the initiator is changed from 2 flits (a = 2) to 4 flits (a = 4) during operation under the conditions shown in FIG. 送信間隔制御を行わない場合のパケットのレイテンシの結果を示す図である。It is a figure which shows the result of the latency of the packet when not performing transmission interval control. 本発明の実施形態による送信間隔制御を行った場合のパケットのレイテンシの結果を示す図である。It is a figure which shows the result of the latency of the packet at the time of performing transmission interval control by embodiment of this invention.

以下、添付の図面を参照しながら、本発明による送信間隔制御装置の実施形態を説明する。以下の実施形態においては、携帯電話やＡＶ機器に組み込まれるＳｏＣのメインバスとしてＮｏＣを使用する例を挙げて、送信間隔制御装置を説明する。 Hereinafter, embodiments of a transmission interval control device according to the present invention will be described with reference to the accompanying drawings. In the following embodiments, a transmission interval control device will be described by taking an example in which NoC is used as a main bus of SoC incorporated in a mobile phone or an AV device.

（実施形態１）
まず、回路網の構成を説明する。図４は、ＮｏＣバスを有するＳｏＣ１００の構成例を示す。ＳｏＣ１００は、たとえば携帯電話やＡＶ機器に組み込まれる。図４においては、最上段に記載されたイニシエータから、最下段に記載されたターゲットにデータが伝送される。または、当該イニシエータからの読み出し要求に応答して、ターゲットからイニシエータにデータが伝送される。 (Embodiment 1)
First, the configuration of the circuit network will be described. FIG. 4 shows a configuration example of the SoC 100 having a NoC bus. The SoC 100 is incorporated into, for example, a mobile phone or an AV device. In FIG. 4, data is transmitted from the initiator described at the top to the target described at the bottom. Alternatively, data is transmitted from the target to the initiator in response to a read request from the initiator.

図４にはバスのイニシエータおよびターゲットの一例が示されている。すなわちバスのイニシエータとして、画面への表示処理を行うＤＭＡＣ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓＣｏｎｔｒｏｌｌｅｒ）、映像信号のＭＰＥＧ（ＭｏｔｉｏｎＰｉｃｔｕｒｅＥｘｐｅｒｔＧｒｏｕｐ）符号化を行うＥＮＣ（Encoder）、映像信号のＭＰＥＧ復号化を行うＤＥＣ（Decoder）、ウェブブラウザやユーザインタフェース処理などを行うＣＰＵ（Central Processing Unit）が含まれる。またターゲットとして、外部のＤＲＡＭ１〜ＤＲＡＭ４（図示せず）に接続されたメモリコントローラＭＥＭ１（Memory Controller １）からＭＥＭ４（Memory Controller ４）が含まれる。 FIG. 4 shows an example of a bus initiator and target. That is, as a bus initiator, a DMAC (Direct Memory Access Controller) that performs display processing on a screen, an ENC (Encoder) that performs MPEG (Motion Picture Expert Group) encoding of a video signal, and a DEC (MPEG) that decodes a video signal. Decoder), a CPU (Central Processing Unit) that performs web browser, user interface processing, and the like. Targets include memory controllers MEM1 (Memory Controller 1) to MEM4 (Memory Controller 4) connected to external DRAM1 to DRAM4 (not shown).

各イニシエータおよび各ターゲットは、ネットワーク・インタフェース・コントローラＮＩＣｎ（ｎ：１〜８の整数）と接続されている。そして、複数のネットワーク・インタフェース・コントローラの間は、ルータＲｋ（ｋ：１〜４の整数）によって接続されている。これにより、各イニシエータおよび各ターゲットが相互に通信可能に接続される。 Each initiator and each target are connected to a network interface controller NICn (n: an integer from 1 to 8). A plurality of network interface controllers are connected by a router Rk (k: integer of 1 to 4). Thereby, each initiator and each target are connected so that they can communicate with each other.

以下では、ＳｏＣ１００におけるデータの基本的な処理およびデータの流れを概説する。 In the following, basic processing of data and data flow in the SoC 100 will be outlined.

イニシエータは、まずターゲットへのアクセス要求である要求トランザクションを生成し、自身と接続されたＮＩＣｘ（ｘ：１〜４の整数）に出力する。 The initiator first generates a request transaction which is an access request to the target, and outputs it to NICx (x: an integer of 1 to 4) connected to itself.

イニシエータから要求トランザクションを受け取ったＮＩＣｘは、受け取った要求トランザクションをパケット化し、ルータＲ１〜Ｒ４同士の接続によって構成されるネットワークバスに送出する。 Upon receiving the request transaction from the initiator, NICx packetizes the received request transaction and sends it to the network bus configured by the connection between the routers R1 to R4.

メモリコントローラＭＥＭｙ（ｙ：１〜４の整数）に接続されたＮＩＣ（ｙ＋４）は、ルータＲ１〜Ｒ４を介してイニシエータ側から伝送されてきたパケットを受信し、そのパケットからイニシエータの要求トランザクションを構成する。そして当該ＮＩＣ（ｙ＋４）は、得られた要求トランザクションを、自身と接続されたメモリコントローラＭＥＭｙに転送する。メモリコントローラＭＥＭｙは、要求トランザクションに従ってＤＲＡＭｙへデータを書き込み（データライト）、または、ＤＲＡＭｙに格納されたデータを読み出す（データリード）。その結果は、そのメモリコントローラＭＥＭｙからＮＩＣ（ｙ＋４）に渡される。 The NIC (y + 4) connected to the memory controller MEMy (y: an integer from 1 to 4) receives a packet transmitted from the initiator side via the routers R1 to R4 and configures a request transaction of the initiator from the packet. To do. Then, the NIC (y + 4) transfers the obtained request transaction to the memory controller MEMy connected to itself. The memory controller MEMy writes data to the DRAMy (data write) or reads data stored in the DRAMy (data read) according to the requested transaction. The result is passed from the memory controller MEMy to NIC (y + 4).

メモリコントローラＭＥＭｙに接続されたＮＩＣ（ｙ＋４）は、メモリコントローラＭＥＭｙがＤＲＡＭｙにリードまたはライトすることによって得られた結果である返信トランザクションをパケット化し、ルータＲｋ（ｋ：１〜４の整数）で構成されるネットワークバス上に送出する。 The NIC (y + 4) connected to the memory controller MEMy packetizes a return transaction obtained by the memory controller MEMy reading or writing to the DRAMy, and is configured by a router Rk (k: an integer of 1 to 4). Sent to the network bus.

そしてイニシエータに接続されたＮＩＣｘは、ターゲットから送られた返信パケットを受け取って脱パケット化し、返信トランザクションを構成して、アクセス結果としてイニシエータに通知する。 The NICx connected to the initiator receives the reply packet sent from the target, depackets it, forms a reply transaction, and notifies the initiator as an access result.

次に、上述の要求トランザクションおよび返信トランザクションの具体例を説明する。ここでは、ＤＭＡＣが要求トランザクション１０を送信して、ＭＥＭ１に格納されたデータを読み出す例を説明する。 Next, specific examples of the above request transaction and reply transaction will be described. Here, an example will be described in which the DMAC transmits the request transaction 10 and reads the data stored in the MEM1.

図５は、ＤＭＡＣが生成した要求トランザクション１０のデータ構造の例を示す。要求トランザクション１０は、Ｒ／Ｗフィールド、ＡＤＤＲフィールドおよびＳＩＺＥフィールドを有している。図５ではそれぞれ「Ｒ／Ｗ」、「ＡＤＤＲ」、「ＳＩＺＥ」と示されている。以下の図６〜８でも同様に記載する。 FIG. 5 shows an example of the data structure of the request transaction 10 generated by the DMAC. The request transaction 10 has an R / W field, an ADDR field, and a SIZE field. In FIG. 5, “R / W”, “ADDR”, and “SIZE” are shown, respectively. The same applies to FIGS. 6 to 8 below.

R/Wフィールドにはメモリからの読み出し動作を行うことを示す命令が記述される。ＡＤＤＲフィールドにはデータの読み出しを行うＤＲＡＭのアドレスが記述される。またSIZEフィールドで読み出すべきデータのサイズが指定されている。 In the R / W field, an instruction indicating a read operation from the memory is described. In the ADDR field, the address of a DRAM from which data is read is described. The size of data to be read is specified in the SIZE field.

図６は、ＮＩＣ１によってパケットヘッダＰＨが付加されてパケット化された要求トランザクション１２を示している。パケットヘッダＰＨには、要求元（その要求を送信したイニシエータ）としてＮＩＣ１のノードのＩＤが格納される。またパケットヘッダＰＨには、要求の宛先としてＮＩＣ５のノードのＩＤが格納される。ＮＩＣ５は、パケット化された要求トランザクション１２からパケットヘッダを除去する処理（脱パケット化処理）を行って図５に示した要求トランザクション１０を再構成し、その要求トランザクション１０をＭＥＭ１に転送する。 FIG. 6 shows the request transaction 12 packetized by the NIC 1 with the packet header PH added. The packet header PH stores the node ID of the NIC 1 as the request source (the initiator that transmitted the request). The packet header PH stores the ID of the node of the NIC 5 as the request destination. The NIC 5 performs processing (depacketization processing) for removing the packet header from the packetized request transaction 12 to reconfigure the request transaction 10 shown in FIG. 5, and transfers the request transaction 10 to the MEM 1.

ＭＥＭ１は、要求トランザクション１０に記述されたＲ／Ｗフィールドに基づいて読み出し動作であることを特定する。さらにＭＥＭ１は、ＡＤＤＲフィールドおよびＳＩＺＥフィールドを読み取り、ＡＤＤＲフィールドに指定されたＤＲＡＭ上の番地から、ＳＩＺＥフィールドに指定されたサイズのデータを読み出す。その後ＭＥＭ１は、読み出した結果得られたデータを、ＮＩＣ５に出力する。 The MEM 1 specifies the read operation based on the R / W field described in the request transaction 10. Further, the MEM 1 reads the ADDR field and the SIZE field, and reads data of the size specified in the SIZE field from the address on the DRAM specified in the ADDR field. Thereafter, the MEM 1 outputs the data obtained as a result of the reading to the NIC 5.

図７は、ＮＩＣ５がＭＥＭ１から受け取った返信トランザクション１４のデータ構造の一例を示す。ＮＩＣ５は返信トランザクションをパケット化し、ネットワークバスに転送する。 FIG. 7 shows an example of the data structure of the reply transaction 14 received by the NIC 5 from the MEM 1. The NIC 5 packetizes the reply transaction and transfers it to the network bus.

図８は、パケット化された返信トランザクション１６および１８のデータ構造の一例を示す。返信トランザクション１６は、取得したデータにパケットヘッダＰＨを付して生成される。なお、返信するデータのサイズが大きい場合には、複数のパケットに分けて返信されてもよい。１つの返信パケット１６または複数の返信パケット１８を受け取った後、ＮＩＣ１は、返信パケット１６または１８を脱パケット化処理することにより、図７に示す返信トランザクション１４を再構築する。ＮＩＣ１は、そのデータをＤＭＡＣに転送することで、ＤＭＡＣのメモリアクセスが完了する。 FIG. 8 shows an example of the data structure of the packetized reply transactions 16 and 18. The reply transaction 16 is generated by attaching the packet header PH to the acquired data. If the size of the data to be returned is large, it may be returned in a plurality of packets. After receiving one reply packet 16 or a plurality of reply packets 18, the NIC 1 deconstructs the reply packet 16 or 18 to reconstruct the reply transaction 14 shown in FIG. NIC1 completes the DMAC memory access by transferring the data to DMAC.

次に、本実施形態によるＳｏＣ１００の動作を説明する。 Next, the operation of the SoC 100 according to the present embodiment will be described.

以下では、イニシエータであるＤＭＡＣが映像を画面（図示せず）表示する処理を行うとする。ただし、ＳｏＣ１００上ではＤＭＡＣのみが動作するのではなく、ＥＮＣ、ＤＥＣ、ＣＰＵも並列的に動作しつつ、データの授受を行う状況を想定する。 In the following, it is assumed that the DMAC as the initiator performs a process of displaying a video (not shown). However, it is assumed that not only the DMAC operates on the SoC 100 but also that the ENC, the DEC, and the CPU operate in parallel and exchange data.

本実施形態では、ＤＭＡＣは画面への表示処理を行うため、表示画面のコマ落ちやフリーズの発生により製品のユーザ価値を低下させないためには、メモリアクセス時のレイテンシへの要求が厳しく、リアルタイム性の確保が重要となる。 In this embodiment, since the DMAC performs display processing on the screen, in order not to reduce the user value of the product due to frame dropping or freezing on the display screen, the demand for latency during memory access is severe, and real-time performance is required. Is important.

一方、ＭＰＥＧに代表される映像や音声の符号化と復号化のアルゴリズムを実行するＥＮＣ及びＤＥＣは、その信号処理の過程で、バースト的なアクセスを発生させる。図９は、典型的なバーストアクセスの様子を示した例である。図９中の"Ｔｂ"はバーストが発生する周期である。ＭＰＥＧのマクロブロック単位で信号処理を行うコーデックであれば、Ｔｂは１マクロブロック時間に相当するサイクル数で表される。"Ｎｐ"は１回のバーストで発生するアクセスの回数である。Ｎｐはバーストアクセスの開始に先立って、コーデックの処理アルゴリズムにより決定される。 On the other hand, ENC and DEC that execute video and audio encoding and decoding algorithms typified by MPEG generate bursty access in the process of signal processing. FIG. 9 is an example showing a typical burst access. “Tb” in FIG. 9 is a cycle in which a burst occurs. For a codec that performs signal processing in units of MPEG macroblocks, Tb is represented by the number of cycles corresponding to one macroblock time. “Np” is the number of accesses that occur in one burst. Np is determined by the codec processing algorithm prior to the start of burst access.

またＣＰＵは、ユーザとのインタラクション処理やインターネットのブラウジング処理などに用いられる。アクセスが発生するタイミングやアクセス量を事前に見積もることは困難であるが、他のイニシエータと比較すると、ＣＰＵの処理に関してはリアルタイム性に対する要求は厳しくない。 The CPU is also used for user interaction processing, Internet browsing processing, and the like. Although it is difficult to estimate the timing at which access occurs and the amount of access in advance, compared to other initiators, the demand for real-time processing is not strict regarding CPU processing.

ネットワークバスを構成するルータは、パケットヘッダに格納された宛先に従って出力ポートを決定し、パケットの転送を行っていく。１パケットを転送するために必要なサイクル数Ｔｐは、内部のハードウェア構成とパケットサイズとによって異なる。 The router configuring the network bus determines the output port according to the destination stored in the packet header, and transfers the packet. The number of cycles Tp required to transfer one packet differs depending on the internal hardware configuration and the packet size.

図１０は、ＤＭＡＣ、ＥＮＣ、ＤＥＣが同時にＭＥＭ１に対してアクセス要求を行った場合に、ＤＭＡＣのアクセスが遅延する様子を示す。図面の上からＥＮＣ、ＤＥＣ、ＤＭＡＣ、Ｒ１およびＲ３がパケットを出力していること、および、パケットを出力した時刻（サイクル）が示されている。なお図中で各パケットに振られた番号は、パケットを区別するための番号であり、説明の都合上付加したものに過ぎない。 FIG. 10 shows how DMAC access is delayed when DMAC, ENC, and DEC simultaneously request access to MEM1. From the top of the drawing, ENC, DEC, DMAC, R1 and R3 output packets, and the time (cycle) at which the packets are output. In the figure, the numbers assigned to the packets are numbers for distinguishing the packets, and are merely added for convenience of explanation.

図１０の例では、ＥＮＣとＤＥＣが同一のタイミングでパケット１とパケット２のデータを送信し、続くサイクルでパケット３とパケット４を送信し、続くサイクルでパケット５とパケット６を送信している。そして、パケット５および６が出力されたタイミングと同一のタイミングでＤＭＡＣがパケット７を送信している。 In the example of FIG. 10, ENC and DEC transmit data of packet 1 and packet 2 at the same timing, transmit packet 3 and packet 4 in the subsequent cycle, and transmit packet 5 and packet 6 in the subsequent cycle. . The DMAC transmits the packet 7 at the same timing as when the packets 5 and 6 are output.

以下の説明では、ルータＲ１およびＲ３は、ＥＮＣ、ＤＥＣからの送信パケットとの干渉待合が発生しなければ、１パケットの転送に１サイクルを要すると仮定する。つまり、ＤＭＡＣから出力されたパケット７は２サイクル後に、ＭＥＭ１に到着するものとする。 In the following description, it is assumed that the routers R1 and R3 require one cycle to transfer one packet unless interference waiting with a transmission packet from ENC and DEC occurs. That is, assume that the packet 7 output from the DMAC arrives at MEM1 after two cycles.

ＥＮＣとＤＭＡＣが送信したパケットは、１サイクル後にルータＲ１に受信され、後段のルータＲ３へと中継される。一方、ＥＮＣからのパケット５の送信と同時にＤＭＡＣから送信されたパケット７は、ＥＮＣからのパケット５と同時にルータＲ１を通過できない。そこで、たとえばパケット７はパケット５の後に中継される。このときパケット７には１サイクルの遅延が発生することになる。 The packet transmitted by the ENC and DMAC is received by the router R1 after one cycle and relayed to the subsequent router R3. On the other hand, the packet 7 transmitted from the DMAC simultaneously with the transmission of the packet 5 from the ENC cannot pass through the router R1 simultaneously with the packet 5 from the ENC. Thus, for example, the packet 7 is relayed after the packet 5. At this time, a delay of one cycle occurs in the packet 7.

一方、ＤＥＣは、ＥＮＣの送信開始のタイミングと同一のタイミングでパケット２、パケット４、パケット６を順に送信している。パケット２、４、６はルータＲ２を経由してルータＲ３に送信される。ルータＲ３は、ルータＲ１の中継パケットと共にパケット２、４、６を中継する。ルータＲ３は、ルータＲ１からのパケットと、ＤＥＣからのパケットを交互に出力する。 On the other hand, the DEC transmits packet 2, packet 4, and packet 6 in order at the same timing as the transmission start timing of ENC. Packets 2, 4, and 6 are transmitted to router R3 via router R2. Router R3 relays packets 2, 4, and 6 together with the relay packet of router R1. Router R3 alternately outputs packets from router R1 and packets from DEC.

図１０のルータＲ３のパケット出力タイミングによれば、ルータＲ３によって中継され、ＭＥＭ１に到着するパケット７には、最速でＭＥＭ１に到達する２サイクルを基準として、４サイクル分の遅延が発生していることが分かる。この遅延はルータ網上でのパケット競合に起因している。したがって、ネットワークバスの段数が増えると遅延は更に大きくなり、ＤＭＡＣのメモリアクセスのレイテンシが許容時間を超える状況が発生する。 According to the packet output timing of the router R3 in FIG. 10, the packet 7 that is relayed by the router R3 and arrives at MEM1 has a delay of 4 cycles based on the two cycles that reach the MEM1 at the fastest speed. I understand that. This delay is due to packet contention on the router network. Therefore, as the number of stages of the network bus increases, the delay further increases and a situation occurs in which the latency of DMAC memory access exceeds the allowable time.

一方、図１１は、ＤＭＡＣ、ＥＮＣ、ＤＥＣが送信間隔を調整しながらＭＥＭ１に対してアクセス要求を行った場合の、ＤＭＡＣのアクセスが遅延する様子を示す。図１１に示すように、ＤＭＡＣ、ＥＮＣ、ＤＥＣが送信間隔を調整するとＤＭＡＣのレイテンシは大幅に短縮される。ＥＮＣおよびＤＥＣは送信パケット間隔を空けてパケット１〜６を送信している。一方、ＤＭＡＣは、ＥＮＣおよびＤＥＣからパケットが送信されていないタイミングでパケット７を送信している。ＤＭＡＣが送信したパケット７は、ＥＮＣとＤＥＣからの送信パケットの間の空きサイクルを利用して最短のレイテンシである２サイクルでＭＥＭ１に到着している。 On the other hand, FIG. 11 shows how DMAC access is delayed when DMAC, ENC, and DEC make an access request to MEM1 while adjusting the transmission interval. As shown in FIG. 11, when DMAC, ENC, and DEC adjust the transmission interval, the latency of DMAC is significantly shortened. ENC and DEC transmit packets 1 to 6 with a transmission packet interval. On the other hand, the DMAC transmits the packet 7 at a timing when no packet is transmitted from the ENC and the DEC. The packet 7 transmitted by the DMAC arrives at the MEM 1 in two cycles, which is the shortest latency, using the empty cycle between the transmission packets from the ENC and the DEC.

本実施形態では、ＥＮＣ、ＤＥＣは、アクセス負荷の状態とアクセスの進捗度合いを用いて動的にパケットの送信間隔を決定する。決定のタイミングは送信間隔の算出演算に要するサイクル数と演算ハードウェア資源及び稼働中のアクセスフローによって規定される最小のパケット転送サイクル数によって決定される。 In this embodiment, ENC and DEC dynamically determine the packet transmission interval using the state of access load and the degree of progress of access. The timing of determination is determined by the number of cycles required for calculation of the transmission interval, the minimum number of packet transfer cycles defined by the operation hardware resources and the access flow in operation.

図１２は、イニシエータ側のＮＩＣ１２０の基本的な構成を示す。ＮＩＣ１２０は、たとえば図４のＮＩＣ１〜４である。なお、以下に説明する機能に鑑みて、ＮＩＣ１２０は「送信間隔制御装置」と呼ぶこともある。 FIG. 12 shows a basic configuration of the NIC 120 on the initiator side. The NIC 120 is, for example, the NICs 1 to 4 in FIG. In view of the functions described below, the NIC 120 may be referred to as a “transmission interval control device”.

ＮＩＣ１２０は、パケット化部１２１と、脱パケット化部１２２と、パケットバッファ１２３と、パケット送受信部１２４と、送信間隔制御部１２５とを備えている。 The NIC 120 includes a packetization unit 121, a depacketization unit 122, a packet buffer 123, a packet transmission / reception unit 124, and a transmission interval control unit 125.

パケット化部１２１は、イニシエータから送信されたデータを受け取り、そのデータにパケットヘッダを付加し、パケット化する。たとえば、パケット化部１２１は、イニシエータから送信された要求トランザクション１０（図５）を受け取り、パケットヘッダＰＨを付加してパケット化された要求トランザクション１２（図６）を生成する。この処理はパケット化処理とも呼ばれる。 The packetizing unit 121 receives data transmitted from the initiator, adds a packet header to the data, and packetizes the data. For example, the packetizing unit 121 receives the request transaction 10 (FIG. 5) transmitted from the initiator, and generates a packetized request transaction 12 (FIG. 6) by adding a packet header PH. This processing is also called packetization processing.

脱パケット化部１２２は、パケット化部１２１の処理と逆の処理を行う。すなわち、脱パケット化部１２２は、ルータから受け取ったパケットからパケットヘッダを除去する。この処理は脱パケット化処理とも呼ばれる。 The depacketizing unit 122 performs processing reverse to the processing of the packetizing unit 121. That is, the depacketizer 122 removes the packet header from the packet received from the router. This process is also called a depacketization process.

パケットバッファ１２３は、パケットを一時的に格納するために設けられたバッファである。 The packet buffer 123 is a buffer provided for temporarily storing packets.

パケット送受信部１２４は、パケットの送受信に関連する処理を行う。パケットの送信時には、パケット送受信部１２４は、パケットバッファ１２３に格納されたパケットデータを送信データバス幅ごとに読み出して、パケットの送信処理を行う。パケットの受信時には、パケット送受信部１２４は、受信データバス幅毎に受信されたデータをパケットバッファ１２３に格納していくことでパケットを再構築する。 The packet transmitting / receiving unit 124 performs processing related to packet transmission / reception. At the time of packet transmission, the packet transmitting / receiving unit 124 reads the packet data stored in the packet buffer 123 for each transmission data bus width, and performs packet transmission processing. When receiving a packet, the packet transmitting / receiving unit 124 reconstructs the packet by storing the data received for each received data bus width in the packet buffer 123.

送信間隔制御部１２５の詳細は図１３を参照しながら後に詳細に説明する。 Details of the transmission interval control unit 125 will be described later in detail with reference to FIG.

パケット化部１２１、脱パケット化部１２２、パケットバッファ１２３およびパケット送受信部１２４は、一般的なＮＩＣに存在する機能である。これらの構成要素に基づくＮＩＣ１２０の処理を概説する。 The packetization unit 121, the depacketization unit 122, the packet buffer 123, and the packet transmission / reception unit 124 are functions existing in a general NIC. The processing of the NIC 120 based on these components will be outlined.

パケット送信時の処理は以下のとおりである。パケット化部１２１はイニシエータからデータを受け取り、そのデータにパケットヘッダを付加してパケット化処理を行う。パケットバッファ１２３はそのパケットを一時的に格納する。パケット送受信部１２４はパケットバッファ１２３に保持されたパケットデータを送信データバス幅ごとに読み出してパケットの送信処理を行う。 Processing at the time of packet transmission is as follows. The packetization unit 121 receives data from the initiator, adds a packet header to the data, and performs packetization processing. The packet buffer 123 temporarily stores the packet. The packet transmission / reception unit 124 reads the packet data held in the packet buffer 123 for each transmission data bus width, and performs packet transmission processing.

パケットの受信時の処理は以下のとおりである。パケット送受信部１２４は受信データバス幅毎にデータを受信し、パケットバッファ１２３に格納する、これにより、パケットが再構築される。脱パケット化部１２２は、得られたパケットからパケットヘッダを除去し、残されたデータをイニシエータに出力する。 Processing at the time of packet reception is as follows. The packet transmitting / receiving unit 124 receives data for each received data bus width and stores the data in the packet buffer 123, whereby the packet is reconstructed. The depacketizer 122 removes the packet header from the obtained packet, and outputs the remaining data to the initiator.

次に、送信間隔制御部１２５を説明する。 Next, the transmission interval control unit 125 will be described.

送信間隔制御部１２５は、本実施形態にかかるＮＩＣ１２０に固有の機能要素である。 The transmission interval control unit 125 is a functional element unique to the NIC 120 according to the present embodiment.

図１３は送信間隔制御部１２５の基本構成を示す。送信間隔制御部１２５は、許容遅延演算部１３１と、調整方法選択部１３２と、送信間隔決定部１３３とを有している。 FIG. 13 shows a basic configuration of the transmission interval control unit 125. The transmission interval control unit 125 includes an allowable delay calculation unit 131, an adjustment method selection unit 132, and a transmission interval determination unit 133.

許容遅延演算部１３１は、バーストアクセスの進捗情報に基づいて、リアルタイム性が確保可能な通信間隔の調整範囲（許容遅延量）を演算する。進捗情報は、許容遅延演算部１３１が保持する情報である。具体的には、進捗情報は、バーストアクセスの残数（アクセス残数）である。以下では主として「アクセス残数」という語を用いて説明する。 The allowable delay calculation unit 131 calculates a communication interval adjustment range (allowable delay amount) that can ensure real-time performance based on the progress information of burst access. The progress information is information held by the allowable delay calculation unit 131. Specifically, the progress information is the remaining number of burst accesses (access remaining number). In the following description, the term “remaining access number” will be mainly used.

調整方法選択部１３２は、負荷情報に基づいて通信間隔の調整方法を選択し、間隔制御情報を生成する。 The adjustment method selection unit 132 selects a communication interval adjustment method based on the load information, and generates interval control information.

送信間隔決定部１３３は、調整範囲情報と間隔制御情報に基づいて通信間隔を決定し、データの送信許可情報を生成する。 The transmission interval determination unit 133 determines a communication interval based on the adjustment range information and the interval control information, and generates data transmission permission information.

ここで、送信間隔制御部１２５を含む、イニシエータ側のＮＩＣ１２０の処理の概要を説明する。 Here, an overview of processing of the NIC 120 on the initiator side including the transmission interval control unit 125 will be described.

図１４は、イニシエータ側のＮＩＣ１２０の処理の手順を示すフローチャートである。 FIG. 14 is a flowchart showing a processing procedure of the NIC 120 on the initiator side.

ステップＳ１００において、送信間隔制御部１２５は、最小パケットサイズａを示す情報を取得する。最小パケットサイズとは、データアクセスのためにターゲットに送信されるパケットの最小のサイズをいう。最小パケットサイズａは、データアクセスの種類やターゲットに応じて変動し得る。 In step S100, the transmission interval control unit 125 acquires information indicating the minimum packet size a. The minimum packet size refers to the minimum size of a packet transmitted to the target for data access. The minimum packet size a may vary depending on the type of data access and the target.

ステップＳ１０２において、許容遅延演算部１３１は、最小パケットサイズａ、許容遅延演算部１３１に設けられた除算回路の並列数Ｎおよび各除算回路が演算に要するサイクル数ｎに基づいて、制御間隔ｐを演算する。なお、並列数Ｎおよびサイクル数ｎは、後に図１５を参照しながら説明するように許容遅延演算部１３１内に保持されている。 In step S102, the allowable delay calculation unit 131 sets the control interval p based on the minimum packet size a, the parallel number N of the division circuits provided in the allowable delay calculation unit 131, and the number of cycles n required for each division circuit to calculate. Calculate. Note that the parallel number N and the cycle number n are held in the allowable delay calculation unit 131 as described later with reference to FIG.

ステップＳ１０４において、許容遅延演算部１３１は、制御間隔ｐに基づくタイミングで、許容遅延Ｉを求める。この「制御間隔ｐに基づくタイミング」とは、ｐ個のパケットを送信する毎に、という意味である。求められた許容遅延Ｉはパケット送信の遅延をどの程度許容できるかを示す調整範囲情報として利用される。 In step S104, the allowable delay calculation unit 131 obtains the allowable delay I at a timing based on the control interval p. The “timing based on the control interval p” means that every time p packets are transmitted. The obtained allowable delay I is used as adjustment range information indicating how much packet transmission delay can be tolerated.

調整範囲情報が得られると、送信間隔決定部１３３はステップＳ１０６の処理を実行する。ステップＳ１０６において、送信間隔決定部１３３は、調整範囲情報およびネットワークバスの負荷（レイテンシ）の情報を受け取る。そして、調整範囲および負荷（レイテンシ）に応じてパケットの送信間隔を調整する。パケットの送信間隔に合致する場合には、送信間隔決定部１３３は、イニシエータからのデータ（アクセス要求に対応する要求トランザクション）の送信を許可するか禁止するかを決定し、その結果を示す情報を出力する。 When the adjustment range information is obtained, the transmission interval determination unit 133 executes the process of step S106. In step S106, the transmission interval determining unit 133 receives the adjustment range information and the network bus load (latency) information. Then, the packet transmission interval is adjusted according to the adjustment range and the load (latency). If it matches the packet transmission interval, the transmission interval determination unit 133 determines whether to permit or prohibit transmission of data (request transaction corresponding to the access request) from the initiator, and provides information indicating the result. Output.

ステップＳ１０８において、パケット化部１２１は、データの送信を許可されたイニシエータからデータを受け取り、パケット化する。 In step S108, the packetizing unit 121 receives data from the initiator permitted to transmit data and packetizes it.

そしてステップＳ１１０において、パケット送受信部１２４は、生成されたパケットを送信する。 In step S110, the packet transmitting / receiving unit 124 transmits the generated packet.

ステップＳ１０４からＳ１１０までの処理を具体的に説明すると以下のとおりである。 The process from step S104 to S110 will be specifically described as follows.

たとえば、制御間隔ｐに基づくタイミングで求められた許容遅延Ｉが負であるとする。これは、遅延は許容されないことを意味する。このとき、パケットの転送サイクルが完了する度に送信間隔決定部１３３はイニシエータからのデータの送信を許可し（ステップＳ１０８）、パケット送受信部１２４は、遅延なく次のデータ（パケット）を送信する（ステップＳ１１０）。 For example, it is assumed that the allowable delay I obtained at the timing based on the control interval p is negative. This means that no delay is allowed. At this time, every time a packet transfer cycle is completed, the transmission interval determination unit 133 permits transmission of data from the initiator (step S108), and the packet transmission / reception unit 124 transmits the next data (packet) without delay ( Step S110).

一方、許容遅延Ｉが０以上であるときは、調整方法選択部１３２はその時点でのネットワークバスの負荷（レイテンシ）に応じてパケットの通信間隔を決定する。負荷が比較的大きければ、ネットワークバスが混雑しているため、パケットの送信間隔を拡大する。一方、負荷が比較的小さければ、ネットワークバスは混雑していないため、パケットの送信間隔を縮小する。決定された通信間隔に応じて、送信間隔決定部１３３はデータの送信を許可または禁止する情報をイニシエータに出力する。 On the other hand, when the allowable delay I is 0 or more, the adjustment method selection unit 132 determines the packet communication interval according to the load (latency) of the network bus at that time. If the load is relatively large, the network bus is congested, so the packet transmission interval is expanded. On the other hand, if the load is relatively small, the network bus is not congested, so the packet transmission interval is reduced. In accordance with the determined communication interval, the transmission interval determination unit 133 outputs information that permits or prohibits data transmission to the initiator.

上述の説明から理解されるように、制御間隔ｐは、許容遅延Ｉを再計算するタイミングを示している。許容遅延Ｉが得られると、パケットの送信間隔を調整する処理が行われる。よって、制御間隔ｐは、パケットの送信間隔を調整するためのパラメータであるといえる。 As can be understood from the above description, the control interval p indicates the timing at which the allowable delay I is recalculated. When the allowable delay I is obtained, processing for adjusting the packet transmission interval is performed. Therefore, it can be said that the control interval p is a parameter for adjusting the packet transmission interval.

図１５は、許容遅延演算部１３１の基本構成を示す。許容遅延演算部１３１は、演算実行部１４１と、演算結果格納部１４２と、許容遅延算出部１４３と、制御間隔決定部１４４とを有している。 FIG. 15 shows a basic configuration of the allowable delay calculation unit 131. The allowable delay calculation unit 131 includes a calculation execution unit 141, a calculation result storage unit 142, an allowable delay calculation unit 143, and a control interval determination unit 144.

演算実行部１４１は、サイクル数を要する除算実行を予測的に並列処理する。本実施形態では、演算実行部１４１が並列的に設けられることを想定している。並列的に設けられる演算実行部１４１の数は、後に「Ｎ」という記号で表される。 The operation execution unit 141 performs predictive parallel processing of division execution that requires the number of cycles. In the present embodiment, it is assumed that the calculation execution unit 141 is provided in parallel. The number of operation execution units 141 provided in parallel is later represented by a symbol “N”.

演算結果格納部１４２は、演算結果を送信間隔の算出時点まで記憶する。 The calculation result storage unit 142 stores the calculation result until the transmission interval is calculated.

許容遅延算出部１４３は、送信間隔の調整範囲情報を算出する。 The allowable delay calculation unit 143 calculates transmission interval adjustment range information.

制御間隔決定部１４４は、イニシエータからアクセスされる最小のパケットサイズ、演算実行部の演算に費やされる処理サイクル数及びハードウェアの並列処理可能数から最適な制御間隔を決定する。 The control interval determination unit 144 determines an optimal control interval from the minimum packet size accessed from the initiator, the number of processing cycles consumed for the calculation of the calculation execution unit, and the number of hardware parallel processes.

以下、図１６を参照しながら、制御間隔決定部１４４の処理を説明する。 Hereinafter, the processing of the control interval determination unit 144 will be described with reference to FIG.

図１６は、制御間隔決定部１４４の処理の手順を示すフローチャートである。 FIG. 16 is a flowchart illustrating a processing procedure of the control interval determination unit 144.

ステップＳ２００において、制御間隔決定部１４４は制御間隔ｐを更新するか否かを判断する。最小パケットサイズａが変更された場合は、制御間隔決定部１４４は制御間隔ｐを更新すると判断する。更新すると判断しない場合には、制御間隔決定部１４４は次の処理に進まない。 In step S200, the control interval determination unit 144 determines whether to update the control interval p. When the minimum packet size a is changed, the control interval determination unit 144 determines to update the control interval p. If it is not determined to update, the control interval determination unit 144 does not proceed to the next process.

なお、最小パケットサイズａを示す情報は、ＮＩＣ１２０上のレジスタ（図示せず）に保持されている。最小パケットサイズａが変更されたか否かは、制御間隔決定部１４４が当該レジスタをモニタして検出すればよい。 Information indicating the minimum packet size a is held in a register (not shown) on the NIC 120. Whether or not the minimum packet size a has been changed may be detected by the control interval determination unit 144 monitoring the register.

ステップＳ２０２において、制御間隔決定部１４４は、後述する数８に示す天井関数を用いて、制御間隔ｐを求める。 In step S202, the control interval determination unit 144 obtains the control interval p using a ceiling function shown in Equation 8 described later.

ステップＳ２０４において、制御間隔決定部１４４は、求めた制御間隔ｐを内部メモリまたはレジスタ（図示せず）に保持する。 In step S204, the control interval determination unit 144 holds the obtained control interval p in an internal memory or a register (not shown).

図１７は、送信間隔制御部１２５が管理するパラメータを記録するためのレジスタ群の例を示す。以下、各レジスタに設定されるパラメータの意味を説明する。 FIG. 17 shows an example of a register group for recording parameters managed by the transmission interval control unit 125. Hereinafter, the meaning of the parameters set in each register will be described.

Ｔｂはバーストアクセスが発生する周期であり、Ｔｐはトランザクションの転送に要するサイクル数である。たとえば、１２８バイトのトランザクションを６４ビットバスで送信する場合であれば、サイクル数１２８／６４＊８＝１６サイクルとなる。 Tb is a cycle in which burst access occurs, and Tp is the number of cycles required for transaction transfer. For example, if a 128-byte transaction is transmitted on a 64-bit bus, the number of cycles is 128/64 * 8 = 16 cycles.

Ｎｐは単一のバースト周期内に生成されるアクセスの回数である。Ｓｂはバースト周期の開始時刻のサイクル数を格納するためのレジスタであり、Ｒｐはバースト周期内に生成されるべきアクセス数Ｎｐから既に生成されたアクセス数を差し引いたアクセスの残数である。ＲｃはＲｐ回のアクセスを完了するまでに残されたサイクル残数である。またＧは次回のアクセスまでの送信間隔であり、Ｌは直近のレイテンシである。 Np is the number of accesses generated within a single burst period. Sb is a register for storing the number of cycles at the start time of the burst period, and Rp is the remaining number of accesses obtained by subtracting the number of accesses already generated from the number of accesses Np to be generated within the burst period. Rc is the number of remaining cycles until Rp access is completed. G is a transmission interval until the next access, and L is the latest latency.

パラメータＴｂ、Ｔｐ、Ｎｐは、イニシエータから要求されるメモリアクセス特性を決定する。これらは、メモリへのアクセス開始に先立ってイニシエータによって各レジスタ上で初期化される。またパラメータＳｂ、Ｒｐ、Ｒｃ、Ｇ、Ｌは、送信間隔制御部１２５が管理する内部パラメータを記録するために用いられる。チップの電源投入時またはリセット時に各レジスタ上でゼロクリアされる。また以下の説明で登場するＴｃは、サイクルカウンタの現在値であり、システム上での現在時刻を表す。サイクルカウンタは、チップの電源投入時またはリセット時にゼロにクリアされ、１クロックサイクル毎にインクリメントされるｎビットのカウンタで良い。 Parameters Tb, Tp, and Np determine the memory access characteristics required from the initiator. These are initialized on each register by the initiator prior to the start of access to the memory. The parameters Sb, Rp, Rc, G, and L are used to record internal parameters managed by the transmission interval control unit 125. Cleared to zero on each register when the chip is powered on or reset. Tc appearing in the following description is the current value of the cycle counter and represents the current time on the system. The cycle counter may be an n-bit counter that is cleared to zero when the chip is powered on or reset and incremented every clock cycle.

以下、送信間隔制御部１２５の動作を説明する。 Hereinafter, the operation of the transmission interval control unit 125 will be described.

図１８は、送信間隔制御部１２５の処理の手順を示すフローチャートである。この処理は、バーストアクセス区間中のアクセス要求毎に繰り返される。 FIG. 18 is a flowchart illustrating a processing procedure of the transmission interval control unit 125. This process is repeated for each access request in the burst access period.

まず、イニシエータからターゲットへのアクセス要求が発生し、ＮＩＣ１２０がトランザクションを受け取ったことを、送信間隔制御部１３３が検出すると、処理が開始される。たとえば送信間隔制御部１３３は、パケットバッファ１２３（図１２）をモニタしている。送信間隔制御部１３３は、トランザクションのデータの存在を検出することにより、上述のアクセス要求およびトランザクションの受信を検出する。または、送信間隔制御部１３３がパケット送受信部１２４またはパケット化部１２１から通知を受けることにより、上述の検出を行ってもよい。 First, when a transmission request from the initiator to the target is generated and the transmission interval control unit 133 detects that the NIC 120 has received a transaction, the processing is started. For example, the transmission interval control unit 133 monitors the packet buffer 123 (FIG. 12). The transmission interval control unit 133 detects the reception of the above-described access request and transaction by detecting the presence of transaction data. Alternatively, the above-described detection may be performed when the transmission interval control unit 133 receives a notification from the packet transmitting / receiving unit 124 or the packetizing unit 121.

上述のようにアクセス要求が検出されると処理が開始される。 As described above, when an access request is detected, processing is started.

ステップＳ２において、送信間隔制御部１３３は、アクセス残数レジスタＲｐを読み出す。アクセス残数レジスタＲｐがＲｐ＝０のときは処理はステップＳ４に進み、Ｒｐ＝１のときは処理はステップＳ８に進み、Ｒｐの値がそれら以外のときは処理はステップＳ６に進む。このように処理をアクセス残数レジスタＲｐの値に応じて変えている理由は、バーストアクセスの種類に応じて適切な処理を行うためである。 In step S2, the transmission interval control unit 133 reads the remaining access number register Rp. If the remaining access number register Rp is Rp = 0, the process proceeds to step S4. If Rp = 1, the process proceeds to step S8. If the value of Rp is other than those, the process proceeds to step S6. The reason why the process is changed according to the value of the remaining access number register Rp is to perform an appropriate process according to the type of burst access.

図１９は、バーストアクセスの種類を示している。図１９に示すように、Ｒｐ＝０の場合には、バースト周期内での先頭アクセスであることを意味する。またＲｐ＝１の場合には、バースト周期内での最後尾（最終の）アクセスであることを意味する。Ｒｐの値がそれら以外の場合には、バースト中のアクセス（中間アクセス）であることを意味する。適切な動作になるよう、本実施形態においてはそれぞれの場合で各部の行う処理内容を異ならせている。 FIG. 19 shows the types of burst access. As shown in FIG. 19, when Rp = 0, it means that the head access is within the burst period. Further, when Rp = 1, it means that it is the last (final) access within the burst period. When the value of Rp is other than those, it means that the access is in a burst (intermediate access). In this embodiment, the contents of processing performed by each unit are different in each case so that an appropriate operation is performed.

再び図１７を参照する。ステップＳ４は先頭アクセスの場合の処理である。許容遅延演算部１３１は、数２〜数５に従い、関連パラメータの初期化処理を行う。

Refer to FIG. 17 again. Step S4 is processing in the case of head access. The permissible delay calculation unit 131 performs the related parameter initialization processing according to Equations 2 to 5.

数４におけるＴｂ’はバースト周期Ｔｂから時間余裕として確保するべきマージン量を差し引いた値である。イニシエータは、バースト周期内の全てのアクセスを完了した後にデータ処理を行うための時間マージンを必要とする。確保するべき時間マージンを決定する主な要因はその処理に要する最大のサイクル数である。データ格納用のメモリ領域とデータ処理用のメモリ領域が分離されているようなプログラムでは、Ｔｂ’およびＴｂを同じ値にすることができる。 Tb ′ in Equation 4 is a value obtained by subtracting a margin amount to be secured as a time margin from the burst period Tb. The initiator needs a time margin for performing data processing after completing all accesses within the burst period. The main factor that determines the time margin to be secured is the maximum number of cycles required for the processing. In a program in which a memory area for data storage and a memory area for data processing are separated, Tb 'and Tb can be set to the same value.

ステップＳ４の処理が完了すると、処理はステップＳ３２に進む。 When the process of step S4 is completed, the process proceeds to step S32.

次に、ステップＳ６の処理を説明する。ステップＳ６は中間アクセスの場合の処理である。 Next, the process of step S6 will be described. Step S6 is processing in the case of intermediate access.

ステップＳ６において許容遅延演算部１３１は、数６に従い、サイクル残数Ｒｃを更新する。

In step S <b> 6, the allowable delay calculation unit 131 updates the remaining cycle number Rc according to Equation 6.

さらにステップＳ１０において許容遅延演算部１３１は、数７に従い、許容遅延（または調整範囲）Ｉを計算する。

In step S10, the allowable delay calculation unit 131 calculates the allowable delay (or adjustment range) I according to Equation 7.

許容遅延演算部１３１は、許容遅延Ｉの値を調整範囲情報として調整方法選択部１３２および送信間隔決定部１３３に通知する。 The allowable delay calculation unit 131 notifies the value of the allowable delay I to the adjustment method selection unit 132 and the transmission interval determination unit 133 as adjustment range information.

許容遅延Ｉの値の計算は、大きく２つのフェーズに分離できる。第一のフェーズは１／Ｒｐを算出するフェーズであり、第二のフェーズは、積和演算により許容遅延Ｉを算出するフェーズである。言うまでもなく、演算コストに関しては除算演算を含む第一のフェーズの方が高い。本実施形態では、第一のフェーズに要するサイクル数を９サイクル、第二のフェーズに要するサイクル数を２サイクルとして説明する。第一のフェーズは演算実行部１４１によって実行され、第二のフェーズは許容遅延算出部１４３によって実行される。 The calculation of the value of the allowable delay I can be roughly divided into two phases. The first phase is a phase for calculating 1 / Rp, and the second phase is a phase for calculating an allowable delay I by a product-sum operation. Needless to say, the calculation cost is higher in the first phase including division. In the present embodiment, the number of cycles required for the first phase is 9 cycles, and the number of cycles required for the second phase is 2 cycles. The first phase is executed by the calculation execution unit 141, and the second phase is executed by the allowable delay calculation unit 143.

続いて、図２０〜図２４を参照する。これらの図面の説明に関連して、以下では「フリット」という概念を用いる。「１フリット」は、１バスクロック毎に転送が可能なデータの伝送単位である。１フリットのサイズは、１パケットのサイズ以下である。本実施形態では、１パケットは、複数のフリットに分割されて伝送されるとする。 Subsequently, reference will be made to FIGS. In connection with the description of these drawings, the concept of “frit” is used below. "1 flit" is a data transmission unit that can be transferred every bus clock. The size of one flit is equal to or smaller than the size of one packet. In the present embodiment, it is assumed that one packet is divided into a plurality of flits and transmitted.

なお、パケットサイズは、１パケットを転送するのに要するサイクル数と比例している。当該サイクル数が決まれば、１フリットのデータサイズとサイクル数との積によってパケットサイズを求めることができる。 The packet size is proportional to the number of cycles required to transfer one packet. If the number of cycles is determined, the packet size can be obtained from the product of the data size of 1 flit and the number of cycles.

図２０は、イニシエータからのデータアクセスの最小パケットサイズが９フリットであるとしたときのタイミング図を示す。最上段の矩形波形はバスクロックの状態を示す。最上段に添えて記載されている記号ｔは、１バスクロック（１サイクル）を単位時間としたときの経過時間を示す。 FIG. 20 is a timing chart when the minimum packet size for data access from the initiator is 9 flits. The uppermost rectangular waveform indicates the state of the bus clock. A symbol t written at the top indicates an elapsed time when one bus clock (one cycle) is a unit time.

二段目の各矩形はフリットを示している。各矩形には、たとえば「Ｐ０」、「Ｆ０」のような、ＰおよびＦを用いた記号が示されている。１つのフリット内のＰおよびＦを、本明細書では便宜的に「Ｐ−Ｆ」のようにハイフンでつなげて表記する。たとえば最初のフリットは「Ｐ０−Ｆ０」のように表記する。 Each rectangle in the second row indicates a frit. In each rectangle, symbols using P and F such as “P0” and “F0” are shown. In this specification, P and F in one frit are represented by connecting them with a hyphen as “PF” for convenience. For example, the first frit is expressed as “P0-F0”.

Ｐ０−Ｆ０は、０番目のパケット（パケット０）を構成する０番目のフリット（パケット０のヘッダフリット）、Ｐ０−Ｆ８は０番目のパケットの９番目のフリット（パケット０のトレイラーフリット）を示す。次のパケット１（Ｐ１）の送信が開始されるタイミングは、最短でｔ＝１０であることが分かる。したがって、ｔ＝１０の時点で許容遅延Ｉの更新が完了していることが必要になる。 P0-F0 indicates the 0th flit constituting the 0th packet (packet 0) (header flit of packet 0), and P0-F8 indicates the 9th flit of the 0th packet (trailer flit of packet 0). . It can be seen that the timing for starting transmission of the next packet 1 (P1) is t = 10 at the shortest. Therefore, it is necessary that the allowable delay I has been updated at the time t = 10.

最下段は、演算等の処理に要する時間（サイクル）である。第一のフェーズである除算演算はｔ＝８のタイミングで完了するよう、ｔ＝−１の時点において予測的に開始される。その後、第二のフェーズで積和演算が実行され、許容遅延Ｉが算出される。第二のフェーズである積和演算の完了は、許容遅延Ｉの算出完了（更新）を意味する。許容遅延Ｉの算出はｔ＝１０で完了する。よって、続くパケット１（Ｐ１）の送信間隔を決定するための調整範囲を制限時間内に算出できている。 The lowest level is a time (cycle) required for processing such as calculation. The division operation, which is the first phase, is started predictively at time t = −1 so that it is completed at timing t = 8. Thereafter, the product-sum operation is executed in the second phase, and the allowable delay I is calculated. Completion of the product-sum operation as the second phase means completion of calculation (update) of the allowable delay I. The calculation of the allowable delay I is completed at t = 10. Therefore, the adjustment range for determining the transmission interval of the subsequent packet 1 (P1) can be calculated within the time limit.

続くパケット２（Ｐ２）の送信間隔を決定するための演算が開始される。パケット２が送信されるタイミングは、最短でｔ＝１９である。第一のフェーズである除算はｔ＝８からｔ＝１７のサイクル期間中に実行され、第二のフェーズで２サイクルを消費した後、ｔ＝１９で許容遅延Ｉを更新する。このような演算サイクルを確保できれば、続くパケット３の送信間隔を決定するための許容遅延（調整範囲）を制限時間内に算出することが可能である。 Calculation for determining the transmission interval of the subsequent packet 2 (P2) is started. The timing at which the packet 2 is transmitted is t = 19 at the shortest. The first phase division is performed during the cycle period from t = 8 to t = 17. After consuming 2 cycles in the second phase, the allowable delay I is updated at t = 19. If such a calculation cycle can be secured, it is possible to calculate an allowable delay (adjustment range) for determining the transmission interval of the subsequent packet 3 within the time limit.

図２１は、新たにイニシエータからパケットサイズが７フリットのデータアクセスが発生したとしたときのタイミング図を示す。パケット２（Ｐ２）に対する第一の演算フェーズはｔ＝８のタイミングで完了し、第二のフェーズはｔ＝１０に完了する。よって、続くパケット２（Ｐ２）の送信間隔を決定するための調整範囲Ｉを、制限時間内に算出できているといえる。 FIG. 21 is a timing chart when a new data access with a packet size of 7 flits occurs from the initiator. The first calculation phase for packet 2 (P2) is completed at the timing t = 8, and the second phase is completed at t = 10. Therefore, it can be said that the adjustment range I for determining the transmission interval of the subsequent packet 2 (P2) can be calculated within the time limit.

しかしながら、パケットサイズが７フリットであるために、パケット３（Ｐ３）の送信タイミングは最短でｔ＝１７となる。第一の演算フェーズはｔ＝１７で完了するが、第二のフェーズが完了までにはさらに２サイクルかかり、処理遅延が生じる。よって、パケット３の送信が開始される最短タイミングまでには許容遅延Ｉの値を算出できず、リアルタイム性が確保できなくなってしまう。この例によれば、調整範囲Ｉを制限時間内に算出できるかどうかは、パケットサイズに依存することが分かる。 However, since the packet size is 7 flits, the transmission timing of packet 3 (P3) is t = 17 at the shortest. The first calculation phase is completed at t = 17, but it takes two more cycles to complete the second phase, resulting in a processing delay. Therefore, the value of the allowable delay I cannot be calculated by the shortest timing at which transmission of the packet 3 is started, and real-time performance cannot be ensured. According to this example, it can be seen that whether or not the adjustment range I can be calculated within the time limit depends on the packet size.

図２２は、演算回路を２つの回路Ａ及び回路Ｂを用いて並列化することにより、調整範囲を制限時間内に算出可能にしたときのタイミング図を示す。パケットサイズは図２１の例と同じく、７フリットである。この構成例は、図１５の演算実行部１４１の並列数が２のときの構成に対応する。 FIG. 22 shows a timing diagram when the adjustment range can be calculated within the time limit by parallelizing the arithmetic circuit using two circuits A and B. The packet size is 7 flits as in the example of FIG. This configuration example corresponds to the configuration when the number of parallel executions of the arithmetic execution unit 141 in FIG.

回路Ａがｔ＝８まで１／Ｒｐの予測演算を行い、ｔ＝１０で調整範囲Ｉを更新する間に、並列化された回路Ｂがｔ＝６からｔ＝１７の間、１／Ｒｐの予測演算を行い、調整範囲Ｉを更新する。このように第一の演算フェーズを実行する除算回路を２並列化することで、許容遅延Ｉの算出タイミングに遅延を生じず、リアルタイム性を確保できるようになる。この例によれば、調整範囲Ｉを制限時間内に算出できるかどうかは、演算回路が並列化されているか否かに依存することが分かる。 While the circuit A performs 1 / Rp prediction calculation until t = 8, and the adjustment range I is updated at t = 10, the parallelized circuit B is 1 / Rp between t = 6 and t = 17. A prediction calculation is performed, and the adjustment range I is updated. Thus, by parallelizing the division circuit that executes the first calculation phase, the calculation timing of the allowable delay I is not delayed, and real-time performance can be ensured. According to this example, it can be seen that whether the adjustment range I can be calculated within the time limit depends on whether the arithmetic circuits are parallelized.

ここで、新たにイニシエータから１パケットが４フリットで構成されるデータのアクセスが発生した場合を考える。この場合、図２２のような、並列化された演算回路が２つ設けられていたとしても、リアルタイム性を守ることができない。 Here, a case is considered where data access is newly generated from an initiator in which one packet is composed of 4 flits. In this case, even if two parallel arithmetic circuits as shown in FIG. 22 are provided, the real-time property cannot be maintained.

このようなデータアクセスが発生し得る場合には、第一の演算フェーズを実行する除算回路の更なる並列化を必要とする。これは結果として、回路面積を増大させてしまう。しかしながら、ＮｏＣが実装される製品の仕様によっては、演算回路の更なる並列化が許容される場合も考えられる。 If such data access can occur, further division of the division circuit that executes the first operation phase is required. This results in an increase in circuit area. However, depending on the specifications of the product on which NoC is mounted, there may be a case where further parallelization of the arithmetic circuit is allowed.

そこで本実施形態では、回路面積制約を受けつつも実装された、第一のフェーズを実行する除算回路の並列数Ｎに合わせて、調整範囲Ｉの更新間隔ｐを、イニシエータが送信する最小パケットサイズａに応じて制御する構成を示す。これにより、リアルタイム性を確保することが可能になる。各除算回路の演算に要するサイクル数はｎとする。 Therefore, in this embodiment, the update interval p of the adjustment range I is set to the minimum packet size transmitted by the initiator in accordance with the parallel number N of the division circuits that execute the first phase, which is implemented while receiving circuit area restrictions. The structure controlled according to a is shown. This makes it possible to ensure real time performance. The number of cycles required for each division circuit operation is n.

図２３は、並列数Ｎ＝３、最小パケットサイズａ＝２、除算回路の演算に要するサイクル数ｎ＝９のときのタイミング図を示す。このとき、制御間隔決定部１４４は、制御間隔ｐを、天井関数を用いた以下の式により決定する。

FIG. 23 shows a timing chart when the parallel number N = 3, the minimum packet size a = 2, and the number of cycles n required for the operation of the dividing circuit n = 9. At this time, the control interval determination unit 144 determines the control interval p by the following expression using a ceiling function.

天井関数は、実数ｘに対してｘ以上の最小の整数を与える関数である。上述の条件下では、ｎ／Ｎａは１.５であるため、制御間隔ｐは２と算出される。このとき、送信間隔制御部１２５は、パケットバッファ１２３を監視し、２つのパケットが送信される度に、送信間隔を更新する。なお図２３では、予測演算の開始を２つ目のパケットを構成する２つのフリットのうち、最初のフリットが送信された時点で予測演算を開始している。このように許容遅延Ｉを一定のパケット数送信毎に更新し、その許容遅延Ｉを用いて送信間隔の拡大や縮小を制御する（図１８）。これにより、アクセス残数Ｒｐは単調減少となり、１／Ｒｐを予測的に演算することが可能となる。 The ceiling function is a function that gives a minimum integer greater than or equal to x for a real number x. Under the above-mentioned conditions, since n / Na is 1.5, the control interval p is calculated as 2. At this time, the transmission interval control unit 125 monitors the packet buffer 123 and updates the transmission interval every time two packets are transmitted. In FIG. 23, the prediction calculation is started when the first flit of the two flits constituting the second packet is transmitted. In this way, the allowable delay I is updated every time a certain number of packets are transmitted, and the expansion and reduction of the transmission interval are controlled using the allowable delay I (FIG. 18). As a result, the remaining access number Rp decreases monotonously and 1 / Rp can be calculated predictively.

図２４は、たとえば図２３に示す条件下で動作中に、イニシエータが送信する最小パケットサイズが２フリット（ａ＝２）から４フリット（ａ＝４）に変更されたときのタイミング図を示す。 FIG. 24 shows a timing chart when the minimum packet size transmitted by the initiator is changed from 2 flits (a = 2) to 4 flits (a = 4) during operation under the conditions shown in FIG.

一般にＮ、ａ、ｎの値はＮＩＣ上のレジスタ等に格納される。イニシエータは最小パケットサイズに変更が生じた際に、レジスタを更新する。これにより、送信間隔制御部１４４は、上述の条件の変更に関する通知をイニシエータまたはＮＩＣから受けることができる。 In general, the values of N, a, and n are stored in a register or the like on the NIC. The initiator updates the register when the minimum packet size changes. As a result, the transmission interval control unit 144 can receive a notification regarding the change of the above-described condition from the initiator or the NIC.

送信間隔制御部１４４は、ａの値を格納するレジスタが変更を受けたことを検出すると、または、変更されたとの通知を受信すると、再度（数８）によって、制御間隔ｐを再計算して更新する。このとき、ｐ＝１となるため、１パケット毎に送信間隔が再計算されており、バスの混雑状態への追従性が向上する。このように実装上限られた並列処理数の演算回路を用いて、最も追従性の良い制御間隔を動的に計算することで、リアルタイム性を確保する上で、効率の良い制御を実現可能となる。 When the transmission interval control unit 144 detects that the register storing the value of a has been changed or receives a notification that it has been changed, the transmission interval control unit 144 recalculates the control interval p again according to (Equation 8). Update. At this time, since p = 1, the transmission interval is recalculated for each packet, and the followability to the busy state of the bus is improved. By dynamically calculating the control interval with the best follow-up performance using the arithmetic circuit with the maximum number of parallel processings, which is the upper limit of mounting in this way, efficient control can be realized in securing real-time performance. .

再び図１７を参照する。ステップＳ１２において調整方法選択部１３２は、調整範囲情報のＩの値が非負か負かを判定する。非負の場合には処理はステップＳ１４に進み、負の場合には処理はステップＳ１６に進む。 Refer to FIG. 17 again. In step S12, the adjustment method selection unit 132 determines whether the value I of the adjustment range information is non-negative or negative. If it is non-negative, the process proceeds to step S14. If negative, the process proceeds to step S16.

ステップＳ１４において、許容遅延演算部１３１は、現在のネットワークバスの負荷状況を推定するため、負荷情報を取得する。負荷の把握に利用する情報として、直近のトランザクション処理に要したレイテンシの値Ｌを用いることができる。レイテンシは、要求を出してから応答が帰ってくるまでの時間である。 In step S14, the allowable delay calculation unit 131 acquires load information in order to estimate the current load state of the network bus. As information used for grasping the load, the latency value L required for the latest transaction processing can be used. Latency is the time from when a request is made until a response is returned.

レイテンシの値Ｌの具体的な求め方の一例は以下のとおりである。まず、パケット送受信部１２４は要求トランザクションをネットワークバスに送出する時点におけるサイクルカウンタＴｃの値をパケットヘッダ内に記録しておく。ターゲット側のＮＩＣは返信トランザクションを送出する際に、先に受信していたパケットのパケットヘッダ内に記録されていたサイクルカウンタの値を、送出しようとする返信トランザクションのパケットのパケットヘッダ内に記録（コピー）する。そして許容遅延演算部１３１は、イニシエータ側のＮＩＣに返信トランザクションのパケットが到着した時点でのサイクルカウンタの値を、コピーされたサイクルカウンタの値（＝要求トランザクション送出時点でのサイクルカウンタの値）から差し引くことによって求めてもよい。 An example of a specific method for obtaining the latency value L is as follows. First, the packet transmitting / receiving unit 124 records the value of the cycle counter Tc at the time of sending the requested transaction to the network bus in the packet header. When the target NIC sends a reply transaction, it records the cycle counter value recorded in the packet header of the previously received packet in the packet header of the reply transaction packet to be sent ( make a copy. Then, the allowable delay calculation unit 131 calculates the value of the cycle counter at the time when the reply transaction packet arrives at the NIC on the initiator side from the value of the copied cycle counter (= the value of the cycle counter at the time of sending the request transaction). It may be obtained by subtracting.

上述の説明では、イニシエータ側のＮＩＣ１２０が要求トランザクションをネットワークバスに送出する時点を起点とし、その起点からの経過時間をレイテンシとして計測した。しかしながら、たとえばイニシエータからの要求トランザクションの受信時を起点とし、当該イニシエータが要求を出してから応答が帰ってくるまでの時間をレイテンシとして計測してもよい。 In the above description, the time point when the initiator-side NIC 120 sends the requested transaction to the network bus is used as the starting point, and the elapsed time from the starting point is measured as the latency. However, for example, the time from when a request transaction is received from an initiator as a starting point until the response is returned after the initiator issues a request may be measured as latency.

また、複数のイニシエータが相互に直接通信する場合も想定され得る。その場合には、イニシエータからの要求トランザクションの受信時を起点とし、当該イニシエータが他のイニシエータに要求を出してから応答が帰ってくるまでの時間をレイテンシとして計測してもよい。 It can also be assumed that a plurality of initiators communicate directly with each other. In that case, the time from when the request transaction is received from the initiator to the other initiator as a starting point and the response from the initiator to the other initiator may be measured as the latency.

次のステップＳ１８において、調整方法選択部１３２は、算出したレイテンシの値Ｌの値を予め設定された閾値Ｌ１およびＬ２（Ｌ１＜Ｌ２）とを比較し、調整方法を選択する。Ｌ＞Ｌ２に該当するときは、負荷が大きい、つまり伝送されているパケット数が多いことを意味する。よって、送信間隔を増加させ、パケットの密度を減少させる必要がある。一方、Ｌ≦Ｌ１に該当するときは、負荷が小さい、つまり伝送されているパケット数が少ないことを意味する。このときは、送信間隔を縮小させ、パケットの密度を増加させることができる。Ｌ２＞Ｌ≧Ｌ１に該当するときは、負荷が中程度であると言える。 In the next step S18, the adjustment method selection unit 132 compares the calculated latency value L with preset threshold values L1 and L2 (L1 <L2), and selects an adjustment method. When L> L2, the load is large, that is, the number of transmitted packets is large. Therefore, it is necessary to increase the transmission interval and decrease the packet density. On the other hand, when L ≦ L1, the load is small, that is, the number of transmitted packets is small. At this time, it is possible to reduce the transmission interval and increase the packet density. When L2> L ≧ L1, the load is medium.

Ｌ２＞Ｌ≧Ｌ１の場合には、処理はステップＳ２０に進み、送信間隔決定部１３３は、段階的に送信間隔を拡大する調整を行う。送信間隔Ｇは、Ｇ＋Ｋ１により求められる。 When L2> L ≧ L1, the process proceeds to step S20, and the transmission interval determination unit 133 performs adjustment to increase the transmission interval stepwise. The transmission interval G is obtained by G + K1.

Ｋ１は段階調整幅であり、数９のように定義する。

K1 is a step adjustment range, and is defined as in Equation 9.

この定義によると、バースト周期内でデータ送信を行うサイクルの割合が小さいイニシエータほど、送信間隔の調整幅は大きくすることができる。なお、ｋは調整パラメータであり、シミュレーションなどにより決定される定数である。 According to this definition, the adjustment width of the transmission interval can be increased as the initiator has a smaller ratio of cycles in which data transmission is performed within the burst period. Note that k is an adjustment parameter, and is a constant determined by simulation or the like.

数８をより詳しく説明する。数８に含まれる（Np・Tp）は、単一のバースト周期内に生成されるアクセス回数Npと１パケットを転送するために必要なサイクル数Ｔｐとの積であるから、１バースト周期内において必要とされるサイクル数である。よって、バースト周期Ｔｂを（Np・Tp）で除算すると、バーストアクセス期間中における、１サイクル当たりの時間長が得られる。本願明細書では、このように得られた値を「バーストアクセスの密度」と呼ぶ。数８のＫ１は、当該バーストアクセスの密度に調整パラメータｋを乗じて得られる値であり、バーストアクセスの密度に応じた値である。 Equation 8 will be described in more detail. (Np · Tp) included in Equation 8 is the product of the number of times of access Np generated within a single burst period and the number of cycles Tp required to transfer one packet. The number of cycles required. Therefore, when the burst period Tb is divided by (Np · Tp), the time length per cycle during the burst access period can be obtained. In the present specification, the value thus obtained is referred to as “burst access density”. K1 in Equation 8 is a value obtained by multiplying the density of burst access by the adjustment parameter k, and is a value corresponding to the density of burst access.

Ｌ≦Ｌ１の場合には、処理はステップＳ２２に進み、送信間隔決定部１３３は、段階的に送信間隔を縮小する調整を行う。送信間隔Ｇは、Ｇ−Ｋ２により求められる。 When L ≦ L1, the process proceeds to step S22, and the transmission interval determination unit 133 performs adjustment to reduce the transmission interval step by step. The transmission interval G is obtained by G-K2.

Ｋ２は段階調整幅であり、Ｋ２＝１としてもよいし、数８のＫ１をＫ２としてＫ１と同じように定義してもよい。 K2 is a step adjustment width, and may be K2 = 1, or may be defined in the same manner as K1 with K1 in Equation 8 as K2.

そしてＬ＞Ｌ２の場合には、処理はステップＳ２４に進み、調整方法選択部１３２は、非段階的に送信間隔を拡大する調整を行う。 If L> L2, the process proceeds to step S24, and the adjustment method selection unit 132 performs adjustment to increase the transmission interval in a non-step manner.

ステップＳ２４においては、送信間隔決定部１３３は、数９に従い、送信間隔Ｇを更新する。

In step S24, the transmission interval determination unit 133 updates the transmission interval G according to Equation 9.

なお図１７のステップＳ２４では、ａ＝０と表している。 Note that a = 0 in step S24 of FIG.

非段階的な調整は、必須ではないが、段階的な調整でレイテンシが改善しないような過負荷な状態から、速やかに脱却できる可能性がある。ａは乱数の発生区間の下限であり、ａ＝Ｉ／２などとしてもよいし、ａ＝Ｋ１などとしてもよい。また乱数の確率分布は一様分布を用いてもよい。 Although non-stepwise adjustment is not essential, there is a possibility that it is possible to quickly escape from an overload state in which latency is not improved by stepwise adjustment. a is the lower limit of the random number generation interval, and may be a = I / 2 or a = K1. The random probability distribution may be a uniform distribution.

調整方法選択部１３２は、決定された調整方法を間隔制御情報として、送信間隔決定部１３３に通知する。その後、処理はステップＳ２６およびＳ２８に進む。 The adjustment method selection unit 132 notifies the transmission interval determination unit 133 of the determined adjustment method as interval control information. Thereafter, the process proceeds to steps S26 and S28.

ステップＳ２６において、送信間隔決定部１３３は送信間隔Ｇと許容遅延Ｉとを比較する。送信間隔Ｇが許容遅延Ｉより小さいときには、ステップＳ３０に進み、その送信間隔Ｇを記録する。送信間隔Ｇが許容遅延Ｉ以上のときには、ステップＳ２８に進み、許容遅延Ｉをその送信間隔Ｇとして設定する。そしてステップＳ３０において、その送信間隔Ｇを記録する。 In step S <b> 26, the transmission interval determination unit 133 compares the transmission interval G with the allowable delay I. When the transmission interval G is smaller than the allowable delay I, the process proceeds to step S30, and the transmission interval G is recorded. When the transmission interval G is equal to or greater than the allowable delay I, the process proceeds to step S28, where the allowable delay I is set as the transmission interval G. In step S30, the transmission interval G is recorded.

なお、この例においては閾値Ｌ１およびＬ２を２つ設けたが、これは一例である。閾値はＬ１のみであってもよい。このときは、Ｌ＞Ｌ１の場合には、段階的に送信間隔を拡大する調整を行い、Ｌ≦Ｌ１の場合には、段階的に送信間隔を縮小する調整を行ってもよい。 In this example, two threshold values L1 and L2 are provided, but this is an example. The threshold may be only L1. At this time, when L> L1, adjustment to increase the transmission interval stepwise may be performed, and when L ≦ L1, adjustment to decrease the transmission interval stepwise may be performed.

このとき、送信間隔Ｇは以下の処理により求めることができる。 At this time, the transmission interval G can be obtained by the following process.

まず段階的に送信間隔を拡大する場合（Ｌ＞Ｌ１）には、送信間隔決定部１３３は、数１０に従い、送信間隔Ｇを更新する。

First, when the transmission interval is increased stepwise (L> L1), the transmission interval determination unit 133 updates the transmission interval G according to Equation 10.

Ｋ１は上述の数９に示す通りである。 K1 is as shown in Equation 9 above.

また、段階的に送信間隔を縮小する場合（Ｌ≦Ｌ１）には、送信間隔決定部１３３は、数１２に従い、送信間隔Ｇを更新する。

When the transmission interval is reduced stepwise (L ≦ L1), the transmission interval determination unit 133 updates the transmission interval G according to Equation 12.

Ｋ２＝１としてもよいし、数９のように定義してもよい。 K2 may be set to 1 or may be defined as in Equation 9.

なお、数１１および数１２の”min”は、図１７のステップＳ２６およびＳ２８の処理を包含していることに留意されたい。 It should be noted that “min” in Equations 11 and 12 includes the processing of Steps S26 and S28 in FIG.

送信間隔決定部１３３は、内部にデクリメントカウンタを持ち、デクリメントカウンタの保持する値が、ゼロの場合だけ、送信許可信号をアサート（Ａｓｓｅｒｔ）して、イニシエータが要求トランザクションを発行可能な状態にする。デクリメントカウンタにゼロ以外の値が書き込まれると、送信許可信号はネゲート（Ｎｅｇａｔｅ）され、イニシエータは新たな要求トランザクションが発行可能な状態となるまで、メモリアクセスを待機させる。デクリメントカウンタの値は、サイクルカウンタがインクリメントされる度にデクリメントされることで次第にゼロに近づく。 The transmission interval determination unit 133 has a decrement counter inside, and asserts a transmission permission signal (Assert) only when the value held by the decrement counter is zero so that the initiator can issue a request transaction. When a value other than zero is written to the decrement counter, the transmission permission signal is negated, and the initiator waits for memory access until a new request transaction can be issued. The value of the decrement counter gradually approaches zero by being decremented each time the cycle counter is incremented.

先頭アクセスの場合に実行されるステップＳ３２では、アクセス残数Ｒｐをデクリメントし（Ｒｐ＝Ｒｐ−１）、ステップＳ３４においてＴｐ＋Ｇをデクリメントカウンタに書き込む。そしてステップＳ３８において、送信間隔決定部１３３は送信許可信号を”ＤＩＳＡＢＬＥ”に設定する。これによりＴｐ＋Ｇサイクル以内のイニシエータからの要求トランザクションの発行が禁止される。 In step S32 executed in the case of the head access, the remaining access number Rp is decremented (Rp = Rp-1), and Tp + G is written in the decrement counter in step S34. In step S38, the transmission interval determination unit 133 sets the transmission permission signal to “DISABLE”. This prohibits the issue of a request transaction from the initiator within the Tp + G cycle.

中間アクセスの場合に実行されるステップＳ１６において、送信間隔決定部１３３は、調整範囲情報により許容遅延Ｉの値が負であることを検出した場合には、イニシエータが行う処理のリアルタイム性を確保するため、送信間隔ＧをＧ＝０に更新する。その後処理はステップＳ３０に進んで送信間隔Ｇの値が記録される。次のステップＳ３２ではアクセス残数Ｒｐをデクリメントし（Ｒｐ＝Ｒｐ−１）、ステップＳ３４においてデクリメントカウンタにＴｐ＋Ｇを書き込む。そしてステップＳ３８において、送信間隔決定部１３３は送信許可信号を”ＤＩＳＡＢＬＥ”に設定する。これによりＴｐ＋Ｇサイクル以内のイニシエータからの要求トランザクションの発行が禁止される。 In step S16 executed in the case of intermediate access, when the transmission interval determination unit 133 detects that the value of the allowable delay I is negative based on the adjustment range information, the transmission interval determination unit 133 ensures the real-time property of the processing performed by the initiator. Therefore, the transmission interval G is updated to G = 0. Thereafter, the process proceeds to step S30, and the value of the transmission interval G is recorded. In the next step S32, the remaining access number Rp is decremented (Rp = Rp−1), and Tp + G is written in the decrement counter in step S34. In step S38, the transmission interval determination unit 133 sets the transmission permission signal to “DISABLE”. This prohibits the issue of a request transaction from the initiator within the Tp + G cycle.

最後尾アクセスの場合に実行されるステップＳ８において、送信間隔決定部１３３は、アクセス残数Ｒｐをゼロクリアし、ステップＳ３６においてデクリメントカウンタにＳｂ＋Ｔｂを書き込む。そしてステップＳ３８において、送信間隔決定部１３３は送信許可信号を”ＤＩＳＡＢＬＥ”に設定する。これによりＴｐ＋Ｓｂサイクル以内のイニシエータからの要求トランザクションの発行が禁止される。 In step S8 executed in the case of the last access, the transmission interval determination unit 133 clears the remaining access number Rp to zero, and writes Sb + Tb to the decrement counter in step S36. In step S38, the transmission interval determination unit 133 sets the transmission permission signal to “DISABLE”. This prohibits the issue of a request transaction from the initiator within the Tp + Sb cycle.

その後、ステップＳ４０においてカウンタがリセットされたことを検出すると、ステップＳ４２において送信間隔決定部１３３は送信許可信号を”ＥＮＡＢＬＥ”に設定する。これにより、イニシエータからの要求トランザクションの発行が許可される。 Thereafter, when it is detected in step S40 that the counter has been reset, in step S42, the transmission interval determination unit 133 sets the transmission permission signal to “ENABLE”. Thereby, the issuing of the request transaction from the initiator is permitted.

送信間隔の拡大だけでなく縮小を行うことにより、Ｎｐ回のアクセスを早期に完了させ、後から発生するであろう他のイニシエータのアクセスを可能な限りバスの帯域内に収容することができる。例えば図４のＣＰＵは、ユーザとのインタラクションやインターネットのブラウジングなどに利用されるため、ユーザの指定した処理を実行するために発生するメモリアクセス量は、事前に予測することが困難である。そのような場合でも、送信間隔の縮小により、各イニシエータが早期に全てのメモリアクセスを完了していれば、ユーザによって起動されたＣＰＵ処理のアクセスをバス帯域内に収容することができ、バス帯域の利用効率が向上する。 By reducing not only the transmission interval but also the Np number of accesses, it is possible to complete Np accesses at an early stage and accommodate the access of other initiators that will occur later in the bus bandwidth as much as possible. For example, since the CPU in FIG. 4 is used for user interaction, Internet browsing, and the like, it is difficult to predict in advance the amount of memory access that occurs to execute the process specified by the user. Even in such a case, if each initiator completes all memory accesses at an early stage by reducing the transmission interval, the CPU processing access activated by the user can be accommodated in the bus band. The utilization efficiency of is improved.

以上のように送信間隔制御部１２５が働くことにより、イニシエータ間でリンクを共用するネットワークバス上でのリアルタイム性の確保が可能となるとともに、バス帯域の利用効率も向上させることができる。図２５及び図２６は、ＥＮＣ、ＤＥＣ、ＤＭＡＣのアクセスをモデル化し、ソフトウェアシミュレータにより、レイテンシの大きさを比較評価したものである。横軸は時間（サイクルクロックの値）を、縦軸はレイテンシの値（単位はサイクル）を示している。使用した３種類のイニシエータのアクセスモデルは以下のように条件設定した。
ＥＮＣ：Ｔｂ＝４０００、Ｎｐ＝２５０、Ｔｐ＝４（灰色）
ＤＥＣ：Ｔｂ＝４０００、Ｎｐ＝２５０、Ｔｐ＝４（灰色）
ＤＭＡＣ：Ｔｂ＝４００、Ｎｐ＝４、Ｔｐ＝４（黒色） As described above, when the transmission interval control unit 125 operates, it is possible to ensure real-time performance on the network bus sharing the link between the initiators, and it is possible to improve the utilization efficiency of the bus bandwidth. FIGS. 25 and 26 are modeled on the access of ENC, DEC, and DMAC, and the magnitude of latency is compared and evaluated by a software simulator. The horizontal axis represents time (cycle clock value), and the vertical axis represents latency value (unit: cycle). The access models for the three types of initiators used were set as follows.
ENC: Tb = 4000, Np = 250, Tp = 4 (gray)
DEC: Tb = 4000, Np = 250, Tp = 4 (gray)
DMAC: Tb = 400, Np = 4, Tp = 4 (black)

図２５は送信間隔制御を行わない場合のパケットのレイテンシの結果を示す。０サイクルからＥＮＣとＤＥＣのバーストアクセスが開始されると、レイテンシは一定の傾斜で大きくなっていき、８５０サイクル程度まで増加していることが分かる。バーストアクセスによりバスが過負荷となっている間に行われたＤＭＡＣのメモリアクセスも、バーストアクセスの影響でレイテンシが同様に伸びている。 FIG. 25 shows packet latency results when transmission interval control is not performed. It can be seen that when burst access of ENC and DEC is started from the 0th cycle, the latency increases with a constant slope and increases to about 850 cycles. The latency of DMAC memory access performed while the bus is overloaded by burst access is similarly increased due to the effect of burst access.

一方、図２６は本実施形態による送信間隔制御を行った場合のパケットのレイテンシの結果を示す。０サイクルから開始されたＥＮＣとＤＥＣのバーストアクセスは、レイテンシの増加とともに、送信間隔が制御され、５０サイクル程度のレイテンシに抑制されている。またこのため、ＤＭＡＣのアクセスも送信間隔制御されたためにできた空きサイクルにより、低レイテンシでアクセスが完了していることが分かる。また送信間隔が縮小される制御が入っているため、０から４０００サイクルの間で実行されるべきＥＮＣ及びＤＥＣの各４０００回のアクセスは、２０００サイクル迄の間で全て完了していることが分かる。このように、各イニシエータ毎のリアルタイム性の確保とバス帯域の利用効率向上がなされることがシミュレーションで確認できる。 On the other hand, FIG. 26 shows packet latency results when transmission interval control according to the present embodiment is performed. In the burst access of ENC and DEC started from the 0th cycle, the transmission interval is controlled as the latency increases, and the latency is suppressed to about 50 cycles. For this reason, it can be seen that the DMAC access is completed with a low latency due to the empty cycle created because the transmission interval is controlled. Since the transmission interval is controlled to be reduced, it is understood that all 4000 accesses of ENC and DEC to be executed between 0 and 4000 cycles are completed within 2000 cycles. . As described above, it can be confirmed by simulation that the real-time property is secured for each initiator and the utilization efficiency of the bus bandwidth is improved.

本実施形態では、送信間隔制御部はイニシエータ側のＮＩＣの一部として説明したが、ＮＩＣ機能ブロックの外部に位置していてもよい。またイニシエータとターゲット間を接続するネットワークバスのトポロジーとして、フライ網を仮定したが、メッシュ網やトーラス網などの他のトポロジーであってもよい。また本実施形態では、ターゲットが複数存在する場合を説明しているが、ターゲットは単一でも良い。 In the present embodiment, the transmission interval control unit has been described as a part of the NIC on the initiator side, but may be located outside the NIC functional block. Further, although the fly network is assumed as the topology of the network bus connecting the initiator and the target, other topologies such as a mesh network and a torus network may be used. In this embodiment, the case where there are a plurality of targets is described, but a single target may be used.

（実施形態２）
本実施形態では、実施形態１において説明した第一の演算フェーズを実行するのに要する演算サイクル数ｎ、イニシエータが送信する最小パケットサイズａ、必要とされる制御粒度を示す制御間隔ｐを、回路設計時における要求仕様として定義する。これにより、（数１３）に示す関係式から、必要な演算回路の並列数を決定することができる。

(Embodiment 2)
In the present embodiment, the number of operation cycles n required to execute the first operation phase described in the first embodiment, the minimum packet size a transmitted by the initiator, and the control interval p indicating the required control granularity are set in the circuit. It is defined as a required specification at the time of design. Thereby, the required parallel number of arithmetic circuits can be determined from the relational expression shown in (Equation 13).

Ｎによって、回路の実装規模や実装面積が決定されるため、設計時における回路規模の見積りに利用することができると共に、設計手法または設計ツールとして用いることも可能である。 Since the circuit mounting scale and mounting area are determined by N, it can be used for estimating the circuit scale at the time of designing, and can also be used as a design technique or a design tool.

たとえば、設計者にとっては、１パケットの送信毎に送信間隔を制御したいという要求は強いと考えられる。その場合には、制御間隔ｐを１に固定すればよい。このように、ｐの値が変化しない条件下でも、最適な数の並列演算回路を有する送信間隔制御装置を設計することが可能になる。 For example, it is considered that a designer is strongly required to control the transmission interval for each transmission of one packet. In that case, the control interval p may be fixed to 1. In this way, it is possible to design a transmission interval control device having an optimal number of parallel arithmetic circuits even under conditions where the value of p does not change.

本願発明は、チップ（ＮｏＣ）上に実装されるＮＩＣまたは送信間隔制御装置として実施されるだけでなく、チップ上に実装するための設計及び検証を行うシミュレーションプログラムとしても実施される。そのようなシミュレーションプログラムは、コンピュータによって実行される。本実施形態においては、図１２、図１３、図１５に示される各構成要素は、シミュレーションプログラム上のオブジェクト化されたクラスとして実装される。各クラスは、予め定められたシミュレーションシナリオを読み込むことにより、上述した実施形態の各構成要素に対応する動作をコンピュータ上で実現する。言い換えると、各構成要素に対応する動作は、コンピュータの処理ステップとして直列的または並列的に実行される。 The present invention is not only implemented as a NIC or transmission interval control device mounted on a chip (NoC), but also as a simulation program that performs design and verification for mounting on a chip. Such a simulation program is executed by a computer. In the present embodiment, each component shown in FIGS. 12, 13, and 15 is implemented as an object class on the simulation program. Each class implements an operation corresponding to each component of the above-described embodiment on a computer by reading a predetermined simulation scenario. In other words, the operation corresponding to each component is executed in series or in parallel as a processing step of the computer.

たとえば、並列演算回路の最適な並列数Ｎを求めるためには、使用者は、上述のコンピュータに、予め定めた制御間隔ｐの値を入力し、予め想定される、イニシエータが送信する最小のパケットサイズａおよび並列演算回路の演算実行の要処理時間ｎを入力すればよい。コンピュータは、上述の数１３にそれらを代入することにより、最適な並列演算回路数Ｎを求めることが可能になる。 For example, in order to obtain the optimum parallel number N of the parallel arithmetic circuit, the user inputs the value of the predetermined control interval p to the above-mentioned computer, and is assumed to be the smallest packet transmitted by the initiator. What is necessary is just to input the size a and the processing time n required for execution of the parallel arithmetic circuit. The computer can determine the optimum number N of parallel arithmetic circuits by substituting them into the above-described equation 13.

本発明は、ＴＶ、レコーダなどのＡＶ家電、携帯電話などのモバイル機器をはじめとして各種組込機器上でリアルタイム処理を行う半導体ＬＳＩ上のパケット通信型のバスに適用可能である。たとえば本発明は、バスを共用する複数のイニシエータを、パケットの送信間隔を制御することによって効率的に通信させるための送信間隔制御装置、制御方法、制御プログラムとして利用可能である。 The present invention is applicable to a packet communication type bus on a semiconductor LSI that performs real-time processing on various embedded devices such as AV home appliances such as TVs and recorders and mobile devices such as mobile phones. For example, the present invention can be used as a transmission interval control device, a control method, and a control program for efficiently communicating a plurality of initiators sharing a bus by controlling the packet transmission interval.

１２１パケット化部
１２２脱パケット化部
１２３パケットバッファ部
１２４パケット送受信部
１２５送信間隔制御部
１３１許容遅延演算部
１３２調整方法選択部
１３３送信間隔制御部
１４１演算実行部
１４２演算結果格納部
１４３許容遅延算出部
１４４制御間隔決定部 121 packetization unit 122 depacketization unit 123 packet buffer unit 124 packet transmission / reception unit 125 transmission interval control unit 131 allowable delay calculation unit 132 adjustment method selection unit 133 transmission interval control unit 141 calculation execution unit 142 calculation result storage unit 143 allowable delay calculation Unit 144 Control interval determination unit

Claims

In a semiconductor circuit having a networked bus through which a plurality of packets are transmitted, a control device that controls the transmission timing of each packet and transmits each packet to the bus,
An allowable delay calculation unit that calculates an allowable delay indicating an allowable transmission delay amount of a packet to be transmitted afterwards at a timing determined according to a transmission completion time of a predetermined number of packets, and outputs as an adjustment range information;
A transmission interval determination unit that determines a transmission interval of each packet based on at least the adjustment range information, and permits or prohibits transmission of data by an adjacently connected initiator according to the determination result;
A control device for controlling a transmission interval, comprising: a transmission / reception unit configured to transmit, to a bus, at least one packet generated based on data received from the initiator permitted to transmit the data.

The allowable delay calculation unit includes:
N operation execution units in parallel, each of which takes time n to execute a predetermined operation;
A control interval determining unit that calculates a control interval p based on a minimum size a of packets transmitted to the bus, a parallel number N of the operation execution units, and the time n, and the predetermined number is the control The control device according to claim 1, defined by the interval p.

The control device according to claim 2, wherein the control interval determination unit determines the control interval p according to an expression shown in the following formula 1.

The control device according to claim 1, wherein the transmission interval determination unit determines a transmission interval of each packet based on information on the load L of the bus in addition to the adjustment range information.

When the load L is greater than a predetermined value, the transmission interval determination unit expands the transmission interval of each packet,
The control device according to claim 4, wherein when the load L is equal to or less than the predetermined value, the transmission interval determination unit reduces the transmission interval of each packet.

Other initiators or targets are connected to the bus,
The transmission interval control unit obtains a latency, which is a time from when an access request is transmitted to the other initiator or the target until a response is returned, using the transmission / reception unit, and uses it as a load on the bus The control device according to claim 4.

In a semiconductor circuit having a networked bus through which a plurality of packets are transmitted, the number N of arithmetic circuits provided in parallel in a control device that controls the transmission timing of each packet and transmits each packet to the bus A design method for determining,
The controller is
An allowable delay calculation unit that calculates an allowable delay indicating an allowable transmission delay amount of a packet to be transmitted afterwards at a timing determined according to a transmission completion time of a predetermined number of packets, and outputs as an adjustment range information;
A transmission interval determination unit that determines a transmission interval of each packet based on at least the adjustment range information, and permits or prohibits transmission of data by an adjacently connected initiator according to the determination result;
A transmission / reception unit for transmitting to the bus at least one packet generated based on the data received from the initiator permitted to transmit the data;
The allowable delay calculation unit is
N arithmetic circuits in parallel, each of which takes time n to execute a predetermined operation;
A control interval determining unit that calculates the control interval p, which is the predetermined number, based on the minimum size a of packets transmitted to the bus, the parallel number N of the arithmetic circuits, and the time n,
Inputting a value of a predetermined control interval p;
Obtaining a predetermined minimum value of size a and the value of time n;
A step of determining a parallel number N of the arithmetic circuits by an expression shown in the following equation (2):