JP2001517024A

JP2001517024A - Method and system for fast routing lookup

Info

Publication number: JP2001517024A
Application number: JP2000512323A
Authority: JP
Inventors: ブロドレック，アンドレイ; デゲルマーク，ミカエル; カールソン，スバンテ; ピンク，ステファン
Original assignee: エフィシャントネットワーキングアクティエボラーグ
Priority date: 1997-09-15
Filing date: 1998-05-11
Publication date: 2001-10-02
Also published as: IL134835A0; SK3692000A3; ID24241A; KR20010030607A; CA2303118A1; EA200000328A1

Abstract

(57)【要約】 IPデータグラムをどこに転送するか決定するための、ネクストホップ・テーブル中の関連ネクストホップ情報を有する任意長プレフィックスのエントリを含むルーティング・テーブル中のIPルーティング・ルックアップの方法において、ルーティング・テーブルの表示が、全てのルーティング・テーブル・エントリのプレフィックスによって規定される完全プレフィックス木（７）の形態で保存される。さらに、現在の深さ（Ｄ）でのプレフィックス木（７）のカットのデータを含むビット・ベクトル（８）の表示と、ネクストホップ・テーブル及びネクスト・レベル・チャンクへの索引を含むポインタの配列とが保存される。前記ビット・ベクトル（８）がビット・マスクに分割され、ビット・マスクの表示がマップテーブルに保存される。その後、各々行索引をマップテーブルとポインタ・オフセットに符号化する符号語の配列と、ベース・アドレスの配列とが保存される。最後に、ルックアップが行われる。 (57) Abstract: A method of IP routing lookup in a routing table that includes an entry of an arbitrary length prefix with associated next hop information in a next hop table to determine where to forward an IP datagram In, the representation of the routing table is stored in the form of a complete prefix tree (7) defined by the prefixes of all routing table entries. In addition, a representation of the bit vector (8) containing the data of the cut of the prefix tree (7) at the current depth (D) and an array of pointers containing the next hop table and the index to the next level chunk And are saved. The bit vector (8) is divided into bit masks and the representation of the bit mask is stored in a map table. Thereafter, an array of codewords encoding each row index into a map table and pointer offset, and an array of base addresses are stored. Finally, a lookup is performed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】発明の分野本発明は一般に、IPデータグラムをどこに転送すべきかを決定する、ネクスト
ホップ・テーブル中の関連ネクストホップ情報を有する任意長プレフィックスの
エントリを含む、ルーティング・テーブル中のIPルーティング・ルックアップの
方法とシステムに関する。FIELD OF THE INVENTION The present invention generally relates to IP routing in a routing table that includes an entry of an arbitrary length prefix with relevant next hop information in a next hop table that determines where to forward IP datagrams. -It relates to a lookup method and system.

【０００２】従来技術の説明インターネットはネットワークの相互接続された集合であり、そこでは構成要
素であるネットワークは各々そのアイデンティティを保持し、多数のネットワー
ク相互の通信のためには専用の機構が必要である。構成要素であるネットワーク
はサブネットワークと呼ばれる。2. Description of the Prior Art The Internet is an interconnected collection of networks, where each of its constituent networks retains its identity and requires specialized mechanisms for communication between multiple networks. is there. The constituent networks are called sub-networks.

【０００３】インターネット中の各サブネットワークはそのサブネットワークに接続された
装置間の通信をサポートする。さらに、サブネットワークは、網間接続ユニット
(IWU）と呼ばれる装置によって接続される。個々のIWU は、似ている、あるいはそうでもない２つのネットワークを接続
するために使用されるルータである。このルータは、各ルータとネットワークの
各ホストに存在するインターネット・プロトコル（IP）を利用する。Each sub-network in the Internet supports communication between devices connected to that sub-network. Further, the sub-network is an interconnection unit.
It is connected by a device called (IWU). Each IWU connects two networks that are similar or not
Is the router used to This router makes use of the Internet Protocol (IP) that exists on each router and each host on the network.

【０００４】 IPは局間のコネクションレスまたはデータグラム・サービスを提供する。ルーティングは一般に、可能な宛先ネットワークの各々について、IPデータグ
ラムを送信すべき次のルータを示すルーティング・テーブルを各局及びルータに
維持することで達成される。 IPルータはルーティング・テーブルでルーティング・ルックアップを行い、ど
こにIPデータグラムを転送するかを決定する。この操作の結果は宛先方向の経路
上のネクストホップである。ルーティング・テーブル中のエントリは概念上、関
連ネクストホップ情報を有する任意長プレフィックスである。ルーティング・ル
ックアップは、最長一致プレフィックスを有するルーティング・エントリを見出
さなければならない。[0004] IP provides connectionless or datagram services between stations. Routing is generally accomplished by maintaining a routing table at each station and router that indicates, for each possible destination network, the next router to which the IP datagram should be sent. The IP router performs a routing lookup in the routing table and determines where to forward the IP datagram. The result of this operation is the next hop on the route toward the destination. An entry in the routing table is conceptually an arbitrary length prefix with relevant next hop information. The routing lookup must find the routing entry with the longest matching prefix.

【０００５】 IPルーティング・ルックアップは本質的に低速で複雑だったので、従来技術の
ソリューションによる操作は、それの使用を回避する技術の普及につながった。
IPより下の様々なリンク層スイッチング技術、IP層バイパス法（コンピュータ通
信会議(IEEE Infocom)会報、カリフォルニア州サンフランシスコ、１９９６年３
月、ギガビット・ネットワーク・ワークショップ会報、ボストン、１９９５年４
月、及びACM SIGCOMM '９５、４９〜５８ページ、マサチューセッツ州ケンブリ
ッジ、１９９５年８月で開示）及びATM のような仮想回線技術に基づく代替ネッ
トワーク層の開発は、ある程度IPルーティング・ルックアップを回避しようとい
う願望の結果である。[0005] Since IP routing lookups were inherently slow and complex, operation with prior art solutions has led to the widespread adoption of techniques that avoid their use.
Various link layer switching techniques below IP, IP layer bypass method (Computer Communication Conference (IEEE Infocom) Bulletin, San Francisco, CA, March 1996)
Mon, Gigabit Network Workshop Bulletin, Boston, April 1995
Moon, and ACM SIGCOMM '95, pp. 49-58, Cambridge, Mass., Disclosed in August 1995) and the development of alternative network layers based on virtual circuit technology such as ATM would avoid IP routing lookups to some extent It is the result of the desire.

【０００６】 IPレベルより下のスイッチング・リンク層とフローまたはタグ・スイッチング
・アーキテクチャの使用によって、複雑性と冗長性がネットワークに追加される
。最新のIPルータ設計はキャッシュ技術（caching technique)を使用しており、
そこでは最近使用された宛先アドレスのルーティング・エントリがキャッシュに
保持される。この技術は、トラフィックに十分な局所性が存在するので、キャッ
シュ・ヒット率が十分に高く、ルーティング・ルックアップの費用がいくつかの
パケットにわたって分担されるということに依存している。このキャッシュ法は
従来良好に動作した。しかし、現在のインターネットの急速な発達によって必要
なアドレス・キャッシュのサイズが増大するに連れて、ハードウェア・キャッシ
ュは不経済なものになることがある。[0006] The use of a switching link layer below the IP level and a flow or tag switching architecture adds complexity and redundancy to the network. Modern IP router designs use caching techniques,
There, routing entries for recently used destination addresses are kept in a cache. This technique relies on the fact that there is sufficient locality in the traffic so that the cache hit rate is high enough and the cost of routing lookup is shared across several packets. This cache method has worked well in the past. However, as the rapid growth of the current Internet increases the size of the address cache required, hardware caches can become uneconomical.

【０００７】ルーティング・テーブルの伝統的な実現は、ほぼ３０年前に発明されたデータ
構造であるパトリシア木(ACMジャーナル、１５（４）：５１４〜５３４、１９６
８年１０月で開示）の、最長プレフィックス一致のために修正したバージョンを
使用している。例えばNetBSD 1.2の実現におけるような、ルーティング・ルックアップ目的で
のパトリシア木の直接的な実現は、葉と内部ノードのために２４バイトを使用す
る。４０，０００エントリの場合、木構造だけでほぼ２メガバイトであり、完全
な平衡木では、ルーティング・エントリを見出すために、１５あるいは１６のノ
ードを横切らなければならない。[0007] The traditional realization of the routing table is the data structure invented almost 30 years ago, the Patricia tree (ACM Journal, 15 (4): 514-534, 196).
(Disclosed in October 2008), modified for the longest prefix match. A direct implementation of the Patricia tree for routing lookup purposes, such as in a NetBSD 1.2 implementation, uses 24 bytes for leaves and internal nodes. For 40,000 entries, the tree alone is almost 2 megabytes, and a perfect balanced tree would have to traverse 15 or 16 nodes to find a routing entry.

【０００８】場合によっては、最長一致プレフィックス規則のため、適切なルーティング情
報を見出すために追加ノードの横断が必要になることがあるが、これは初期探索
によって適切な葉が見出されることが保証されていないからである。パトリシア
木のサイズを縮小しルックアップ速度を改善できる最適化が存在する。それにも
かかわらず、データ構造は大きく、それを探索するためにはあまりにも高価なメ
モリ参照が必要である。現在のインターネット・ルーティング・テーブルは大き
すぎてオンチップ・キャッシュには収まらないし、DRAMのオフチップ・メモリ参
照は低速すぎて必要なルーティング速度をサポートすることはできない。In some cases, the longest-matching prefix rule may require traversal of additional nodes to find the appropriate routing information, which ensures that the initial search finds the appropriate leaves. Because they are not. There are optimizations that can reduce the size of the Patricia tree and improve lookup speed. Nevertheless, the data structures are large and searching for them requires too expensive memory references. Current Internet routing tables are too large to fit in the on-chip cache, and DRAM off-chip memory references are too slow to support the required routing speed.

【０００９】完全なルーティング・ルックアップを回避することでIPルーティング性能を改
善する初期の作業（コンピュータ通信会議(IEEE Infocom)会報、ルイジアナ州ニ
ューオーリンズ、１９８８年３月で開示）によって見出されたところでは、小さ
な宛先アドレス・キャッシュによってルーティング・ルックアップ性能は少なく
とも６５パーセント改善できる。９０パーセントを越えるヒット率を得るために
必要なスロットは１０未満であった。このような小さな宛先アドレス・キャッシ
ュは、現在のインターネットの大きなトラフィック密度とホスト数には不十分で
ある。[0009] Early work to improve IP routing performance by avoiding complete routing lookups (found in the IEEE Infocom Bulletin, New Orleans, Louisiana, March 1988) By the way, a small destination address cache can improve routing lookup performance by at least 65 percent. Less than 10 slots were required to achieve a hit rate of over 90 percent. Such a small destination address cache is inadequate for today's high Internet traffic density and host count.

【００１０】 ATM(非同期転送モード）は、接続セットアップ中にアドレスをネットワークに
伝える信号プロトコルを有することでルーティング・ルックアップの実行を回避
している。仮想回線識別子(VCI) によってアクセスされる転送状態がセットアッ
プ中に接続の経路に沿ったスイッチにインストールされる。ATM セルは、転送状
態を有するテーブルへの直接索引またはハッシュ関数へのキーとして使用される
VCI を含んでいる。ルーティングの決定はATM の場合簡単である。しかし、パケ
ット・サイズが４８バイトより大きい場合、さらに多くのATM ルーティングの決
定を行う必要がある。[0010] ATM (Asynchronous Transfer Mode) avoids performing routing lookups by having a signaling protocol that communicates addresses to the network during connection setup. The transfer state accessed by the virtual circuit identifier (VCI) is installed on the switch along the path of the connection during setup. ATM cells are used as a direct index into a table with forwarding state or as a key to a hash function
Includes VCI. The routing decision is simple for ATM. However, if the packet size is greater than 48 bytes, more ATM routing decisions need to be made.

【００１１】タグ・スイッチング及びフロー・スイッチング（コンピュータ通信会議(IEEE
Infocom)会報、カリフォルニア州サンフランシスコ、１９９６年３月で開示）は
、ATM 上で動作することを意図する２つのIPバイパス法である。一般的な考え方
は、実際のデータ転送を行うリンクレベルATM ハードウェアをIPに制御させると
いうものである。どのATM 仮想回線識別子を使用し、どのパケットがどのVCI を
使用するかをルータ間で一致させる専用プロトコル（コメント要求RFC １９５３
、インターネット・エンジニアリング・タスク・フォース、１９９６年５月で開
示）が必要である。Tag Switching and Flow Switching (Computer Communication Conference (IEEE
Infocom) Bulletin, San Francisco, Calif., Disclosed in March 1996) is two IP bypass methods intended to operate over ATM. The general idea is to have IP control the link-level ATM hardware that does the actual data transfer. A dedicated protocol (comment request RFC 1953) that matches between ATMs which ATM virtual circuit identifier is used and which packet uses which VCI.
Internet Engineering Task Force, disclosed in May 1996).

【００１２】同じくIP処理の回避を目標とするもう１つのアプローチはIP／ATM アーキテク
チャ（ギガビット・ネットワーク・ワークショップ会報、ボストン、１９９５年
４月、及びACM SIGCOMM '９５会報、４９〜５８ページ、マサチューセッツ州ケ
ンブリッジ、１９９５年８月で開示）でなされるが、そこではATM バックプレー
ンが多数のライン・カードとルーティング・カードを接続する。ルーティング・
カードに配置されたIP処理要素がIPヘッダを処理する。パケット・ストリームが
到着すると、最初のIPヘッダだけが検討され、その後のパケットは最初のものと
同様にルーティングされる。このショートカットの主要な目的は、多数のパケッ
トにわたるIP処理の費用を分担することであると思われる。Another approach also aimed at avoiding IP processing is the IP / ATM architecture (Gigabit Network Workshop Bulletin, Boston, April 1995, and ACM SIGCOMM '95 Bulletin, pages 49-58, Mass.) (August 1995, Cambridge, U.S.A.), where an ATM backplane connects a number of line and routing cards. routing·
An IP processing element arranged on the card processes the IP header. When a packet stream arrives, only the first IP header is considered, and subsequent packets are routed like the first. The primary purpose of this shortcut seems to be to share the cost of IP processing over many packets.

【００１３】 IPルータ設計はIBM ルータ（高速ネットワークジャーナル、１（４）：２８１
〜２８８、１９９３年で開示）の場合のように専用ハードウェアを使用してIP処
理を行うこともある。これは柔軟性のないソリューションとなる。IPフォーマッ
トまたはプロトコルに何らかの変化があるとこの設計は無効となる。ソフトウェ
アの柔軟性と汎用プロセッサの急速な性能の向上によってこのソリューションは
好適なものになる。ハードウェアによるもう１つのアプローチは、CAM を使用し
てルーティング・ルックアップを行うことである（コンピュータ通信会議(IEEE
Infocom)会報、第３巻、１３８２〜１３９１ページ、サンフランシスコ、１９９
３年で開示）。これは高速ではあるが高価なソリューションである。The IP router design is based on the IBM router (High Speed Network Journal, 1 (4): 281)
288, disclosed in 1993), IP processing may be performed using dedicated hardware. This is an inflexible solution. Any change in the IP format or protocol invalidates this design. The flexibility of software and the rapid increase in performance of general purpose processors make this solution a good choice. Another hardware approach is to use CAM to perform routing lookups (Computer Communication Conference (IEEE)
Infocom) Bulletin, Vol. 3, pp. 1382-1391, San Francisco, 199
3 years). This is a fast but expensive solution.

【００１４】 BBN は現在、転送エンジンとして汎用プロセッサを使用する１対のマルチギガ
ビット・ルータを製造中である。今までのところ発表された情報はほとんどない
。しかし、この計画は、転送エンジンとしてアルファ・プロセッサ(Alpha proce
ssor) を使用し、全てのIP処理をソフトウェアで行うものと思われる。出版物ギ
ガビット・ネットワーキング、マサチューセッツ州レディング、アディソン−ウ
ェズリー社、１９９３年が示すところによれば、ルート・キャッシュでのヒット
を想定すれば、わずか２００の命令でIP処理を行うことが可能である。アルファ
の２次キャッシュは宛先アドレスの大規模LRU(最低使用頻度）キャッシュとして
使用される。この方式はトラフィック・パターンの局所性を想定している。局所
性が低い場合キャッシュ・ヒット率が低くなりすぎ、性能が犠牲になることがあ
る。BBN is currently manufacturing a pair of multi-gigabit routers using a general purpose processor as the forwarding engine. Little information has been released so far. However, the plan was based on the Alpha processor
ssor) and all IP processing will be done in software. The publication Gigabit Networking, Addison-Wesley, Redding, MA, 1993, shows that assuming a hit in the route cache, IP processing can be done with as few as 200 instructions. Alpha's secondary cache is used as a large LRU (least recently used) cache of destination addresses. This method assumes the locality of the traffic pattern. Poor locality can result in too low a cache hit rate and sacrificing performance.

【００１５】発明の概要従って、ギガビット速度までの各IPパケットについて完全ルーティング・ルッ
クアップを行う改善されたIPルーティング・ルックアップの方法とシステムを提
供することが本発明の目的であるが、この方法とシステムは上記で言及された欠
点を克服するものである。SUMMARY OF THE INVENTION It is therefore an object of the present invention to provide an improved IP routing lookup method and system that performs a full routing lookup for each IP packet up to gigabit speed. The system overcomes the disadvantages mentioned above.

【００１６】さらに別の目的は、従来のマイクロプロセッサによるルーティング・ルックア
ップ速度を向上させることである。もう１つの目的は、転送テーブルにおけるルックアップ時間を最小化すること
である。本発明のもう１つのさらに別の目的は、従来のマイクロプロセッサのキャッシ
ュに完全に収まるデータ構造を提供することである。Yet another object is to increase the speed of routing lookups with conventional microprocessors. Another purpose is to minimize lookup time in the forwarding table. It is yet another object of the present invention to provide a data structure that fits entirely in the cache of a conventional microprocessor.

【００１７】その結果、メモリ・アクセスは、データ構造が、例えば比較的低速なDRAMから
なるメモリに存在する必要のある場合より何倍も高速になる。これらの目的は本発明によるIPルーティング・ルックアップの方法とシステム
によって得られるが、このシステムは、非常に小型の形態で大規模なルーティン
グ・テーブルを表すことができ、わずかなメモリ参照を使用して高速で探索でき
るデータ構造である。As a result, memory access is many times faster than if the data structure had to be in memory, for example consisting of relatively slow DRAM. These objectives are obtained by the method and system of IP routing lookup according to the present invention, which can represent large routing tables in a very small form and use few memory references. It is a data structure that can be searched at high speed.

【００１８】本発明をより詳細に説明し、本発明の利点と特徴を説明するため、好適実施態
様が以下詳細に説明され、添付の図面が参照される。発明の実施の態様図１を参照すると、ルータ設計は、多数のネットワーク・インバウンド・イン
タフェース１、ネットワーク・アウトバウンド・インタフェース２、転送エンジ
ン３、及びネットワーク・プロセッサ４を備えているが、これらは全て接続ファ
ブリック５によって相互接続されている。インバウンド・インタフェース１は接
続ファブリック５を通じてパケット・ヘッダを転送エンジン３に送信する。一方
転送エンジン３はパケットをどの出力インタフェース２に送信すべきかを決定す
る。この情報はインバウンド・インタフェース１に返送され、そこからパケット
がアウトバウンド・インタフェース２に転送される。転送エンジン３の唯一のタ
スクはパケット・ヘッダを処理することである。ルーティング・プロトコルへの
関与、リソースの確保、特別な注意を必要とするパケットの処理、及び他の管理
動作といった他の全てのタスクはネットワーク・プロセッサ４によって処理され
る。In order to more fully describe the present invention and to explain the advantages and features of the present invention, preferred embodiments are described in detail below with reference to the accompanying drawings. DETAILED DESCRIPTION OF THE INVENTION Referring to FIG. 1, a router design includes a number of network inbound interfaces 1, a network outbound interface 2, a forwarding engine 3, and a network processor 4, all of which are connected. Interconnected by a fabric 5. The inbound interface 1 sends the packet header to the forwarding engine 3 through the connection fabric 5. On the other hand, the forwarding engine 3 determines to which output interface 2 the packet should be transmitted. This information is returned to the inbound interface 1 from which the packet is forwarded to the outbound interface 2. The only task of the forwarding engine 3 is to process the packet header. All other tasks, such as participating in routing protocols, reserving resources, handling packets that require special attention, and other management operations, are handled by the network processor 4.

【００１９】各転送エンジン３は、ネットワーク・プロセッサ４からダウンロードされ、転
送エンジン３中の記憶手段に保存されたルーティング・テーブルのローカル・バ
ージョンである転送テーブルを使用してルーティングの決定を行う。ルーティン
グ・アップデートの度に新しい転送テーブルをダウンロードする必要はない。ル
ーティング・アップデートは頻繁であるが、ルーティング・プロトコルは収束す
るため若干の時間を要するので、転送テーブルはそれほど陳腐化せず、せいぜい
１秒に１回程度しか変更する必要はない（スタンフォード大学高速ルーティング
及びスイッチングに関するワークショップ、１９９６年１２月、http://tiny-te
ra.stanford.edu/Workshop＿Dec96 ／で開示）。Each forwarding engine 3 makes a routing decision using a forwarding table that is a local version of the routing table downloaded from the network processor 4 and stored in storage means in the forwarding engine 3. There is no need to download a new forwarding table for each routing update. Although routing updates are frequent, the routing tables take some time to converge, so the forwarding tables are not very stale and need to be changed at most once per second (Stanford University Fast Routing And Switching Workshop, December 1996, http: // tiny-te
ra.stanford.edu/Workshop_Dec96/).

【００２０】ネットワーク・プロセッサ４は、転送テーブルの高速アップデートと高速生成
のために設計された動的ルーティング・テーブルを必要とする。他方、転送テー
ブルはルックアップ速度について最適化することができ、動的である必要はない
。ルックアップ時間を最小化するために、ルックアップの期間中必要なメモリ・
アクセスの数と、データ構造のサイズという２つのパラメータを転送テーブルの
データ構造において最小化しなければならない。The network processor 4 needs a dynamic routing table designed for fast updating and fast generation of forwarding tables. On the other hand, the forwarding table can be optimized for lookup speed and does not need to be dynamic. The memory required during the lookup to minimize lookup time
Two parameters, the number of accesses and the size of the data structure, must be minimized in the transfer table data structure.

【００２１】メモリ・アクセスは比較的低速で普通ルックアップ手順のボトルネックとなる
ため、ルックアップの期間中必要なメモリ・アクセスの数を減少させることは重
要である。データ構造は十分に小さくすることができれば、従来のマイクロプロ
セッサのキャッシュに完全に収まる。これは、パトリシア木の場合のように、デ
ータ構造が比較的低速なDRAMからなるメモリに存在する必要がある場合より何桁
も高速になることを意味する。It is important to reduce the number of memory accesses required during a lookup, since memory accesses are relatively slow and typically bottleneck the lookup procedure. If the data structure can be made small enough, it will fit entirely in a conventional microprocessor cache. This means that the data structure is orders of magnitude faster than if it had to reside in a relatively slow DRAM memory, as in the Patricia tree.

【００２２】転送テーブルがキャッシュに完全に収まらないばあいでも、テーブルの大部分
がキャッシュに存在することができれば有益である。トラフィック・パターンの
局所性によってデータ構造の最も頻繁に使用される部分がキャッシュに保持され
るので、大部分のルックアップが高速になる。さらに、少量の必要な外部メモリ
として高速SRAMを使用することが実行可能になる。SRAMは高価であり、高速であ
るほどさらに高価になる。費用が一定の場合、SRAMは必要量が少ない方が高速で
ある。Even if the forwarding table does not fit entirely in the cache, it would be beneficial if most of the table could be in the cache. Most lookups are fast because the locality of the traffic pattern keeps the most frequently used parts of the data structure in the cache. In addition, the use of high-speed SRAM as a small amount of required external memory becomes feasible. SRAM is expensive, and the higher the speed, the more expensive. For a fixed cost, SRAM is faster when the required amount is smaller.

【００２３】第２の設計目標として、費用のかかる命令と面倒なビット抽出操作を回避する
ため、データ構造は、ルックアップの期間中に必要な命令が少なく、できる限り
エンティティを自然に整列した状態に保持できるものであるべきである。データ構造に関する定量的設計パラメータを決定するため、以下説明されるよ
うに、多数の大規模ルーティング・テーブルがこれまで検討されている。これら
のテーブルに存在する別個のルーティング・エントリは４０，０００とかなり少
ない。ネクストホップが同一であれば、残りのルーティング情報も同じであるの
で、同じネクストホップを指定する全てのルーティング・エントリはルーティン
グ情報を共有できる。ルータのルーティング・テーブル中の別個のネクストホッ
プの数は、１つのホップで到達できる他のルータまたはホストの数によって制限
されるので、大規模バックボーン・ルータの場合でもこの数が小さいのは驚くべ
きことではない。しかし、ルータが例えば大規模ATM ネットワークに接続されて
いる場合、ネクストホップの数はもっと多いことがある。As a second design goal, to avoid costly instructions and cumbersome bit extraction operations, the data structure requires fewer instructions during the look-up and keeps the entities as natural as possible. Should be able to be retained. Numerous large-scale routing tables have been discussed to determine quantitative design parameters for data structures, as described below. There are quite a few distinct routing entries in these tables, 40,000. If the next hop is the same, the rest of the routing information is the same, so that all routing entries that specify the same next hop can share the routing information. It is surprising that this number is small even for large backbone routers, since the number of distinct next hops in the router's routing table is limited by the number of other routers or hosts that can be reached by one hop. Not that. However, if the router is connected to a large ATM network, for example, the number of next hops may be higher.

【００２４】この実施態様では、転送テーブル・データ構造は２¹⁴すなわち１６Ｋの個別ネ
クストホップに対応するよう設計されているが、これは大部分の場合十分である
。別個のネクストホップが２５６より少ない場合、ネクストホップ・テーブルへ
の索引は１つのバイトに保存できるので、ここで説明される転送テーブルは、別
の実施態様で占有する空間がかなり少なくなるように修正できる。In this embodiment, the forwarding table data structure is designed to accommodate 2 ¹⁴ or 16K individual next hops, which is sufficient in most cases. If there are fewer than 256 distinct next hops, the forwarding table described here is modified to take up significantly less space in another embodiment, since the index into the next hop table can be stored in one byte. it can.

【００２５】転送テーブルは本質的に３つのレベルを有する木である。１つのレベルの探索
には１〜４のメモリ・アクセスが必要である。従って、メモリ・アクセスの最大
数は１２である。しかし、従来のルーティング・テーブルではルックアップの大
多数が必要とするのは１〜２レベルだけなので、メモリ・アクセス数の大部分は
８以下である。A forwarding table is essentially a tree with three levels. One level of search requires 1-4 memory accesses. Therefore, the maximum number of memory accesses is twelve. However, in conventional routing tables, the majority of lookups require only one or two levels, so the majority of memory accesses are less than eight.

【００２６】データ構造を理解する目的で、図２に示される、IPアドレス空間全体に広がる
二分木６を想像のこと。その深さは３２であり、葉の数は２³²であるが、これは
可能な各IPアドレスについて１つである。ルーティング・テーブル・エントリの
プレフィックスはあるノードを末端とする木の中の経路を規定する。そのノード
に根付いた部分木の全てのIPアドレス（葉）はそのルーティング・エントリによ
ってルーティングされる。この方法で各ルーティング・テーブル・エントリは同
一のルーティング情報を有するIPアドレスの範囲を規定する。For the purpose of understanding the data structure, imagine the binary tree 6 shown in FIG. 2 and spread over the entire IP address space. Its depth is 32 and the number of leaves is 2 ³² , one for each possible IP address. The prefix of the routing table entry defines a path in the tree terminating at a certain node. All IP addresses (leaves) of the subtree rooted at that node are routed by that routing entry. In this way, each routing table entry defines a range of IP addresses that have the same routing information.

【００２７】いくつかのルーティング・エントリが同じIPアドレスを対象にしている場合、
最長一致の規則が適用される。それによれば、あるIPアドレスについて、最長一
致プレフィックスを有するルーティング・エントリが使用される。この状況は図
３に示されている。ルーティング・エントリｅ１は、範囲ｒ中のアドレスについ
てｅ２によって隠されている。If several routing entries target the same IP address,
The longest match rule applies. Thereby, for an IP address, the routing entry with the longest matching prefix is used. This situation is shown in FIG. Routing entry e1 is hidden by e2 for addresses in range r.

【００２８】転送テーブルは、全てのルーティング・エントリがまたがる二分木６、プレフ
ィックス木７の表示である。プレフィックス木は完全であること、すなわち木の
各ノードは２つの子を有するかまたは子を有さない何れかであることが必要であ
る。１つの子を有するノードは２つの子を有するように拡張されなければならな
い。この形で追加された子は常に葉であり、それらのネクストホップ情報は、ネ
クストホップ情報を有する最も近い先祖のネクストホップと同じであるか、また
はそのような先祖が存在しない場合「規定されない」ネクストホップである。The forwarding table is an indication of a binary tree 6 and a prefix tree 7 spanning all routing entries. The prefix tree needs to be complete, that is, each node of the tree must either have two children or have no children. Nodes with one child must be expanded to have two children. Children added in this fashion are always leaves, and their next hop information is the same as the next hop of the nearest ancestor with next hop information, or `` undefined '' if no such ancestor exists Next hop.

【００２９】図４に示されるこの手順は、プレフィックス木７中のノードの数を増加させる
が、小さな転送テーブルの構築を可能にする。しかし、転送テーブルを構築する
ために実際にプレフィックス木を構築する必要はない。プレフィックス木が使用
されるのは説明を簡単にするためである。転送テーブルは、全てのルーティング
・エントリを１回通過する間に構築できる。This procedure, shown in FIG. 4, increases the number of nodes in the prefix tree 7, but allows the construction of a small forwarding table. However, it is not necessary to actually build the prefix tree to build the forwarding table. The prefix tree is used for simplicity. The forwarding table can be built during one pass through all routing entries.

【００３０】ルーティング・エントリの集合は、IPアドレス空間をIPアドレスの集合に分割
する。正しいルーティング情報を見出す問題は、インターバル・セット・メンバ
ーシップ問題（SIAMコンピュータジャーナル、１７（１）：１０９３〜１１０２
、１９８８年１２月で開示）と同様である。この場合、各インターバルはプレフ
ィックス木中のノードによって規定されるので、転送テーブルを圧縮するために
使用できる特性を有する。例えば、IPアドレスの各範囲は２の累乗である長さを
有する。The set of routing entries divides the IP address space into a set of IP addresses. The problem of finding the correct routing information is the interval set membership problem (SIAM Computer Journal, 17 (1): 1093-11102).
, December 1988). In this case, since each interval is defined by a node in the prefix tree, it has properties that can be used to compress the forwarding table. For example, each range of IP addresses has a length that is a power of two.

【００３１】図５に示されるように、データ構造のレベル１は深さ１６までのプレフィック
ス木を対象とし、レベル２は深さ１７から２４までを対象とし、レベル３は深さ
２５から３２までを対象とする。プレフィックス木の一部がレベル１６の下に延
びた場合はいつでも、レベル２チャンク(chunk）が木のその部分を記述する。同
様に、レベル３のチャンクはプレフィックス木の２４より深い部分を記述する。
データ構造のレベルを探索した結果は、ネクストホップ・テーブルへの索引かま
たは次のレベルのチャンクの配列への索引である。As shown in FIG. 5, level 1 of the data structure targets a prefix tree up to a depth of 16, level 2 targets a depth of 17 to 24, and level 3 of a data structure ranges from a depth of 25 to 32 Target. Whenever a part of the prefix tree extends below level 16, a level two chunk describes that part of the tree. Similarly, level 3 chunks describe portions deeper than 24 in the prefix tree.
The result of searching the level of the data structure is an index into the next hop table or an array into the next level of the chunk array.

【００３２】例えば、図５のデータ構造のレベル１には、深さ１６でプレフィックス木７の
カット(cut）が存在する。カットは深さ１６で可能なノード１つ毎に１ビットの
ビット・ベクトルに保存される。すなわち、２¹⁶ビット＝６４Ｋビット＝８Ｋバ
イトが必要である。IPアドレスの最初の部分に対応するビットを見出すため、ア
ドレスの上部の１６ビットがビット・ベクトルへの索引として使用される。For example, at level 1 in the data structure of FIG. 5, there is a cut of the prefix tree 7 at a depth of 16. The cut is stored in a bit vector of one bit for each possible node at depth 16. That is, it requires two ^16-bit = 64K bits = 8K bytes. To find the bit corresponding to the first part of the IP address, the top 16 bits of the address are used as an index into the bit vector.

【００３３】深さ１６のプレフィックス木にノードが存在する場合、ベクトル中の対応する
ビットが設定される。また、木が１６未満の深さに葉を有する場合、その葉の対
象となるインターバルの最下位ビットが設定される。他の全てのビットは０であ
る。すなわち、ビット・ベクトル中のビットは、プレフィックス木がカットの下まで続くことを表す１、根ヘッド(root head)(
図６のビット６、１２及び１３）、または、深さ１６またはそれ未満の葉を表す１、純粋ヘッド(genuine head)（図６のビ
ット０、４、７、８、１４及び１５）、または、この値が１６未満の深さの葉の対象となる範囲のメンバ（member) であること
を意味する０（図６のビット１、２、３、５、９、１０及び１１）である。メン
バは前記メンバより小さい最大ヘッドと同じネクストホップを有する。If a node exists in the depth 16 prefix tree, the corresponding bit in the vector is set. If the tree has leaves at a depth of less than 16, the least significant bit of the target interval for the leaves is set. All other bits are zero. That is, the bits in the bit vector are 1, the root head (root head (
Bits 6, 12 and 13 in FIG. 6) or 1, representing a leaf at depth 16 or less, a genuine head (bits 0, 4, 7, 8, 14 and 15 in FIG. 6), or 0 (bits 1, 2, 3, 5, 9, 10, and 11 in FIG. 6) meaning that this value is a member of the range of interest for leaves with a depth less than 16. The member has the same next hop as the largest head smaller than the member.

【００３４】純粋ヘッドの場合、ネクストホップ・テーブルへの索引を保存しなければなら
ない。メンバは前記メンバより小さい最大ヘッドと同じ索引を使用する。根ヘッ
ドの場合、対応する部分木を表すレベル２チャンクへの索引を保存しなければな
らない。ヘッド情報は配列に保存された１６ビット・ポインタで符号化される。
ポインタの２ビットはそれがどんな種類のポインタかを符号化し、残りの１４ビ
ットはネクストホップ・テーブルまたはレベル２チャンクを含む配列の何れかへ
の索引である。For a pure head, an index to the next hop table must be stored. The member uses the same index as the largest head smaller than the member. For the root head, the index to the level 2 chunk representing the corresponding subtree must be stored. The head information is encoded with a 16-bit pointer stored in the array.
The two bits of the pointer encode what kind of pointer it is, and the remaining 14 bits are an index into either the next hop table or an array containing level 2 chunks.

【００３５】適切なポインタを見出すために、ビット・ベクトルは長さ１６のビット・マス
クに分割されるが、それらは２¹²＝４０９６存在する。さらに、配列でのポイン
タの位置は、ベース索引、６ビット・オフセット及び４ビット・オフセットとい
う３つのエンティティを加算することで得られる。ベース索引プラス６ビット・
オフセットによって特定のビット・マスクに対応するポインタがどこに保存され
るかが決定される。４ビット・オフセットはポインタの中のどれを検索すべきか
を指定する。図７は、これらのエンティティを見出す方法を示す。以下の段落は
その手順の詳細説明である。In order to find the appropriate pointer, the bit vector is split into bit masks of length 16, where there are 2 ¹² = 4096. Furthermore, the position of the pointer in the array is obtained by adding three entities: the base index, the 6-bit offset and the 4-bit offset. Base index plus 6 bits
The offset determines where the pointer corresponding to a particular bit mask is stored. The 4-bit offset specifies which of the pointers to retrieve. FIG. 7 shows how to find these entities. The following paragraphs are a detailed description of the procedure.

【００３６】ビット・マスクは完全なプレフィックス木から生成されるので、１６ビットの
全ての組合せが可能なわけではない。長さ２ｎの０でないビット・マスクは、長
さｎの２つのビット・マスクか、または値１のビット・マスクの何らかの組合せ
である。ａ（ｎ）が長さ２ⁿのありうる０でないビット・マスクの数であるとす
る。ａ（ｎ）は次の漸化式によって規定される。Since the bit mask is generated from the complete prefix tree, not all combinations of 16 bits are possible. A non-zero bit mask of length 2n is either two bit masks of length n or some combination of bit masks of value one. Let a (n) be the number of possible non-zero bit masks of length 2 ⁿ . a (n) is defined by the following recurrence formula.

【００３７】ａ（０）＝１、ａ（ｎ）＝１＋ａ（ｎ−１）² すなわち、長さ１６のありうるビット・マスクの数はａ（４）＋１＝６７８で
あるが、１が加算されるのは、ビット・マスクが０のこともあるからである。従
って各ビット・マスクに対するエントリを有するテーブルへの索引が必要とする
のは１０ビットだけである。A (0) = 1, a (n) = 1 + a (n−1) ² That is, the number of possible bit masks of length 16 is a (4) + 1 = 678, but 1 is added This is because the bit mask may be zero. Thus, an index into a table with an entry for each bit mask needs only 10 bits.

【００３８】このテーブル、すなわちマップテーブルは、ビット・マスク範囲内のビット数
を４ビット・オフセットにマップするために保持される。このオフセットは必要
なポインタを見出すためにいくつポインタをスキップするかを指定するので、ビ
ット索引より小さいセット・ビットの数に等しい。これらのオフセットは、ポイ
ンタがたまたまどんな値を有しているかということとは無関係に全ての転送テー
ブルについて同じである。マップテーブルは一定で、一度に全てについて生成さ
れる。This table, the map table, is maintained to map the number of bits in the bit mask range to a 4-bit offset. This offset is equal to the number of set bits less than the bit index, since it specifies how many pointers to skip to find the required pointer. These offsets are the same for all forwarding tables, regardless of what value the pointer happens to have. The map table is constant and is generated for all at once.

【００３９】ありうるビットマスクは、長さが２の偶数乗であり、同じ２の累乗の倍数であ
るビット索引で始まるビットのインターバルが１）全ての０を含むか、または２
）最下位ビット・セットを有するかの何れかであるという特性を有する。マップテーブルによって提供されるオフセットへのビットマスクとビットから
のマッピングは、プレフィックス木を完全にしマッピングの使用を可能にするプ
レフィックスの拡張と共に、小さい転送テーブルサイズを達成する鍵である。Possible bit masks are even powers of 2 in length, with an interval of bits starting at a bit index that is a multiple of the same power of 2 1) containing all 0s or 2
) Has the property of either having the least significant bit set. The mapping from bit masks and bits to offsets provided by the map table is key to achieving a small forwarding table size, with extension of the prefix to complete the prefix tree and enable the use of the mapping.

【００４０】実際のビット・マスクは必要ではなく、ビット・ベクトルの代わりに、マップ
テーブルへの１０ビットの索引プラス６ビット・オフセットからなる１６ビット
符号語(cord word）の配列を保持する。６ビット・オフセットは６４までのポイ
ンタを対象とするので、４つの符号語毎に１つのベース索引が必要である。最大
６４Ｋのポインタが存在しうるので、ベース索引は最大１６ビット（２¹⁶＝６４
Ｋ）である必要がある。The actual bit mask is not required, and instead of a bit vector, it holds an array of 16-bit code words consisting of a 10-bit index into the map table plus a 6-bit offset. Since the 6-bit offset covers up to 64 pointers, one base index is required for every four codewords. Since there can be up to 64K pointers, the base index can be up to 16 bits (2 ¹⁶ = 64)
K).

【００４１】図７に示される手順では、データ構造の第１レベルを探索するために疑似符号
の次のステップが必要であるが、そこでは符号語の配列が符号と呼ばれ、ベース
・アドレスの配列がベースと呼ばれ、ixは符号語の配列中のIPアドレスの第１索
引部分であり、bit はマップテーブルへのIPアドレスの列索引部分であり、ten
はマップテーブルへの符号語の行索引部分であり、bix はベース・アドレスの配
列へのIPアドレスの第２索引部分であり、pix は、レベル２チャンクへの索引と
共にネクストホップ・テーブルへの索引を含む、ポインタの配列へのポインタで
ある。In the procedure shown in FIG. 7, the next step of the pseudo code is required to search the first level of the data structure, where the array of code words is called a code and the base address is The array is called the base, ix is the first index of the IP address in the array of codewords, bit is the column index of the IP address into the map table, and ten
Is the row index part of the codeword into the map table, bix is the second index part of the IP address into the array of base addresses, and pix is the index into the next hop table along with the index into the level 2 chunk. Is a pointer to an array of pointers.

【００４２】 ix ：＝IPアドレスの上位１２ビット bit ：＝IPアドレス符号語の上位１６ビットの下位４：＝符号［ix］ ten ：＝符号語six からの１０ビット：＝符号語からの６ビット bix ：＝IPアドレスの上位１０ビット pix ：＝ベース［bix]＋six ＋マップテーブル［ten]［bit]ポインタ：＝レベ
ル１＿ポインタ［pix] すなわち、数ビットの抽出、配列参照及び加算だけが必要である。配列を索引
する際の暗黙の乗算以外、乗算または除算命令は必要ない。Ix: = upper 12 bits of IP address bit: = lower 16 bits of upper 16 bits of IP address code word: = sign [ix] ten: = 10 bits from code word six: = 6 bits from code word bix: = higher 10 bits of IP address pix: = base [bix] + six + map table [ten] [bit] pointer: = level 1_pointer [pix] That is, only extraction of several bits, array reference and addition are necessary is there. No multiplication or division instructions are required, other than the implicit multiplication when indexing the array.

【００４３】第１レベルを探索するためにアクセスする必要があるのは合計７バイト、すな
わち、２バイトの符号語、２バイトのベース・アドレス、マップテーブル中の１
バイト（実際には４ビット）、最後に２バイトのポインタである。第１レベルの
サイズは符号語配列について８Ｋバイト、ベース索引の配列について２Ｋバイト
、プラス多数のポインタである。マップテーブルが必要とする５．３Ｋバイトは
３つのレベル全てで共有される。To search the first level, a total of 7 bytes need to be accessed, ie, a 2-byte codeword, a 2-byte base address, a 1-byte in the map table.
A byte (actually 4 bits), and finally a 2 byte pointer. The first level size is 8K bytes for the codeword array, 2K bytes for the base index array, plus a number of pointers. The 5.3 Kbytes required by the map table are shared by all three levels.

【００４４】ビットマスクが０であるかまたは１つのビット集合(bit set）を有する場合、
ポインタはネクストホップ・テーブルへの索引でなければならない。こうしたポ
インタは直接符号語に符号化できるので、マップテーブルはビット・マスク１及
び０のためのエントリを含む必要はない。すなわち、マップテーブル・エントリ
の数は６７６（索引０〜６７５）まで減少する。符号語（上記のten)の１０ビッ
トが６７５より大きい場合、符号語はネクストホップ・テーブルへの直接索引を
表す。符号語からの６ビットが索引の最下位６ビットとして使用され、(ten−６
７６）が索引の上位ビットである。この符号化によって最大（１０２４−６７６
）×２⁶＝２２２７２の索引が可能になるが、これは設計の対象である１６Ｋよ
り多い。この最適化によって、ルーティング・エントリが深さ１２以上にある時
３つのメモリ参照が除去され、ポインタ配列のポインタの数がかなり減少する。
これは比較及び条件付き分岐による。If the bit mask is 0 or has one bit set,
The pointer must be an index into the next hop table. Since such pointers can be encoded directly into codewords, the map table need not include entries for bit masks 1 and 0. That is, the number of map table entries is reduced to 676 (index 0-675). If the 10 bits of the codeword (ten above) are greater than 675, the codeword represents a direct index into the next hop table. The 6 bits from the codeword are used as the least significant 6 bits of the index, and (ten-6
76) is the upper bit of the index. By this encoding, the maximum (1024-676) is obtained.
) × 2 ⁶ = 22272 indices, which is more than the 16K to be designed. This optimization removes three memory references when the routing entry is deeper than 12, and significantly reduces the number of pointers in the pointer array.
This is due to comparisons and conditional branches.

【００４５】マッピングはＣプログラミング言語の以下のデータと関数によって示される。この符号の重要な特徴は、それが提供する個別のビットマスクからオフセット
の配列へのマッピングである。 mtable（及びmt）のエントリを入れ替えるといった変化があっても与えられる
マッピングは同じである。従って、４つの１６ビット語の代わりに２つの３２ビ
ット語といった、オフセットの配列を表す他の方法もあるだろう。／^*マップテーブル^*／ #define MAPVECLEN 4 typedef uint16 MAPVEC [MAPVECLEN] ; typedef MAPVEC MCOMPACT; MCOMPACT mt [TMAX]; ／^*mtはルックアップ時に使用されるマップテーブルである。The mapping is represented by the following data and functions in the C programming language. An important feature of this code is the mapping from the individual bit masks it provides to an array of offsets. Even if there is a change such as exchanging entries of mtable (and mt), the given mapping is the same. Thus, there would be other ways to represent an array of offsets, such as two 32-bit words instead of four 16-bit words. / ^* Map table ^* / # define MAPVECLEN 4 typedef uint16 MAPVEC [MAPVECLEN]; typedef MAPVEC MCOMPACT; MCOMPACT mt [TMAX]; / ^* mt is a map table used at the time of lookup.

【００４６】構築中に使用されるmtableから初期化される。^*／ typedef struct mentry ｛ uint16 mask; ／^*１６ビット・パターン（LSB は設定！）^*／ uint16 len; ／^*８ビットlen （値１〜１６）^*／ MAPVEC map；／^*４のグループ中の１６の４ビットオフセット^*／｝ MENTRY, *MP; void mentry2mcompact(MENTRY *from, MCOMPACT *to)／^*「from」からマップ部
分を取り、それを「to」に置く^*／｛ int i; for(i=0; i<MAPVECLEN; i++)｛ (*to) [i]=from->map [i]; ｝｝ extern void mtable＿compact() ／^*mtableからmtを初期化する^*／｛ register int i; for(i=0; i<TMAX; i++) ｛ mentry2mcompact(&mtable [i], & mt [i]); ｝｝データ構造のレベル２及び３はチャンクからなる。チャンクは高さ８の部分木
を対象とし、最大２⁸＝２５６ヘッドを含みうる。レベルｎ−１の根ヘッドはレ
ベルｎのチャンクを指す。Initialized from the mtable used during construction. ^{* / Typedef struct mentry {uint16 mask} ; / * 16 bit pattern (LSB is ^{set!) * / Uint16 len; /} * 8 bits len (values ^{1~16) * / MAPVEC map; /} * 4 of 16 in the group ^* /｝ MENTRY, * MP; void mentry2mcompact (MENTRY * from, MCOMPACT * to) / ^* Take the map part from “from” and put it in “to” ^* / ｛int i; for ( i = 0; i <MAPVECLEN; i ++) ｛(* to) [i] = from-> map [i];｝｝ extern void mtable_compact () / ^* Initialize mt from mtable ^* / ｛register int i; for (i = 0; i <TMAX; i ++) ment mentry2mcompact (& mtable [i], & mt [i]);｝レベル Levels 2 and 3 of the data structure consist of chunks. The chunk is intended for a subtree of height 8 and may contain up to 2 ⁸ = 256 heads. The level n-1 root head points to the level n chunk.

【００４７】イマジナリ・ビット・ベクトルに含まれるヘッドの数によって、チャンクには
３つの種類がある。すなわち、１〜８のヘッドが存在する場合、チャンクは疎で
あり、ヘッドの８ビット索引の配列、それに加えて８つの１６ビット・ポインタ
、すなわち合計２４バイトによって表される。９〜６４のヘッドが存在する場合、チャンクは稠密である。それは、ベース索
引の数以外レベル１と同様に表される。相違点は、６ビット・オフセットが６４
ポインタ全てに及ぶため、１６ビット符号語全てについて必要なベース索引は１
つだけだということである。合計３４バイト、それに加えてポインタのための１
８〜１２８バイトが必要である。There are three types of chunks depending on the number of heads included in the imaginary bit vector. That is, if there are 1 to 8 heads, the chunk is sparse and is represented by an array of 8-bit indices of the heads, plus 8 16-bit pointers, for a total of 24 bytes. If there are 9 to 64 heads, the chunk is dense. It is represented similarly to level 1 except for the number of base indexes. The difference is that the 6-bit offset is 64
The base index required for all 16-bit codewords is 1 to span all pointers.
It is only one. 34 bytes total, plus 1 for pointer
8 to 128 bytes are required.

【００４８】６５〜２５６のヘッドが存在する場合、チャンクは超稠密である。それはレベ
ル１と同様に表される。１６の符号語と４のベース索引で合計４０バイトとなる
。さらに６５〜２５６のポインタが１３０〜５１２バイトを必要とする。稠密及び超稠密チャンクは第１レベルと同様に探索される。疎なチャンクは、８要素用によく合わせた専用二分探索によって探索される。
これは線形探索及び汎用二分探索よりかなり高速である。さらに、この探索は、
条件付き移動命令を有するプロセッサ・アーキテクチャ上で従来の条件付き飛び
越しを使用せずに実現できる。／^*疎なチャンクのための探索関数^*／ static inline uint16 findsparse(SPARSECHUNK *chu, uint32 val) ／^*chu->vals は８ビット値のソートされた配列０．．７である。If there are 65-256 heads, the chunk is super-dense. It is represented as in level 1. Sixteen codewords and four base indices add up to a total of 40 bytes. In addition, 65-256 pointers require 130-512 bytes. Dense and ultra-dense chunks are searched as in the first level. Sparse chunks are searched for by a dedicated binary search that is well tailored for the eight elements.
This is significantly faster than linear search and general binary search. In addition, this search
It can be implemented on a processor architecture with conditional move instructions without using conventional conditional jumps. / ^* Search function for sparse chunks ^* / static inline uint16 findsparse (SPARSECHUNK * chu, uint32 val) / ^* chu-> vals is a sorted array of 8-bit values. . 7

【００４９】 chu->rinfoはポインタの対応する配列０．．７である。 val は探索キーである^* ／｛ uint8 ^*p,^*q; p=q=&(chu->vals ［0 ］); p+=(^*(p-3)>val)<<2; p+=(^*(p+1)>val)<<1; p+=(^*p>val); return (chu->rinfo［p-q ］); ｝稠密及び超稠密チャンクは、説明されたようにレベル１と同様に最適化される
。疎なチャンクでは、２つの連続ヘッドのネクストホップが同一な場合それらを
併合（merge)し、小さい方で表すことができる。チャンクが疎か稠密かを決定す
る場合、この併合が考慮されるので、併合されたヘッドの数が８以下の場合チャ
ンクは疎であると考えられる。木を完全にするために追加された葉の多くは順番
に発生し、同一のネクストホップを有する。こうした葉に対応するヘッドは併合
されて疎なチャンクになる。Chu-> rinfo is an array of pointers 0. . 7 val is the search key ^* / ｛uint8 ^* p, ^* q; p = q = &(chu-> vals [0]); p + = ( ^* (p-3)> val) <<2; p + = ( ^* (p + 1)> val) <<1; p + = ( ^* p>val); return (chu-> rinfo [pq]);｝ Dense and ultra-dense chunks are the same as Level 1 as described Optimized for In a sparse chunk, if the next hops of two consecutive heads are the same, they can be merged and represented by the smaller one. This merge is taken into account when determining whether a chunk is sparse or dense, so a chunk is considered sparse if the number of merged heads is 8 or less. Many of the leaves added to complete the tree occur sequentially and have the same next hop. Heads corresponding to these leaves are merged into sparse chunks.

【００５０】この最適化によって、チャンクの分布はさらに大きい稠密なチャンクからさら
に小さい疎なチャンクの方向にシフトされる。大きなテーブルの場合、転送テー
ブルのサイズは通常５〜１５パーセント縮小される。このデータ構造はルーティング・エントリがかなり増大しても対応できる。現
在の設計には２つの制限がある。This optimization shifts the distribution of chunks from larger dense chunks to smaller sparse chunks. For large tables, the size of the transfer table is typically reduced by 5 to 15 percent. This data structure can accommodate a significant increase in routing entries. Current designs have two limitations.

【００５１】１．各種類のチャンクの数はレベル当たり２¹⁴ １６３８４に制限される。表１は、これが現在使用されているものより約１６倍大きいことを示している
。それでもこの制限を越える場合、ポインタの符号化を変更し索引にさらに余裕
を与えるようにするか、またはポインタ・サイズを増大するようにデータ構造を
修正できる。1. The number of chunks of each type is limited to 2 ¹⁴ 16384 per level. Table 1 shows that this is about 16 times larger than currently used. If the limit is still exceeded, the encoding of the pointer can be changed to allow more room for the index, or the data structure can be modified to increase the pointer size.

【００５２】２．レベル２及び３のポインタの数はベース索引のサイズによって制限される
。現在の実現は１６ビット・ベース索引を使用し、３〜５の増大係数に対応でき
る。この限度を越える場合、ベース・ポインタのサイズを３バイトに増大するの
が簡単である。チャンク・サイズは稠密チャンクの場合３パーセント、超稠密チ
ャンクの場合１０パーセント増大する。疎なチャンクには影響はない。[0052] 2. The number of level 2 and 3 pointers is limited by the size of the base index. Current implementations use a 16-bit based index and can accommodate growth factors of 3-5. Beyond this limit, it is easy to increase the size of the base pointer to 3 bytes. Chunk size increases by 3 percent for dense chunks and 10 percent for ultra-dense chunks. Sparse chunks have no effect.

【００５３】このデータ構造が、ルーティング・エントリの数の大きな増加に対応できるこ
とは明らかである。そのサイズはルーティング・エントリの数と共にほぼ直線的
に増大する。転送テーブルの性能を調べるために、多数のIPルーティング・テーブルが集め
られた。インターネット・ルーティング・テーブルは現在インターネット・パフ
ォーマンス・メジャーメント・アンド・アナリシス（IPMA）プロジェクトのウェ
ブサイト(http://www.ra.net/statistics/) で入手可能であり、以前はルーティ
ング・アービタ・プロジェクト(http://www.ra.net/statistics/）で入手可能と
なっていたが、これは現在終了している。収集されたルーティング・テーブルは
、様々な大規模インターネット相互接続点で使用されるルーティング・テーブル
の日々のスナップショットである。これらのテーブルのルーティング・エントリ
の中には多数のネクストホップを含むものもある。この場合、それらの１つが、
転送テーブルで使用されるネクストホップとして無作為に選択された。It is clear that this data structure can accommodate a large increase in the number of routing entries. Its size increases almost linearly with the number of routing entries. A number of IP routing tables were assembled to examine the performance of the forwarding tables. The Internet Routing Table is now available on the Internet Performance Measurement and Analysis (IPMA) project website (http://www.ra.net/statistics/) and was formerly a routing arbiter. It was available at the project (http://www.ra.net/statistics/), but is now closed. The collected routing tables are daily snapshots of the routing tables used at various large Internet interconnect points. Some of the routing entries in these tables contain multiple next hops. In this case, one of them
It was randomly selected as the next hop to be used in the forwarding table.

【００５４】図８の表１は様々なルーティング・テーブルから構成された転送テーブルに関
するデータを示す。各サイトについて、この表は最大転送テーブルを生成したル
ーティング・テーブルに関するデータと結果を示す。ルーティング・エントリは
ルーティング・テーブル中のルーティング・エントリの数であり、ネクストホッ
プはテーブル中に見られる個別のネクストホップの数である。葉はプレフィック
ス木を完全にするために葉が追加された後のプレフィックス木中の葉の数である
。Table 1 in FIG. 8 shows data related to a forwarding table composed of various routing tables. For each site, this table shows the data and results for the routing table that generated the maximum forwarding table. The routing entry is the number of routing entries in the routing table, and the next hop is the number of individual next hops found in the table. Leaves is the number of leaves in the prefix tree after the leaves have been added to complete the prefix tree.

【００５５】表１の構築時間は、ルーティング・テーブルのメモリ内二分木表示から転送テ
ーブルを生成するために必要な時間を示す。時間はDEC OSF1を実行する３３３Ｍ
Ｈｚアルファ２１１６４で測定された。次の列は生成されたテーブルでの疎、稠
密及び超稠密チャンクの合計数を示し、それに、データ構造の最下位レベルのチ
ャンクの数が続く。The construction time in Table 1 indicates the time required to generate a forwarding table from the in-memory binary tree representation of the routing table. 333M running DEC OSF1 time
Hz Alpha 21164 was measured. The next column shows the total number of sparse, dense and ultra-dense chunks in the generated table, followed by the number of lowest level chunks in the data structure.

【００５６】表１から、新しい転送テーブルが急速に生成できることが明らかである。１Ｈ
ｚの再生周波数では、消費されるアルファの能力は１０分の１未満である。上記
で説明されたように、１Ｈｚより高い再生周波数は必要ない。表１のさらに大きなテーブルはアルファの９６Ｋバイト２次キャッシュには完
全に収まらない。しかし、２次キャッシュに収まらない部分のための第３レベル
・キャッシュに少量の超高速SRAMを有し、２次キャッシュのミスの費用を低減す
ることは実行可能である。トラフィック・パターンの局所性によって、大部分の
メモリ参照は２次キャッシュへのものとなる。From Table 1, it is clear that a new forwarding table can be generated quickly. 1H
At a playback frequency of z, the alpha power consumed is less than one-tenth. As explained above, playback frequencies higher than 1 Hz are not required. The larger table in Table 1 does not fit entirely into Alpha's 96Kbyte secondary cache. However, it is feasible to have a small amount of ultra-fast SRAM in the third level cache for parts that do not fit in the secondary cache and reduce the cost of secondary cache misses. Due to the locality of the traffic pattern, most memory references are to the secondary cache.

【００５７】観察された興味深い点は、これらのテーブルのサイズが、配列中の全てのプレ
フィックスをちょうど保存するために要するものと同等だということである。さ
らに大きなテーブルでは、プレフィックス毎に必要なのはわずか５．６バイトに
過ぎない。これらのバイトの半分以上はポインタによって消費される。スプリン
ト・テーブル(Sprint table)では、３３４６９のポインタが存在し、６５Ｋバイ
ト以上の記憶装置を必要とする。ポインタの数を減らすことで転送テーブルのサ
イズをさらに縮小できることは明らかである。An interesting observation is that the size of these tables is comparable to that required to just store all the prefixes in the array. For larger tables, only 5.6 bytes are required per prefix. More than half of these bytes are consumed by pointers. In the Sprint table, there are 33469 pointers, requiring 65K bytes or more of storage. Obviously, the size of the transfer table can be further reduced by reducing the number of pointers.

【００５８】ルックアップ・ルーチンに関する測定は、GNU C コンパイラgcc(gnu cc．の使
用と移植マニュアル、フリー・ソフトウェア・ファウンデーション、１９９５年
１１月、ISBN １−８８２１１４−６６−３で開示）によってコンパイルされた
Ｃ関数上でなされる。報告された時間には、ネクストホップ・テーブルへの関数
呼び出しまたはメモリ・アクセスは含まれない。gcc は、最悪の場合データ構造
の１レベルを探索するために約５０のアルファ命令を使用する符号を生成する。
ペンティアム・プロでは、gcc は最悪の場合レベル毎に３５から４５の命令を使
用する符号を生成する。The measurements for the look-up routine were compiled by the GNU C compiler gcc (disclosed in the use and porting manual for gnu cc., Free Software Foundation, November 1995, ISBN 1-88214-66-3). Is performed on the C function. Reported times do not include function calls or memory accesses to the next hop table. gcc generates a code that uses about 50 alpha instructions to search one level of the data structure in the worst case.
In Pentium Pro, gcc generates a code that uses 35 to 45 instructions per level at worst.

【００５９】以下のＣ符号関数は、測定で使用されるルックアップ関数の符号を示す。レベル１符号語配列はint1と呼ばれ、ベース索引配列はbase1 と呼ばれる。レ
ベル１のポインタは配列htab1 に保存される。チャンクは配列cis 、cid 、ciddに保存され、ここでｉはレベルである。稠密及び超稠密チャンクに関するベース・アドレスとポインタはそれぞれ配列
baseid、baseidd 、及びhtabid、htabidd に保存される。／^*転送テーブルのためのルックアップ関数^*／ #include "conf.h" #include "forward.h" #include "mtentry.h" #include "mtable.h" #include "bit2index.h" #define TABLE ＿LOOKUP #include "sparse.h" #include "timing.h" #include "lookup.h" ／^*これらのマクロは、ipが３２ビット符号なしint(uint32）である時だけ動作
する^*／ #define EXTRACT(start, bits, ip) (((ip)<<(start))>>(32-(bits))) #define GETTEN(m) (((m)<<22)>>22) #define GETSIX(m) ((m)>>10) #define bit2o(ix ,bit) ((mt［(ix)］［(bit)>>2］>>(((bit)&0x3)<<2))&0xf) ／^*ルックアップ(ipaddr)--ipaddrに関するルーティング・テーブル・エントリ
への索引^*／ unsigned int lookup(uint32 ipaddr) ｛ uint32 ix; ／^*符号語配列への索引^*／ uint32 code; ／^*１６ビット符号語^*／ int32 diff; ／^*TMAXとの差^*／ uint32 ten; ／^*mtへの索引^*／ uint32 six; ／^*符号の６つの「余分の」ビット^*／ int32 nhop; ／^*ネクストホップへのポインタ^*／ uint32 off; ／^*ポインタ・オフセット^*／ uint32 hbase; ／^*ハッシュ・テーブルのためのベース^*／ uint32 pntr; ／^*ハッシュ・テーブル・エントリ^*／ int32 kind; ／^*pntrの種類^*／ uint32 chunk; ／^*チャンク索引^*／ uint8 *p,*q; ／^*探索疎へのポインタ^*／ uint32 key; ／^*疎の探索キー^*／／^{* *}HERE^*からタイミングを取る ^*／The following C sign function indicates the sign of the lookup function used in the measurement. The level 1 codeword array is called int1, and the base index array is called base1. Level 1 pointers are stored in array htab1. Chunks are stored in the arrays cis, cid, cidd, where i is a level. Base addresses and pointers for dense and ultra-dense chunks are arrays
Stored in baseid, baseidd, htabid, htabidd. / ^* Lookup function for forwarding table ^* / #include "conf.h"#include"forward.h"#include"mtentry.h"#include"mtable.h"#include"bit2index.h"#define TABLE _LOOKUP #include "sparse.h"#include"timing.h"#include"lookup.h" / ^* These macros operate only when ip is 32-bit unsigned int (uint32) ^* / #define EXTRACT (start, bits, ip) (((ip) << (start)) >> (32- (bits))) #define GETTEN (m) (((m) << 22) >> 22) #define GETSIX (m) ((m) >> 10) #define bit2o (ix, bit) ((mt [(ix)] [(bit) >> 2] >> (((bit) & 0x3) << 2)) & 0xf ) / ^* Lookup (ipaddr)-Index to routing table entry for ipaddr ^* / unsigned int lookup (uint32 ipaddr) ｛uint32 ix; / ^* Index to codeword array ^* / uint32 code; / ^* 16 bits code word ^{^*} / int32 diff; / ^* the difference between the TMAX ^{^*} / uint32 ten; / ^* index into mt ^{^*} / uint32 six; / ^* the sign of the six "extra" bits ^{^*} / int32 nhop; / ^* next hop The pointer ^{^*} / uint32 off; / ^* pointer offset ^{^*} / uint32 hbase; / ^* base for the hash table ^{^*} / uint32 ^pntr; / ^* hash table entry ^{^*} / int32 kind; / ^{* pntr} of type ^* / uint32 chunk; / ^* chunk index ^* / uint8 * p, * q; / ^* pointer to search sparse ^* / uint32 key; / ^* sparse search key ^* / / ^{* *} Take timing from HERE ^* ^* /

【００６０】[0060]

【表１】 [Table 1]

【００６１】[0061]

【表２】 [Table 2]

【００６２】[0062]

【表３】 [Table 3]

【００６３】[0063]

【表４】 [Table 4]

【００６４】／^* HERE！までタイミングを取る ^*／ return nhop; ｝アルファ及びペンティアム・プロ(Pentium Pro) のクロックサイクル・カウン
タの現在の値を読むことが可能である。高い精度でルックアップ時間を測定する
ためにこの機構が使用された。１クロックチック(clock tick)は２００ＭＨｚで
は５ナノ秒であり、３３３ＭＨｚでは３ナノ秒である。/ ^* HERE! Take the timing until ^* / return nhop; ことが It is possible to read the current value of the alpha and Pentium Pro (Pentium Pro) clock cycle counters. This mechanism was used to measure the lookup time with high accuracy. One clock tick is 5 nanoseconds at 200 MHz and 3 nanoseconds at 333 MHz.

【００６５】理想的には、転送テーブル全体がキャッシュに配置され、ルックアップは非擾
乱キャッシュ(undisturbed cache) によって行われる。これは専用転送エンジン
のキャッシュの挙動をエミュレートする。しかし、測定は従来の汎用ワークステ
ーションで行われたのであって、こうしたシステムでキャッシュ内容を制御する
ことは困難である。キャッシュは、Ｉ／Ｏが行われる時、割込みが発生する時、
または他の処理の実行が開始される時常に擾乱される。キャッシュを擾乱せずに
は、測定データをプリントアウトすることやファイルから新しいIPアドレスを読
み出すことも不可能である。Ideally, the entire forwarding table is located in the cache, and the lookup is performed by an undisturbed cache. This emulates the behavior of a dedicated forwarding engine cache. However, since the measurements were made on a conventional general-purpose workstation, it is difficult to control the cache contents with such a system. The cache is used when I / O is performed, when an interrupt occurs,
Or, it is always disturbed when the execution of another process is started. Without disturbing the cache, it is not possible to print out the measurement data or read out the new IP address from the file.

【００６６】使用される方法は各ルックアップを２回行い、２回目のルックアップのルック
アップ時間を測定する。この方法では、最初のルックアップは擾乱キャッシュ(d
isturbed cache) によってなされ、２回目は全ての必要なデータが最初のルック
アップによって１次キャッシュに置かれた状態のキャッシュでなされる。各１組
のルックアップの後で測定データがプリントアウトされ新しいアドレスがフェッ
チされるが、この手順は再びキャッシュを擾乱する。The method used performs each lookup twice and measures the lookup time of the second lookup. In this method, the first lookup is a perturbation cache (d
the second time in the cache with all the necessary data placed in the primary cache by the first lookup. After each set of lookups, the measurement data is printed out and a new address is fetched, but this procedure again perturbs the cache.

【００６７】２回目のルックアップは、データと命令がプロセッサに最も近い１次キャッシ
ュに移動しているため、転送エンジンにおけるルックアップより良好に行われる
。ルックアップ時間の上限値を得るために、２次キャッシュへのメモリ・アクセ
スのために必要な追加時間を測定時間に加算しなければならない。転送テーブル
を通る全ての経路を試験するために、完全木への拡張によって追加されたエント
リを含む、ルーティング・テーブルの各エントリについてルックアップ時間が測
定された。The second lookup works better than the lookup in the transfer engine because the data and instructions have moved to the primary cache closest to the processor. In order to obtain the upper limit of the lookup time, the additional time required for the memory access to the secondary cache must be added to the measurement time. To test all paths through the forwarding table, the lookup time was measured for each entry in the routing table, including the entry added by the extension to the full tree.

【００６８】現実のトラフィック・ミックス(traffic mix) が各ルーティング・エントリへ
のアクセスに関して均一な確率を有することはありそうにないので、平均ルック
アップ時間はこれらの実験からは推論できない。さらに、トラフィック・パター
ンの局所性によってデータ構造の頻繁にアクセスされる部分は１次キャッシュに
保持されるので、平均ルックアップ時間は減少する。以下計算される性能値は、
全てのメモリ・アクセスは１次キャッシュでミスし、常に最悪の場合の実行時間
が発生することを想定しているため控え目なものである。現実のルックアップ速
度はもっと高速である。The average lookup time cannot be inferred from these experiments, since the real traffic mix is unlikely to have a uniform probability of accessing each routing entry. In addition, the average look-up time is reduced because frequently accessed parts of the data structure are kept in the primary cache due to the locality of the traffic pattern. The performance values calculated below are:
All memory accesses are conservative, as they assume a miss in the primary cache and always have worst case execution times. The real lookup speed is much faster.

【００６９】表１は、データ構造のレベル３ではチャンクが非常に少ないことを示す。この
ため、ネクストホップを見出すためルックアップの大部分が探索する必要がある
のはせいぜい２レベルまでということになる。従って、２次キャッシュへのメモ
リ・アクセスのための追加時間は最悪の場合の１２ではなく８のメモリ・アクセ
スについて計算される。もしルックアップの大きな部分がそれらの少数のチャン
クにアクセスする場合があれば、それらは１次キャッシュに移動するので、１２
全てのメモリ・アクセスはより費用のかからないものになるだろう。Table 1 shows that there are very few chunks at level 3 of the data structure. This means that most of the lookups need to search at most two levels to find the next hop. Thus, the additional time for a memory access to the secondary cache is calculated for eight memory accesses instead of the worst case twelve. If large parts of the lookup may access those few chunks, they are moved to the L1 cache, so
All memory accesses will be less expensive.

【００７０】実験はアルファ２１１６４上で、３３３ＭＨｚのクロック周波数で行われた。
１サイクルは３ナノ秒である。８Ｋバイト１次データ・キャッシュへのアクセス
は２サイクルで完了し、２次９６Ｋバイト・キャッシュへのアクセスには８サイ
クルが必要である。図９の表２を参照のこと。図１０は、１月１日からのスプリント・ルーティング・テーブルに関するアル
ファの場合の２回目のルックアップの期間に経過したクロック・チックの分布を
示す。観察された最高速のルックアップは１７クロック・サイクルを必要とする
。これは、第１レベルの符号語がネクストホップ・テーブルへの索引を直接符号
化する場合である。こうしたルーティング・エントリは非常に少数である。しか
し、こうしたルーティング・エントリは各々多くのIPアドレスを対象とするので
、実際のトラフィックはこうした宛先アドレスを多く含むことがある。ルックア
ップには２２サイクルかかるものがあるが、これは前と同じ場合であろう。クロ
ック・サイクル・カウンタが２つの連続命令について読み取られる場合、その差
は予想される０ではなく５サイクルのことがあることが、実験によって確認され
ている。The experiments were performed on an Alpha 21164 with a clock frequency of 333 MHz.
One cycle is 3 nanoseconds. Access to the 8 Kbyte primary data cache is completed in two cycles, and access to the secondary 96 Kbyte cache requires eight cycles. See Table 2 in FIG. FIG. 10 shows the distribution of clock ticks during the second lookup in the alpha case for the sprint routing table from January 1st. The fastest lookup observed requires 17 clock cycles. This is where the first level codeword directly encodes an index into the next hop table. There are very few such routing entries. However, since each such routing entry is for many IP addresses, actual traffic may contain many such destination addresses. Some lookups take 22 cycles, which would be the same as before. Experiments have shown that if the clock cycle counter is read for two consecutive instructions, the difference may be five instead of the expected zero.

【００７１】図１０の次のスパイク(spike) は４１クロック・サイクルにあるが、これは第
１レベルにあるポインタがネクストホップ・テーブルへの索引である場合である
。伝統的なクラスＢアドレスがこのカテゴリに入る。５２〜５３、５７、６２、
６７、及び７２チックのスパイクは、疎なレベル２チャンクで１、２、３、４ま
たは５の値を調べた後にポインタが見出されることに対応する。７５及び８３チ
ックのスパイクが非常に大きいのは、多くのチックがそれぞれ稠密及び超稠密チ
ャンクを探索する必要があるためである。８３以上でいくつかのチックが観察さ
れるのは、おそらく実行時間の変化のために疎なレベル３チャンクを探索した後
で見出されるポインタに対応する。２次キャッシュでのキャッシュの競合、また
はルックアップ前のパイプライン及びキャッシュ・システムの状態の差によって
こうした変化が起きることがある。１００クロック・サイクルを越える観察値の
末尾は、こうした変化かまたはキャッシュ・ミスの何れかによるものである。全
てのデータが１次キャッシュにある場合３００ナノ秒あれば十分である。The next spike in FIG. 10 is at 41 clock cycles, where the pointer at the first level is an index into the next hop table. Traditional class B addresses fall into this category. 52-53, 57, 62,
The 67 and 72 tick spikes correspond to pointers being found after examining a value of 1, 2, 3, 4 or 5 in a sparse level 2 chunk. The spikes of 75 and 83 ticks are so large that many tics need to search for dense and ultra-dense chunks respectively. Some ticks observed at 83 and above probably correspond to pointers found after searching for sparse level 3 chunks due to runtime changes. Such changes may occur due to cache contention in the secondary cache, or differences in the state of the pipeline and cache system before the lookup. The tail of observations that exceed 100 clock cycles is due to either such a change or a cache miss. If all data is in the primary cache, 300 nanoseconds is sufficient.

【００７２】１次キャッシュと２次キャッシュのデータ・アクセスの差は８−２＝６サイク
ルである。データ構造の２レベルを探索するのに必要なクロック・サイクルは最
悪の場合、図１０に示された場合より８×６＝４８多い。これは、２レベルが十
分である時、最悪の場合のルックアップでは最大１００＋４８＝１４８サイクル
すなわち４４４ナノ秒が必要となることを意味する。すなわち、アルファは、２
次キャッシュ中の転送テーブルで、１秒当たり２２０万のルーティング・ルック
アップを行うことができる。The difference in data access between the primary cache and the secondary cache is 8−2 = 6 cycles. The worst case clock cycle required to search the two levels of the data structure is 8 × 6 = 48 more than the case shown in FIG. This means that when two levels are sufficient, the worst case lookup will require up to 100 + 48 = 148 cycles or 444 ns. That is, alpha is 2
2.2 million routing lookups per second can be performed on the forwarding table in the secondary cache.

【００７３】もう１つの実験はペンティアム・プロ上で、２００ＭＨｚのクロック周波数で
行われた。１サイクルは５ナノ秒である。１次８Ｋバイト・キャッシュは２サイ
クルの待ち時間を有し、２５６Ｋバイトの２次キャッシュは６サイクルの待ち時
間を有する。表２を参照のこと。図１１は、アルファ２１１６４の場合と同じ転送テーブルに関するペンティア
ム・プロの場合の２回目のルックアップの期間に経過したクロック・チックの分
布を示す。クロック・サイクル・カウンタを取り出す（fetch)一連の命令は３３
クロック・サイクルを要する。互いの直後に２つの取り出しが発生する場合カウ
ンタ値は３３異なる。この理由で、報告された時間は全て３３減らされている。Another experiment was performed on a Pentium Pro with a clock frequency of 200 MHz. One cycle is 5 nanoseconds. The primary 8 Kbyte cache has a two cycle latency and the 256 Kbyte secondary cache has a six cycle latency. See Table 2. FIG. 11 shows the distribution of clock ticks during the second lookup for Pentium Pro for the same forwarding table as for Alpha 21164. The series of instructions that fetch the clock cycle counter is 33
Requires a clock cycle. If two fetches occur immediately after each other, the counter values differ by 33. For this reason, all reported times have been reduced by 33.

【００７４】観察された最高速のルックアップは１１クロック・サイクルであり、アルファ
２１１６４の場合とほぼ同じ速度である。ネクストホップ索引が第１レベルの直
後にある場合に対応するスパイクは２５クロック・サイクルで発生する。疎なレ
ベル２チャンクに対応するスパイクは３６〜４０クロック・サイクルの範囲に互
いに接近して集まっている。ペンティアムの異なったキャッシュ構造は、アルフ
ァ２１１６４のキャッシュ構造より線形走査を良好に扱うように思われる。The fastest lookup observed is 11 clock cycles, about the same speed as for alpha 21164. A spike corresponding to the case where the next hop index is immediately after the first level occurs in 25 clock cycles. Spikes corresponding to sparse level 2 chunks are clustered close together in the range of 36-40 clock cycles. Pentium's different cache structure seems to handle linear scans better than the alpha 21164 cache structure.

【００７５】第２レベル・チャンクが稠密及び超稠密である時、ルックアップはそれぞれ４
８及び５０サイクルを必要とする。６９まではいくつかの付加的な不規則スパイ
クがあるが、それ以上で観察されるものは非常に少ない。全てのデータが１次キ
ャッシュにある場合、ルックアップを行うには６９サイクル（３４５ナノ秒）で
十分なことは明らかである。When the second level chunk is dense and super dense, the lookups are 4
Requires 8 and 50 cycles. There are some additional irregular spikes up to 69, but much less are observed. Clearly, if all data is in the primary cache, 69 cycles (345 nanoseconds) are sufficient to perform the lookup.

【００７６】１次キャッシュと２次キャッシュのアクセス時間の差は２０ナノ秒（４サイク
ル）である。２つのレベルを調べる必要がある時、ペンティアム・プロの場合の
ルックアップ時間は最悪で６９＋８×４＝１０１サイクルすなわち５０５ナノ秒
である。ペンティアム・プロは２次キャッシュの転送テーブルで１秒当たり少な
くとも２００万のルーティング・ルックアップを行うことができる。The difference between the access times of the primary cache and the secondary cache is 20 nanoseconds (4 cycles). When two levels need to be examined, the look-up time for Pentium Pro is at worst 69 + 8 × 4 = 101 cycles or 505 nanoseconds. Pentium Pro can perform at least 2 million routing lookups per second on the forwarding table in the secondary cache.

【００７７】本発明が、上記で示した目標と利点を完全に満足する改善されたIPルーティン
グ・ルックアップの方法とシステムを提供することは明らかであろう。本発明は
特定の実施態様と共に説明されたが、代替案、修正及び変形は当業技術分野に熟
練した者には明らかである。上記で説明された実施態様ではルックアップ関数はＣプログラミング言語で実
現されている。プログラミングの技術分野に熟練した者には、他のプログラミン
グ言語も使用できることが明らかであろう。また、ルックアップ関数は標準デジ
タル設計技術を使用してハードウェアで実現することもできるが、これもハード
ウェア設計の技術分野に熟練した者には明らかであろう。It will be apparent that the present invention provides an improved IP routing lookup method and system that fully satisfies the goals and advantages set forth above. Although the present invention has been described with particular embodiments, alternatives, modifications and variations will be apparent to those skilled in the art. In the embodiment described above, the lookup function is implemented in the C programming language. It will be apparent to those skilled in the programming arts that other programming languages may be used. Also, the lookup function can be implemented in hardware using standard digital design techniques, which will also be apparent to those skilled in the art of hardware design.

【００７８】例えば、本発明は、アルファ２１１６４またはペンティアム・プロを使用する
システム以外のコンピュータ・システム構成、プレフィックス木をカットする他
の方法、木の様々なレベル数、マップテーブルを表す他の方法、及び符号語を符
号化する他の方法に適用可能である。さらに、本発明はファイアウォール・ルーティング・ルックアップにも利用で
きる。For example, the present invention relates to computer system configurations other than systems using Alpha 21164 or Pentium Pro, other ways to cut prefix trees, various levels of trees, other ways to represent map tables, And other methods of encoding codewords. In addition, the invention can be used for firewall routing lookups.

[Brief description of the drawings]

【図１】図１は、ルータ設計の概略図である。FIG. 1 is a schematic diagram of a router design.

【図２】図２は、IPアドレス空間全体にわたる二分木を示す図である。FIG. 2 is a diagram illustrating a binary tree that spans the entire IP address space.

【図３】図３は、IPアドレスの範囲を規定するルーティング・エントリを示す図である
。FIG. 3 is a diagram illustrating a routing entry that defines a range of IP addresses.

【図４】図４は、プレフィックス木を拡張して完全なものにするステップを示す図であ
る。FIG. 4 is a diagram illustrating the steps of expanding a prefix tree to complete it.

【図５】図５は、本発明によるデータ構造の３つのレベルを示す図である。FIG. 5 is a diagram showing three levels of a data structure according to the present invention.

【図６】図６は、深さ１６でのプレフィックス木のカットの一部を示す図である。FIG. 6 is a diagram illustrating a part of a cut of a prefix tree at a depth of 16;

【図７】図７は、データ構造の第１レベル探索を示す図である。FIG. 7 is a diagram illustrating a first level search of a data structure.

【図８】図８は、様々なルーティング・テーブルから構成された転送テーブル上のデー
タを示す表である。FIG. 8 is a table showing data on a transfer table composed of various routing tables.

【図９】図９は、プロセッサとキャッシュのデータを示す表である。FIG. 9 is a table showing data of a processor and a cache;

【図１０】図１０は、アルファ２１１６４に関するルックアップ時間分布を示すグラフで
ある。FIG. 10 is a graph showing a look-up time distribution for alpha 21164.

【図１１】図１１は、ペンティアム・プロに関するルックアップ時間分布を示すグラフで
ある。FIG. 11 is a graph showing a lookup time distribution for Pentium Pro.

【手続補正書】特許協力条約第３４条補正の翻訳文提出書[Procedural Amendment] Submission of translation of Article 34 Amendment of the Patent Cooperation Treaty

【提出日】平成１２年３月１５日（２０００．３．１５）[Submission date] March 15, 2000 (2000.3.15)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】特許請求の範囲[Correction target item name] Claims

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【特許請求の範囲】[Claims]

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＣＹ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＧＭ，ＫＥ，ＬＳ，ＭＷ，ＳＤ，ＳＺ，ＵＧ，ＺＷ)，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＣＡ，ＣＨ，ＣＮ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＥ，ＧＨ，ＧＭ，ＧＷ，ＨＵ，ＩＤ，ＩＬ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＵＡ，ＵＧ，ＵＺ，ＶＮ，ＹＵ，ＺＷ (72)発明者カールソン，スバンテスウェーデン国，エス−977 53 リュレオ，ニュスティーゲン４ (72)発明者ピンク，ステファンスウェーデン国，エス−165 57 ハースレビー，ビベカトロレスグランド８──────────────────────────────────────────────────続き Continuation of front page (81) Designated country EP (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE ), OA (BF, BJ, CF, CG, CI, CM, GA, GN, ML, MR, NE, SN, TD, TG), AP (GH, GM, KE, LS, MW, SD, SZ, UG, ZW), EA (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), AL, AM, AT, AU, AZ, BA, BB, BG, BR, BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, FI, GB, GE, GH, GM, GW, HU, ID, IL, IS, JP, KE, KG, KP, KR , KZ, LC, LK, LR, LS, LT, LU, LV, MD, MG, MK, MN, MW, MX, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, UA, UG, UZ, VN, YU, ZW (72) Inventor Carlson, Subante Sweden, S-977 53 Lureo, Nustygen 4 (72) Invention Person Pink, Stefan Sweden, S-165 57 Haas Levy, Viveka Tolores Grand 8

Claims

[Claims]

1. A method of IP routing lookup in a routing table that includes an entry of an arbitrary length prefix having relevant next hop information in a next hop table for determining where to forward an IP datagram. And each node has either no children or two children, and all added children have the same next hop information as their nearest ancestor with next hop information, or Representation of the routing table in the form of a full prefix tree (7), defined by the prefix of all routing table entries, which is completed to be a leaf with unspecified next hop if no ancestor is present Storing in a storage means the current depth (D) possible Storing, in said storage means, a representation of a bit vector (8) containing data of a cut of said prefix tree (7) of said current depth having one bit for each node, wherein said prefix Tree (7
Setting the bit if there is a node in the array, an array of pointers, an index into the next hop table for a pure head, and an index to the next level chunk for a root head. Storing in the storage means; dividing the bit vector (8) into bit masks of a certain length; storing possible bit mask representations in a map table in the storage means. Storing in the storage means an array of codewords each encoding a row index into the map table and pointer offset; storing an array of base addresses in the storage means; Accessing a codeword at a position corresponding to a first index portion (ix) of the IP address in an array; Accessing a map table entry portion at a position corresponding to an index portion (bit) and a row index portion (ten) of the codeword in the map table; and accessing the IP address in the array of base addresses. Accessing a base address at a location corresponding to a second index portion (bix); and the map table entry in the array of the base address plus the pointer offset of the codeword (six) plus pointers. Accessing a pointer at a location corresponding to the portion.

2. A system for IP routing lookup in a routing table that includes an entry of an arbitrary length prefix with relevant next hop information in a next hop table for determining where to forward an IP datagram. And each node either has no children or has two children, and all added children have the same next hop information as the nearest ancestor with next hop information, or A routing table in the form of a full prefix tree (7), defined by the prefix of all routing table entries, which is a leaf with an unspecified next hop if no such ancestor is present; D) before said current depth with one bit for each possible node An indication of a bit vector (8) containing data of cuts of the prefix tree (7), wherein the bit is set if a node is present in the prefix tree (7), and an array of pointers An index to the next hop table for a pure head, an index to a next level chunk for a root head, and the bit vector (8) divided into bit masks of a certain length; A system comprising: a map table containing possible representations of the bit mask; an array of codewords each encoding a row index into the map table and a pointer offset; and an array of base addresses.