JP7088868B2

JP7088868B2 - Communication equipment, communication methods and programs

Info

Publication number: JP7088868B2
Application number: JP2019055532A
Authority: JP
Inventors: 幸杜上野
Original assignee: NTT Communications Corp
Current assignee: NTT Communications Corp
Priority date: 2019-03-22
Filing date: 2019-03-22
Publication date: 2022-06-21
Anticipated expiration: 2039-03-22
Also published as: JP2020156058A

Description

本発明は、通信装置、通信方法及びプログラムに関する。 The present invention relates to communication devices, communication methods and programs.

IPv4(Internet Protocol version 4) / IPv6(Internet Protocol Version 6)の通信において、通信経路上のノードがパケットの宛先アドレス（つまり、宛先として設定されているIPアドレス）に応じて転送先を選択する処理はIPルックアップと呼ばれる。IPルックアップはルータ及びエンドノードにおける通信機能の核となる部分であり、ソフトウェア・ハードウェアを問わず性能向上が求められている。 In IPv4 (Internet Protocol version 4) / IPv6 (Internet Protocol Version 6) communication, a process in which a node on the communication path selects a forwarding destination according to the destination address of the packet (that is, the IP address set as the destination). Is called an IP lookup. IP lookup is a core part of communication functions in routers and end nodes, and performance improvement is required regardless of software or hardware.

これまでソフトウェアでIPルックアップを高速に処理する手法は複数提案されている。例えば、従来手法の1つとして、特許文献１に記載されている手法がある。 So far, several methods have been proposed to process IP lookups at high speed by software. For example, as one of the conventional methods, there is a method described in Patent Document 1.

また、近年では、NFV(Network Functions Virtualization)の普及によって様々なネットワーク機能がソフトウェアで実現されるようになっている。NFVの機能にはルーティングのみの装置と比較して複雑な処理が求められるDPI(Deep Packet Inspection)等が含まれるため、IPルックアップの処理負荷を低減することが求められている。 In recent years, with the spread of NFV (Network Functions Virtualization), various network functions have been realized by software. Since the NFV functions include DPI (Deep Packet Inspection), which requires more complicated processing than a routing-only device, it is required to reduce the processing load of IP lookup.

特許第５９６０８６３号公報Japanese Patent No. 5960863

しかしながら、IPルックアップの処理に要する演算時間は、現在通信キャリアで用いられる伝送速度に対して充分に短いとは言えず、依然として演算時間の短縮が求められている。例えば、現在通信キャリアのネットワークでは100Gbps Ethernetが主要な伝送規格として用いられているが、ソフトウェアルータではCPU(Central Processing Unit) 1コアで100Gbps水準のトラフィックに対するIPルックアップを遅滞なく処理することは困難である。 However, the calculation time required for IP lookup processing is not sufficiently short with respect to the transmission speed currently used by communication carriers, and there is still a demand for reduction in calculation time. For example, 100Gbps Ethernet is currently used as the main transmission standard in communication carrier networks, but it is difficult for software routers to process IP lookups for 100Gbps level traffic without delay with one CPU (Central Processing Unit) core. Is.

なお、IPルックアップを含めたパケット処理を高速化する手法として、CPU以外にGPU(Graphics Processing Unit)やFPGA(Field-Programmable Gate Array)等を用いる手法もあるが、これらの手法ではGPUやFPGA等のハードウェアを追加する必要がある。ソフトウェアによるパケット処理は、導入コストの安い汎用サーバで実行可能であること、改変が容易であり新機能の開発コストを下げられること等が利点であるが、GPUやFPGA等の特別なハードウェアを追加することは、これらの利点を弱めてしまうことになる。 As a method for accelerating packet processing including IP lookup, there are methods that use GPU (Graphics Processing Unit), FPGA (Field-Programmable Gate Array), etc. in addition to CPU, but these methods use GPU and FPGA. It is necessary to add hardware such as. Packet processing by software has advantages such as being able to be executed on a general-purpose server with low installation cost, being easy to modify, and reducing the development cost of new functions, but special hardware such as GPU and FPGA is required. Adding will undermine these advantages.

本発明は、上記の点に鑑みてなされたもので、IPルックアップ処理の高速化を実現することを目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to realize high-speed IP lookup processing.

上記目的を達成するため、本発明の実施形態に係る通信装置は、パケットを受信する受信手段と、ネクストホップに関する情報と、Multiway Trieで経路表を表現した場合における次の階層に関する情報とがそれぞれ含まれる複数の配列要素で構成される配列を用いて、SIMD演算を用いたロンゲストマッチにより、前記受信手段が受信した複数のパケットの宛先アドレスのネクストホップに関する情報を前記配列から並列に取得するルックアップ処理手段と、前記ルックアップ処理手段が取得した複数のネクストホップに関する情報のそれぞれに従って、前記受信手段が受信した複数のパケットのそれぞれを送信する送信手段と、を有することを特徴とする。 In order to achieve the above object, in the communication device according to the embodiment of the present invention, the receiving means for receiving the packet, the information regarding the next hop, and the information regarding the next layer when the route table is expressed by Multiway Trie are respectively. Using an array composed of a plurality of array elements included, information on the next hop of the destination address of a plurality of packets received by the receiving means is acquired in parallel from the array by a longest match using a SIMD operation. It is characterized by having a lookup processing means and a transmitting means for transmitting each of the plurality of packets received by the receiving means according to each of the information regarding the plurality of next hops acquired by the lookup processing means.

IPルックアップ処理の高速化を実現することができる。 It is possible to realize high-speed IP lookup processing.

本実施形態に係る通信装置の全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the communication apparatus which concerns on this embodiment. Multiway Trieによるロンゲストマッチの一例を説明するための図である。It is a figure for demonstrating an example of a longest match by Multiway Trie. 経路表の配列展開の一例を説明するための図である。It is a figure for demonstrating an example of an array expansion of a route table. 最適化の一例を説明するための図（その1）である。It is a figure (part 1) for explaining an example of optimization. 最適化の一例を説明するための図（その2）である。It is a figure (2) for explaining an example of optimization. 本実施形態に係るIPルックアップ処理（IPv4）の一例を示すフローチャートである。It is a flowchart which shows an example of the IP lookup process (IPv4) which concerns on this embodiment. ルックアップに使用する領域の抽出（初回）の一例を説明するための図である。It is a figure for demonstrating an example of extraction (first time) of the area used for lookup. 最初にルックアップする配列要素のインデックスの生成の一例を説明するための図である。It is a figure for demonstrating an example of the generation of the index of the array element to be looked up first. 配列展開した経路表からのノードの取得の一例を説明するための図である。It is a figure for demonstrating an example of acquisition of a node from an array-expanded route table. inf値の抽出の一例を説明するための図である。It is a figure for demonstrating an example of extraction of an inf value. jmp値の抽出の一例を説明するための図である。It is a figure for demonstrating an example of extraction of a jmp value. ルックアップが継続している対象のinf値の保持の一例を説明するための図である。It is a figure for demonstrating an example of holding an inf value of an object with continuous lookup. ルックアップに使用する領域の抽出（2回目以降）の一例を説明するための図である。It is a figure for demonstrating an example of the extraction (the second and subsequent times) of the area used for a lookup. 次にルックアップする配列要素のインデックスの生成の一例を説明するための図である。Next, it is a figure for demonstrating an example of generating an index of an array element to be looked up. 本実施形態に係るIPルックアップ処理（IPv6）の一例を示すフローチャートである。It is a flowchart which shows an example of the IP lookup process (IPv6) which concerns on this embodiment. IPv6アドレスの先頭64ビットの抽出の一例を説明するための図である。It is a figure for demonstrating an example of extraction of the first 64 bits of an IPv6 address. IPv6アドレスの先頭32ビットの抽出の一例を説明するための図である。It is a figure for demonstrating an example of extraction of the first 32 bits of an IPv6 address. 並べ替えの一例を説明するための図である。It is a figure for demonstrating an example of sorting. 本実施形態に係る通信装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of the communication apparatus which concerns on this embodiment.

以下、本発明の実施の形態（以降、「本実施形態」とも表す。）について説明する。本実施形態では、SIMD(Single Instruction/Multiple Data)演算を用いて複数の宛先アドレスを並列にIPルックアップすることで、IPルックアップ処理の高速化を実現する通信装置１０について説明する。 Hereinafter, embodiments of the present invention (hereinafter, also referred to as “the present embodiments”) will be described. In the present embodiment, a communication device 10 that realizes high-speed IP lookup processing by IP lookup of a plurality of destination addresses in parallel using SIMD (Single Instruction / Multiple Data) operations will be described.

SIMD演算とは、単一の命令により複数のデータを同時に処理する演算のことである。SIMD演算は並列での計算を適用可能な問題では効果が大きい反面、使用可能な演算命令が限られる点や条件分岐のオーバーヘッド等が高い点等に起因して、既存のソフトウェアにそのまま適用しても性能が向上しない場合が多い。このため、本実施形態では、後述するように、IPルックアップ処理をSIMD演算で処理可能にする経路表のデータ構造とIPルックアップ処理の処理手順とを工夫することで、IPルックアップ処理の高速化を実現する。 A SIMD operation is an operation that processes multiple data at the same time with a single instruction. While SIMD operation is very effective for problems to which parallel calculation can be applied, it can be applied to existing software as it is due to the limited number of operation instructions that can be used and the high overhead of conditional branching. In many cases, the performance does not improve. Therefore, in the present embodiment, as will be described later, the IP lookup process can be performed by devising the data structure of the route table that enables the IP lookup process to be processed by the SIMD operation and the processing procedure of the IP lookup process. Achieve high speed.

ここで、本実施形態に係る通信装置１０は、IPルックアップ処理を含むパケット処理をソフトウェア（プログラム）により実現するコンピュータ（例えば汎用サーバやPC（パーソナルコンピュータ）等）であるものとする。 Here, it is assumed that the communication device 10 according to the present embodiment is a computer (for example, a general-purpose server, a PC (personal computer), etc.) that realizes packet processing including IP lookup processing by software (program).

＜通信装置１０の全体構成＞
まず、本実施形態に係る通信装置１０の全体構成について、図１を参照しながら説明する。図１は、本実施形態に係る通信装置１０の全体構成の一例を示す図である。 <Overall configuration of communication device 10>
First, the overall configuration of the communication device 10 according to the present embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an example of the overall configuration of the communication device 10 according to the present embodiment.

図１に示すように、本実施形態に係る通信装置１０は、機能部として、受信部１０１と、ルックアップ処理部１０２と、送信部１０３とを有する。 As shown in FIG. 1, the communication device 10 according to the present embodiment has a receiving unit 101, a lookup processing unit 102, and a transmitting unit 103 as functional units.

受信部１０１は、他の機器や他の装置（例えば、他の通信装置１０）等から送信又は転送されたパケットを受信する。 The receiving unit 101 receives a packet transmitted or transferred from another device, another device (for example, another communication device 10) or the like.

ルックアップ処理部１０２は、SIMD演算を用いてIPルックアップ処理を実行する。すなわち、ルックアップ処理部１０２は、受信部１０１により受信されたパケットの宛先アドレスに応じて、経路表から転送先アドレスを選択する。ここで、転送先アドレスとは、パケットの転送先となる他の機器又は他の装置のIPアドレスのことであり、例えば、「経路情報」や「ネクストホップ」等とも称される。以降では、主に、転送先アドレスのことを経路情報と表す。 The lookup processing unit 102 executes IP lookup processing using a SIMD operation. That is, the lookup processing unit 102 selects a forwarding address from the route table according to the destination address of the packet received by the receiving unit 101. Here, the forwarding address is an IP address of another device or another device to which the packet is forwarded, and is also referred to as, for example, "route information" or "next hop". Hereinafter, the forwarding address is mainly referred to as route information.

ここで、経路表とは、宛先ネットワーク（ネットワークアドレス）と経路情報とを対応付けた表（テーブル）のことであり、ルーティングテーブルとも称される。本実施形態では、後述するように、経路表は、配列の形式で経路情報を保持するFIB(Forwarding Information Base)である。 Here, the route table is a table (table) in which the destination network (network address) and the route information are associated with each other, and is also referred to as a routing table. In the present embodiment, as will be described later, the routing table is a FIB (Forwarding Information Base) that holds routing information in the form of an array.

送信部１０３は、ルックアップ処理部１０２により選択された転送先に対して、受信部１０１により受信されたパケットを転送する。 The transmission unit 103 transfers the packet received by the reception unit 101 to the transfer destination selected by the lookup processing unit 102.

ここで、本実施形態の全体の手順は以下の１～３となる。 Here, the entire procedure of this embodiment is the following 1 to 3.

１．経路情報をRIB(Routing Information Base)として保持する。 1. 1. The route information is retained as RIB (Routing Information Base).

２．RIBをFIBに変換する。このとき、FIBは配列の形式で保持する。 2. 2. Convert RIB to FIB. At this time, the FIB is held in the form of an array.

３．FIBに対するSIMD演算により、宛先アドレスに対してロンゲストマッチを行って、転送先となる経路情報を選択する（すなわち、SIMD演算を用いたIPルックアップ処理を実行する。）。 3. 3. A longest match is performed on the destination address by the SIMD operation for the FIB, and the route information to be the transfer destination is selected (that is, the IP lookup process using the SIMD operation is executed).

このうち、上記の１及び２は、IPルックアップ処理の実行前に事前に行っておく必要がある。なお、上記の１及び２は、ルックアップ処理部１０２が行ってもよいし、ルックアップ処理部１０２とは異なる他の機能部（例えば、「経路表作成部」等と称される、図示しない機能部）が行ってもよい。 Of these, the above 1 and 2 need to be performed in advance before executing the IP lookup process. The above 1 and 2 may be performed by the lookup processing unit 102, or may be referred to as another functional unit different from the lookup processing unit 102 (for example, a “route table creation unit” or the like, which is not shown). The functional part) may do it.

＜１．経路情報をRIBとして保持＞
まず、経路情報をRIBとして保持する手順について説明する。本実施形態では、RIBをMultiway Trieとして保持する。Multiway Trieとは、1つの階層で2ビット以上の幅を表現するTrie木のことである。二分木で構成されるTrie（つまり、いわゆるBinary Trie）に対して、多分木で構成されるMultiway Trieは、一般的に、マッチングを行う際に木を辿る操作が少なくなるという利点がある。 <1. Retain route information as RIB>
First, the procedure for holding the route information as RIB will be described. In this embodiment, the RIB is held as a Multiway Trie. A Multiway Trie is a Trie tree that represents a width of 2 bits or more in one hierarchy. In contrast to the Trie (that is, the so-called Binary Trie), which is composed of binary trees, the Multiway Trie, which is composed of probably trees, generally has the advantage that there are fewer operations to follow the tree when performing matching.

Multiway Trieにおけるロンゲストマッチでは、以下のStep1～Step4により経路情報を選択する。 In the longest match in Multiway Trie, route information is selected by the following Step1 to Step4.

Step1) Multiway Trieの該当の階層の幅に応じて宛先アドレスを切り出した上で、この切り出されたアドレスとのマッチングを行うことで、当該階層における該当のノードを得る。 Step1) After cutting out the destination address according to the width of the corresponding layer of Multiway Trie, by matching with the cut out address, the corresponding node in the corresponding layer is obtained.

Step2) 当該ノードが経路情報を保持していれば、その経路情報を一時保存する。 Step2) If the node holds the route information, the route information is temporarily saved.

Step3) 当該ノードが次の階層へのポインタを保持していれば、上記のStep1に戻る。 Step3) If the node holds a pointer to the next hierarchy, return to Step1 above.

Step4) そして、上記のStep1で取得したノードが次の階層へのポインタを保持していない場合、上記のステップStep2で最後に一時保存した経路情報が転送先アドレス（つまり、ロンゲストマッチの結果）となる。 Step4) Then, if the node acquired in Step 1 above does not hold a pointer to the next layer, the route information temporarily saved last in Step 2 above is the forwarding address (that is, the result of the longest match). It becomes.

本実施形態に係るRIBでは、Multiway Trieの1階層を1つの配列として表現する。このとき、配列の各要素（以降では、配列の要素を「配列要素」とも表す。）は、経路情報と次の階層へのポインタとを保持する。このRIBによるロンゲストマッチの一例を図２に示す。図２に示す例では、宛先ネットワーク「1.0.0.0/8」、「1.0.1.0/24」及び「192.0.2.0/24」を持つ経路表（Routing table）をRIBで表現し、宛先アドレス（Destination）「192.0.2.1」をロンゲストマッチした場合を示している。また、図２に示す例では、RIBを構成する各配列要素のtableには次の階層へのポインタ（NULLである場合も含む。）が設定され、hopには経路情報が設定されている。 In the RIB according to this embodiment, one layer of Multiway Trie is expressed as one array. At this time, each element of the array (hereinafter, the element of the array is also referred to as an "array element") holds the route information and the pointer to the next hierarchy. An example of this longest match by RIB is shown in FIG. In the example shown in FIG. 2, the routing table having the destination networks "1.0.0.0/8", "1.0.1.0/24" and "192.0.2.0/24" is represented by RIB, and the destination address (Destination). ) Shows the case where "192.0.2.1" is a longest match. Further, in the example shown in FIG. 2, a pointer to the next hierarchy (including the case of NULL) is set in the table of each array element constituting the RIB, and route information is set in the hop.

また、本実施形態のRIB（及びFIB）では、ルックアップを高速化するため、1階層目の幅を16ビット、2階層目以降の幅を8ビットとする。図２に示す例では、Tbl0は1階層目を表現する配列（要素数：2¹⁶=65536）を表すテーブルであり、Tbl1及びTbl2は2階層目に属する配列（要素数：2⁸=256）をそれぞれ表すテーブルである。 Further, in the RIB (and FIB) of the present embodiment, the width of the first layer is 16 bits and the width of the second and subsequent layers is 8 bits in order to speed up the lookup. In the example shown in FIG. 2, Tbl0 is a table representing an array representing the first layer (number of elements: 2 ¹⁶ = 65536), and Tbl1 and Tbl2 are arrays belonging to the second layer (number of elements: 2 ⁸ = 256). It is a table representing each.

1階層目の幅を16ビットとするのは、（１）実際にはプレフィックス長が8～24ビットの経路が多くを占めるため、1階層目を8ビットとすると多くのケースで2回以上のルックアップが必要になること、（２）ルックアップでSIMD演算のバイト単位のシャッフル命令を用いているため、階層の幅を8ビット単位でしか変更できないこと、（３）1階層を24ビット幅とするとFIBのメモリ空間の使用効率が悪いこと、等を考慮したためである。ただし、これら（１）～（３）を考慮しなければ、1階層目の幅を8ビットや24ビットとしてもよい。 The reason why the width of the first layer is 16 bits is that (1) in reality, most routes have a prefix length of 8 to 24 bits, so if the width of the first layer is 8 bits, it is more than twice in many cases. Lookup is required, (2) the width of the layer can be changed only in 8-bit units because the shuffle instruction in byte units of SIMD operation is used in lookup, and (3) one layer is 24 bits wide. This is because the efficiency of using the memory space of FIB is poor. However, if these (1) to (3) are not taken into consideration, the width of the first layer may be 8 bits or 24 bits.

＜２．RIBをFIBに変換＞
次に、RIBをFIBに変換する手順について説明する。本実施形態に係るFIBは、RIBとして構築したMultiway Trieを配列形式に変換する（つまり、1つの配列として配列展開する）ことで構築される。図２に示すRIBの配列展開を図３に示す。上述したように、本実施形態に係る経路表は、RIBを配列展開したFIBである。 <2. Convert RIB to FIB>
Next, the procedure for converting RIB to FIB will be described. The FIB according to the present embodiment is constructed by converting the Multiway Trie constructed as a RIB into an array format (that is, expanding the array as one array). The sequence expansion of the RIB shown in FIG. 2 is shown in FIG. As described above, the route table according to this embodiment is an FIB in which RIBs are sequenced and expanded.

原則として、FIBの各行はMultiway Trieにおける8ビット幅（256要素）の1階層を表すが、0行目は特殊用途、1～256行目は16ビット幅（65536要素）の1階層を表す。 As a general rule, each line of the FIB represents one layer of 8-bit width (256 elements) in Multiway Trie, but the 0th line represents a special purpose and the 1st to 256th lines represent one layer of 16-bit width (65536 elements).

FIBを構成する各配列要素は32ビットの空間を持ち、その配列要素に対応するノードにマッチした場合の経路情報のインデックス番号を示すinf値と次の階層へのインデックス番号を示すjmp値とを保持する。したがって、FIBは、32ビットの配列要素が1行に256個並んだ2次元配列で表される。ただし、本実施形態では、この2次元配列のi行j列目の配列要素のインデックス番号をi×256+jで表し、1次元配列として扱うものとする。 Each array element that makes up the FIB has a 32-bit space, and the inf value that indicates the index number of the route information when matching the node corresponding to the array element and the jmp value that indicates the index number to the next hierarchy are set. Hold. Therefore, the FIB is represented by a two-dimensional array of 256 32-bit array elements arranged in one row. However, in the present embodiment, the index number of the array element in the i-th row and j-th column of this two-dimensional array is represented by i × 256 + j and is treated as a one-dimensional array.

ここで、0行目の配列要素（つまり、特殊用途の配列要素）はルックアップが終了した宛先アドレスに対して無意味なルックアップを実行するために使用され、全ての配列要素のjmp値には0が設定（つまり、jmp=0）されている。本実施形態に係るIPルックアップ処理では、SIMD演算を用いて複数の宛先アドレスを並列にルックアップするため、既にルックアップが終了した宛先アドレスに対しても全ての宛先アドレスのルックアップが終了するまで、同じルックアップを繰り返す必要がある。そして、全ての宛先アドレスがFIBの0行目に到達した時点でIPルックアップ処理が終了となる。 Here, the array element in line 0 (that is, the array element for special purpose) is used to perform a meaningless lookup for the destination address where the lookup is finished, and it is used for the jmp value of all the array elements. Is set to 0 (that is, jmp = 0). In the IP lookup process according to the present embodiment, since a plurality of destination addresses are looked up in parallel using SIMD calculation, the lookup of all the destination addresses is completed even for the destination addresses for which the lookup has already been completed. You need to repeat the same lookup until. Then, the IP lookup process ends when all the destination addresses reach the 0th line of the FIB.

図３に示すFIBを用いて、宛先アドレス「192.0.2.1」をルックアップする場合、まず、宛先アドレスの先頭16ビットである「192.0」と、1～256行目の配列要素とを用いてルックアップが行われ、193行0列目の配列要素がマッチすることとなる。次に、宛先アドレスの先頭から17～24ビット目までの部分「2」と、当該配列要素のjmp値「258」に対応する258行目の配列要素とを用いてルックアップが行われ、258行2列目の配列要素がマッチすることとなる。このとき、マッチした配列要素のinf値「X」が保持される。 When looking up the destination address "192.0.2.1" using the FIB shown in FIG. 3, first, look up using "192.0" which is the first 16 bits of the destination address and the array elements on the 1st to 256th lines. The up is done, and the array elements in the 193rd row and 0th column are matched. Next, a lookup is performed using the part "2" from the beginning of the destination address to the 17th to 24th bits and the array element on the 258th line corresponding to the jmp value "258" of the array element. The array elements in the second row and column will match. At this time, the inf value "X" of the matched array element is retained.

最後に、宛先アドレスの先頭から25～32ビット目までの部分「1」と、当該配列要素のjmp値「0」に対応する0行目の配列要素とを用いてルックアップが行われ、0行目1列の配列要素がマッチすることとなり、ロンゲストマッチが終了する。これにより、最終的に保持されているinf値「X」に対応する経路情報が転送先アドレス（つまり、ロンゲストマッチの結果）となる。 Finally, a lookup is performed using the part "1" from the beginning of the destination address to the 25th to 32nd bits and the array element on the 0th line corresponding to the jmp value "0" of the array element, and 0. The array elements in the first row and column match, and the longest match ends. As a result, the route information corresponding to the finally held inf value "X" becomes the forwarding address (that is, the result of the longest match).

ここで、本実施形態では、RIBをFIBに変換する際に、次の2つの最適化を行う。 Here, in the present embodiment, the following two optimizations are performed when converting the RIB into the FIB.

・最適化その1：経路情報のインデックス番号（inf値）を末端ノードへ事前展開
本実施形態に係るRIBでは、1階層のマッチングを行う毎に、マッチしたノードが経路情報を保持しているか否かを確認し、一時保持していた。しかし、ノードが経路情報を保持しているか否かを確認するためには最低1回のCPU命令が必要となる。そこで、事前に、上位階層の経路情報を、下位階層の経路情報を持つノードが現れるまでの部分（つまり、ロンゲストマッチの結果が上位階層の経路情報になるノード）に再帰的に展開することで、上記の確認を行うためのCPU命令を不要とすることができる。この最適化の一例を図４に示す。 -Optimization 1: Pre-expanding the index number (inf value) of the route information to the terminal node In the RIB according to this embodiment, whether or not the matched node holds the route information every time the matching of one layer is performed. I checked it and held it temporarily. However, at least one CPU instruction is required to check whether the node holds the route information. Therefore, in advance, the route information of the upper layer is recursively expanded to the part until the node having the route information of the lower layer appears (that is, the node where the result of the longest match becomes the route information of the upper layer). Therefore, the CPU instruction for performing the above confirmation can be eliminated. An example of this optimization is shown in FIG.

図４に示す例では、257行目の各配列要素のinf値として、上位ノード（2行0列目の配列要素）が保持しているinf値「X」を展開している。これにより、ルックアップ時に各ノードが経路情報を保持しているか否か（すなわち、当該ノードに対応する配列要素のinf値がNULLであるか否か）を確認又は判定する必要がなくなり、IPルックアップ処理の高速化を図ることが可能となる。 In the example shown in FIG. 4, the inf value “X” held by the upper node (the array element in the second row and the 0th column) is expanded as the inf value of each array element in the 257th row. This eliminates the need to check or determine whether each node holds route information at the time of lookup (that is, whether the inf value of the array element corresponding to the node is NULL), and IP look. It is possible to speed up the up processing.

・最適化その2：経路集約
経路集約はよく知られた手法であり、同じネクストホップを持つ複数の経路をよりプレフィックス長の短い単一の経路として扱う手法である。この最適化の一例を図５に示す。 -Optimization # 2: Route aggregation Route aggregation is a well-known method that treats multiple routes with the same next hop as a single route with a shorter prefix length. An example of this optimization is shown in FIG.

図５に示す例では、257行目の各配列要素が保持している情報は、上位ノード（2行0列目の配列要素）に集約可能であるため、削除している。これにより、FIBのサイズを縮約させることができるため、CPUキャッシュのヒット率向上が期待できる。 In the example shown in FIG. 5, the information held by each array element in the 257th row can be aggregated in the upper node (array element in the 2nd row and 0th column), so that the information is deleted. As a result, the size of the FIB can be reduced, which can be expected to improve the hit rate of the CPU cache.

＜３．SIMD演算を用いたIPルックアップ処理＞
次に、SIMD演算を用いてIPルックアップ処理について説明する。 <3. IP lookup processing using SIMD operation>
Next, the IP lookup process will be described using the SIMD operation.

≪IPルックアップ処理（IPv4）≫
以降では、IPアドレスがIPv4である場合のIPルックアップ処理について、図６を参照しながら説明する。図６は、本実施形態に係るIPルックアップ処理（IPv4）の一例を示すフローチャートである。 ≪IP lookup processing (IPv4) ≫
Hereinafter, the IP lookup process when the IP address is IPv4 will be described with reference to FIG. FIG. 6 is a flowchart showing an example of the IP lookup process (IPv4) according to the present embodiment.

なお、本実施形態では、一例として、SIMD拡張命令セットとしてIntel AVX2を想定し、256ビット幅のレジスタを用いた宛先アドレス8個の同時処理を実現する場合について説明する。ただし、本実施形態は、Intel AVX2以外の他のアーキテクチャに対しても同様に適用可能である。 In this embodiment, as an example, an Intel AVX2 is assumed as a SIMD extension instruction set, and a case where simultaneous processing of eight destination addresses using a 256-bit wide register is realized will be described. However, this embodiment can be similarly applied to other architectures other than Intel AVX2.

また、SIMDでは同じ操作又は処理結果を実現するために複数通りの実装方法が存在する場合があるが、本実施形態では、全てIntel AVX2においてCPU命令のレイテンシが最も低くなる実装方法を採用するものとする。 In SIMD, there may be multiple mounting methods to achieve the same operation or processing result, but in this embodiment, the mounting method with the lowest CPU instruction latency is adopted in all Intel AVX2. And.

ステップＳ１０１：ルックアップ処理部１０２は、SIMD演算命令の1つであるロード命令（例えばVMOVDQA等）によりメモリからレジスタに宛先アドレス（IPv4アドレス）を8個ロードする。これら8個の宛先アドレスは、予めメモリ上に連続に保持されているものとする。なお、これら8個の宛先アドレスの各々は、受信部１０１により受信されたパケットの宛先アドレスである。 Step S101: The lookup processing unit 102 loads eight destination addresses (IPv4 addresses) from the memory into the register by a load instruction (for example, VMOVDQA or the like) which is one of the SIMD operation instructions. It is assumed that these eight destination addresses are continuously held in the memory in advance. Each of these eight destination addresses is the destination address of the packet received by the receiving unit 101.

ステップＳ１０２：次に、ルックアップ処理部１０２は、ルックアップに使用する領域をレジスタから抽出する。例えば、図７に示すように、レジスタにロードされた8個の宛先アドレスをDst1, Dst2, …,Dst8として、これらDst1, Dst2, …,Dst8で構成される領域をdstとする。このとき、ルックアップ処理部１０２は、SIMD演算命令の1つであるシャッフル命令（例えばVPSHUFB等）により、Dst1～Dst8それぞれの上位16ビットを別のレジスタの32ビット領域の下位16ビットにそれぞれコピーする。以降では、Dst1～Dst8それぞれの上位16ビットがコピーされた32ビット領域をそれぞれTgt1～Tgt8として、これら8個のTgt1～Tgt8で構成される領域をtgtとする。 Step S102: Next, the lookup processing unit 102 extracts the area used for lookup from the register. For example, as shown in FIG. 7, the eight destination addresses loaded in the register are Dst1, Dst2, ..., Dst8, and the area composed of these Dst1, Dst2, ..., Dst8 is dst. At this time, the lookup processing unit 102 copies the upper 16 bits of each of Dst1 to Dst8 to the lower 16 bits of the 32-bit area of another register by the shuffle instruction (for example, VPSHUFB) which is one of the SIMD operation instructions. do. In the following, the 32-bit area to which the upper 16 bits of each of Dst1 to Dst8 are copied will be referred to as Tgt1 to Tgt8, respectively, and the area composed of these eight Tgt1 to Tgt8 will be referred to as tgt.

なお、本実施形態では1階層目の幅を16ビットであるものとしたが、例えば1階層目の幅を8ビットとした場合には、ルックアップ処理部１０２は、Dst1～Dst8それぞれの上位8ビットを別のレジスタの32ビット領域の下位8ビットにそれぞれコピーすればよい。同様に、例えば1階層目の幅を24ビットとした場合には、ルックアップ処理部１０２は、Dst1～Dst8それぞれの上位24ビットを別のレジスタの32ビット領域の下位24ビットにそれぞれコピーすればよい。 In the present embodiment, the width of the first layer is 16 bits, but when the width of the first layer is 8 bits, for example, the lookup processing unit 102 is the upper 8 of each of Dst1 to Dst8. The bits may be copied to the lower 8 bits of the 32-bit area of another register. Similarly, for example, when the width of the first layer is 24 bits, the lookup processing unit 102 may copy the upper 24 bits of each of Dst1 to Dst8 to the lower 24 bits of the 32-bit area of another register. good.

ステップＳ１０３：次に、ルックアップ処理部１０２は、最初にルックアップする配列要素のインデックスを生成する。例えば、図８に示すように、ルックアップ処理部１０２は、上記のステップＳ１０３で得られたtgtを構成するTgt1～Tgt8のそれぞれに対して、SIMD演算命令の1つである加算命令（VPDDD）により、256を加算する。これは、FIBで表された経路表の0行をスキップ（すなわち、インデックス番号「0」～「255」までの配列要素をスキップ）し、次のルックアップを1～256行目に設定（すなわち、1階層目の配列要素に設定）することに相当する。以降では、Tgt1～Tgt8のそれぞれに256を加算した結果をIdx1～Idx8として、これらIdx1～Idx8で構成される領域をidxとする。この時点で、これらのIdx1～Idx8は、Dst1～Dst8それぞれの初回のルックアップに使用される配列要素（ノード）のインデックス番号となる。 Step S103: Next, the lookup processing unit 102 generates an index of the array element to be looked up first. For example, as shown in FIG. 8, the lookup processing unit 102 requests an addition instruction (VPDDD), which is one of the SIMD operation instructions, for each of Tgt1 to Tgt8 constituting the tgt obtained in step S103 above. Adds 256. This skips row 0 of the route table represented by the FIB (ie, skips array elements from index numbers "0" to "255") and sets the next lookup to lines 1-256 (ie). , Equivalent to setting in the array element of the first layer). Hereinafter, the result of adding 256 to each of Tgt1 to Tgt8 is referred to as Idx1 to Idx8, and the area composed of these Idx1 to Idx8 is referred to as idx. At this point, these Idx1 to Idx8 are the index numbers of the array elements (nodes) used for the initial lookup of each of Dst1 to Dst8.

ステップＳ１０４：次に、ルックアップ処理部１０２は、メモリ上の配列（つまり、配列展開された経路表）からノード（つまり、配列要素）を取得する。例えば、図９に示すように、ルックアップ処理部１０２は、SIMD演算命令の1つであるギャザー命令（例えばVPGATHERDD等）により、Idx1～Idx8のそれぞれに格納されているインデックス番号の配列要素を経路表から取得して、レジスタに格納する。以降では、Idx1～Idx8のそれぞれに格納されているインデックス番号の配列要素が格納された領域をVal1～Val8として、これら8個のVal1～Val8で構成される領域をvalとする。これにより、Val1～Val8の各々には、経路情報のインデックス番号を示すinf値（16ビット）と、次の階層へのインデックス番号を示すjmp値（16ビット）とが格納される。なお、「jmp値×256＋次にルックアップする8ビット幅の値」が、配列展開された経路表（FIB）上で次にルックアップする配列要素のインデックス番号となる。 Step S104: Next, the lookup processing unit 102 acquires a node (that is, an array element) from an array (that is, an array-expanded route table) on the memory. For example, as shown in FIG. 9, the lookup processing unit 102 routes through the array elements of the index numbers stored in each of Idx1 to Idx8 by a gather instruction (for example, VPGATHERDD) which is one of the SIMD operation instructions. Get it from the table and store it in the register. Hereinafter, the area in which the index number array elements stored in each of Idx1 to Idx8 are stored is referred to as Val1 to Val8, and the area composed of these eight Val1 to Val8 is referred to as val. As a result, each of Val1 to Val8 stores an inf value (16 bits) indicating the index number of the route information and a jmp value (16 bits) indicating the index number to the next layer. Note that "jmp value x 256 + 8-bit width value to be looked up next" is the index number of the array element to be looked up next on the array-expanded route table (FIB).

なお、従来ではギャザー命令の性能が低く、IPルックアップを始めとするテーブルルックアップ系の処理を高速に行うことが難しかったが、近年では、ギャザー命令の性能が向上し、上記のステップＳ１０４の処理をギャザー命令により高速に実行することが可能である。 In the past, the performance of the gather instruction was low, and it was difficult to perform table lookup processing such as IP lookup at high speed. However, in recent years, the performance of the gather instruction has improved, and in step S104 above. It is possible to execute the process at high speed by the gather instruction.

ステップＳ１０５：次に、ルックアップ処理部１０２は、valからinf値を取得する。例えば、図１０に示すように、ルックアップ処理部１０２は、SIMD演算命令の1つであるシャッフル命令（例えばVPSHUFB等）により、Val1～Val8それぞれの上位16ビットを別のレジスタの32ビット領域の下位16ビットにそれぞれコピーする。以降では、Val1～Val8それぞれの上位16ビットがコピーされた32ビット領域をそれぞれInf1～Inf8として、これら8個のInf1～Inf8で構成される領域をinfとする。これにより、Inf1～Inf8には、Val1～Val8に格納されているinf値がそれぞれ格納される。 Step S105: Next, the lookup processing unit 102 acquires the inf value from val. For example, as shown in FIG. 10, the lookup processing unit 102 uses a shuffle instruction (for example, VPSHUFB), which is one of the SIMD operation instructions, to set the upper 16 bits of each of Val1 to Val8 in the 32-bit area of another register. Copy to the lower 16 bits respectively. Hereinafter, the 32-bit area in which the upper 16 bits of each of Val1 to Val8 are copied is referred to as Inf1 to Inf8, respectively, and the area composed of these eight Inf1 to Inf8 is referred to as inf. As a result, the inf values stored in Val1 to Val8 are stored in Inf1 to Inf8, respectively.

ステップＳ１０６：次に、ルックアップ処理部１０２は、valからjmp値を取得する。例えば、図１１に示すように、ルックアップ処理部１０２は、SIMD演算命令の1つであるシャッフル命令（例えばVPSHUFB等）により、Val1～Val8それぞれの下位16ビットを別のレジスタの32ビット領域の下位16ビットにそれぞれコピーする。以降では、Val1～Val8それぞれの下位16ビットがコピーされた32ビット領域をそれぞれJmp1～Jmp8として、これら8個のJmp1～Jmp8で構成される領域をjmpとする。これにより、Jmp1～Jmp8には、Val1～Val8に格納されているjmp値がそれぞれ格納される。 Step S106: Next, the lookup processing unit 102 acquires the jmp value from val. For example, as shown in FIG. 11, the lookup processing unit 102 uses a shuffle instruction (for example, VPSHUFB), which is one of the SIMD operation instructions, to set the lower 16 bits of each of Val1 to Val8 in the 32-bit area of another register. Copy to the lower 16 bits respectively. In the following, the 32-bit area to which the lower 16 bits of Val1 to Val8 are copied will be referred to as Jmp1 to Jmp8, respectively, and the area composed of these eight Jmp1 to Jmp8 will be referred to as jmp. As a result, the jmp values stored in Val1 to Val8 are stored in Jmp1 to Jmp8, respectively.

なお、上記のステップＳ１０５及びステップＳ１０６の処理順は順不同である。すなわち、ステップＳ１０６の処理が実行された後、ステップＳ１０５の処理が実行されてもよい。 The processing order of steps S105 and S106 is not specified. That is, the process of step S105 may be executed after the process of step S106 is executed.

ステップＳ１０７：次に、ルックアップ処理部１０２は、ルックアップが継続している対象（すなわち、jmp値が0でない宛先アドレス）のinf値を保持する。例えば、図１２に示すように、ルックアップ処理部１０２は、まず、SIMD演算命令の1つである比較命令（例えばVPCMPEQD等）により、jmp値が0でない対象のみを選択するマスク（msk）を作成する。すなわち、ルックアップ処理部１０２は、比較命令により、Jmp1～Jmp8それぞれに格納されているjmp値が0であるか否かを比較することでMsk1～Msk8をそれぞれ作成し、これらMsk1～Msk8で構成される領域をmskとする。なお、Msk1は、例えば、Jmp1に格納されているjmp値が0である場合は全てのビット値が1の32ビット領域、Jmp1に格納されているjmp値が0でない場合は全てのビット値が0の32ビット領域である。Msk2～Msk8についても同様である。 Step S107: Next, the lookup processing unit 102 holds the inf value of the target whose lookup is continuing (that is, the destination address whose jmp value is not 0). For example, as shown in FIG. 12, the lookup processing unit 102 first uses a comparison instruction (for example, VPCMPEQD), which is one of the SIMD operation instructions, to select a mask (msk) that selects only targets whose jmp value is not 0. create. That is, the lookup processing unit 102 creates Msk1 to Msk8 by comparing whether or not the jmp value stored in each of Jmp1 to Jmp8 is 0 by the comparison instruction, and is composed of these Msk1 to Msk8. Let msk be the area to be created. For Msk1, for example, if the jmp value stored in Jmp1 is 0, all bit values are 1 in the 32-bit area, and if the jmp value stored in Jmp1 is not 0, all bit values are set. It is a 32-bit area of 0. The same applies to Msk2 to Msk8.

次に、ルックアップ処理部１０２は、SIMD演算命令の1つであるブレンド命令（例えばVPBLENDVB等）により、mskとjmpとからルックアップが継続している対象のinf値を別のレジスタに保持する。すなわち、ルックアップ処理部１０２は、ブレンド命令により、Msk1～Msk8と、Inf1～Inf8とをそれぞれ混合（ブレンド）して、その結果を32ビット領域にそれぞれ上書きする。これらの32ビット領域をそれぞれNhop1～Nhop8として、これら8個のNhop1～Nhop8で構成される領域をnhopとする。Nhop1～Nhop8には、ルックアップが継続している対象のinf値が格納される。なお、混合（ブレンド）とは、2つのビット列の各要素を条件付きでコピーすることを意味する。このとき、条件としてMsk1～Msk8の最上位ビットを用いて、Inf1～Inf8とNhop1～Nhop8とをそれぞれ混合して、新しいNhop1～Nhop8とする。これにより、例えば、Msk1の最上位ビットが0の場合はInf1が新しいNhop1にコピーされ、Msk1の最上位ビットが1の場合はNhop1が新しいNhop1にコピーされる。Inf2～Inf8及びNhop2～Nhop8についても同様である。 Next, the lookup processing unit 102 holds the inf value of the target whose lookup is continued from msk and jmp in another register by a blend instruction (for example, VPBLENDVB) which is one of the SIMD operation instructions. .. That is, the lookup processing unit 102 mixes (blends) Msk1 to Msk8 and Inf1 to Inf8, respectively, by the blend instruction, and overwrites the result in the 32-bit area, respectively. Let these 32-bit areas be Nhop1 to Nhop8, respectively, and let the area composed of these eight Nhop1 to Nhop8 be nhop. In Nhop1 to Nhop8, the inf value of the target whose lookup is continuing is stored. Note that mixing (blending) means that each element of the two bit strings is conditionally copied. At this time, using the most significant bit of Msk1 to Msk8 as a condition, Inf1 to Inf8 and Nhop1 to Nhop8 are mixed, respectively, to obtain new Nhop1 to Nhop8. This will copy Inf1 to the new Nhop1 if the most significant bit of Msk1 is 0, and copy Nhop1 to the new Nhop1 if the most significant bit of Msk1 is 1. The same applies to Inf2 to Inf8 and Nhop2 to Nhop8.

なお、RIBをFIBに変換する際に上記の最適化その1（つまり、経路情報のインデックス番号（inf値）を末端ノードへ事前展開）を行っていることにより、Msk1～Msk8と、Inf1～Inf8とをそれぞれ混合（ブレンド）した結果は、常にNhop1～Nhop8に上書き可能である。 By performing the above optimization 1 (that is, pre-expanding the index number (inf value) of the route information to the terminal node) when converting RIB to FIB, Msk1 to Msk8 and Inf1 to Inf8 The result of mixing (blending) each of and can always be overwritten with Nhop1 to Nhop8.

ステップＳ１０８：次に、ルックアップ処理部１０２は、全てのjmp値が0であるか否かを判定する。すなわち、ルックアップ処理部１０２は、SIMD演算命令の1つであるテスト命令（例えばVPTEST等）により、jmpを構成するJmp1～Jmp8の全てが0であるか否かを判定する。 Step S108: Next, the lookup processing unit 102 determines whether or not all the jmp values are 0. That is, the lookup processing unit 102 determines whether or not all of Jmp1 to Jmp8 constituting jmp are 0 by a test instruction (for example, VPTEST or the like) which is one of the SIMD operation instructions.

上記のステップＳ１０８で全てのjmp値が0であると判定された場合、ルックアップ処理部１０２は、IPルックアップ処理を終了する。この場合、Nhop1～Nhop8のそれぞれに格納されているinf値（つまり、経路情報のインデックス番号）が示す経路情報が、Dst1～Dst8のそれぞれに格納された宛先アドレスのネクストホップ（転送先アドレス）となる。したがって、その後、送信部１０３により、各パケットがネクストホップに転送される。 If it is determined in step S108 above that all jmp values are 0, the lookup processing unit 102 ends the IP lookup processing. In this case, the route information indicated by the inf value (that is, the index number of the route information) stored in each of Nhop1 to Nhop8 is the next hop (forwarding destination address) of the destination address stored in each of Dst1 to Dst8. Become. Therefore, after that, each packet is forwarded to the next hop by the transmission unit 103.

一方で、ステップＳ１０８で全てのjmp値が0であると判定されなかった場合、ルックアップ処理部１０２は、ステップＳ１０９の処理に進む。 On the other hand, if it is not determined in step S108 that all jmp values are 0, the lookup processing unit 102 proceeds to the process of step S109.

ステップＳ１０９：ルックアップ処理部１０２は、次のルックアップに使用する領域をレジスタから抽出する。例えば、図１３に示すように、2回目のルックアップでは、ルックアップ処理部１２０は、SIMD演算命令の1つであるシャッフル命令（例えばVPSHUFB等）により、Dst1～Dst8それぞれの上位17ビット目～24ビット目までの8ビットを、別のレジスタの32ビット領域の下位8ビットにコピーする。これらの32ビット領域もそれぞれTgt1～Tgt8として、8個のTgt1～Tgt8で構成される領域をtgtとする。なお、3回目のルックアップでは、ルックアップ処理部１２０は、SIMD演算命令の1つであるシャッフル命令により、Dst1～Dst8それぞれの上位25ビット目～32ビット目までの8ビットを、別のレジスタの32ビット領域の下位8ビットにコピーすればよい。 Step S109: The lookup processing unit 102 extracts the area used for the next lookup from the register. For example, as shown in FIG. 13, in the second lookup, the lookup processing unit 120 uses a shuffle instruction (for example, VPSHUFB, etc.), which is one of the SIMD operation instructions, to perform the upper 17 bits of each of Dst1 to Dst8. Copy the 8 bits up to the 24th bit to the lower 8 bits of the 32-bit area of another register. These 32-bit areas are also Tgt1 to Tgt8, respectively, and the area composed of eight Tgt1 to Tgt8 is tgt. In the third lookup, the lookup processing unit 120 uses a shuffle instruction, which is one of the SIMD calculation instructions, to set the 8 bits from the upper 25th bit to the 32nd bit of each of Dst1 to Dst8 to another register. It can be copied to the lower 8 bits of the 32-bit area of.

なお、上記のステップＳ１０２及びステップＳ１０９の処理で使用するシャッフル命令の制御マスク（つまり、dstのどのビットをtgtにコピーするかを制御するためのマスク）は、例えば、SIMD演算命令の1つである_mm256_add_epi32()で加算することにより更新する。 The control mask of the shuffle instruction used in the processes of steps S102 and S109 (that is, the mask for controlling which bit of dst is copied to tgt) is, for example, one of the SIMD operation instructions. Update by adding with a certain _mm256_add_epi32 ().

ステップＳ１１０：次に、ルックアップ処理部１０２は、次にルックアップする配列要素のインデックスを生成する。例えば、図１４に示すように、ルックアップ処理部１０２は、現在のjmpのJmp1～Jmp8のそれぞれを1バイト（8ビット）左にシフトした上で、SIMD演算命令の1つであるOR演算命令（例えばVPOR等）により、tgtとOR演算を実行する。これは、Jmp1～Jmp8のそれぞれに格納されているjmp値を256倍した上で、Tgt1～Tgt8のそれぞれとOR演算を行うことを意味する。これらのOR演算の結果が格納される32ビット領域もそれぞれIdx1～Idx8として、これらIdx1～Idx8で構成される領域をidxとする。これらのIdx1～Idx8は、Dst1～Dst8それぞれの次のルックアップに使用される配列要素（ノード）のインデックス番号となる。 Step S110: Next, the lookup processing unit 102 generates an index of the array element to be looked up next. For example, as shown in FIG. 14, the lookup processing unit 102 shifts each of Jmp1 to Jmp8 of the current jmp to the left by 1 byte (8 bits), and then an OR operation instruction which is one of the SIMD operation instructions. Execute tgt and OR operation by (for example, VPOR). This means that after multiplying the jmp values stored in each of Jmp1 to Jmp8 by 256, the OR operation is performed with each of Tgt1 to Tgt8. The 32-bit area in which the results of these OR operations are stored is also designated as Idx1 to Idx8, and the area composed of these Idx1 to Idx8 is designated as idx. These Idx1 to Idx8 are the index numbers of the array elements (nodes) used for the next lookup of each of Dst1 to Dst8.

そして、ルックアップ処理部１０２は、ステップＳ１０４の処理に戻る。これにより、上記のIndを用いて、ステップＳ１０４以降の処理（つまり、各宛先アドレスの次の8ビット部分のルックアップ）が実行される。すなわち、上記のステップＳ１０４～ステップＳ１１０の処理は、各宛先アドレスに含まれる所定の8ビット部分毎に繰り返し実行される。 Then, the lookup processing unit 102 returns to the processing of step S104. As a result, the processing after step S104 (that is, the lookup of the next 8-bit portion of each destination address) is executed using the above Ind. That is, the processes of steps S104 to S110 are repeatedly executed for each predetermined 8-bit portion included in each destination address.

なお、本実施形態では、8個の宛先アドレスを並列にルックアップするIPルックアップ処理を説明したが、パイプライン処理を利用することで、複数のIPルックアップ処理を並列に実行することもできる。例えば、IPルックアップ処理を2回分のループアンロールで実装することで、パイプライン処理により、16個の宛先アドレスを同時にルックアップすることが可能となる（ただし、パイプライン処理内では逐次的に処理が実行されるため、厳密には同時ではなく、逐次的なルックアップになる。）。同様に、3回以上のループアンロールで実装することも可能である。 In this embodiment, the IP lookup process that looks up eight destination addresses in parallel has been described, but by using the pipeline process, a plurality of IP lookup processes can be executed in parallel. .. For example, by implementing the IP lookup process with two loop unrolls, it is possible to look up 16 destination addresses at the same time by the pipeline process (however, it is processed sequentially in the pipeline process). Is executed, so it is not strictly simultaneous, but a sequential lookup.) Similarly, it can be implemented with three or more loop unrolls.

また、本実施形態では、通信キャリアのネットワーク上に通信装置１０が設置されており、8個の宛先アドレス毎に、IPルックアップ処理を繰り返し実行することを想定しているが、本実施形態に係る通信装置１０は、例えば比較的低速なネットワーク上に設置することも可能である。比較的低速ネットワーク上ではIPルックアップ処理の実行を開始する際に、宛先アドレスが8個存在しない場合も有り得る。このような場合には、例えば、全てが0の32ビット長のビット列を宛先アドレスと見做して、この宛先アドレスを１以上含む8個の宛先アドレスに対してIPルックアップ処理を実行すればよい。 Further, in the present embodiment, it is assumed that the communication device 10 is installed on the network of the communication carrier and the IP lookup process is repeatedly executed for each of the eight destination addresses. The communication device 10 can be installed, for example, on a relatively low-speed network. On a relatively slow network, it is possible that eight destination addresses do not exist when starting the execution of the IP lookup process. In such a case, for example, a 32-bit length bit string in which all 0s can be regarded as a destination address, and IP lookup processing can be performed on eight destination addresses including one or more of these destination addresses. good.

≪IPルックアップ処理（IPv6）≫
以降では、IPアドレスがIPv6である場合のIPルックアップ処理について、図１５を参照しながら説明する。図１５は、本実施形態に係るIPルックアップ処理（IPv6）の一例を示すフローチャートである。 ≪IP lookup processing (IPv6) ≫
Hereinafter, the IP lookup process when the IP address is IPv6 will be described with reference to FIG. FIG. 15 is a flowchart showing an example of the IP lookup process (IPv6) according to the present embodiment.

IPv4アドレスのビット長が32ビットであるのに対してIPv6アドレスのビット長は128ビットであるため、IPルックアップ処理（IPv6）では、IPv6アドレス8個をそれぞれ先頭から32ビットずつ切り出した上で、IPルックアップ処理（IPv4）と同様の処理を行う。そして、32ビット分のルックアップが終了した時点で、ルックアップが継続している宛先アドレスがあれば、次の32ビットを切り出すことを繰り返す。また、ルックアップ終了後、ルックアップ結果（つまり、ネクストホップを示すinf値）の順序と宛先アドレスの順序とが一致しないため、ルックアップ結果の並べ替えを行う。 Since the bit length of an IPv4 address is 32 bits, the bit length of an IPv6 address is 128 bits, so in IP lookup processing (IPv6), eight IPv6 addresses are cut out by 32 bits from the beginning. , Performs the same processing as IP lookup processing (IPv4). Then, when the lookup for 32 bits is completed, if there is a destination address for which the lookup continues, the next 32 bits are repeatedly cut out. Also, after the lookup is completed, the order of the lookup results (that is, the inf value indicating the next hop) and the order of the destination addresses do not match, so the lookup results are rearranged.

ステップＳ２０１：ルックアップ処理部１０２は、SIMD演算命令の1つであるロード命令（例えばVMOVDQA等）によりメモリからレジスタに宛先アドレス（IPv6アドレス）を2個ずつ合計8個ロードする。これら8個の宛先アドレスは、予めメモリ上に連続に保持されているものとする。なお、これら8個の宛先アドレスの各々は、受信部１０１により受信されたパケットの宛先アドレスである。 Step S201: The lookup processing unit 102 loads two destination addresses (IPv6 addresses) from the memory into the register by a load instruction (for example, VMOVDQA or the like), which is one of the SIMD operation instructions, for a total of eight. It is assumed that these eight destination addresses are continuously held in the memory in advance. Each of these eight destination addresses is the destination address of the packet received by the receiving unit 101.

ステップＳ２０２：次に、ルックアップ処理部１０２は、4個の宛先アドレス（IPv6）の先頭64ビットを抽出する。例えば、図１６に示すように、宛先アドレスをDst1, Dst2, …,Dst8として、Dst1及びDst2、Dst3及びDst4、Dst5及びDst6、Dst7及びDst8がそれぞれ同一のレジスタにロードされたものとする。また、Dst1及びDst2で構成される領域をdst_1、Dst3及びDst4で構成される領域をdst_2、Dst5及びDst6で構成される領域をdst_3、Dst7及びDst8で構成される領域をdst_4とする。更に、例えば、Dst1の1ビット目から32ビット目までをDst1[0:31]、33ビット目から64ビット目までをDst1[32:63]等と表す。他のビット部分及び他の宛先アドレスDst2～Dst8についても同様である。 Step S202: Next, the lookup processing unit 102 extracts the first 64 bits of the four destination addresses (IPv6). For example, as shown in FIG. 16, it is assumed that the destination addresses are Dst1, Dst2, ..., Dst8, and Dst1 and Dst2, Dst3 and Dst4, Dst5 and Dst6, Dst7 and Dst8 are loaded in the same register, respectively. The region composed of Dst1 and Dst2 is dst_1, the region composed of Dst3 and Dst4 is dst_2, the region composed of Dst5 and Dst6 is dst_3, and the region composed of Dst7 and Dst8 is dst_4. Further, for example, the 1st to 32nd bits of Dst1 are expressed as Dst1 [0:31], and the 33rd to 64th bits are expressed as Dst1 [32:63]. The same applies to other bit parts and other destination addresses Dst2 to Dst8.

このとき、ルックアップ処理部１０２は、SIMD演算命令の1つであるアンパック命令（例えばVPUNPCKLDQ等）により、Dst1, Dst2, Dst5, Dst6の先頭64ビットをそれぞれ抽出して、別のレジスタにコピーする。以降では、このレジスタの領域をdst_1_3とする。なお、後半の32ビット（すなわち、Dst1[32:64], Dst5[32:64], Dst2[32:64], Dst6[32:64]）は次のステップＳ２０３の処理で破棄される。 At this time, the lookup processing unit 102 extracts the first 64 bits of Dst1, Dst2, Dst5, and Dst6 by an unpack instruction (for example, VPUNPCKLDQ, etc.), which is one of the SIMD operation instructions, and copies them to another register. .. In the following, the area of this register will be dst_1_3. The latter 32 bits (that is, Dst1 [32:64], Dst5 [32:64], Dst2 [32:64], Dst6 [32:64]) are discarded in the next step S203.

同様に、ルックアップ処理部１０２は、SIMD演算命令の1つであるアンパック命令により、Dst3, Dst4, Dst7, Dst8の先頭64ビットをそれぞれ抽出して、別のレジスタにコピーする。以降では、このレジスタの領域をdst_2_4とする。なお、後半の32ビット（すなわち、Dst3[32:64], Dst7[32:64], Dst4[32:64], Dst8[32:64]）は次のステップＳ２０３の処理で破棄される。 Similarly, the lookup processing unit 102 extracts the first 64 bits of Dst3, Dst4, Dst7, and Dst8 by the unpack instruction, which is one of the SIMD operation instructions, and copies them to another register. In the following, the area of this register will be dst_2_4. The latter 32 bits (that is, Dst3 [32:64], Dst7 [32:64], Dst4 [32:64], Dst8 [32:64]) are discarded in the next step S203.

ステップＳ２０３：次に、ルックアップ処理部１０２は、8個の宛先アドレス（IPv6）の先頭32ビットを抽出する。例えば、図１７に示すように、ルックアップ処理部１０２は、SIMD演算命令の1つであるアンパック命令（例えばVPUNPCKLDQ等）により、Dst1, Dst2, Dst5, Dst6の先頭32ビットをそれぞれ抽出して、別のレジスタにコピーする。以降では、このレジスタの領域をupkとする。 Step S203: Next, the lookup processing unit 102 extracts the first 32 bits of the eight destination addresses (IPv6). For example, as shown in FIG. 17, the lookup processing unit 102 extracts the first 32 bits of Dst1, Dst2, Dst5, and Dst6 by an unpack instruction (for example, VPUNPCKLDQ, etc.), which is one of the SIMD operation instructions. Copy to another register. Hereinafter, the area of this register is referred to as upk.

ステップＳ２０４：次に、ルックアップ処理部１０２は、upkを用いて、図６のステップＳ１０２～ステップＳ１０３の処理を実行する。すなわち、ルックアップ処理部１０２は、upkをdstと見做して、図６に示すIPルックアップ処理（IPv4）のステップＳ１０２～ステップＳ１０３の処理を実行する。 Step S204: Next, the lookup processing unit 102 executes the processing of steps S102 to S103 of FIG. 6 using upk. That is, the lookup processing unit 102 regards upk as dst and executes the processes of steps S102 to S103 of the IP lookup process (IPv4) shown in FIG.

ステップＳ２０５：次に、ルックアップ処理部１０２は、図６のステップＳ１０４～ステップＳ１０７の処理を実行する。 Step S205: Next, the lookup processing unit 102 executes the processing of steps S104 to S107 of FIG.

ステップＳ２０６：次に、ルックアップ処理部１０２は、図６のステップＳ１０８の処理と同様に、全てのjmp値が0であるか否かを判定する。 Step S206: Next, the lookup processing unit 102 determines whether or not all the jmp values are 0, as in the processing of step S108 of FIG.

上記のステップＳ２０６で全てのjmp値が0である判定されなかった場合（つまり、ルックアップが継続している場合）、ルックアップ処理部１０２は、ステップＳ２０７の処理に進む。 If it is not determined in step S206 that all jmp values are 0 (that is, if lookup continues), the lookup processing unit 102 proceeds to the process of step S207.

一方で、ステップＳ２０６で全てのjmp値が0であると判定された場合（つまり、ルックアップが継続していない場合）、ルックアップ処理部１０２は、ステップＳ２０９の処理に進む。 On the other hand, when it is determined in step S206 that all jmp values are 0 (that is, when the lookup is not continued), the lookup processing unit 102 proceeds to the process of step S209.

ステップＳ２０７：ルックアップ処理部１０２は、32ビット分のルックアップが行われた場合（つまり、dstと見做されたupkのdst1～dst8の各32ビット領域の全てのビット列がルックアップ済みである場合）、SIMD演算命令の1つである論理シフト命令（例えばVPSRLDQ等）により、dst_1～dst_4をそれぞれ32ビット左にシフトする。ただし、32ビット分のルックアップが未だ行われていない場合、本ステップの処理は実行されない。なお、dst1～dst8の各32ビット領域の全てのビット列をルックアップ済みであるとは、例えば、8ビット毎にルックアップを行う場合には4回ルックアップが行われたことを意味し、16ビットでルックアップを行った後、8ビット毎にルックアップを行う場合には3回ルックアップが行われたことを意味する。 Step S207: When the lookup processing unit 102 has performed a lookup for 32 bits (that is, all the bit strings in each 32-bit area of dst1 to dst8 of the upk regarded as dst have been looked up. Case), a logical shift instruction (for example, VPSRLDQ, etc.), which is one of the SIMD operation instructions, shifts dst_1 to dst_4 to the left by 32 bits, respectively. However, if the 32-bit lookup has not been performed yet, the processing of this step will not be executed. Note that all the bit strings in each 32-bit area of dst1 to dst8 have been looked up, for example, when looking up every 8 bits, it means that the lookup was performed 4 times. 16 After looking up in bits, if you look up every 8 bits, it means that the lookup was done 3 times.

そして、ルックアップ処理部１０２は、32ビット左シフトされたdst_1～dst_4を用いて、ステップＳ２０２～ステップＳ２０３の処理を実行する。これにより、宛先アドレス（IPv6）の次の32ビット部分がupkとして抽出される。 Then, the lookup processing unit 102 executes the processing of steps S202 to S203 using the 32-bit left-shifted dst_1 to dst_4. As a result, the next 32-bit part of the destination address (IPv6) is extracted as upk.

ステップＳ２０８：ルックアップ処理部１０２は、図６のステップＳ１０９～ステップＳ１１０の処理を実行する。すなわち、ルックアップ処理部１０２は、upkをdstと見做して、図６に示すIPルックアップ処理（IPv4）のステップＳ１０９～ステップＳ１１０の処理を実行する。そして、ルックアップ処理部１０２は、上記のステップＳ２０５の処理に戻る。 Step S208: The lookup processing unit 102 executes the processes of steps S109 to S110 of FIG. That is, the lookup processing unit 102 regards upk as dst and executes the processes of steps S109 to S110 of the IP lookup process (IPv4) shown in FIG. Then, the lookup processing unit 102 returns to the processing of step S205.

ステップＳ２０９：ルックアップ処理部１０２は、ルックアップ結果が格納されているNhop1～Nhop8を並べ替える。例えば、図１８に示すように、本実施形態では、nhopは、Nhop1, Nhop3, Nhop5, Nhop7, Nhop2, Nhop4, Nhop6, Nhop8の順で構成されている。なお、Nhop1～Nhop8にはそれぞれdst1～dst8のネクストホップを示すinf値（つまり、経路情報のインデックス番号）が格納されている。このとき、ルックアップ処理部１０２は、SIMD演算命令の1つである並び替え命令（例えばVPSRLDQ等）により、Nhop1, Nhop2, Nhop3, Nhop4, Nhop5, Nhop6, Nhop7, Nhop8の順にNhop1～Nhop8を並び替える。これにより、各宛先アドレス（IPv6）のネクストホップが得られる。 Step S209: The lookup processing unit 102 rearranges Nhop1 to Nhop8 in which the lookup result is stored. For example, as shown in FIG. 18, in the present embodiment, the nhop is configured in the order of Nhop1, Nhop3, Nhop5, Nhop7, Nhop2, Nhop4, Nhop6, Nhop8. In addition, inf values (that is, index numbers of route information) indicating the next hops of dst1 to dst8 are stored in Nhop1 to Nhop8, respectively. At this time, the lookup processing unit 102 arranges Nhop1 to Nhop8 in the order of Nhop1, Nhop2, Nhop3, Nhop4, Nhop5, Nhop6, Nhop7, Nhop8 by a rearrangement instruction (for example, VPSRLDQ) which is one of the SIMD operation instructions. Change. This gives the next hop for each destination address (IPv6).

なお、本実施形態では、ルックアップ終了後に並べ替えを行っているが、ルックアップ前に、upkを構成するDst1～Dst8を、Dst1, Dst2, Dst3, Dst4, Dst5, Dst6, Dst7, Dst8の順に並べ替えてもよい。ただし、ルックアップ前に並べ替える場合には、32ビットの切り出し毎に並べ替えが必要となる。一方で、本実施形態のように、ルックアップ終了後に並べ替えを行えば並べ替えが1回で済むため、効率的な並べ替えが可能となる。 In this embodiment, sorting is performed after the lookup is completed, but before the lookup, Dst1 to Dst8 constituting upk are arranged in the order of Dst1, Dst2, Dst3, Dst4, Dst5, Dst6, Dst7, Dst8. You may sort them. However, when sorting before lookup, sorting is required for each 32-bit cutout. On the other hand, as in the present embodiment, if the sorting is performed after the lookup is completed, the sorting can be performed only once, so that the sorting can be performed efficiently.

＜通信装置１０のハードウェア構成＞
次に、本実施形態に係る通信装置１０のハードウェア構成について、図１９を参照しながら説明する。図１９は、本実施形態に係る通信装置１０のハードウェア構成の一例を示す図である。 <Hardware configuration of communication device 10>
Next, the hardware configuration of the communication device 10 according to the present embodiment will be described with reference to FIG. FIG. 19 is a diagram showing an example of the hardware configuration of the communication device 10 according to the present embodiment.

図１９に示すように、本実施形態に係る通信装置１０は、ハードウェアとして、入力装置２０１と、表示装置２０２と、外部I/F２０３と、通信I/F２０４と、プロセッサ２０５と、記憶装置２０６とを有する。これら各ハードウェアは、それぞれがバスＢを介して通信可能に接続されている。 As shown in FIG. 19, the communication device 10 according to the present embodiment has, as hardware, an input device 201, a display device 202, an external I / F 203, a communication I / F 204, a processor 205, and a storage device 206. And have. Each of these hardware is connected so as to be communicable via the bus B.

入力装置２０１は、例えばキーボードやマウス、タッチパネル等である。表示装置２０２は、例えばディスプレイ等である。なお、通信装置１０は、入力装置２０１及び表示装置２０２の少なくとも一方を有していなくてもよい。 The input device 201 is, for example, a keyboard, a mouse, a touch panel, or the like. The display device 202 is, for example, a display or the like. The communication device 10 does not have to have at least one of the input device 201 and the display device 202.

外部I/F２０３は、外部装置とのインタフェースである。外部装置には、記録媒体２０３ａ等がある。通信装置１０は、外部I/F２０３を介して、記録媒体２０３ａの読み取りや書き込み等を行うことができる。記録媒体２０３ａには、例えば、通信装置１０が有する各機能部（つまり、受信部１０１、ルックアップ処理部１０２及び送信部１０３）を実現する1以上のプログラム等が記録されていてもよい。 The external I / F 203 is an interface with an external device. The external device includes a recording medium 203a and the like. The communication device 10 can read or write the recording medium 203a via the external I / F 203. For example, one or more programs that realize each functional unit (that is, the receiving unit 101, the lookup processing unit 102, and the transmitting unit 103) of the communication device 10 may be recorded on the recording medium 203a.

通信I/F２０４は、通信装置１０を通信ネットワークに接続するためのインタフェースである。 The communication I / F 204 is an interface for connecting the communication device 10 to the communication network.

プロセッサ２０５は、例えばCPU等である。なお、プロセッサ２０７にはレジスタが内蔵されている。通信装置１０が有する各機能部は、記憶装置２０６に格納されている1以上のプログラムが、プロセッサ２０５に処理を実行させることで実現される。 The processor 205 is, for example, a CPU or the like. The processor 207 has a built-in register. Each functional unit included in the communication device 10 is realized by causing the processor 205 to execute a process by one or more programs stored in the storage device 206.

記憶装置２０６は、RAM(Random Access Memory)、HDD(Hard Disk Drive)やSSD(Solid State Drive)等の補助記憶装置等である。記憶装置２０６にはROM(Read Only Memory)が含まれていてもよい。記憶装置２０６には、各種プログラムやデータが記憶されている。これらのプログラムやデータには、例えば、OS(Operating System)やアプリケーションプログラム、通信装置１０が有する各機能部を実現する1以上のプログラム、経路表等が挙げられる。 The storage device 206 is an auxiliary storage device such as a RAM (Random Access Memory), an HDD (Hard Disk Drive), or an SSD (Solid State Drive). The storage device 206 may include a ROM (Read Only Memory). Various programs and data are stored in the storage device 206. Examples of these programs and data include an OS (Operating System), an application program, one or more programs that realize each functional unit of the communication device 10, a route table, and the like.

本実施形態に係る通信装置１０は、図１９に示すハードウェア構成を有することにより、上述したIPルックアップ処理を実現することができる。なお、図１９に示す例では、本実施形態に係る通信装置１０が1台の装置（コンピュータ）で実現されている場合を示したが、これに限られない。本実施形態に係る通信装置１０は、複数台の装置（コンピュータ）で実現されていてもよい。また、1台の装置（コンピュータ）には、複数のプロセッサ２０５や複数の記憶装置２０６が含まれていてもよい。 The communication device 10 according to the present embodiment can realize the above-mentioned IP lookup process by having the hardware configuration shown in FIG. In the example shown in FIG. 19, the case where the communication device 10 according to the present embodiment is realized by one device (computer) is shown, but the present invention is not limited to this. The communication device 10 according to the present embodiment may be realized by a plurality of devices (computers). Further, one device (computer) may include a plurality of processors 205 and a plurality of storage devices 206.

＜まとめ＞
以上のように、本実施形態に係る通信装置１０は、SIMD演算を用いて複数の宛先アドレスを並列にIPルックアップすることができる。このため、本実施形態に係る通信装置１０では、IPルックアップ処理の高速化を実現することが可能となる。 <Summary>
As described above, the communication device 10 according to the present embodiment can perform IP lookup of a plurality of destination addresses in parallel by using SIMD calculation. Therefore, in the communication device 10 according to the present embodiment, it is possible to realize high-speed IP lookup processing.

しかも、本実施形態に係る通信装置１０は、IPルックアップ処理を高速化するためにSIMD演算を用いているため、以下の2つのメリットがある。 Moreover, since the communication device 10 according to the present embodiment uses the SIMD calculation in order to speed up the IP lookup process, it has the following two merits.

メリット1：低コスト
SIMD演算機能は現在広く流通しているサーバ向けプロセッサに標準的に搭載されていることが多いため、特別なハードウェア（例えば、GPUやFPGA等）を追加で用意することなく、IPルックアップ処理を高速化させることが可能となる。また、SIMD演算を使用する処理の開発プロセスは通常のソフトウェアと変わらないため、開発コストも増大しない。 Advantage 1: Low cost
Since the SIMD arithmetic function is often installed as standard in processors for servers that are widely distributed today, IP lookup processing is performed without additional special hardware (for example, GPU or FPGA). Can be speeded up. Moreover, since the development process of the process using SIMD calculation is the same as that of ordinary software, the development cost does not increase.

メリット2：マルチコアへのスケーラビリティ
SIMD演算は単位のCPUコア毎に使用できる機構であるため、従来から利用されてきたマルチコアによるスケーリング手法を同時に利用することができる。 Advantage 2: Scalability to multi-core
Since SIMD calculation is a mechanism that can be used for each CPU core of a unit, it is possible to simultaneously use the multi-core scaling method that has been used conventionally.

本発明は、具体的に開示された上記の実施形態に限定されるものではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。 The present invention is not limited to the above-described embodiment disclosed specifically, and various modifications and modifications can be made without departing from the scope of claims.

１０通信装置
１０１受信部
１０２ルックアップ処理部
１０３送信部 10 Communication device 101 Receiver 102 Lookup processing 103 Transmitter

Claims

The receiving means for receiving packets and
Using an array composed of a plurality of array elements containing information on the next hop and information on the next hierarchy when the routing table is represented by Multiway Trie, each hierarchy of the routing table is divided into the hierarchy. Correspondingly, a bit string having a predetermined length is extracted from each of a plurality of destination addresses acquired from the plurality of packets received by the receiving means, and the extracted plurality of bit strings and the array elements constituting the hierarchy are subjected to SIMD calculation. By repeatedly acquiring the information about the next hop and the information about the next layer contained in the matched array element by matching by the longest match using, the information about the next hop of the destination address of the plurality of packets is obtained. Lookup processing means to acquire from the array in parallel,
When the destination address is an IPv6 address, an extraction means that extracts a bit string from the destination address every 32 bits until the information about the next layer acquired by the lookup processing means reaches a predetermined value, and an extraction means.
A transmission means for transmitting each of the plurality of packets received by the reception means according to each of the information regarding the plurality of next hops acquired by the lookup processing means.
Have,
The lookup processing means is
The bit string extracted by the extraction means is regarded as the destination address of the IPv4 address, and the information regarding the next hop of the destination address and the information regarding the next layer are repeatedly acquired from the array in parallel. Communication device.

As an optimization, the array has a lower array element when the information about the next hop is pre-expanded in a lower hierarchy and there is a higher sequence element containing information about the same next hop. The communication device according to claim 1, wherein at least one of them has been deleted.

The lookup processing means is
When the destination address is an IPv4 address and the layer composed of the array elements to be matched is the first layer, a bit string having a length of 16 bits is extracted from the beginning of the plurality of destination addresses. Match the extracted multiple bit strings with the array elements that make up the hierarchy, and acquire the information about the next hop and the information about the next hierarchy contained in the matched array elements.
When the layer composed of the array elements to be matched is the second layer or later, a bit string having an 8-bit length is extracted from the beginning of the unextracted portion of the plurality of destination addresses for each layer. A claim characterized in that the extracted plurality of bit strings are matched with the array elements constituting the hierarchy, and the information regarding the next hop and the information regarding the next hierarchy included in the matched array elements are repeatedly acquired. The communication device according to 1 or 2 .

The lookup processing means is
The number of destination addresses in which information about the next hop is acquired in parallel by the longest match using the SIMD operation is N, the number of parallel executions by the pipeline processing is M, and N × M by the pipeline processing. The communication device according to any one of claims 1 to 3 , wherein information about the next hop of the destination address is acquired at the same time.

The receiving procedure for receiving packets and
Using an array composed of a plurality of array elements containing information on the next hop and information on the next hierarchy when the routing table is represented by Multiway Trie, each hierarchy of the routing table is divided into the hierarchy. Correspondingly, a bit string of a predetermined length is extracted from each of a plurality of destination addresses acquired from each of the plurality of packets received in the reception procedure, and the extracted plurality of bit strings and the array elements constituting the hierarchy are subjected to SIMD calculation. By repeatedly acquiring the information about the next hop and the information about the next layer contained in the matched array element by matching by the longest match using, the information about the next hop of the destination address of the plurality of packets is obtained. The lookup processing procedure to acquire from the array in parallel and
When the destination address is an IPv6 address, an extraction procedure for extracting a bit string from the destination address every 32 bits until the information regarding the next layer acquired in the lookup processing procedure reaches a predetermined value, and an extraction procedure.
A transmission procedure for transmitting each of the plurality of packets received in the reception procedure according to each of the information regarding the plurality of next hops acquired in the lookup processing procedure, and a transmission procedure for transmitting each of the plurality of packets received in the reception procedure.
The computer runs ,
The lookup processing procedure is
The bit string extracted in the extraction procedure is regarded as the destination address of the IPv4 address, and the information regarding the next hop of the destination address and the information regarding the next layer are repeatedly acquired from the array in parallel. Communication method.

The receiving procedure for receiving packets and
Using an array composed of a plurality of array elements containing information on the next hop and information on the next hierarchy when the routing table is represented by Multiway Trie, each hierarchy of the routing table is divided into the hierarchy. Correspondingly, a bit string of a predetermined length is extracted from each of a plurality of destination addresses acquired from each of the plurality of packets received in the reception procedure, and the extracted plurality of bit strings and the array elements constituting the hierarchy are subjected to SIMD calculation. By repeatedly acquiring the information about the next hop and the information about the next layer contained in the matched array element by matching by the longest match using, the information about the next hop of the destination address of the plurality of packets is obtained. The lookup processing procedure to acquire from the array in parallel and
When the destination address is an IPv6 address, an extraction procedure for extracting a bit string from the destination address every 32 bits until the information regarding the next layer acquired in the lookup processing procedure reaches a predetermined value, and an extraction procedure.
A transmission procedure for transmitting each of the plurality of packets received in the reception procedure according to each of the information regarding the plurality of next hops acquired in the lookup processing procedure, and a transmission procedure for transmitting each of the plurality of packets received in the reception procedure.
Let the computer run
The lookup processing procedure is
The bit string extracted in the extraction procedure is regarded as the destination address of the IPv4 address, and the information regarding the next hop of the destination address and the information regarding the next layer are repeatedly acquired from the array in parallel. Program to be.