GB2371381A - Tree based search method - Google Patents

Tree based search method Download PDF

Info

Publication number
GB2371381A
GB2371381A GB0108545A GB0108545A GB2371381A GB 2371381 A GB2371381 A GB 2371381A GB 0108545 A GB0108545 A GB 0108545A GB 0108545 A GB0108545 A GB 0108545A GB 2371381 A GB2371381 A GB 2371381A
Authority
GB
United Kingdom
Prior art keywords
search
leaf
key
pattern
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0108545A
Other versions
GB2371381B (en
GB0108545D0 (en
Inventor
Brian M Bass
Jean L Calvignac
Marco C Heddes
Antonios Maragkos
Michael S Siegel
Fabrice J Verplanken
Piyush Patel
Clark D Jeffries
Mark A Rinaldi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/544,992 external-priority patent/US6947931B1/en
Priority claimed from US09/543,531 external-priority patent/US6675163B1/en
Priority claimed from US09/545,100 external-priority patent/US7107265B1/en
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB0108545D0 publication Critical patent/GB0108545D0/en
Publication of GB2371381A publication Critical patent/GB2371381A/en
Application granted granted Critical
Publication of GB2371381B publication Critical patent/GB2371381B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • H04L45/74591Address table lookup; Address filtering using content-addressable memories [CAM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A method of performing a search based upon a search criterion using a tree is proposed. In use an input is read as a search key and the most significant bits are used as an index to a search table representing a plurality of search nodes. Each non empty entry in the search table will contain a pointer to the next branch of the tree. The search table may use a hash function to generate an index key. A determination is then made if the pointer points to a leaf or branch of the tree. If the pointer is to a branch the procedure is repeated until a leaf object is identified and returned to the calling application. In no entry is found that matches the search a no match is returned. The search criterion may be a longest prefix match in which instance the method is executed to find the position of the distinguishing bit.

Description

SEARCH ALGORITHM IMPLEMENTATION FOR A NETWORK PROCESSOR FIELD OF THE INVENTION The present invention relates generally to pattern matching algorithms and, more particularly, to search algorithms that can be implemented in a network processing device.
BACKGROUND OF THE INVENTION The demand for hardware-integrated processing to support more and more complex tasks at media speed has led to the creation of network processors. Network processors provide wirespeed frame processing and forwarding capability with function flexibility through a set of embedded, programmable protocol processors and complementary system coprocessors.
Network processors are expected to become the fundamental network building block for networks in the manner that microprocessors are for today's personal computers. Network processors offer real-time processing of multiple data streams, providing enhanced security and IP packet handling and forwarding capabilities. In addition, they provide speed improvements through advanced architectures, such as parallel distributed processing and pipeline processing designs. These capabilities can enable efficient search engines, increased data handling throughput, and provide rapid execution of complex tasks. The programmable features of network processors provide network product developers an easier migration path to implement new protocols and technologies without requiring new custom Application Specific Integrated Circuit (ASIC) designs.
Network processors provide a highly customizable, scalable technology for the development of interconnect solutions for Internet or enterprise network providers. A network processor provides the basis for a wide range of solutions from a low-end, stand-alone device to a large multirack solution. Scaling of this nature is accomplished through the use of high performance, non-blocking packet routing switch technology and proprietary interfaces such as IBM Corporation's Data Aligned Serial Link (DASL) interface which can be adapted to other industry switch technologies.
As a programmable communications integrated circuit, the network processor provides very efficient packet classification, multi-table lookups per frame, packet modification, queue/policy management, and other packet processing capabilities. The network processor integrates a
switching engine, search engine, frame processors and Ethernet MACs on one device to support the needs of customers who require high capacity, media weight switching frames based on frame content at any protocol layer.
Hardware accelerators perform frame forwarding, frame filtering and frame alteration. The network processor's ability to enforce hundreds of rules with complex range and action specifications sets a new benchmark for filtering capabilities, making a network processor-based system uniquely suited for high capacity server farm applications.
A typical system developed with a network processor uses a distributed software model, with each programmable network processor executing tasks concurrently. Some functions are performed in the control point (CP) processor, which can be internal or external to the network processor. The CP provides support for layer 2 and layer 3 routing protocols, and layer 4 and layer 5 network applications and systems management. Wirespeed forwarding and filtering functions are performed by a combination of the network processor hardware and resident picocode.
In communication networks, comprising a number of interconnected nodes, data can be sent from one node to any other node or network.
Specialized nodes called routers are responsible for forwarding the data to their destinations. Any data sent through a communication network contains information about the destination address, generally as part of a header.
Each router compares this information, or at least part of it, with a list of addresses stored internally. If a match is found between stored addresses and the destination address, the router establishes a path leading to the destination node. Depending on the network size and structure, the data are either directly forwarded to their destination or sent to another intermediate router. The International Organization for Standardization (ISO) promulgated a routing standard in which a router stores routing information for partial addresses. The router then sends the packet to the best matching partial address it has in its database.
The ISO standard allows a hierarchical structure of nodes to be built using a given number of digits or a given header length. Main routers are addressed by the initial part of the address, subrouters by the middle part, and the final destination by the last digits of the address.
Therefore, it is sufficient for any router to read the digits assigned to the level of the hierarchy to which the data are to be sent.
The routing of the receive packet is based on the accompanying address string. The address string is used as a search key in a database which contains the address string along with other pertinent details such as which router is next in a delivery of a packet. The database is
referred to as a routing table, while the link between the current router and the next router is called the next hop in the progress of the packet.
The routing table search process depends on the structure of the address as well as the organization of the tables. For example, a search key of a size less than 8 bits and having a nonhierarchical structure would most efficiently be found in a routing table organized as a series of address entries. The search key would be used as an index in the table to locate the right entry. For a search key of a larger size, say thirty-two bits, the corresponding routing table may have more than 10,000 entries.
Organizing the database as a simple table to be searched directly by an index would waste a large amount of memory space, because most of the table would be empty.
Conventional routers break up the search process into several steps.
The first step is to determine whether the router is directly connected to the destination host computer. In this case, the message is one hop from the destination and should be routed in that direction. If the destination computer is not directly connected to the router, the next step is to determine the topological direction of the destination network. If the direction is determined from the topological layout, the message is routed that way. Otherwise, the final step is to route the message along a default link.
Typically, the first step is performed using a linear search through a table containing the thirty-two bit addresses of host computers directly connected to the router. Reflecting the local topology, each entry in the address table is connected to a corresponding output interface leading directly to the addressed computer. When a destination address is received by a router, the full thirty-two bits are compared with each of the destination addresses in a table. If a match is found, the message is sent directly to the corresponding destination via the specified router interface. The second step, that of determining the direction of the destination network, is not usually performed by a linear search through a table since the number of network addresses would make such a table difficult to manage and use. In the prior art, when address strings conformed to the three-level hierarchy of network address, subnet address and host identification, routers performed the determination using one of several well-known techniques, such as hashing, Patricia-tree searching, and multilevel searching. In hashing, a hash function reduces the network portion of the address, producing a small, manageable index. The hash index is used to index a hash table and to search for a matching hash entry. Corresponding to each hash entry of the hash table is the address of an output interface pointing in the topological direction of a corresponding network. If a match is found between the hash network
portion and a hash entry, the message is directed towards the corresponding interface and destination network.
Hashing reduces a large, unmanageable field to a small manageable index. In the process, however, there is a chance that two or more fields may generate the same hash index. This occurrence is referred to as a collision, since these fields must be stored in the same location in the hash table. Further searching is needed to differentiate the entries during a collision. Therefore, collisions reduce the efficiency obtained from using the hashing search, and in the worst case, where all permissible addresses reduce to a single index, hashing is rendered practically useless as a search process.
Patricia-tree searching avoids the collisions encountered by hashing methods. This method of searching requires that all address strings and accompanying information, such as related route information, be stored in a binary tree. Starting from the most significant bit position within the address string, the search process compares the address, bit by bit, with the tree nodes. A matched bit value guides the search to visit either the left or the right child node and the process is repeated for the next bit of the address. The search time is proportional to the size of the longest address string stored. In Patricia-tree searching, the difference between the average search time and the worst case search time is not very large.
In addition, the routing table is organized quite efficiently. It requires less memory than comparable routing tables of hashing methods.
Patricia-tree searching handles the worst case searches better than the hashing methods, but in most cases it takes significantly longer to locate a match. Therefore, many conventional routers use a combination of hashing and Patricia-tree searching. This combination is called multilevel searching. Multilevel searching joins hashing with Patricia-tree searching.
A cache stores a hash table containing a subset of the most recently, and presumably most commonly, routed network addresses, while a Patricia-tree stores the full set of network addresses. As the message is received, the destination address is hashed onto the table. If it is not located within a pre-determined period of time, the address is passed to the Patricia-tree search engine which insures that the address, if stored, will be found.
In the prior art, there are a number of known tree search algorithms including fixed match trees, longest prefix match trees and software managed trees. Fixed match trees are used for fixed size patterns requiring an exact match, such as layer 2 Ethernet MAC tables. Longest prefix match trees are used for variable length patterns requiring only partial matches, such as IP subnet forwarding. Software managed trees are used for patterns that are defined as ranges or bit masks, such as filter
rules. In general, lookup is performed with the aid of a tree search engine (TSE).
SUMMARY OF THE INVENTION It is the object of this invention to provide an efficient tree search algorithm to perform searches such as full match, longest prefix, or pattern range.
A search mechanism is provided that does not require storage of the previous pointer and uses only a forward pointer along with a next bit or group of bits to test thereby reducing storage space for nodes.
The concept of the invention is that a key is input, a direct table (DT) is accessed, and the tree is walked through pattern search control blocks (PSCBs) and ends up with a leaf. Alternatively a hash function may be performed on the key.
A problem solved is the design of a set of data structures that can be located in a few registers and regular memory, and then used to build a Patricia-tree structure that can be manipulated by a relatively simple hardware macro. In the Patricia-tree, both keys and corresponding information needed for retrieval are stored.
The key is the information that is to be searched on and matched.
Initially, the key is placed in a register and can be hashed. The result is the hash key and the actual search will happen on the hash key. The hash function could be the null hash, and then the hash key will be exactly the same as the key. The hash function provides an n- > n mapping of the bits of the key to the bits of the hash key.
The data structure that is used to store the hash keys and the related information in the tree is called a leaf. Retrieving the leaf is the purpose of this algorithm. Each leaf corresponds to one or more keys according to the search criteria. In this implementation the leaf contains the one or more keys, and appended to it is the additional information to be stored. For example in a Full Match search tree a leaf contains a single key that matches exactly with the input key. The length of the leaf is programmable, as is the length of the key. The leaf is stored in random access memory and is implemented as a single memory entry. If the key is located in the direct table (DT) then it is called a direct leaf.
Typically, to find the match to a search criteria in a tree one has to compare a bit at a time until finding the best. To achieve this, one has
to compare bitwise, requiring"n"number of comparisons or memory accesses to identify the closest matching pattern. The described approach addresses full match, longest prefix match, and software managed trees with the least number of comparisons. The trees are built in such a way that the matched result is guaranteed to be a best match and is found using a fewer number of comparisons.
Preferably, within the search tree, the starting bit number and number of bits to be compared at a branch of the tree can be specified.
This enables a tree to be built in which branches need only be included where keys, contained in leaves, under the branch differ and all branches eventually lead, possibly through other branches, to a leaf. This ensures that bit comparisons need only be carried out for bit positions where key contents of leaves under a given branch differ and a match can be found using the reduced number of comparisons.
The novel data structure employed by the invention can be used for the implementation in hardware of a Full Match tree search algorithm for Patricia-trees. The invention describes how the memory structures are set up so that they can serve the purpose of the algorithm, and how the hardware processes these structures. For full match searches a leaf contains a single key.
The novel data structure employed by the invention can also be used for longest prefix match search, which provides the mechanism for searching tables efficiently with variable length patterns or prefixes. This approach allows a very efficient and simple implementation with the least amount of storage and search time. In modern communications networks, it is very important to identify the best match prefix very quickly due to the speed and volume of traffic. An example is the IP layer 3 forwarding table. Typically, when a forwarding engine is looking for a given IP address/key, the matching result could be full match/exact match for a host address or it could be a prefix for a network address. This requires both exact full match search followed by all prefix matches to determine the most appropriate match result. The described approach addresses longest prefix match with the least number of comparisons.
Using a trail of birds, which are special leaves which represent partial prefix matches of the search key, and their prefix length allows going to the correct prefix result from the trail. Since the full pattern and its prefix length are stored in the bird, this allows the backtracking of the trail for tree management functions. By construction, the tree provides the best matching prefix at or after the first compare during walking the trail or tree. This concept is scalable to support various
combination of values for address, next bit to test (NBT), leaf sizes and other components used.
The novel data structure employed by the invention can be used for a Software Managed Tree (SMT) which provides a mechanism to create tree structures that follow a search mechanism defined by a control point. An exemplary mechanism of this would be an Internet Protocol (IP) 5-tuple filtering table, containing IP source address (IPSA), IP destination address (IPDA), source port (SP), destination port (DP) and communications protocol. In contrast to Full Match (FM) or Longest Prefix Match (LPM) trees, SMT trees allow support for range compares. For example, a leaf can be used to specify that the source port must be in the range x and y.
This approach allows a very efficient and simple implementation with efficient storage and search time. This approach also allows various filter rules to be chained.
BRIEF DESCRIPTION OF THE DRAWINGS A preferred embodiment of the present invention will now be described with reference to the accompanying drawings in which: Fig. 1 illustrates an exemplary architecture for a network processor in accordance with a preferred embodiment of the present invention; Fig. 2 illustrates an exemplary embodiment for an embedded processor complex in accordance with a preferred embodiment of the present invention; Fig. 3 illustrates an exemplary protocol processor structure in accordance with a preferred embodiment of the present invention; Fig. 4 illustrates exemplary ingress and egress frame flows in accordance with a preferred embodiment of the present invention; Fig. 5 illustrates a tree data structure for the full match search algorithm in accordance with a preferred embodiment of the present invention; Fig. 6 illustrates the effect on exemplary data structures of using a direct table in accordance with a preferred embodiment of the invention; Fig. 7 illustrates the effect on exemplary data structures of having direct leaves enabled in accordance with a preferred embodiment of the present invention;
Fig. 8 illustrates an exemplary structure of a DT entry and pattern search control block (PSCB) line formats in a Full Match search tree in accordance with a preferred embodiment of the present invention; Fig. 9 illustrates an example of a search using a Full Match search in accordance with a preferred embodiment of the present invention; Fig. 10 illustrates the processing logic of the Full Match (FM) search algorithm in accordance with a preferred embodiment of the present invention; Fig. 11 illustrates the internal structure of an exemplary lookup definition table in accordance with a preferred embodiment of the present invention; Fig. 12 illustrates the internal format of a PSCB register in accordance with a preferred embodiment of the present invention; Fig. 13 illustrates the fixed leaf format for FM trees in accordance with a preferred embodiment of the present invention; Fig. 14 illustrates an exemplary structure of a DT entry and pattern search control block (PSCB) line formats in a Longest Prefix Match search tree in accordance with a preferred embodiment of the present invention; Fig. 15 illustrates an example of a search using a Longest Prefix Match search in accordance with a preferred embodiment of the present invention; Fig. 16 illustrates examples of the calculation of a distinguishing position (DistPos) between an input key and a leaf pattern in a Longest Prefix Match search in accordance with a preferred embodiment of the present invention; Figs. 17A-17B illustrate the processing logic of the Longest Prefix Match (LPM) search algorithm in accordance with a preferred embodiment of the present invention; Fig. 18 illustrates the fixed leaf format for LPM trees in accordance with a preferred embodiment of the present invention.
Fig. 19 illustrates examples of fields in the input key and leaf patterns for a Software Managed Tree (SMT) in accordance with a preferred embodiment of the present invention;
Fig. 20 illustrates an exemplary format for a compare definition table entry for a Software Managed Tree in accordance with a preferred embodiment of the present invention.
Fig. 21 illustrates the processing logic of the Software Management Tree search algorithm in accordance with a preferred embodiment of the present invention.
Fig. 22 illustrates the fixed leaf format for SMT Trees in accordance with a preferred embodiment of the present invention.
Fig. 23 illustrates an exemplary architecture for a tree search engine in accordance with a preferred embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT The present invention will be described in the context of a network processor in which the invention is embedded. The network processor 10 is a programmable switching and routing system on a single chip, an architecture of which is depicted in Fig. 1. It provides media interfaces for 10/100 Ethernet, Gigabit Ethernet and Packet Over SONET (POS) as well as data aligned serial links (DASL) for attachment to switch interfaces.
Internal hardware accelerators increase performance and efficiency. An embedded processor complex (EPC) 12 includes protocol processors and an internal control point processor for frame processing, configuration and management support.
Up to N parallel protocol processors are available. In an embodiment of 16 protocol processors, 16,384 words of internal picocode instructions store and 32,768 words of external picocode instructions store are available to provide 2,128 million instructions per second (MIPS) of aggregate processing capability. In addition, each protocol processor has access to M hardware accelerator coprocessors which provide high speed pattern search, data manipulation, internal chip management functions, frame parsing, and data prefetching support. In a preferred embodiment control storage for the protocol processors is provided by both internal and external memories: 32K of internal static random access memory (SRAM) 28 for immediate access, external zero bus turnaround (ZBT) SRAM 30 for fast access, and external double data rate (DDR) dynamic random access memory (DRAM) 32 for large storage requirements.
Using embedded hardware accelerators in conjunction with preprocessing algorithms, operating on the attached control point processor
34, the network processor 10 is capable of processing frames through one hundred or more filter rules with complex range, priority, and action specifications at wirespeed. This makes a network processor-based system well suited for gateways, server farm applications, and filtering tasks associated with processing a mix of traffic.
Control point software provides automatic logic checking when a network administrator enters filter rules to a coherent, user-friendly interface. Using novel flow control based upon stability theory, the network processor 10 withstands higher rates of temporary oversubscription without Transmission Control Protocol (TCP) collapse than commonly-used random early discard methods. The network processor 10 also delivers differentiated services by automatically allocating bandwidth, relieving network administrators from having to predict the effects of setting dozens of thresholds on the basis of momentary or assumed traffic statistics.
A single network processor 10 provides media speed switching for up to 40 Fast Ethernet or four Gigabit Ethernet ports. It can also be configured to support OC-48c, OC-48, four OC-12 or sixteen OC-3 ports. For scalability, the two 3.5 Gbps serial DASL links can be used to interconnect two network processors to double the port density, or to attach switch fabrics to create switching solutions with up to 64 network processors.
The two DASL links, one primary and one secondary, can also provide connection to a redundant switch fabric for increased system availability.
One exemplary embodiment of a network processor 10 includes the following major sections as illustrated in Fig. 1: 1. An embedded processor complex (EPC) 12 including up to 16 programmable processors plus coprocessors; 2. An enqueue-dequeue-scheduling logic 14 for frames travelling from the Ethernet physical layer devices to the switch fabric (EDS-Ingress); 3. An enqueue-dequeue-scheduling logic 16 for frames travelling from the switch fabric to the Ethernet physical layer devices (EDS-Egress); 4. An ingress switch interface (Switch Ingress) 18 and egress switch interface (Switch Egress) 20 DASL links for interconnection to another network processor or intermediate switch;
5. A physical MAC multiplexer 22 receiving frames from the Ethernet or POS physical layer devices 26 (PMM-Ingress) and the physical MAC multiplexer 24 transmitting frames to the Ethernet or POS physical layer devices 26 (PMM-Egress).
Fig. 2 illustrates an exemplary embodiment for an embedded processor complex. It includes 16 protocol processors providing 2128 MIPS of processing power. Each protocol processor 40 includes a 3-stage pipeline (fetch, decode and execute), general purpose registers, special purpose registers, an eight instruction cache, a dedicated arithmetic logic unit (ALU) and coprocessors all running at 133 MHz. Two of the protocol processors are specialised: one for handling guided frames (the guided frame handler) and one for building lookup data in control memory (the generic tree handler).
Fig. 3 illustrates an exemplary embodiment of a protocol processor.
The coprocessors associated with each of the programmable protocol processors 40 provide the following functions: 1. A data store coprocessor 64 interfaces frame buffer memory 42,44 (ingress and egress directions) to provide direct memory access (DMA) capability; 2. A checksum coprocessor 62 calculates header checksums; 3. An enqueue coprocessor 66 controls access to the 256-bit working register, containing key frame parameters. This coprocessor interfaces with the completion unit 46 to enqueue frames to the switch and target port queues; 4. An interface coprocessor provides all protocol processors access to internal registers, counters and memory for debug or statistics gathering; 5. A string copy coprocessor enables efficient movement of data within the EPC; 6. A counter coprocessor manages counter updates for the protocol processors 40; 7. A policy coprocessor examines flow control information and checks for conformance with pre-allocated bandwidth.
Hardware accelerators 48 perform frame forwarding, frame filtering, frame alteration and tree searches. Other features incorporated into the network processor include innovative filter rule processing, hash functions and flow control.
The protocol processors 40 can enforce one hundred or more frame filter rules with complex range and action specifications. Filtering is essential for network security, and network processor hardware assists 48 provide wirespeed enforcement of these complex rule sets. Filter rules can deny or permit a frame or allocate quality of service (QoS) based on IP header information. Control point software for preprocessing rules automatically corrects logic errors. After a logically correct rule set has been entered, keys are formed from packet header information and are tested at wirespeed using the network processor's software managed trees.
Geometric hash functions exploit statistical structures in IP headers to outperform ideal random hashes. Consequently, the low collision rates enable high speed look-ups in full match tables without additional resolution searches.
Operating in parallel with protocol processor execution, the tree search engine 70 performs tree search instructions (including memory read, write or read-write), memory range checking and illegal memory access notification. Fig. 23 illustrates an exemplary embodiment of a tree search engine.
Two system control options are available within the network processor 10. An internal processor 34 can function as the control point (CP) processor for the system or, alternatively, an external processor can be connected to one of the four Ethernet macros for initialisation and configuration. The CP processor 34 communicates with other processor entities within the network processors by building special Ethernet frames called guided frames. Guided frames can be forwarded across the DASL links to other devices allowing one CP processor attached to a single Ethernet port to communicate with and control all of the network processor devices contained within the subsystem. The internal processor 34 of each network processor 10 can also communicate using a separate 32-bit PCI bus.
The network processor 10 usually resides on a subsystem board and provides the protocol layer (i. e. , layer 2, layer 3, layer 4 and higher) frame processing. Software running on a CP processor 34 in the CP subsystem provides the management and route discovery functions. The CP code, picocode running on the protocol processors, and picocode running on the guided frame handler enable initialisation of this system, maintenance
of the forwarding paths, and management of the system. As a distributed system, the CP and each network processor subsystem contain multiple processors which operate in parallel and communicate using guided frames for increased efficiency and performance.
Data frames are received from the media by the PMM 22 and transferred to the data storage buffers 42. The PMM also performs CRC checking and frame validation during the receive process. The dispatcher 50 sends up to 64-bytes of frame information to an available protocol processor 40 for frame lookups. The classifier hardware assist 48 supplies control data to identify frame formats. The protocol processor 40 uses the control data to determine the tree search algorithm to apply including fixed match trees, longest prefix match trees, or software managed trees.
Lookup is performed with the aid of a tree search engine (TSE) 70.
The TSE 70 performs control memory 72 accesses, enabling the protocol processor 40 to continue execution. The control memory 72 stores all tables, counters and any other data needed by the picocode. For efficiency, a control memory arbiter 52 manages control memory operations by allocating memory cycles between the protocol processors 40 and a variety of on-chip and off-chip control memory options 54.
The protocol processor 40 contains a primary data buffer, a scratch pad data buffer and control registers (collectively, 72) for data store operations. Once a match is found, ingress frame alterations, such as VLAN header insertion or overlay, can be applied. These alterations are not performed by the EPC 12. Instead, the ingress switch interface hardware 18 performs the alteration if the hardware flags are set. Other frame alterations can be accomplished by the picocode and the data store coprocessor 64 by modifying the frame contents held in the ingress data store 42.
Control data is gathered and used to build switch headers and frame headers prior to sending frames to the switch fabric. Control data includes switch information such as the destination of the frame, as well as information for the egress network processor, to help it expedite frame lookup of destination ports, multicast or unicast operations, and egress frame alterations.
Fig. 4 illustrates exemplary ingress and egress frame flows. Upon completion, the enqueue coprocessor 66 builds the necessary formats for enqueuing the frame to the queue control block (QCB) 74 and forwards them to the completion unit 46. The completion unit 46 guarantees frame order from the up to 16 protocol processors 40 to the switch fabric queues 76.
Frames from the switch fabric queues 76 are segmented into 64-byte cells with switch header and frame header bytes inserted as they are transmitted by the switch fabric 76.
Frames received from the switch fabric 76 are placed in egress data store buffers 78 using information provided by the reassembly control block (RCB) 80 and the EDS-Egress 44 and are enqueued to the EPC 12. A portion of the frame is sent by the dispatcher 50 to any idle protocol processor 40 for performing the frame lookups. Frame data is dispatched to the protocol processor 40 along with data from the classifier hardware assist 48. The classifier hardware assist 48 uses frame control data created by the ingress network processor to help determine the beginning instruction address for egress processing.
Egress tree searches support the same algorithms as are supported for ingress searches. Lookup is performed with the TSE 70, freeing the protocol processor 40 to continue execution. All control memory operations are managed by the control memory arbiter 52, which allocates memory access among the processor complexes.
Egress frame data is accessed through the data store coprocessor 64.
The results of a successful lookup contains forwarding information and, in some cases, frame alteration information. Egress frame alterations can include VLAN header deletion, time to live increment (IPX) or decrement (IP), IP header checksum recalculation, Ethernet frame CRC overlay and MAC destination address or source address overlay or insertion. IP header checksums are prepared by the checksum coprocessor 62. Alterations are not performed by the embedded processor complex 12, but rather hardware flags are created and PMM egress hardware 24 performs the alterations. Upon completion, the enqueue coprocessor 46 is used to build the necessary formats for enqueuing the frame in the EDS egress queues 44 and forwards them to the completion unit 46. The completion unit 46 guarantees frame order from the up to 16 protocol processors to the EDS egress queues 44 feeding the egress Ethernet MACs. The completed frames are finally sent by the PMM egress hardware 24 to the Ethernet MACs or the POS interface and out the physical ports.
The tree search engine (TSE) 70 as depicted in Fig. 13 uses the concept of trees to store and retrieve information. Retrieval, i. e., tree-searches as well as inserts and deletes are done based on a key, which is a bit-pattern such as, for example, a MAC source address, or the concatenation of an IP source address and an IP destination address. An exemplary tree data structure 100 for use in the present invention is depicted in Fig. 5. Information is stored in a control block called a leaf
116, 118, 120, 122, which contains at least the key 102 (the stored bit pattern is actually the hashed key 106). A leaf can also contain additional information such as ageing information, or user information, which can be forwarding information such as target blade and target port numbers. The format of a leaf is defined by picocode; the object is placed into an internal or external control store.
The search algorithm for trees operates on input parameters including the key 102, performs a hash 104 on the key, accesses a direct table (DT) 108, walks the tree through pattern search control blocks (PSCBs) 110,112, 114 and ends up at a leaf 116,118, 120,122. Each type of tree has its own search algorithm causing the tree-walk to occur according to different rules. For example, for fixed match (FM) trees, the data structure is a Patricia tree. When a leaf has been found, this leaf is the only possible candidate that can match the input key 102. A"compare at the end" operation compares the input key 102 with the pattern stored in the leaf.
This verifies if the leaf really matches the input key 102. The result of this search will be success (OK) when the leaf has been found and a match has occurred, or failure (KO) in all other cases.
The input to a search operation contains the following parameters: key The 176 bit key must be built using special picocode instructions prior to the search or insert/delete. There is only one key register. However, after the tree search has started, the key register can be used by the picocode to build the key for the next search concurrently with the TSE 70 performing the search. This is because the TSE 70 hashes the key and stores the result in an internal 192 bit HashedKey register 106. key length This 8 bit register contains the key length minus one bit. It is automatically updated by the hardware during the building of the key.
LUDefIndex This is an 8 bit index into the lookup definition table (LUDefTable), which contains a full definition of the tree in which the search occurs. The internal structure of the LUDefTable is illustrated in Fig. 11.
TSRNr The search results can be stored either in 1 bit Tree Search Result Areas TSRO or TSR1. This is specified by TSRNr. While the TSE is searching, the picocode can
access the other TSR to analyse the results of a previous search.
Color For trees which have color enabled (specified in the LUDefTable), the contents of a 16 bit color register 124 is inserted in the key during the hash operation.
The input key is hashed into a HashedKey 106, as shown in Fig. 5. In the preferred embodiment, no hash function is performed on the input key for LPM trees, and the hashed output equals the input key. The hash algorithm (including no hash for LPM trees) that will be used is specified in the LUDefTable.
The lookup definition table is the main structure which manages tree search memory. The LUDefTable is an internal memory structure and contains 128 entries for creating trees. The LUDefTable contains entries that define the physical memory the tree exists in (e. g., DRAM, SRAM, internal RAM), whether caching is enabled, the size of the key and leaf, and the type of search action to perform. The LUDefTable is implemented as three separate random access memories-one RAM that is accessible only by the general processor tree handler (GTH) and two RAMs that are duplicates of each other and are accessible by all picoprocessors.
The output of the hash function 104 is always a 176-bit number which has the property that there is a one-to-one correspondence between the original input key 102 and the output of the hash function 104. As will be explained below, this property minimises the depth of the tree that starts after the direct table 108.
If colors are enabled for the tree, which is the case in the example of Fig. 5, the 16-bit color register 124 is inserted in the 176-bit hash function output and the file result is a 192-bit number, called the HashedKey 106. The insertion occurs directly after the direct table 108.
If the direct table 108 contains 2N entries, then the 16-bit color value is inserted at bit position N, as shown in Fig. 5. The output of the hash function, together with the inserted color value, is stored in the HashedKey register 106. If colors are disabled for a tree, the 176-bit hash function is taken unmodified, and 16 zeros are appended to the hash output to produce the 192-bit final HashedKey.
Colors can be used to share a single direct table 108 among multiple independent trees. For example, one use of a color could be a VLAN ID in a MAC source address (SA) table. In this case, the input key 102 would be the MAC SA, and the color 124 would be the VLAN ID (since the VLAN ID is 12
bits, four bits of the color would be unused, i. e., set to zero). After the hash function 104, the pattern used is 48 + 16 = 64 bits. The color is now part of the pattern and will distinguish between MAC addresses of different VLANs.
The hash function 104 is defined such that most entropy in its output resides in the highest bits. The N highest bits of the HashedKey register 106 are used to calculate an index into the direct table (DT) 108.
For SMT trees, the input key and color together form a 192-bit input pattern. In this view, the color register should be seen as an extension to the key register. SMT trees must have the color enable bit set to'0'.
Furthermore, SMT trees must use a hash function, which takes the 176-bit input key 102 together with the 16-bit color register 124 and produces a 192-bit HashedKey 106, whereby the key forms the 176 leftmost bits and the color forms the 16 rightmost (LSB) bits. Thus, hash function for SMT trees is not really a hash function, but a means to use the color as an extension of the key.
The first structure that implements the tree is called the direct table (DT) 108. Each entry in a DT table with N elements corresponds to a key whose first log2N bits are the same as the index of that entry in the DT table, in binary form. For example, the 5t'entry in an 16 entry DT table would correspond to keys whose first 4 bits are"0101". If there are no
leaves that correspond to a key with the first log2N bits the same as the index in the DT, then that entry is marked as empty. If there is only a single leaf that matches those bits, then inside that entry there is a pointer to a leaf. This pointer is the address in the memory that the leaf is stored. If there is more than one leaf that corresponds to keys with the same first bits, then the DT entry points to a PSCB structure 110, and also contains the next bit (s) to test (NBT) field 126. These two structures will be described below.
The DT table 108 is implemented in memory, and its size (length) and starting point are programmable. Another programmable feature is the use of what are called direct leaves. Instead of having the DT entry point to a leaf, which then must be read afterwards, the leaf can be stored in the location of the DT entry. This is called a direct leaf. The problem with this is, of course, a trade-off in speed with the use of more memory for the DT entry. The memory size (its width) must be enough to accommodate a leaf, and not all of the DT entries will have leaves stored in them.
However, a good hash function of the key could result in most of the leaves being attached to a single DT entry, so the speed trade-off could be big.
In summary, a DT entry can be empty. In this case, no leaves are attached to this DT entry. The DT entry can point to a single leaf attached to this DT entry. The DT entry can point to a pattern search control block (PSCB) and also contain the next bit (s) to test (NBT) for that PSCB. In this case there is more than one leaf attached to this DT entry. Finally, the DT entry can contain a direct leaf.
A PSCB represents a branch in the tree. In the preferred embodiment there is a 0-branch and a 1-branch. The number of branches emanating from a PSCB is variable depending on the number of bits used to designate the
branches. If n bits are used, then 2"branches are defined at the PSCB.
Each PSCB is also associated with a bit position p. All leaves that can be reached from the PSCB through the 0-branch have a'0'at position p in the pattern, and the leaves that can be reached through the 1-branch have a 1' at position p. Furthermore, all leaves that can be reached from a PSCB will always have patterns at which bits 0... p-1 are identical, i. e., the patterns start to differ at position p. The bit position associated with a PSCB is stored in the previous PSCB or in a DT entry and is called the NBT (i. e. , next bit to test).
Thus, PSCBs are only inserted in the tree at positions where leaf patterns differ. This allows efficient search operations since the number of PSCBs, and thus the search performance, depends only on the number of leaves in a tree and not on the length of the patterns. The PSCB register format is depicted in Fig. 12.
In summary, a PSCB entry can be empty, can point to a leaf, or can point to another PSCB, and also contain the next bit to test (NBT) for that PSCB.
A PSCB can represent a branch that corresponds to more than one bit.
In this case, for example, a PSCB that corresponds to 2 bits would have four PSCB entries, a 00 branch entry, a 01 branch entry, a 10 branch entry and a 11 branch entry. Each tree can have PSCBs that correspond to a different number of bits. In this case, the previous PSCB will also have the number of bits that correspond to the next PSCB, as well as the bit number that these bits represent.
In the actual implementation, the key is inserted in a special key register 102. It is then hashed 104, and the results are stored in a hashed key register 106. The hash function 104 is programmable, and one of the functions is the null hash function (i. e. , no hash). The first n bits of the hashed key are used as an index to the DT table 108. One programmable feature is the insertion of a bit vector right after the bits
used to index in the DT entry. This bit vector is called a"color"value (register 124), and the result of the hashed key and the inserted color value is stored inside the hashed key register 106.
The search starts with an access into the direct table 108, i. e. , a DT entry is read from the direct table 108. The address used to read the DT entry is calculated from the N highest bits of the HashedKey, as well as on tree-properties as defined in the lookup definition table (LUDefTable).
The DT entry can be seen as the root of a tree. The actual tree data structure depends on the tree-type. A Patricia tree data structure is used for FM trees, and extensions to Patricia trees are used for LPM and SMT trees.
An example of the use of an 8 entry DT 108 is shown in Fig. 6. It can be seen that the search time, i. e. , the number of PSCBs that must be accessed, can be reduced by using a DT 108. Thus, by increasing the DT size, a trade-off can be made between memory usage and search performance.
For performance reasons, it is inefficient to read a DT entry only to find that it contains a pointer to a leaf, after which the leaf itself must be read. This situation will occur very often for FM trees, which have many single leaf entries per DT entry. The concept of a direct leaf allows a trade-off between more memory usage and better performance.
A tree can have direct leaves enabled, which is specified in the lookup definition table (LUDefTable). The difference between trees with direct leaves enabled and disabled is illustrated in Fig. 7. When direct leaves are enabled and a DT entry contains a single leaf, this leaf 130 is stored directly in the DT entry itself. Otherwise, the DT entry will contain a pointer to the leaf.
Shaping is a feature of the tree search memory (TSM) and is used to specify how an object, like a leaf or PSCB, is stored in the TSM. The shape is defined by the parameters width and height. The height of an object denotes the number of consecutive address locations at which the object is stored. The width of an object denotes the number of consecutive banks at which the object is stored. For width and height, the hardware automatically reads the appropriate number of locations. From a picocode point of view, an object is an atomic unit of access. The width must always be 1 for objects stored in SRAM. The width may be greater than 1 for objects in DRAM. Objects that are small enough to fit within a single memory location are defined to have a height of one and a width of one. The shape of a DT entry with direct leaves disabled is always (W=l, H=l). When the DT entry is stored in dynamic random access memory (DRAM), it occupies
exactly 64-bits. The shape of a DT entry with direct leaves enabled equals the shape of the leaf, which is specified in the LUDefTable. In general, this causes more memory to be used by the DT 108. It also causes an impact of the leaf shape on the DT entry address calculation.
During the tree search, after a DT entry has been read and assuming the DT entry does not contain a direct leaf nor is it empty, the search continues by walking the tree that starts at the DT entry. The tree-walk may pass several PSCBs (pattern search control blocks), until a leaf has been reached.
When a PSCB is encountered during a search, the tree search engine hardware 70 will continue the tree-walk on the 0-branch or the 1-branch, depending on the value of bit p of the HashedKey.
A cache can be used for increasing the search performance in trees.
Use of a cache can be enabled in the LUDefTable on a per tree basis.
During a search, the tree search engine 70 will first check in the cache to determine if a leaf is present that matches the HashedKey. If such a leaf is found, it is returned and no further search is required. If such a leaf is not found, a normal search starts.
For the tree search engine hardware 70, a cache lookup is exactly identical with a normal search. Thus, the input key is hashed into a HashedKey, and a direct table 108 access is performed. The direct table 108 acts as a cache. When the cache search returns OK (success), the search ends. Otherwise, the tree search engine 70 starts a second search in the full tree-except that no hash operation is performed. The contents of the HashedKey register 106 are reused.
It can be specified in the LUDefTable if a cache search is used. If a cache search uses LUDefTable entry I and the search ends KO (failure), another search using LUDefTable entry 1+1 starts automatically. In principle, this allows multiple searches to be chained, although it is recommended that the full tree be stored under LUDefTable entry 1+1.
The basic tree structure and processing described above with a Direct Table, PSCBS and leafs can be adapted for full match searches, longest prefix match searches, and software managed tree searches. To enable this the contents and method of processing of DirectTable entries, PSCBs and leafs can be modified and extended. This is now discussed.
FM trees provide a mechanism for searching tables efficiently with fixed sized patterns. An example of this would be a layer-2 Ethernet
unicast MAC table. Ethernet unicast MAC addresses are a fixed six bytes and must have an exact match, otherwise, the destination is unknown.
FM trees are the best performing trees since they benefit significantly from the hashing function. The tree search engine provides multiple fixed hashing functions that offer very low collision rates.
Assuming that the DT 108 is large enough, the probability of having multiple leaves associated with a single DT entry is very low. This is the 1+ epsilon rule, whereby epsilon represents the number of collisions in a DT entry. A DT entry with one leaf has an epsilon = 0. Thus, with the hashing functions and using FM trees, the value of epsilon should be very small.
The structure of a DT entry in an FM tree can be seen in Fig. 8.
Each DT entry is 36-bits wide and contains one of the following formats: Empty DT entry. There are no leaves associated with this DT entry.
Pointer to next PSCB. The DT entry contains a pointer to a PSCB.
The next PSCB address (NPA) and next bit to test (NBT) fields are valid.
Pointer to leaf. There is a single leaf associated with the DT entry. The leaf control block address (LCBA) contains the pointer to this leaf.
Direct leaf. There is a single leaf associated with a DT entry and the leaf is stored in the DT entry itself. The first field of a leaf must be the NLA rope, which implies that direct leaves must have the rope enabled. A rope is a circular linked list that is used to link leaves in a tree together. Picocode can "walk the rope"or sequentially inspect all leaves in a rope.
It should be noted that the first two bits in the NLA are reserved to denote 110'such that they automatically encode "direct". direct leaves will only be used for a given tree if this is enabled in the LUDefTable.
FM PSCBs have the same structure as an FM DT entry except that they consists of two PSCB lines, whereby each PSCB line can have one of the two formats shown in Fig. 8. The two PSCB lines are allocated consecutively in memory and are used as a branch for walking the tree. The next bit to test (NBT) field signifies the offset into the key to use as the bit comparison for walking the PSCBs and denotes which of the two PSCB lines to use. FM PSCBs always have a shape defined by a width of one and a height of one.
The steps in processing the DT entry, in an FM tree, are as follows : * The DT entry is read from memory.
* If the DT entry is a null entry, this means that there are no leaves in the tree that have the same first"n"bits as the hashed key, so the search fails.
* If the DT entry has a pointer to a leaf, then the leaf is read from memory using the pointer from the DT 108 as the address of the leaf.
The leaf is stored in a register and is compared with the key. This step is called compare at the end. If there is a full match, the tree search succeeds. Otherwise, the tree search fails.
* If the DT entry has a pointer to a PSCB 110 and an NBT, the NBT is first stored in a specific register. Then the NBT number is used to find the bit in the key in location NBT. That bit (0 or 1) is used along with the pointer to the PSCB to extract the correct PSCB entry: the bit is appended at the end of the pointer and that gives the full address in memory of the PSCB. The PSCB is read and stored in a specific register; the hardware then processes the PSCB entry. At this point, the algorithm is starting to walk down the tree.
The steps in processing the PSCB entry, in an FM tree, are as follows: * If the PSCB entry is a null entry, this means that there are no leaves in the tree that have the same first NBT bits as the key, so the search fails.
* If the PSCB has a pointer to a leaf, then the leaf is read from memory using the pointer from the PSCB as the address of the leaf. The leaf is stored in a register and is compared with the key. This step is called compare at the end. If there is a full match, the tree search succeeds.
Otherwise, the tree search fails.
'If the PSCB has a pointer to a PSCB and an NBT, the NBT is first stored to the specific register, and this becomes the current NBT. Then this NBT number is used to find the bit in the key in location NBT. That bit (0 or 1) is used along with the pointer to the PSCB to extract the correct next PSCB entry. The bit is appended at the end of the pointer and gives the full address in memory of the PSCB. The PSCB is read and stored in the specific register. Then the hardware will repeat this processing of a PSCB entry.
A leaf in an FM tree contains control information including a pattern. The pattern identifies the leaf as unique in the tree. A leaf also contains the data needed by the application that initiated the tree search. The data contained in a leaf is application dependent and its size or memory requirements are defined by the LUDefTable entry for the tree.
Fig. 13 illustrates the fixed leaf format for FM trees.
During an FM tree walk, not all bits of the HashedKey are tested, but only those bits for which there is a PSCB. Therefore, when a leaf has been found, the pattern in the leaf must still be compared with the HashedKey to make sure that all bits match. Note that it is the HashedKey that is stored in the leaf and not the original input key. When an FM leaf is found, the following operations are performed: Step 1: The leaf pattern is compared with the HashedKey. When a match occurs, the operation proceeds with Step 2. Otherwise, if the leaf contains a chain-pointer to another leaf, this leaf is read and the pattern is compared again with the HashedKey.
Without a match and without an NLA field, the search ends with failure (KO).
Step 2: If a vector mask is enabled, the bit with number VectorIndex is read from the leaf's vector mask. This bit is returned as part of the search result. The search ends with success (OK).
Everything that is described as"programmable"can be set in a specific register value that corresponds to that tree. If the engine needs to support N trees, then N of these values are placed in a register array.
In this register are encoded the programmable values, i. e. , the hash function to use, the beginning of the DT table, its size, etc.
One capability of the hardware is an automatic insert (a hardware insert) of a key. As the search for the (hashed) key proceeds, when there is a mismatch (KO), the leaf can be automatically inserted at that point by using the hardware to create the PSCB on the fly. In this case, the concept of the full match tree can be used as a cache.
Fig. 10 illustrates the processing logic of the Full Match search algorithm of the present invention. Processing starts in logic block 1000 with reading of an input key. The input key is then run through a hash function as indicated in logic block 1002. Hashing at the input key into a hashed key is an option. The hash function is chosen such that the entropy is highest at the leftmost bits of the hashed key, i. e. , those bits that are used to address a direct table. The hash function is reversible, i. e.,
there exists a reverse hash function that can transform the hashed key into the input key. Next, in logic block 1004, the direct table if read. The upper N bits (whereby N is configurable) of the hashed key are used as an index into the direct table. When the entry that has been read is empty, the search returns KO (no match found). This is indicated by termination block 1006. As indicated in decision block 1008, a determination is made as to whether or not the entry points to a leaf. If the DT entry points to a leaf, then as indicated in logic block 1010 the leaf is read. Otherwise, the DT entry points to a PSCB. In this case, the appropriate part of a PSCB is read as indicated in logic block 1012. For a full match search, a PSCB includes two entries: a 0-part and a 1-part. The previous PSCB (or DT entry) contains a bit number (NBT: next bit to test). The NBT selects a bit in the hashed key (i. e. , 0 or 1) which selects which PSCB entry to use.
The PSCB entry either contains a pointer to a leaf, or a pointer to another PSCB. Processing then loops back to decision block 1008. Once a leaf is found in decision block 1008, and read in logic block 1010, the pattern stored in the leaf is compared bit-wise with the hashed key as indicated by logic block 1014. If all bits match, as indicated in decision block 1016, the search returns OK (successful match) as indicated in termination block 1018. The contents of the leaf is then passed to the application.
Otherwise, the search returns KO (failure) as indicated in termination block 1020. As an extension to this processing logic, a PSCB may consists of 2b entries, such that b bits from the hashed key select which entry to read from the PSCB. This increases performance at a cost of more memory usage.
An example of searching a FM tree can be seen in Fig. 9 where a 7-bit value is stored in the tree. The example is simplified by using the three most significant bits (MSB) of the key as a hash into the FM DT 108. There are five leaf entries (LO-L4) stored in this tree.
As a first example, assume a binary input key of 1110011. The first three bits 111'index into DT entry 7, where an LCBA pointing to leaf LO is present. The leaf LO is read by the TSE 70 and the pattern in LO is compared with the input pattern. In this example, an exact match occurs and the TSE will return OK (success).
Assume now an input pattern of 1001110. DT entry 4 contains a pointer to PSCBO with an NBT field of 3. This means that the fourth bit in the key,'1' (bit 0 is the MSB or leftmost bit), determines which branch of the tree is taken. Since the fourth bit is a'1', the bottom half of PSCBO is used; had it been a'0', the upper half of PSCBO would have been used.
Each PSCB is essentially a two element array of PSCB lines where an NBT value of'0'indexes into the first element and an NBT value ouf 1'indexes
into the second element. Thus, the search continues because PSCB line 1 of PSCBO contains an NBT of 6 and a next PSCB address (NPA) pointing to PSCB2.
With an NBT of 7 and bit 7 of the input pattern squalling'0', the upper half of PSCB2 is used containing a pointer to L3. Reading leaf L3 and performing the full compare operation of the pattern in L3 with the input pattern returns an OK (success).
A search on the input pattern 1001100 will follow exactly the same path in the tree as in the previous example, but the compare at the end operation will not match, such that the search will return a KO (failure).
LPM trees provide a mechanism for searching tables efficiently with variable length patterns or prefixes. An example of this would be a layer 3 Internet Protocol (IP) forwarding table. IP addresses can be full match addresses such as a host address or can be a prefix for a network address.
LPM trees are managed by the CP and also require assistance from the GCH for inserting and removing leaf entries.
The structure of an LPM DT entry differs from an FM DT entry: the LPM DT entries contain both a node (NPA) and leaf (LCBA) address within the same entry. In the FM DT, an entry cannot contain both a node and a leaf address. This difference is due to the searching strategy used between the two tree types.
The structure of a DT entry for an LPM tree is illustrated in Fig.
14. Each DT entry is 64-bits wide and contains one of five possible entry formats that are currently defined: * Empty DT entry (format = 00 and NPA = 0). There are no leaves associated with this DT entry (the next PSCB address (NPA), next bit to test (NBT) and leaf control block address (LCBA) fields contain all zeros).
* LCBA not valid and NPA/NBT valid (format = 00 and NPA < > 0). The DT entry contains a pointer to a PSCB. The NPA and NBT fields are valid.
The LCBA pointer contain all zeros. This code point may seem redundant and is added in the hardware only for the case that the PSCB or the DT entry is stored in a 36-bit wide memory. In this case, the hardware can skip reading the memory location containing the LCBA, which improves TSM bandwidth and therefore search performance.
* LCBA valid and NPA/NBT not valid (format = 01 and NPA = 0). There is a single leaf associated with the DT entry. The LCBA contains a pointer to this leaf. There is no pointer to a next PSCB (NPA = 0).
* LCBA valid and NPA/NBT valid (format = 01 and NPA < > 0). There is a leaf associated with the DT entry (the LCBA contains a pointer to this leaf) and there is a pointer to the next PSCB (NPA < > = 0).
* Direct leaf (format = 10). There is a single leaf associated with the DT entry and the leaf is stored in the DT entry itself. The first field of the leaf must be the NLA rope which implies that direct leaves must have the rope enabled. A rope is a circular linked list that is used to link leaves in a tree together. Picocode can"walk the rope"or sequentially inspect all leaves in a rope. It should be noted that the
first two bits in the NLA are reserved to denote 1101 such that they automatically encode"direct". Direct leaves will only be used for a given tree if this is enabled in the LUDefTable.
LPM PSCBs have the same structure as an LPM DT entry except that they consist of two PSCB lines, whereby each PSCB line can have one of the formats shown in the figure. The two PSCB lines are allocated consecutively in memory and are used as a branch for walking the tree.
Note that one of the two LPM PSCB lines can be an empty line which is not allowed for FM PSCBs.
LPM PSCBs have either a shape defined by a width of one and a height of one, or a width of one and a height of two, depending on the memory in which the PSCB resides. A memory that has a line width of at least 64 bits (HO, HI, H2, DRAM) should be used with a height of one. A memory of 36 bits (H3, H4, ZBT) should be used with a height of two.
A leaf in an LPM tree contains control information including a pattern. The pattern identifies the leaf as unique in the tree. A leaf also contains the data needed by the application that initiated the tree search. The data contained in a leaf is application dependent and its size or memory requirements are defined by the LUDefTable entry for the tree.
Fig. 18 illustrates the leaf format for LPM trees.
The high level algorithm flow for the longest prefix match search is as follows: 1. Read the DT entry. a. if NBT < current~keyLen then read the next PSCB and store the bird/LCBA and the previous NBT in the stack (if present) ; b. if NBT > current keylen then read the leaf at the LCBA and go to the leaf evaluation step; c. if NBT is not valid and a direct leaf is valid, read the leaf contents and go to the leaf evaluation step;
d. if NBT is not valid and/or the leaf/bird is not present ; return KO, i. e., failure for the search result and completion flag as done.
2. Read the PSCB/NPA entry. a. if NBT < current~keylen then read the next PSCB and store the bird in the stack (if present); b. if NBT > current~keylen then read the leaf at the LCBA and go to the leaf evaluation step; c. if NBT is not valid and a direct leaf is valid, read the leaf contents and go to the leaf evaluation step; d. if NBT and/or NPA is not valid and a leaf/bird is not present then read the bird stack and read the leaf at the most recent (last) valid LCBA and then go to the leaf evaluation step.
3. Leaf evaluation: compare the pattern (key) and the pattern stored in the leaf and compute the mismatch point, i. e., DistPos value. a. compare the DistPos value with the NBT field within the stack and read the corresponding leaf (i. e. , the LCBA) with the closest matching NBT and return with OK (success); b. if all the NBTs are greater than DistPos, return the result with KO (failure) since no matching leaf/subnet was found.
If the stack is full before the end of the trail, there will be a need for reading the leaf at the intermediate point also to determine whether to flush the trail.
The trail stack allows the ability to find the longest prefix result after the first compare without requiring the walking of the tree to the tail again. The use of a smaller trail stack is possible but requires comparison of the leaf every time the trail gets full. It is possible to arrive at the longest prefix result/leaf without having a trail stack but in that case one has to walk the trail again until the bird is located at the NBT = DistPosVal (first mismatch position) or the last valid bird is located for prefix~length < DisPosVal. The trail stack supports various memories and it is scalable.
The bit/register width values described herein are exemplary and can be changed to different values to optimise the available memories, performance requirements, etc.
Figs. 17A-17B illustrate the processing logic of the Longest Prefix Match search algorithm of the present invention. The algorithm begins in logic block 1700 with the reading of an input key. In an LPM search, there is no hash function, therefore the hashed key is identical to the input key. As indicated by logic block 1702, the direct table is next read. The upper N bits (whereby N is configurable) of the hashed key are used as an index into a direct table. When the entry that has been read is empty, the search returns KO (nothing found) as indicated by termination block 1704.
It should be noted that for Internal Protocol version 4 (IPv4), a special mechanism for class A addresses can be employed. If the entry points to a leaf in decision block 1706, then processing continues at block 1718 in Fig. 17B. Otherwise, the entry points to a PSCB. The appropriate part of a PSCB is then read as indicated in logic block 1708. For an LPM search, a PSCB includes two entries: a zero-part and a one-part. The previous PSCB (or DT entry) contains a bit number (NBT: next bit to test).
The NBT selects a bit in the hashed key (i. e. , zero or one), which selects the PSCB entry to use. The PSCB entry either contains a pointer to a leaf, or a pointer to another PSCB, or both. In the latter case, the leaf is called a"bird". That is, a PSCB can contain a bird and a pointer to another PSCB, thus, a leaf is always an endpoint in a tree branch.
A bird is always in the middle of a tree branch, since a pointer to a PSCB represents a continuation of a tree branch. When a PSCB entry is read, and it contains a bird, the bird is remembered on a bird stack, together with its bit position as indicated in logic block. When the bird stack is not full in decision block 1710, the search continues with reading the next PSCB by returning to decision block 1706. When the bird stack is full in decision block 1710, it will be flushed as follows. The bird contents are read from memory as indicated by logic block, 1712. Next, as indicated in logic block 1714, the pattern in the bird is compared with the hashed key. When they are exactly equal bit-for-bit and have the same length, the search can end with OK. The value of DistPos is calculated.
This is the first bit at which the bird pattern and the hashed key differ.
For example, DistPos (1010111,10001000100) = 2, i. e. , bit 2, counting from 0 is the first bit where the two patterns differ. If there is a bird in
the bird stack with a bit number equal to DistPos, this bird is selected, otherwise the bird with the largest bit number that is still smaller than the DistPos is selected. This bird is kept in the bird stack ; all other birds are removed. Therefore, the bird stack contains exactly one bird.
The search continues. It should be noted that the LPM search can be aborted as soon as the bit number of the PSCB exceeds the length of the hashed key. This processing is indicated by logic block 1716. From logic
block 1716, processing returns to decision block 1706 to continue testing for a leaf being found.
When a leaf is found in decision block 1106, then processing continues at logic block 1718 in Fig. 17B. Once the leaf is read, it is compared with the hashed key (input key). When the hashed key and leaf pattern are exactly equal bit-for-bit and have the same length, the search can end with OK. This processing is indicated by decision block 1722 and termination block 1728, respectively. Otherwise, the DistPos is calculated and the appropriate bird in the bird stack is selected as indicated in logic block 1724. If a bird exists as indicated in decision block 1726, then the search returns OK as indicated in termination block 1728.
Otherwise, the search returns KO as indicated in termination block 1730.
Internet Protocol version 4 (IPv4) class A addresses have a prefix length of 8 bits or higher. This means that it must be possible to store patterns in the tree with a length of 8 or higher. A problem may occur with a DT size larger than 256 (28). Assume as an example a DT size of 64K entries, which represents a 16-bit address to index into the DT. Assume
also a class A address equal to"0101010101", with length 10 that must be stored in a table. Any input pattern that has this 10-bit prefix must return the above pattern. This poses a problem with a 16-bit DT index.
For example, input key kl"0101010101000000"and input key k2 "0101010101000001"should both find the same result ; however, they address different entries in the DT. This problem can be solved in two ways.
First, the address can be duplicated multiple times in the tree. In this example, the address must be duplicated 2 = 64 times. Second, a different way of calculating the DT index can be used. When an input key is a class A address, which is the case if the leftmost bit equals zero, the eight rightmost bits in the DT index are set to zero. In the current example, both input keys kl and k2 use"01010101"as an index in the DT.
An example of an LPM tree is shown in Fig. 15. The tree contains an 8-entry DT (thus using 3-bit DT addressing), three LPM PSCBs, four leaves and two"birds."A bird is actually identical to a leaf, although it is
called a"bird"only when the PSCB line contains both an LCBA pointer (pointing to the bird) and an NPA pointer (pointing to the next PSCB). The leaf is called a leaf when the PSCB line only contains an LCBA pointer and no NPA pointer. It can be seen in the figure that BirdO (with pattern 100) is a prefix of Bird1 (with pattern 1001), which is in turn a prefix of leaf L3 (with pattern 1001110).
As an example, assume a search for input key 1001110. In this case, the tree walk proceeds in exactly the same way as with an FM search and the
algorithm will reach leaf L3. Like an FM search, an LPM search also performs a compare at the end operation when a leaf has been reached in the tree. When the compare matches exactly, as is the case in this example, the correct leaf has been found and the search returns OK (success).
In contrast, with an FM search, the LPM algorithm performs the following extra action to find a subnet when there is no exact match. The distinguishing positioning (DistPos) which is the first bit in which the input key and leaf pattern differ is calculated by hardware. Assume for example an input key of 10011 and a leaf pattern 1011010. The DistPos (10011,1011010) = 2 since the first bit where these two patterns are different is bit two. Other examples are shown in Fig. 16. Once the DistPos has been determined, there are two possibilities: 1. The DistPos equals the length of the input key and the length of the leaf pattern is smaller than the length of the input key. This would occur with an input key of 10011100, which during a search by the TSE would also find leaf L3. In this case, the leaf is the longest prefix of the input key and the algorithm returns an OK (success).
2. For all other keys, the TSE checks if there is a bird that represents a prefix of the input key. If the input key is 10011, the search again will find leaf L3 and begin looking for a prefix bird. Note that during the tree-walk, two birds would be encountered, BirdO at
bit2 and Bird1 at bit3. It should be noted that the bit position of a bird always equals the length of the bird. The DistPos (10011, 1001110) = 4. Given the DistPos, the appropriate bird, i. e., the longer prefix, is the bird with the highest bit position, which in this example is Bird. Thus, the TSE will read Bird1 from the tree search memory and return OK (success). It should be noted that a compare at the end operation is not required, since the bird is known to be a prefix of the input key.
SMTs provide a mechanism to create tree structures that follow a defined search mechanism. An example of this would be an IP 5-tuple filtering table, containing IP source address (IPSA), IP destination address (IPDA), source port, destination port and protocol. In contrast to FM and LPM trees, SMTs provide support for ranges. For example, a leaf can be used to specify that the source port must be in a range 100... 110.
In a SMT Tree, the first leaf after a PSCB must have the shape defined in the LUDefTable. Any other leaf, in a leaf chain, has the shape defined by 5 bits in the chaining pointer, which is the NLASMT field in a leaf.
In a SMT Tree, the formats of DT and PSCB entries are identical and include the following parts : 1. SCB (search control block) 2 bits.
2. NPA (next pattern address): points to the next PSCB address or LCBA (leaf control block address): points to the leaf/result.
3. NBT (next bit or bits to test): Can be the next pair or group of"x" (x = 1 or n) bits to test. The number of bits to be tested is determined based on the storage efficiency, etc.
4. direct leaf.
Each entry in this implementation is 36 bits wide and contains one of four possible currently defined entry formats: 1. Empty DT entry: SCB = 00 and NPA = 0 and NBT is invalid or zero.
2. The NPA/NBT is valid: SCB = 00 and NPA and NBT are valid. For a DT entry, the NPA points to the first intermediate node and the NBT points to the bit or bits to be tested. In the case of a PSCB entry, the NPA points to other nodes in the trail.
3. The LCBA is valid: SCB = 01. The LCBA points to an associated leaf address, i. e. , the search result.
4. Direct leaf: SCB = 10 and rest of the data contains the search result or leaf. Part of the leaf data can include chaining of leaf addresses to support large search result storage.
With regard to memory allocation of DT and PSCB Entries: SMT PSCBs have the same structure as a SMT DT entry except that they always consist of 2**no~of~bits~to~be~tested addresses, i. e. , in pair/groups. These pairs or group of address are allocated consecutively in memory and are used as a branch/jump pointer for walking the search tree.
The format of a leaf in a SMT tree contains control information including two patterns. The two patterns are used to define range compares. Leaves in a SMT can be chained using the NLASMT filed. Fig. 22 illustrates the format for a SMT tree. When the first leaf is reached after a PSCB, a compare at the end operation is performed. When this returns OK, the search stops. However, when the compare at the end returns
KO and there is a non-zero NLASMT field, the next leaf is read and another compare at the end operation is performed. This process continues until either a compare at the end returns OK, or until the NLASMT filed equals zero, in which case the search returns with KO.
For the purpose of compare operations, the input key (and similarly, the two patterns stored in the leaf) can logically be divided into multiple fields. An example is illustrated in Fig. 19. For each field, one of two compares can be performed: Compare under mask. The bits in the input key are compared with the bits in leaf patternO under a mask specified in leaf patternl. A
Ill in the mask denotes that the corresponding bit in the input key must equal the corresponding bit in patternO, a'0'in the mask denotes that the corresponding bit in the input key has no influence on the compare. Only if all bits compare OK does the entire field compare OK.
Compare under range. The bits in the input key are treated as an integer which is checked to determine if it is in the range given by the Min and Max (both are inclusive in the range). If this is the case, the field compares OK, otherwise the field compares KO. Only if all fields compare OK, does the entire compare at the end return OK, otherwise the compare at the end returns KO.
The definition of how the logical fields are defined is specified in the compare definition table (CompDefTable), of which an example of an entry format is given in Fig. 20. By default, a field is a compare under mask field, unless there is an entry in the CompDefTable that specifies otherwise.
Each entry in the CompDefTable specifies one or two range compares.
As will be explained below, multiple entries can be used to specify more than two range compares. Each range compare is defined by two parameters: Offset, which is the position of the first bit of the field. The offset must be at a 16-bit boundary and can have the following values: 0, 16,32, 48,64, 80,96, 112 or 128.
Length of the field in bits. Lengths can have the following values: 8,16, 24, or 32.
For example, for the key shown in Fig. 19, the compare under range for the source port field would have OffsetO set to 64 and Min/MaxLengthO
set to 16 and the compare under range for the destination port field would have Offsetl set to 80 and Min/MaxLengthI set to 16. If more than two range compares are required, the continue bit must be set to 1, which causes the next entry in the CompDefTable to be used for another one or two compare under range definitions. The index in the CompDefTable that is used for the SMT compare is specified in the leaf.
For performance reasons, it is recommended that as few compare-under-range operations be used as possible. Each compare-under-range costs one extra clock cycle (7.5 nsec) to execute.
Therefore, if a range is a power of two (i. e. , 128-255), no compare-under-range is required and this kind of range can be handled using the compare-under-mask operation. When an SMT compare at the end fails, and the NLASMT field in the leaf is non-zero, then the TSE 70 reads the next leaf and performs another compare at the end operation, until either the compare returns OK, or until NLASMT equals zero.
The high level algorithm flow for the software management tree search is as follows: 1. Read the DT entry a. if SCB = 00 and NPA and NBT are valid then read the NPA and NBT to generate the next PSCB address; b. if NBT is not valid and a direct leaf is valid, read the leaf contents and go to the leaf evaluation step; c. if NBT is not valid and/or a leaf is not present; return KO, i. e., failure for the search result and completion flag as done.
2. Leaf evaluation: compare the pattern (key) and the pattern stored in the leaf. The SMT always contains two patterns. These two patterns are used to define a range compare. Also, leaves in a SMT can be chained (i. e. , using NLASMT which represents the next leaf address).
When the first leaf is a hit after the PSCB, a compare-at-the-end operation is performed. When this returns OK (success), the search stops with OK. However, when this compare-at-end returns KO (failure) and there is a non-zero, i. e. , valid NLASMT field, the next leaf is read and another compare at the end operation is performed.
This operation continues until either a compare at the end returns with OK (success) or until the NLASMT field is no longer valid, in which case the search returns with KO (failure).
The bit/register width values described herein are exemplary and can be changed to different values to optimise the available memories, performance requirements, etc.
Fig. 21 illustrates the processing logic of the software management tree search algorithm of the present invention. The algorithm begins in logic block 2100 with the reading of an input key. In an SMT search, the input key is optionally hashed into a hashed key. The hash function may reverse the order of the bits in the input key. For example, it may swap bit one and bit seven. When a hash function is used, it may be programmable, i. e. , it can be programmed as to which bits are swapped. As indicated by logic block 2102, the direct table is next read. The upper N bits (whereby N is configurable) of the hashed key are used as an index into a direct table. When the entry that has been read is empty, the search returns KO (nothing found) as indicated by termination block 2104.
If the entry points to a leaf in decision block 2106, then processing continues at logic block 2110 with the reading of the contents of the leaf.
Otherwise, the entry points to a PSCB and the appropriate part of a PSCB is read as indicated in logic block 2108. For a SMT search, a PSCB can have the format of a FM PSCB. Alternatively, it can have a more advanced format as described below.
From logic block 2108 processing returns to decision block 2106 to determine if a leaf is found. When a leaf is found in decision block 2106, the leaf is read as indicated by logic 2110 and then compared with the input key as indicated by logic block 2112. If there is a match between the pattern stored in the leaf and the input key in decision block 2114, then the search returns OK (success) and passes the contents of the leaf to the application as indicated by termination block 2118. If there is no match in decision block 2114, then processing continues as indicated in decision block 2116 with a determination of whether a next leaf exists. If there is a next leaf in the leaf chain, processing moves back to logic block 2110 to read the contents of the leaf. Otherwise, if there is no next leaf, the search returns KO (failure) as indicated by termination block 2120.
The compare operation between the leaf pattern and the hashed key is more complicated than it is with FM and LPM searches. The leaf contains two patterns, pi and p2. The hashed key in pi and p2 are each logically divided into N fields. The compare operation includes N sub-compares that all must return OK (success) in order for the total compare to return OK.
Each sub-compare can be performed according to two modes: (i) pi represents a minimum and p2 a maximum, or (ii) pi represents a pattern and p2
represents a mask. In the first case, the sub-compare returns OK when pi hashed key p2. In the second case, the sub-compare returns OK when (hashed key AND p2) = (pi AND p2).
The definition of the fields (the size of each field and the position), as well as the compare mode for each field can be stored in encoded form in the leaf, or it can be stored in a special lookup table.
In the preferred embodiment, a CompDefTable is used, whereby the index of the table is stored in the leaf.
As an extension, a PSCB may consist of 2b entries, such that b bits from the hashed key select which entry to read from the PSCB. This increases performance at a cost of more memory usage. Furthermore, a PSCB can be extended such that each entry contains one or two patterns (pl and p2) that operate in the same way as described with the leaf compare. For example, assume that the PSCB contains pi and p2 with length L which is also stored in the PSCB. Then L bits are taken from the hashed key at the position given by NBT. These L bits are interpreted as an integer I. When pl i I i p2, then entry 1 is used from the next PSCB, otherwise entry 0 is used from the next PSCB.
The present invention can be realised in hardware, software, or a combination of the two. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software could be a general purpose computer system that, when loaded and executed, controls the computer system such that it carries out the methods described herein. The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system, is able to carry out these methods.
Computer program instructions or computer program in the present context mean any expression, in any language, code (i. e. , picocode instructions) or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following occur: a) conversion to another language, code or notation; b) reproduction in a different material form.
Those skilled in the art will appreciate that many modifications to the preferred embodiment of the present invention are possible without departing from the spirit and scope of the present invention. In addition, it is possible to use some of the features of the present invention without the corresponding use of other features. Accordingly, the foregoing description of the preferred embodiment is provided for the purpose of illustrating the principles of the principles of the present invention and not in limitation thereof, since the scope of the present invention is defined solely by the appended claims.

Claims (54)

  1. CLAIMS 1. A method for performing a search, based on a search criterion and for a variable length search key, by a computer processing device, comprising the steps of: reading an input key as a search key; using the N most significant bits of the search key as an index into a table representing a plurality of root nodes of search trees wherein each non-empty entry contains a pointer to a next branch in the search tree or a leaf; determining if the pointer in a non-empty table entry points to a leaf or a next branch of the corresponding search tree; reading the next branch contents if the pointer does not point to the leaf of the corresponding search tree; reading the leaf contents when the leaf of a corresponding search tree is reached and comparing at least one leaf pattern with the search key according to the search criterion; and returning the contents of the leaf found to the requesting application if the at least one leaf pattern meets the search criterion for the search key.
  2. 2. A method as claimed in claim 1 wherein the table representing a plurality of root nodes of search trees contains 2N entries.
  3. 3. A method as claimed in claim 1 or claim 2 wherein the computer processing device is a network processor.
  4. 4. A method as claimed in any preceding claim further comprising the step of: returning a no match found indication if the index into the table is to an empty entry.
  5. 5. A method as claimed in any preceding claim further comprising the steps of:
    if the leaf contains a chain pointer to another leaf, comparing at least one leaf pattern, from the another leaf, with the search key according to the search criterion; and returning an indication of match found if the at least one pattern stored in the another leaf meets the search criterion. returning an indication of no match found if the at least one leaf pattern stored in the another leaf does not meet the search criterion and the another leaf does not contain a pointer to a next leaf in the chain.
  6. 6. A method as claimed in any preceding claim wherein the next branch contents of the corresponding search tree include an address which points to another next branch;
  7. 7. A method as claimed in any preceding claim wherein the next branch contents of the corresponding search tree include an address which points to a leaf;
  8. 8. A method as claimed in any preceding claim wherein the next branch contents of the corresponding search tree include includes at least 2 addresses, each address pointing to another next branch or a leaf, and a value associated with each next branch or leaf and the method further comprises the step of: selecting the next branch or leaf to read by matching one or more bits from the search key with corresponding bits from the value associated the next branch or leaf.
  9. 9. A method as claimed in claim 8, wherein the next branch contents of the corresponding search tree include a bit number associated with each another next branch or a leaf, the bit number indicating the position, in the search key, of the one or more bits to match with corresponding bits in the value associated the next branch or leaf.
  10. 10. A method as claimed in claim 9 further comprising the step of terminating the search when the bit number of the next branch exceeds the length of the search key.
  11. 11. A method as claimed in any preceding claim wherein the input key is hashed using a hash function to generate the search key.
  12. 12. A method as claimed in claim 11 wherein the hash function used on the input key is a reversible hash function that can transform the hashed input key into the input key.
  13. 13. A method as claimed in any preceding claim further comprising appending the contents of a colour register to the search key.
  14. 14. A method as claimed in any of claims 1 to 12 further comprising appending a string of zeros to the search key.
  15. 15. A method as claimed in any preceding claim wherein the search criterion is a full match and a leaf contains one leaf pattern.
  16. 16. A method as claimed in any of claims 1 to 10 wherein the search criterion is a longest prefix match, a leaf contains one leaf pattern and the reading the next branch step further compares the prefix represented by the next branch with the input key to find a distinguishing bit position.
  17. 17. A method as claimed in claim 16 wherein the next branch contents of the corresponding search tree include an address which points to a bird wherein the bird represents a special type of leaf that represents a partial prefix match of the search key.
  18. 18. A method as claimed in claim 17 wherein the bird is placed on a bird stack along with an associated bit position.
  19. 19. A method as claimed in claim 18 further comprising steps of: responsive to the bird stack not being full: reading the contents of the next branch of the corresponding search tree; and responsive to the bird stack being full: flushing the bird stack.
  20. 20. A method as claimed in claim 19 wherein the step of flushing the bird stack comprises the steps of: reading the contents of the birds from a memory location; comparing the search key with the pattern stored in the contents of the bird memory location; determining a distinguishing position which represents a first bit at which the bird pattern and the search key differ;
    selecting the bird with the largest bit number that does not exceed the distinguishing position to keep in the bird stack ; and removing all other birds in the bird stack.
  21. 21. A method as claimed in any of claims 1 to 14 wherein the search criterion is a pattern range, a leaf contains a pair of leaf patterns and the reading the leaf contents step compares the search key with a pair of leaf patterns to determine if the range defined by the pair of leaf patterns include the search key.
  22. 22. A method as claimed in claim 21 wherein the step of comparing a pair of patterns comprises compare under range operation in which the bits in the search key are treated as an integer that is checked to determine if it is in a range defined by the pair of patterns.
  23. 23. A method as claimed in claim 21 wherein the step of comparing a pair of patterns comprises a compare under mask operation in which the bits in the search key are compared with the bits in a first leaf pattern under a mask specified in a second leaf pattern.
  24. 24. A computer program product comprising instructions which, when executed on a data processing system having a non-volatile memory storage device, causes said system to carry out a method as claimed in any preceding claim.
  25. 25. A computer readable medium containing a plurality of data structures for performing a search based on a search criterion for a variable length search key, comprising: a direct table that stores a first address location for a search tree; a plurality of pattern search control blocks that each represent a branch in the search tree; and a plurality of leaves wherein each leaf stores at least one pattern to compare with the search key.
  26. 26. A computer readable medium as claimed in claim 25 further comprising a lookup definition table that manages a tree search memory.
  27. 27. A computer readable medium as claimed in claim 26 wherein the lookup definition table comprises entries that define a physical memory that the
    tree resides in, a size of the key and leaf, and a type of search to be performed.
  28. 28. A computer readable medium as claimed in claims 26 or 27 wherein the lookup definition table is implemented in a plurality of memories.
  29. 29. A computer readable medium as claimed in any of claims 25 to 28 wherein a format for a pattern search control block includes at least one of a search control block; a next pattern address that points to a next pattern search control block; a leaf control block address that points to a leaf or result; and a next bit or bits to test.
  30. 30. The computer readable medium as claimed in any of claims 25 to 29 wherein a leaf data structure includes at least one of a leaf chaining pointer; a prefix length; at least one pattern to be compared to the search key; and variable user data.
  31. 31. A computer readable medium as claimed in any of claims 25 to 30 wherein a format for a direct table entry includes at least one of a search control block; a next pattern address that points to a next pattern search control block; a leaf control block address that points to a leaf or result; a next bit or bits to test; and a direct leaf.
  32. 32. A computer readable medium as claimed in claim 31 wherein the direct leaf is stored directly in a direct table entry and includes a search control block and at least one pattern to be compared to a search key.
  33. 33. A computer readable medium as claimed in any of claims 25 to 32 wherein a pattern search control block is inserted in the search tree at a position where leaf patterns differ
  34. 34. A computer readable medium as claimed in any of claims 25 to 33 wherein the search criterion is a full match.
  35. 35. A computer readable medium as claimed in any of claims 25 to 33 wherein the search criterion is a longest prefix match an the medium further comprises: at least one bird representing a partial match of the search key.
  36. 36. A computer readable medium as claimed in any of claims 25 to 33 wherein the search criterion is a pattern range comprising a pair of pattern values, each leaf includes a pair of patterns and the medium further comprises:
    a compare table that specifies at least one range compare associated with each entry.
  37. 37. A computer readable medium as claimed in claim 34 or 36 wherein a pattern search control block has a shape defined by a width of one and a height of one and is stored in a memory that has a line length of at least 36 bits.
  38. 38. A computer readable medium as claimed in claim 35 wherein a pattern search control block has a shape defined by a width of one and a height of one and is stored in a memory that has a line length of at least 64 bits.
  39. 39. A computer readable medium as claimed in claim 35 wherein a pattern search control block has a shape defined by a width of one and a height of two and is stored in a memory of at least 36 bits.
  40. 40. A computer readable medium as claimed in claim 36 wherein the compare table comprises entries that define at least one range compare, each range compare being defined by an offset parameter which is a position of the first bit of the field and a length parameter which is the length of the field in bits.
  41. 41. An apparatus fabricated on a semiconductor substrate for performing a search based on a search criterion for a variable length search key, comprising: an embedded processor complex including a plurality of protocol processors and an internal control point processor that provide frame processing; a plurality of hardware accelerator coprocessors accessible to each protocol processor and providing high speed pattern searching, data manipulation, and frame parsing; a plurality of programmable memory devices that store a plurality of data structures that represent at least one search tree, wherein the data structures include a direct table, a pattern search control block and a leaf; and an control memory arbiter that controls the access of each protocol processor to the plurality of memory devices.
  42. 42. An apparatus as claimed in claim 41 further comprising a tree search engine that operates in parallel with protocol processor execution to
    perform tree search instructions including memory reads and writes and memory range checking.
  43. 43. An apparatus as claimed in claim 41 or claim 42 wherein the plurality of memory devices further comprise at least one of internal static random access memory, external static random access memory, and external dynamic random access memory.
  44. 44. An apparatus as claimed in any of claims 41 to 43 wherein the control memory arbiter manages control memory operations by allocating memory cycles between the plurality of protocol processors and the plurality of memory devices.
  45. 45. An apparatus as claimed in any of claims 41 to 44 wherein each protocol processor comprises a primary data buffer, a scratch pad data buffer and control registers for data store operations.
  46. 46. An apparatus as claimed in any of claims 41 to 45 further comprising a programmable search key register and a programmable hashed key register.
  47. 47. An apparatus as claimed in claim 46 further comprising a programmable colour key register to enable sharing a single table data structure among a plurality of independent search trees.
  48. 48. An apparatus as claimed in 47 wherein the contents of the colour register, if enabled, are appended to the hash output to produce a final hashed key.
  49. 49. An apparatus as claimed in 47 wherein if the colour register is not enabled, appending an equivalent number of zeros to the hash output to produce a final hashed key.
  50. 50. An apparatus as claimed in any of claims 41 to 49 wherein the search criterion is a full match.
  51. 51. An apparatus as claimed in any of claims 41 to 49 wherein the search criterion is a longest prefix match and the data structures further comprise a bird representing a partial match of the search key.
  52. 52. An apparatus as claimed in any of claims 41 to 49 wherein the search criterion is a pattern range comparison, a leaf includes a pair of patterns and the data structures further comprise a compare table.
  53. 53. An apparatus as claimed in claims 50 or 52 further comprising a hash box component that performs a geometric hash function on the search key.
  54. 54. An apparatus as claimed in claims 53 further comprising a hash box component that performs a no hash function on the search key with a resulting hashed key being equal to the search key
GB0108545A 2000-04-06 2001-04-05 Search algorithm implementation for a network processor Expired - Fee Related GB2371381B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09/544,992 US6947931B1 (en) 2000-04-06 2000-04-06 Longest prefix match (LPM) algorithm implementation for a network processor
US09/543,531 US6675163B1 (en) 2000-04-06 2000-04-06 Full match (FM) search algorithm implementation for a network processor
US09/545,100 US7107265B1 (en) 2000-04-06 2000-04-06 Software management tree implementation for a network processor

Publications (3)

Publication Number Publication Date
GB0108545D0 GB0108545D0 (en) 2001-05-23
GB2371381A true GB2371381A (en) 2002-07-24
GB2371381B GB2371381B (en) 2004-09-01

Family

ID=27415410

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0108545A Expired - Fee Related GB2371381B (en) 2000-04-06 2001-04-05 Search algorithm implementation for a network processor

Country Status (1)

Country Link
GB (1) GB2371381B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11315062B2 (en) * 2016-09-16 2022-04-26 General Electric Company System and method for autonomous service operation validation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0419889A2 (en) * 1989-09-28 1991-04-03 Bull HN Information Systems Inc. Prefix search tree with partial key branching
WO1996000945A1 (en) * 1994-06-30 1996-01-11 International Business Machines Corp. Variable length data sequence matching method and apparatus
JPH10162013A (en) * 1996-11-28 1998-06-19 Nippon Telegr & Teleph Corp <Ntt> Digital searching device
US5857196A (en) * 1996-07-19 1999-01-05 Bay Networks, Inc. Method for storing a tree of potential keys in a sparse table
US5946679A (en) * 1997-07-31 1999-08-31 Torrent Networking Technologies, Corp. System and method for locating a route in a route table using hashing and compressed radix tree searching
GB2350534A (en) * 1999-05-26 2000-11-29 3Com Corp Packet-based network device with forwarding database having a trie search facility
WO2001016779A1 (en) * 1999-08-27 2001-03-08 International Business Machines Corporation Network processor, memory organization and methods

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0419889A2 (en) * 1989-09-28 1991-04-03 Bull HN Information Systems Inc. Prefix search tree with partial key branching
WO1996000945A1 (en) * 1994-06-30 1996-01-11 International Business Machines Corp. Variable length data sequence matching method and apparatus
US5857196A (en) * 1996-07-19 1999-01-05 Bay Networks, Inc. Method for storing a tree of potential keys in a sparse table
JPH10162013A (en) * 1996-11-28 1998-06-19 Nippon Telegr & Teleph Corp <Ntt> Digital searching device
US5946679A (en) * 1997-07-31 1999-08-31 Torrent Networking Technologies, Corp. System and method for locating a route in a route table using hashing and compressed radix tree searching
GB2350534A (en) * 1999-05-26 2000-11-29 3Com Corp Packet-based network device with forwarding database having a trie search facility
WO2001016779A1 (en) * 1999-08-27 2001-03-08 International Business Machines Corporation Network processor, memory organization and methods

Also Published As

Publication number Publication date
GB2371381B (en) 2004-09-01
GB0108545D0 (en) 2001-05-23

Similar Documents

Publication Publication Date Title
US6947931B1 (en) Longest prefix match (LPM) algorithm implementation for a network processor
US7120630B2 (en) Full match (FM) search algorithm implementation for a network processor
US7107265B1 (en) Software management tree implementation for a network processor
US7702630B2 (en) Longest prefix match lookup using hash function
Taylor Survey and taxonomy of packet classification techniques
US7054315B2 (en) Efficiency masked matching
US7382787B1 (en) Packet routing and switching device
US5946679A (en) System and method for locating a route in a route table using hashing and compressed radix tree searching
US7007101B1 (en) Routing and forwarding table management for network processor architectures
US20020172203A1 (en) Fast IP route lookup with 16/K and 16/Kc compressed data structures
Warkhede et al. Multiway range trees: scalable IP lookup with fast updates
Iyer et al. ClassiPl: an architecture for fast and flexible packet classification
US20070177512A1 (en) Method of Accelerating the Shortest Path Problem
CN111937360B (en) Longest prefix matching
EP1335538B1 (en) Method and system for address lookup in data communication
JP2004537921A (en) Method and system for high-speed packet transfer
JP3873027B2 (en) Bit string search apparatus and method
GB2371381A (en) Tree based search method
Li et al. Address lookup algorithms for IPv6
Peyravian et al. Search engine implications for network processor efficiency
Wang PROGRESS IN IP PACKET FORWARDING RESEARCH
Wang High speed string matching for virus scanning-Quick Sampling and Verification
Eliofotou Hardware for IPv6 Longest Prefix Matching

Legal Events

Date Code Title Description
746 Register noted 'licences of right' (sect. 46/1977)

Effective date: 20080329

PCNP Patent ceased through non-payment of renewal fee

Effective date: 20190405