US20140219279A1 - Methods and systems for network address lookup engines - Google Patents

Methods and systems for network address lookup engines Download PDF

Info

Publication number
US20140219279A1
US20140219279A1 US14/175,108 US201414175108A US2014219279A1 US 20140219279 A1 US20140219279 A1 US 20140219279A1 US 201414175108 A US201414175108 A US 201414175108A US 2014219279 A1 US2014219279 A1 US 2014219279A1
Authority
US
United States
Prior art keywords
input
data
address
neural network
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/175,108
Inventor
Warren GROSS
Naoya Onizawa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Royal Institution for the Advancement of Learning
Original Assignee
Royal Institution for the Advancement of Learning
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Royal Institution for the Advancement of Learning filed Critical Royal Institution for the Advancement of Learning
Priority to US14/175,108 priority Critical patent/US20140219279A1/en
Assigned to THE ROYAL INSTITUTION FOR THE ADVANCEMENT OF LEARNING / MCGILL UNIVERSITY reassignment THE ROYAL INSTITUTION FOR THE ADVANCEMENT OF LEARNING / MCGILL UNIVERSITY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GROSS, WARREN, ONIZAWA, NAOYA
Publication of US20140219279A1 publication Critical patent/US20140219279A1/en
Priority to US15/211,335 priority patent/US10469235B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0053Allocation of signaling, i.e. of overhead other than pilot signals
    • H04L5/0057Physical resource allocation for CQI
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J11/00Orthogonal multiplex systems, e.g. using WALSH codes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J13/00Code division multiplex systems
    • H04J13/0007Code type
    • H04J13/004Orthogonal
    • H04J13/0048Walsh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J13/00Code division multiplex systems
    • H04J13/10Code generation
    • H04J13/12Generation of orthogonal codes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0023Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the signalling
    • H04L1/0026Transmission of channel quality indication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0023Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the signalling
    • H04L1/0028Formatting
    • H04L1/0029Reduction of the amount of signalling, e.g. retention of useful signalling or differential signalling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/0001Arrangements for dividing the transmission path
    • H04L5/0014Three-dimensional division
    • H04L5/0016Time-frequency-code
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0044Arrangements for allocating sub-channels of the transmission path allocation of payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0048Allocation of pilot signals, i.e. of signals known to the receiver
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0048Allocation of pilot signals, i.e. of signals known to the receiver
    • H04L5/005Allocation of pilot signals, i.e. of signals known to the receiver of common pilots, i.e. pilots destined for multiple users or terminals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0053Allocation of signaling, i.e. of overhead other than pilot signals
    • H04L5/0055Physical resource allocation for ACK/NACK
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0058Allocation criteria
    • H04L5/006Quality of the received signal, e.g. BER, SNR, water filling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/0091Signaling for the administration of the divided path
    • H04L5/0094Indication of how sub-channels of the path are allocated
    • H04L61/1552
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/69Spread spectrum techniques
    • H04B1/707Spread spectrum techniques using direct sequence modulation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0057Block codes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/1607Details of the supervisory signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/46Cluster building
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/0001Arrangements for dividing the transmission path
    • H04L5/0003Two-dimensional division
    • H04L5/0005Time-frequency
    • H04L5/0007Time-frequency the frequencies being orthogonal, e.g. OFDM(A), DMT
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0053Allocation of signaling, i.e. of overhead other than pilot signals

Definitions

  • one destination address may match more than one routing table entry.
  • the most specific table entry the one with the highest subnet mask, being called the longest prefix match.
  • IPv4 Internet Protocol version 4
  • IPv6 Internet Protocol version 6
  • a low-power large-scale IP lookup engine may be implemented exploiting clustered neural networks (CNNs).
  • CNNs clustered neural networks
  • FIGS. 2A and 2B depict an IP lookup engine exploiting CNNs without wildcards in learning and retrieving processes
  • FIGS. 3A to 3C depict an IP lookup engine according to an embodiment of the invention exploiting wildcards in local storing, dummy neuron activation, and global storing processes;
  • FIGS. 11A and 11B depict circuits schematics for memory blocks providing local storing (MLS) and global storing (MGS) forming portions of an IP lookup engine according to the second implementation architecture in FIG. 10 ;
  • FIGS. 13A and 13B depict circuits schematics for decoding modules based upon ILSW and MILSW respectively forming portions of IP lookup engines according to the second implementation architecture in FIG. 10 for ILSW and MILSW variants;
  • FIGS. 14A and 14B depict circuits schematics for ambiguity elimination block and output selector forming portions of IP lookup engines according to the second implementation architecture in FIG. 10 ;
  • CNNs clustered neural networks
  • a CNN is a neural network which stores data using only binary weighted connections between several clusters, see for example Gripon et al. in “Sparse neural networks with large learning diversity” (IEEE Trans. Neural Networks, Vol. 22, No. 7, pp. 1087-1096).
  • HNN Hopfield Neural Network
  • the CNN requires less complex functions while learning (storing) larger number of messages.
  • a hardware implementation for a CNN has been reported by Jarollahi et. al. in “Architecture and implementation of an associative memory using sparse clustered networks” (Proc. ISCAS 2012, pp. 2901-2904).
  • IP addresses and their corresponding rules are stored.
  • a k-th (1 ⁇ k ⁇ N) learning address m k is composed of c-sub-messages m k1 . . . m kc and a k-th learning rule m′ k is composed of c′-sub-messages m′ k1 . . . m′ kc′ .
  • N is the number of stored addresses.
  • the length of the address is c*log 2 l bits and that of the rule is c′ ⁇ log 2 l′ bits.
  • the learnt message 1.9.10.X has a wildcard.
  • the wildcard is replaced by XORing 9 ⁇ 1 that are the two sub-messages in the first two input clusters, hence the wildcard becomes 8.
  • process (1) is performed by using md k instead of m k in order to make connections between the input and output clusters and then these connections are stored.
  • the dummy sub-message (sub-address) is converted to an l-bit one-hot signal that activates the corresponding dummy neuron in an input cluster associated with the wildcard.
  • the M stored messages are now defined by M 1 , . . . , M M which include the updated input addresses and output rule (port) and connections between the input and the output activated neurons are stored as shown in FIG. 3C and are given by Equation (10).
  • Equations (5) and (6) are executed.
  • Table 1 the learnt messages as a result of the “Local decoding” and “Global decoding” processes described supra in respect of FIGS. 3 and 4 are presented showing the IP address, IP address with dummy and the associated rule.
  • the MILSW there are two types of “wrongly” activated neurons in the input clusters from (c ⁇ c b ) to c. These two types of the activated neurons affect the probability.
  • the first type related to (12) when a sub-message that is not stored is wrongly treated as a “stored sub-message”, the corresponding neuron is wrongly activated.
  • the average number of wrongly activated neurons per input cluster in the first type ( ⁇ 1 ) is given by Equation (19).
  • P amb without wildcards is defined by Equation (7).
  • P anib with wildcards was evaluated by simulations. Unlike IPv4, since packet traces for the long prefix table are not available to the public and the prefix table is still small, see for example “Border Gateway Protocol (BGP)” (http://bgp.potaroo.net), addresses were chosen randomly for the evaluation. The stored addresses are uniformly distributed. Random-length wildcards appear in the last half of addresses (72 bits). If the range of addresses that have wildcards is changed, the prefix length can be changed.
  • BGP Border Gateway Protocol
  • FIG. 7 there is depicted an overall structure 700 of an IP lookup engine according to an embodiment of the invention with wildcards.
  • the learning process is “Local learning” using Equation (8) and “Global learning” as presented in respect of Equations (9) and (1).
  • the retrieving process employed exploits the following process:
  • FIG. 11 shows the MLS contains (c ⁇ l) l 2 -bit sub-memory blocks and the MGS contains c*c′ ll′-bit sub-memory blocks. Both the ILSW and the MILSW use the same memory block.
  • an external processing unit e.g. a central processing unit (CPU) or microprocessor.
  • CPU central processing unit
  • l bits of ⁇ ′ (i,j)(i′,j′) are serially sent from the CPU and are stored in the SRAM shown in FIG.
  • FIG. 12A depicts a circuit diagram of the input-replacement module based on the ILSW.
  • the updated input address (ms in ) is generated using the stored connections read from the MLS in (12) at the first clock cycle.
  • a flip-flop is enabled to store ms inj1 and transfers it to the MGS.
  • FIG. 12B depicts a circuit diagram of the dummy generator for the MILSW.
  • the matched word selects its corresponding port from the registers through a one-hot encoder and a multiplexer and then m out is transferred to an output selector shown in FIG. 14B .
  • the signal (mismatch) is low when the matched word is found in the TCAM and high when it is not found. If the matched word is found, m out is selected as an output port in the output selector. Otherwise, the output port is selected from the global decoding module.
  • the proposed IP lookup engine described in respect of Section 4.2 Implementation 2 and FIGS. 10 to 14 respectively was designed based upon the Taiwan Semiconductor Manufacturing Company Ltd. (TSMC) 65 nm CMOS technology.
  • the MLS and the MGS both exploit 15 SRAM blocks (256 kb) and 32 SRAM blocks (256 kb), respectively.
  • a reference TCAM was also designed.
  • the TCAM cell is designed using a NAND-type cell that consists of 16 transistors, as per Pagiamtzis et al in “Content-Addressable Memory (CAM) Circuits and Architectures: A tutorial and Survey” (IEEE J. Solid State Circuits, Vol. 41, No. 3, pp.
  • the worst-case delay is 1.31 ns in the block that includes the max-function block. This delay is 89.1% of the previous method (Ozinawa2) that includes an ambiguity checker after Global decoding.
  • the delay of the max-function block is 65.8% of the whole delay.
  • the worst-case delay is 0.62 ns.
  • throughput may be defined by (address length)/(worst-case delay)/(retrieving clock cycles) the MILSW offers increased throughput compared to the previous method (Ozinawa2) and the ILSW.
  • IP lookup search engines and context driven search engines discussed supra
  • other applications of embodiments of the invention include, but are no limited, CPU fully associative cache controllers and translation lookaside buffers, database search engines, database engines, data compression, artificial neural networks, and electronic intrusion prevention system.
  • Implementation of the techniques, blocks, steps and means described above may be done in various ways. For example, these techniques, blocks, steps and means may be implemented in hardware, software, or a combination thereof.
  • the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described above and/or a combination thereof.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • processors controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described above and/or a combination thereof.
  • the methodologies described herein are, in one or more embodiments, performable by a machine which includes one or more processors that accept code segments containing instructions. For any of the methods described herein, when the instructions are executed by the machine, the machine performs the method. Any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine are included.
  • a typical machine may be exemplified by a typical processing system that includes one or more processors.
  • Each processor may include one or more of a CPU, a graphics-processing unit, and a programmable DSP unit.
  • the processing system further may include a memory subsystem including main RAM and/or a static RAM, and/or ROM.
  • a bus subsystem may be included for communicating between the components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Quality & Reliability (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Internet routers are a key component in today's Internet. Each router forwards received packets toward their final destinations based upon a Longest Prefix Matching (LPM) algorithm select an entry from a routing table that determines the closest location to the final packet destination among several candidates. Prior art solutions to LPM lookup offer different tradeoffs and that it would be beneficial for a design methodology that provides for low power large scale IP lookup engines addressing the limitations within the prior art. According to embodiments of the invention a low-power large-scale IP lookup engine may be implemented exploiting clustered neural networks (CNNs). In addition to reduced power consumption embodiments of the invention provide reduced transistor count providing for reduced semiconductor die footprints and hence reduced die cost.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This patent application claims the benefit of U.S. Provisional Patent Application 61/761,998 filed on Feb. 7, 2013 entitled “Methods and Systems for Network Address Lookup Engines.”
  • FIELD OF THE INVENTION
  • This invention relates to address look-up engines and more particularly to address look-up engines with low power, die area efficiency, and compatibility with high speed packet datarates.
  • BACKGROUND OF THE INVENTION
  • In 2011 Cisco Systems 5th Annual Visual Networking Index Forecast the networking giant forecast that global Internet traffic will reach approximately 1 zettabyte a year by 2015 (966×1018). Breaking this down this equates to approximately 80 exabytes per month in 2015, up from approximately 20 exabytes per month in 2010, or 245 terabytes per second. This global Internet traffic will arise from approximately 3 billion people using approximately 6 billion portable and fixed electronic devices. At the same time average broadband access speeds will have increased to nearly 30 megabits per second from the approximately 7 megabits per second in 2011. This Internet traffic and consumer driven broadband access being supported through a variety of local, metropolitan, and wide area networks together with long haul, trunk, submarine, and backbone networks operating at OC-48 (2.5 Gb/s), OC-192 (10 Gb/s), OC-768 (40 Gb/s) with coarse and dense wavelength division multiplexing to provide overall channel capacities in some instances in excess of 1 Tb/s.
  • Dispersed between and within these networks are Internet routers which have become key to the Internet backbone. Such routers including, but not limited to, edge routers, subscriber edge routers, inter-provider border routers, and core routers. Accordingly, the overall data handled by these Internet routers by 2015 will be many times the approximately 1 zettabyte actually provided to users who will have expectations not only of high access speeds but low latency. Accordingly, Internet routers require fast IP-lookup operations utilizing hundred thousands of entries or more. Each router forwarding received packets toward their final destinations based upon a Longest Prefix Matching (LPM) algorithm to select an entry from a routing table that determines the closest location to the final packet destination among several candidates. As an entry in a routing table may be specify a network, one destination address may match more than one routing table entry. The most specific table entry, the one with the highest subnet mask, being called the longest prefix match. With the length of a packet being up to 32 bits for Internet Protocol version 4 (IPv4) and 144 bits for Internet Protocol version 6 (IPv6) it is evident that even at OC-48 (2.5 Gb/s) with maximum length IPv6 packets over 17 million packets are received per second. These packets containing binary strings and wildcards.
  • The hardware of the LPM has been designed within the prior art using several approaches including, but not limited to:
      • Ternary Content-Addressable Memory (TCAM), see for example Gamache et. al. in “A fast ternary CAM design for IP networking applications” (Proc. 12th IEEE ICCCN, pp. 434-439), Noda et. al. in “A cost-efficient high-performance dynamic TCAM with pipelined hierarchical searching and shift redundancy architecture” (IEEE JSSC, Vol. 40, No. 1, pp. 245-253), Maurya et. al. in “A dynamic longest prefix matching content addressable memory for IP routing” (IEEE TVLSI, Vol. 19, No. 6, pp. 963-972), and Kuroda et. al. “A 200 Msps, 0.6 W eDRAM-based search engine applying full-route capacity dedicated FIB application” (Proc. CICC 2012, pp. 1-4);
      • Trie-based schemes, see for example, Eatherton et al. “Tree bitmap: hardware/software IP lookups with incremental updates” (SIGCOMM Comput. Commun. Rev., Vol. 34, No. 2, pp. 97-122), and Bando et al “Flashtrie: Beyond 100-Gb/s IP route lookup using hash-based prefix-compressed trie” (IEEE/ACM Trans. Networking, Vol. 20, No. 4, pp. 1262-1275); and
      • Hash-based schemes, see for example Hasan et. al. in “Chisel: A storage-efficient, collision-free hash-based network processing architecture” (Proc. 33rd ISCA, pp. 203-215, June 2006) and Dharmapurikar et al. in “Longest prefix matching using bloom filters” (IEEE/ACM Trans. Networking 2006, Vol. 14, No. 2, pp. 397-409).
  • Unlike random access memory (RAM) which RAM returns the data word stored at a supplied memory address a Content Addressable Memory (CAM) searches its entire memory to see if a data word supplied to it is stored anywhere within it. If the data word is found, the CAM returns a list of the one or more storage addresses where the word was found. A Ternary Content Addressable Memory (TCAM) allows a third matching state of “X” (or “Don't Care”) in addition to “0” and “1” for one or more of the bits within the stored data word, thus adding flexibility to the search. Beneficially TCAMs perform the search of all entries stored in the TCAM cells in parallel and allow therefore for high-speed lookup operations. However, the large area of the cell, exploiting 16 transistors versus the 6 transistors in a static RAM (SRAM) cell and the brute-force searching methodology result in large power dissipation and inefficient hardware architectures for large forwarding tables. In contrast trie-based schemes exploit ordered tree data structures to store prefixes and locations based on this binary-tree structure that is created based on portions of stored Internet Protocol (IP) addresses. Searching is performed by traversing the tree until an LPM is found and may be implemented in hardware using SRAMs, rather than TCAMs, which potentially lowers power dissipation. However, deep trees require multi-step lookups slowing the determination of the LPM. Hash-based schemes use one or more hash tables to store prefixes where the benefit is scalability as table size is increased with length-independent searching speed. However, hash-based schemes have a possibility of collisions that requires post-processing to decide on only one output and requires reading many hash tables for each length of stored strings thereby slowing the process.
  • According, it would be evident that prior art solutions to LPM lookup offer different tradeoffs and that it would be beneficial for a design methodology that provides for low power large scale IP lookup engines addressing the limitations within the prior art. With carriers looking to add picocells, for example with ranges of a few hundred metres, to augment microcells and base stations in order to address capacity demands in dense urban environments for example power consumption becomes an important factor against conventional IP router deployment scenarios. According to embodiments of the invention a low-power large-scale IP lookup engine may be implemented exploiting clustered neural networks (CNNs). In addition to reduced power consumption embodiments of the invention provide reduced transistor count providing for reduced semiconductor die footprints and hence reduced die cost.
  • Beneficially low cost TCAMs would allow for their deployment within a variety of other applications where to date they have not been feasible due to cost as well as others where their deployment had not been previously considered. For example, TCAMs would enable routers to perform additional functions beyond address lookups, including, but not limited to, virus detection and intrusion detection.
  • Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.
  • SUMMARY OF THE INVENTION
  • It is an object of the present invention to address limitations in the prior art for address look-up engines and more particularly to providing address look-up engines with low power, die area efficiency, and compatibility with high speed packet datarates
  • In accordance with an embodiment of the invention there is provided a device comprising a clustered neural network processing algorithms storing ternary data.
  • In accordance with an embodiment of the invention there is provided a device comprising:
    • a first plurality of input clusters forming a first predetermined portion of a clustered neural network, each input cluster comprising of a first predetermined number of input neurons; and
    • a second plurality of output clusters forming a second predetermined portion of the clustered neural network, each output cluster comprising a second predetermined number of output neurons; wherein,
    • the clustered neural network stores a plurality of network addresses and routing rules relating to the network addresses as associations.
  • In accordance with an embodiment of the invention there is provided a method comprising:
    • providing an address lookup engine for a routing device employing a clustered neural network capable of processing ternary data;
    • teaching the address lookup engine about a plurality of addresses and their corresponding rules for routing data packets received by the routing device in dependence upon at least an address forming a predetermined portion of the data packet; and
    • routing data packets using the address lookup engine.
  • Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Embodiments of the present invention will now be described, by way of example only, with reference to the attached Figures, wherein:
  • FIG. 1 depicts a prior art IP lookup engine exploiting TCAMs;
  • FIGS. 2A and 2B depict an IP lookup engine exploiting CNNs without wildcards in learning and retrieving processes;
  • FIGS. 3A to 3C depict an IP lookup engine according to an embodiment of the invention exploiting wildcards in local storing, dummy neuron activation, and global storing processes;
  • FIGS. 4A to 4C depict a retrieving process for use within an IP lookup scheme with wildcards according to an embodiment of the invention;
  • FIGS. 5A to 5D depict a retrieving process for use within an IP lookup scheme with wildcards according to an embodiment of the invention;
  • FIG. 6 depicts ambiguity probability versus number of learnt messages for an IP engine according to an embodiment of the invention;
  • FIG. 7 depicts a first implementation architecture for an IP lookup engine according to an embodiment of the invention;
  • FIG. 8 depicts circuit schematics for a dummy generator and input selection block forming portions of an IP lookup engine according to the first implementation architecture in FIG. 7;
  • FIG. 9 depicts a circuit schematic for a decoding module forming part of an IP lookup engine according to the first implementation architecture in FIG. 7;
  • FIG. 10 depicts a second implementation architecture for an IP lookup engine according to an embodiment of the invention;
  • FIGS. 11A and 11B depict circuits schematics for memory blocks providing local storing (MLS) and global storing (MGS) forming portions of an IP lookup engine according to the second implementation architecture in FIG. 10;
  • FIGS. 12A and 12B depict circuits schematics for input replacement based upon ILSW and dummy generator based upon MILSW forming portions of IP lookup engines according to the second implementation architecture in FIG. 10 for ILSW and MILSW variants;
  • FIGS. 13A and 13B depict circuits schematics for decoding modules based upon ILSW and MILSW respectively forming portions of IP lookup engines according to the second implementation architecture in FIG. 10 for ILSW and MILSW variants;
  • FIGS. 14A and 14B depict circuits schematics for ambiguity elimination block and output selector forming portions of IP lookup engines according to the second implementation architecture in FIG. 10;
  • FIG. 15 depicts the average and maximum number of ambiguous entries versus the number of stored addresses for ILSW and MILSW IP lookup engines according to embodiments of the invention;
  • FIG. 16 depicts the maximum Nerror versus number of stored tables (N) with correlated table patterns for ILSW and MILSW IP lookup engines according to embodiments of the invention;
  • DETAILED DESCRIPTION
  • The present invention is directed to address look-up engines and more particularly to address look-up engines with low power, die area efficiency, and compatibility with high speed packet datarates.
  • The ensuing description provides exemplary embodiment(s) only, and is not intended to limit the scope, applicability or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiment(s) will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It being understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope as set forth in the appended claims.
  • Within the specification a variety of subscripts in respect to mathematical descriptions of embodiments of the invention are employed. Where possible the standard format of Rs, where R is a variable and S is an integer, is employed. However, where RS is itself a subscript to another term then these are denoted as RS for clarity to avoid multiple levels of subscripts which are difficult to interpret in the published patent application.
  • 1. Background
  • As noted supra embodiments of the invention for low-power large-scale IP lookup engines are implemented exploiting clustered neural networks (CNNs). A CNN is a neural network which stores data using only binary weighted connections between several clusters, see for example Gripon et al. in “Sparse neural networks with large learning diversity” (IEEE Trans. Neural Networks, Vol. 22, No. 7, pp. 1087-1096). Compared with a classical Hopfield Neural Network (HNN), see for example Hopfield in “Neural networks and physical systems with emergent collective computational abilities” (Proc. Nat. Academy of Sciences, Vol. 79, No. 8, pp. 2554-2558), the CNN requires less complex functions while learning (storing) larger number of messages. A hardware implementation for a CNN has been reported by Jarollahi et. al. in “Architecture and implementation of an associative memory using sparse clustered networks” (Proc. ISCAS 2012, pp. 2901-2904).
  • However, within an IP lookup engine there is the requirement for ternary not binary logic as LPM searching algorithms exploit three values, namely “0”, “1”, “don't care”, which would make them incompatible with prior art CNNs. Accordingly, as described below in respect of embodiments of the invention, the inventors have extended the CNN concept to process ternary logic. Further, unlike TCAMs that store IP addresses themselves, hardware embodiments of the invention store the associations between IP addresses and output rules, increasing memory efficiency. The output rule may, in some embodiments of the invention, be determined through low complexity hardware using associations read from SRAMs, thereby reducing the search power dissipation compared with that of prior art TCAMs that require brute-force searches. As both IP addresses and their rules can be stored as associations in the proposed IP lookup engine, the additional SRAM that stores rules within a conventional TCAM-based IP lookup engine is not required.
  • 2. Prior Art TCAM Based IP Lookup Scheme
  • FIG. 1 depicts an IP lookup scheme using a TCAM and a SRAM according to the prior art of Pagiamtzis et. al. in “Content-addressable memory (CAM) circuits and architectures: a tutorial and survey” (IEEE JSSC 2006, Vol. 41, No. 3, pp. 712-727). Ternary content-addressable memories (TCAMs) contain large number of entries from hundreds to several hundred thousand. Each entry contains binary address information and a wildcard (X) in TCAMs, while only binary information is stored in binary CAMs. The size of the entry is several dozen to hundreds, e.g. 128, 144 bits for IPv6, see for example Huang et al. in “A 65 nm 0.165 fJ/bit/search 256×144 TCAM macro design for IPv6 lookup tables” (IEEE JSSC 2011, Vol. 46, No. 2, pp. 507-519). In operation an input address is broadcast onto all entries through search lines and one or more entries are matched. A priority encoder finds the longest prefix match among these matched entries and determines a matched location that is an address of a SRAM containing rules. Since the matched location in the TCAM corresponds to an address of the SRAM, the corresponding rule is read.
  • As depicted in FIG. 1, an input address is 42.120.10.23 and this matches two entries, 42.120.10.23 and 42.120.10.X as X is a wildcard. Although two matched locations are activated, the matched location corresponding to 42.120.10.23 is selected. Finally, a “rule D” is read from the SRAM. TCAMs perform high-speed matching based on one clock cycle with a small number of entries. However there are some drawbacks when the number of entries is large, such as network routing. Since the search lines are connected to all entries, large search-line buffers are required, causing large power dissipation. The power dissipation of the search lines is the main portion of the overall power dissipation. In terms of die footprint, the number of transistors in a TCAM cell is 16, while it is 6 in a SRAM cell, resulting in an area inefficient hardware implementation.
  • 3. IP Lookup Based on Clustered Neural Network
  • 3.1 IP Lookup Scheme without Wildcards
  • As noted supra IP lookup engines according to embodiments of the invention are based upon extending the prior art binary CNN, see for example Gripon. Referring to FIG. 2A there is depicted an IP lookup scheme without wildcards (ILS) which provides the functions of a TCAM and a SRAM by storing associations between the IP addresses and their corresponding rules. There are c input clusters and c′ output clusters. Each input cluster consists of l neurons and each output cluster consists of l′ neurons. As depicted in the example within FIG. 2, c=4 and l=16 are set in the input cluster, while c′=1 and l′=16 are set in the output cluster. The input address has 16(c·log2 l) bits and the output port has 4(c′·log2 l′) bits.
  • 3.1.1 Learning Process:
  • In the learning process (also referred to as a storing process), IP addresses and their corresponding rules (also referred to as ports) are stored. Suppose that a k-th (1≦k≦N) learning address mk is composed of c-sub-messages mk1 . . . mkc and a k-th learning rule m′k is composed of c′-sub-messages m′k1 . . . m′kc′. N is the number of stored addresses. The length of the address is c*log2 l bits and that of the rule is c′·log2 l′ bits. The address is partitioned into c-sub-messages mkj (1≦j≦c), whose size is κ=log2 l bits. Each sub-message is converted into a l-bit one-hot signal, which activates the corresponding neuron in the input cluster, the l-bit one-hot signal having one of the 1 bits “1” and the others are “0”. The rule is also partitioned into c′-sub-messages mk′j′(1≦j′≦c′), whose size is κ′=log2 l′ bits. Each sub-message is converted into a l′-bit one-hot signal, which activates the corresponding neuron in the output cluster. During the learning process of M messages m1 . . . mm that include input addresses and rules, the corresponding patterns (activated neurons) C(m) are learnt. Depending on these patterns, connections w(i 1 ,j 1 )(i 2 ,j 1 ) between i1-th neuron of j1-th input cluster and i2-th neuron of j2-th output cluster are stored according to Equation (1).
  • w ( i 1 , j 1 ) ( i 2 , j 2 ) = { 1 , if { m { m 1 m M } C ( m ) j 1 = i 1 C ( m ) j 2 = i 2 0 , otherwise ( 1 )
  • where C(M) is a function that maps a message to a neuron.
  • This process is called “Global learning”. Suppose that M messages are uniformly distributed. An expected density d defined as memory utilization for “Global learning” is given by Equation (2).
  • d = 1 - ( 1 - 1 ll ) M ( 2 )
  • For example, d′=0.3 means that stored bits of “1” are 30% of the whole bits in the memory. The density is related to a probability of ambiguity of output rules as described below.
  • 3.1.2 Retrieving Process:
  • Within the retrieving process, an output neuron (rule) is retrieved using a c*k-bit input messages. The input message min is partitioned into c-sub-messages minj(1≦j≦c), which are converted into l-bit one-hot signals. In each input cluster, one neuron that corresponds to the l-bit one-hot signal is activated. The value of each neuron v(ni1,j1)(i1<l, j1≦c) in the input cluster is given by Equation (3) below.
  • v ( n i 1 , j 1 ) = { = 1 , if m inf 1 = i 1 = 0 , otherwise ( 3 )
  • This process is called “Local decoding”. Then, values of neurons v(n′i2,j2) in the output cluster, where v(ni1,j1)(i2<l′, j2≦c′) is the j2-th neuron of the i2-th input cluster, are updated using Equation (4).
  • v ( n i 1 , j 1 ) = i 1 = 0 l - 1 j 1 = 1 c w ( i 1 , j 1 ) ( i 2 , j 2 ) v ( n i 1 , j 1 ) ( 4 )
  • In each output neuron, the maximum value vMaxj 2 is found and then output neurons are activated using the Equations (5) and (6) respectively.
  • v Maxj 2 = max v ( n i 2 , j 2 ) ( 5 ) v ( n i 2 , j 2 ) = { 1 , if v ( n i 2 , j 2 ) = v max j 2 0 , otherwise ( 6 )
  • The process described by processes covered by Equations (4) through (6) are called “Global decoding”. In the example shown in FIG. 2B, the input message is 5.3.0.4. After applying Equation (4), the value of the l-th neuron in the output cluster is 4 and becomes the maximum value. Hence, the rule “1” is retrieved.
  • For learnt messages, there may be ambiguity in that more than two neurons (rules) are retrieved in the output cluster after “Global decoding”. The probability of ambiguity Pamb is given by Equation (7).

  • P amb=1−(1−d c)e′(l′−1)  (7)
  • Pamb is calculated based on learnt messages that are uniformly distributed. If the learnt messages are correlated, then the inventors through simulations described below have established that these correlated patterns do not affect Pamb significantly.
  • If the number of stored addresses (M) is increased, an input address might activate an output neuron that does not correspond to the input address because of the multiplicity of stored connections.
  • 3.2 IP Lookup Scheme with Wildcards
  • 3.2.1 Learning
  • As an IP routing table contains wildcards (X), the IP lookup scheme is extended to store wildcards in the table. Referring to FIGS. 3A to 3C there is depicted a learning process according to an embodiment of the invention for an IP lookup scheme with wildcards (ILSW). There are two different connections: w′(i,j)(i′,j′) and w(i1,j1)(i2,j2). w′(i,j)(i′,j′) which represent connections between activated neurons in the input clusters, while w (i1,j1)(i2,j2) represents connections between activated neurons in the input and output clusters. In the learning process for w′(i1,j1)(i2,j2) (0≦i1(i2)<l: 1≦j1(j2)≦c) depicted in FIG. 3A, an address learnt is partitioned into c-sub-messages with size κ=log2 l bits. If the sub-message is binary information, it is converted into an l-bit one-hot signal, which activates the corresponding neuron in the input cluster. If it is a wildcard, no neuron is activated in the input cluster. The output neurons are activated to store connections using the same, or comparable, algorithm to that employed within the IP lookup scheme without wildcards as depicted in FIG. 2. During the learning of M messages M1′, . . . , MM′ that include only input connections and depending on the corresponding patterns (activated neurons) C(m), connections w(i,j)(i′,j′) between i-th neuron of j-th input cluster and i′-th neuron of j′-th input cluster are stored based upon Equation (8).
  • w ( i 1 , j 1 ) ( i 2 , j 2 ) = { 1 , if { m { m 1 m M } j + 1 = j C ( m ) j = i C ( m ) j = i 0 , otherwise ( 8 )
  • This process is called “Local learning”.
  • FIG. 3B shows a process of a wildcard replacement for connections between the input and the output clusters w (i 1 ,j 1 )(i 2 ,j 2 ) (0≦i1<l:0≦i2<l′:1≦j1≦c:1≦j2≦c′) wherein dummy neurons are activated when learnt messages include wildcards. Suppose that the last half of learnt messages have wildcards distributed randomly. If a sub-message is a wildcard, the wildcard is replaced by dummy information that is binary one. The dummy information is determined by a function using the first half of the learnt messages. According to an embodiment of the invention, the function is realized by XORing two sub-messages of the first half of learnt messages. The k-th learning sub-messages mkj that contain wildcards (X) are replaced as mdkj according to Equation (9).
  • md kj = { m k ( j - c 2 ) m k ( j - c 2 + 1 ) ( mod c 2 ) A , if { m kj = X and ( j > c b ) m kj . otherwise ( 9 )
  • Alternatively, this may be thought of that if a sub-address is a wildcard the wildcard is replaced by a dummy sub-address generated mdkj, (1≦j≦c), where 1≦k≦N. Accordingly, if we suppose that the first cb sub-addresses are known and the last (c−cb) sub-addresses are either known or are wildcards. The dummy sub-address is generated using a function that employs the first cb sub-addresses.
  • In the example shown in FIG. 3B, the learnt message 1.9.10.X has a wildcard. The wildcard is replaced by XORing 9⊕1 that are the two sub-messages in the first two input clusters, hence the wildcard becomes 8. After making dummy neurons, process (1) is performed by using mdk instead of mk in order to make connections between the input and output clusters and then these connections are stored. For example, the dummy sub-message (sub-address) is converted to an l-bit one-hot signal that activates the corresponding dummy neuron in an input cluster associated with the wildcard. Accordingly, the M stored messages are now defined by M1, . . . , MM which include the updated input addresses and output rule (port) and connections between the input and the output activated neurons are stored as shown in FIG. 3C and are given by Equation (10).
  • w ( i 1 , j 1 ) ( i 2 , j 2 ) = { 1 , if { M { M 1 , , M M } and C ( M ) j 1 = i 1 and C ( M ) j 2 = i 2 0 , otherwise ( 10 )
  • 3.2.2 Retrieving
  • In the retrieving process, the proposed IP lookup scheme with wildcards (ILSW) checks whether input messages minj(1≦j≦C) are stored or not using stored connections. If an activated neuron corresponding to an input sub-message minj1 in the j1-th input cluster doesn't have a connection to an activated neuron in the precedent input cluster, minj1 can be decided as “non-stored messages”. Hence, the activated neuron is de-activated and alternatively a dummy neuron is activated according to the rules in Equations (10) and (11). This process initially partitions an input message min into c sub-messages which are themselves converted into l-bit one hot signals and hence where an activate neuron corresponding to an input sub-address mini, in the j1-th input cluster has no connection to an activated neuron in the (j1−1)-th input then the cluster minj1 may be treated as a non-stored message and is not used but rather a dummy sub-messsage is generated according to Equations (10) and (11) respectively. First, the number of connections conj1 in the j1-th input cluster is given by Equation (11).
  • con j 1 = j = 1 j 1 w ( m in ( j - 1 ) , j - 1 ) ( m inj , j ) ( 11 )
  • If conj1 is less than (j1−1), the input sub-messages are not stored. Then, the input sub-messages minj1 are replaced as msinj1 by dummy information as determined by Equation (12).
  • ms inj 1 = { m in ( j 1 - c 2 ) m in ( j 1 - c 2 + 1 ) , if { con j 1 < j 1 - 1 and ( j 1 > c b ) m inj 1 , otherwise ( 12 )
  • These rules in Equations (11) and (12) define processes are referred to as “Input selection” or “Input replacement”. Then, v(ni1,j1) is obtained based on using Equation (3) by using msinj1 instead of minj. In the example shown initially in FIG. 4A an input message 5.3.10.6 is added with an output rule 5. Subsequently as depicted in FIG. 4B an input message 1.9.10.6 is input. Since the activated neuron “6” in the 4th input cluster doesn't have a connection to the activated neuron “10” in the 3rd input cluster, the activated neuron is de-activated. Instead, a dummy neuron “4” in the 4th input cluster is activated.
  • The subsequent process is “Global decoding” based upon exploiting the processes defined in Equations (4) through (6) respectively. As the first-half input clusters have only binary information, the summation of w(i1,j1)(i2,j2)v(ni1,j1) from the first-half ones must be the half of the number of input clusters, if output neurons are a candidate rule. In the IP lookup scheme with wildcards (ILSW), Equation (4) is re-defined by Equation (13).
  • ena i 2 , j 2 = i 1 = 0 l - 1 j 1 = 1 c b w ( i 1 , j 1 ) ( i 2 , j 2 ) v ( ni 1 , j 1 ) ( 13 )
  • Accordingly, enai2,j2 must be equal to cb if an input address is stored by v(ni1,j1) being equal to 1. If enai2,j2 is less than cb the corresponding output neuron is not related to the input address. Hence the summation in Equation (4) is redefined by Equations (14) and (15).
  • sum i 2 , j 2 = i 1 = 0 l - 1 j 1 = c 2 + 1 c w ( i 1 , j 1 ) ( i 2 , j 2 ) v ( ni 1 , j 1 ) ( 14 ) v ( n i 2 , j 2 ) = { sum i 2 , j 2 , if_ena i 2 , j 2 = c 2 0 , otherwise ( 15 )
  • After these processes are executed the processes established by Equations (5) and (6) are executed. Referring to Table 1 the learnt messages as a result of the “Local decoding” and “Global decoding” processes described supra in respect of FIGS. 3 and 4 are presented showing the IP address, IP address with dummy and the associated rule.
  • TABLE 1
    Learnt Messages from Process Depicted in FIGS. 3 and 4
    IP Address IP Address with Dummy Rule
    5.3.0.6 5.3.0.6 1
    1.9.10.12 1.9.10.12 14
    1.9.10.X 1.9.10.8 8
  • 3.3 Global Decoding with Max Functions
  • 3.3.1 Learning
  • The proposed ILSW described supra in Section 3.2 requires a max function in the Global decoding, which results in large hardware overheads in terms of speed. The reason for a max function being required is that the maximum value of v(n′i2,j2) is varied depending on input messages, where v(n′i2,j2) can be up to (c−cb). In the Input replacement, an input sub-message that is not stored might be wrongly treated as a “stored sub-address”, especially when many addresses are stored. In this case, the input sub-address activates the wrong neuron, which is not later replaced by the dummy neuron. Once the wrong neuron is activated in the input cluster, there may not exist a connection to an output candidate. As a result, the maximum value of v(n′i2,j2) would be less than (c−cb) that needs to be detected using the max function.
  • To eliminate the max function, the maximum value has to be stable. Referring to FIGS. 5A to 5D there is depicted a retrieving process according to an embodiment of the invention exploiting a modified proposed IP lookup scheme with wildcards (MILSW). In the retrieving process, msin which contains dummy sub-addresses is generated using Equations (10) and (11). In addition, additional dummy sub-addresses main (1≦j1≦c) are generated using the processed defined by Equation (16).
  • ma inj 1 = { m in ( j 1 - c b ) m in ( j - c b + 1 ) ( mod c b ) A , if ( j 1 > c b ) m inj 1 , otherwise ( 16 )
  • In Equation (16) all dummy sub-addresses related to an input address are generated. Using msin and main, input neurons and all dummy neurons in the input clusters are activated based on (3) instead of using min. In this case, all the stored connections for Global storing relating to the input address are retrieved. Hence the maximum value of v(n′i2,j2) is (c−cb) in Equation (15). The summation and max function are replaced by an AND-OR function when the maximum value is fixed, see for example the inventors in Ozinawa et al “Clockless Stochastic Decoding of Low-Density Parity-Check Codes: Architecture and Simulation Model” (J. Signal Processing Systems, pp. 2523-2526, 2013, hereinafter Ozinawa1). As a result the output neuron v(n′i2,j2) is given by Equation (17).

  • v(n′ i2,j2)=Λj1=1 cv i1=0 lω(i1,j1)(i2,j2) v(n i1,j1)  (17)
  • In the example shown in FIGS. 5A to 5D one additional entry is stored in Table 1 with an input address is 5.3.10.6 and its output port is 5, where cb, is equal to 3. In FIGS. 5A to 5D the solid lines indicate connections for Local storing and the dashed lines indicate connections for Global storing. When an input address is 1.9.10.6, a neuron “6” in the 4th input cluster is wrongly activated because a connection between the neuron “6” in the 4th input cluster and a neuron “I0” in the 3rd input cluster exists as shown in FIG. 5B. Regardless of the activated neuron “6”, a dummy neuron “S” is activated following (16) using two sub-addresses in the first two clusters shown in FIG. 5C. Using all activated input neurons including the wrongly activated and the dummy neurons, the output neuron is activated as shown in FIG. 5D. In this case, the maximum value of v(n′i2,j2) is 4 that is the number of input clusters (c). In contrast, when (12) is used, the maximum value is 3 that cannot be detected using the AND-OR function because the dummy neuron “8” in the 4th cluster is not activated.
  • 3.3.2 Probability of Generating Ambiguous Outputs
  • The MILSW also has a probability of generating ambiguous outputs at each entry similar to that of the ILS in (7). For the probability, an expected density for local storing is described. In local storing, the density is different depending on positions of stored sub-messages because the message lengths excluding wildcards range from (c−cb)log2 l to log2 l bits. Now suppose the number of sub-messages excluding wildcards are uniformly distributed then the density from the j-th input cluster to the (j+1)-th cluster for Local storing is given by the Equation (18).
  • d j = 1 - ( 1 - 1 l 2 ) N ( c - j ) / ( c - c b + 1 ) ( 18 )
  • In the MILSW, there are two types of “wrongly” activated neurons in the input clusters from (c−cb) to c. These two types of the activated neurons affect the probability. In the first type related to (12), when a sub-message that is not stored is wrongly treated as a “stored sub-message”, the corresponding neuron is wrongly activated. The average number of wrongly activated neurons per input cluster in the first type (ω1) is given by Equation (19).
  • ω 1 1 c - c b + 1 j = c b c - 1 k = 1 c - j ( k r = j j + k + 1 d r ( 1 - d k + 1 ) ) ( 19 )
  • In the second type related to (16), a dummy neuron is activated even if a correct neuron that is actually stored is activated in the same input cluster. In this case, the dummy neuron is “wrongly” activated. The average number of wrongly activated neurons per input cluster in the second type (ω2) is given by Equation (20).
  • ω 2 1 c - c b + 1 j = 1 c - c b j ( 20 )
  • Hence, the average number of activated neurons per input cluster from cb to c(ωa) as given by Equation (21). As the activated neurons including “wrongly” activated neurons affect the probability (pt) in MILWS, where pt is obtained by modifying (7) to yield Equation (22) where ci is the number of independent activated neurons in the input clusters, which is affected by the function used in (9), (12) and (16).
  • 3.4 Ambiguity Analysis
  • The probability of generating ambiguous outputs at each entry (Pamb) is evaluated using a condition of l=512, c=16, l′=1024, and c′=1. Suppose two different cb are used as 4 and 8. The functions of generating dummy sub-addresses in (16) are designed using XOR functions and are summarized as an example in Table 2. The functions are the same in (9), (12) and (16). As the dummy sub-addresses are generated using the first cb sub-addresses, some functions can be the same in cb=4. Supposing the number of stored sub-addresses excluding wildcards are uniformly distributed then c, is about 14.3 in the case of cb=4.
  • TABLE 2
    Example of Dummy Sub-Message Generation in (16)
    f in cb = 8 f in cb = 4
    main0 min0 min0
    main1 min1 min1
    main2 min2 min2
    main3 min3 min3
    main4 min4 min0 ⊕ min1 ⊕ min2 ⊕ min3
    main5 min5 min1 ⊕ min2 ⊕ min3 ⊕ min0
    main6 min6 min2 ⊕ min3 ⊕ min0 ⊕ min1
    main7 min7 min3 ⊕ min0 ⊕ min1 ⊕ min2
    main8 min0 ⊕ min1 min0 ⊕ min1 ⊕ min2
    main9 min1 ⊕ min2 min1 ⊕ min2 ⊕ min3
    main10 min2 ⊕ min3 min2 ⊕ min3 ⊕ min0
    main11 min3 ⊕ min4 min3 ⊕ min0 ⊕ min1
    main12 min4 ⊕ min5 min0 ⊕ min1
    main13 min5 ⊕ min6 min1 ⊕ min2
    main14 min6 ⊕ min7 min2 ⊕ min3
    main15 min7 ⊕ min8 min3 ⊕ min0
  • FIG. 6 shows the probability of ambiguity Pamb versus the number of learnt messages (M) in the IP lookup schemes (l=512, c=16, l′=1024, c′=1). Pamb is the probability that more than two neurons are activated for learnt messages. In this condition, the proposed IP lookup schemes store 144-bit (c*log2 l) addresses (messages) for IPv6 headers. The maximum length is 36 bits for cb=4 and 72 bits for cb=8. Further when the stored length is 36 bits, the wildcard length is 108 bits.
  • Pamb without wildcards is defined by Equation (7). Panib with wildcards was evaluated by simulations. Unlike IPv4, since packet traces for the long prefix table are not available to the public and the prefix table is still small, see for example “Border Gateway Protocol (BGP)” (http://bgp.potaroo.net), addresses were chosen randomly for the evaluation. The stored addresses are uniformly distributed. Random-length wildcards appear in the last half of addresses (72 bits). If the range of addresses that have wildcards is changed, the prefix length can be changed.
  • It is evident from FIG. 6 that Pamb is strongly dependent of M. Within the IP lookup scheme according to embodiments of the invention with wildcards, if a dummy neuron is the same neuron as that already stored, then both outputs (rules) might be retrieved, which slightly increases Pamb compared with that of the IP lookup scheme without wildcards. Adding dummy neurons however as can be seen is very effective at lowering Pamb. This reduction is about five orders of magnitude reduction of that without dummy neurons at M=78,643. As evident in FIG. 5 the IP lookup scheme according to an embodiment of the invention with wildcards can store 100,000 144-bit IP addresses with a negligibly low probability of Pamb (<10−8).
  • Pamb was also simulated when the learnt messages are correlated. The word length in the IP lookup scheme with wildcards is 64 bits (l=256, c=8, l=1024, c′=1). The first 8 bits of the learnt messages are selected from 64 fixed patterns of 256 (28). The rest of the learnt messages are uniformly distributed. Random-length wildcards appear in the last half of addresses (32 bits). At M=10,000, Pamb using the correlated patterns was 5.30×10−7 while Pamb using uniformly distributed patterns is 1.69×10−7.
  • Unlearnt input messages are detected as “mismatch” via a two-step process. In the first step, the number of local connections conc/2 in Equation (10) is checked in “Input selection”. If it is not the same as ((c/2)−1), an unlearnt input message can be detected as “mismatch” because all local connections related to the input message are not stored (“Mismatch 1”). If an unlearnt input message is detected as “match” by other stored connections, the number of global connections enat2,j2 is checked at “Global decoding”. If all enai2,j2 are not the same as (c/2), an unlearnt input message can be detected as “mismatch” because all global connections between the input message and all output rules are not stored (“Mismatch 2”).
  • 4. Hardware Implementation
  • 4.1 Implementation 1
  • Referring to FIG. 7 there is depicted an overall structure 700 of an IP lookup engine according to an embodiment of the invention with wildcards. The learning process is “Local learning” using Equation (8) and “Global learning” as presented in respect of Equations (9) and (1). The retrieving process employed exploits the following process:
      • 1) each input-sub message is replaced by a dummy message if it is not stored (“Input selection”) according to Equations (10) and (11);
      • 2) connections between c activated neurons in the input clusters are read from a memory block 1 (“Local decoding”) established in dependence of Equation (3); and
      • 3) an output neuron is retrieved based on connections between neurons of the input and the output clusters in a memory block 2 in using processes established by Equations (12) to (14), (5), and (6) respectively (“Global decoding”).
  • There are ((c−1)l2)-bit SRAMs for “Local learning” and c′ll′-bit SRAMs for “Global learning”. In “Local learning”, a sub-input k-th learning address IN and the subsequent one mkj′ are converted into one-hot l-bit signals at row and column decoders, respectively. Then, a (w(i,j)(i′,j′)) is stored in the memory block 1 if both are not a wildcard. In “Global learning”, the last half of an input address is replaced by dummy information using a dummy generator that includes (c/2) sub-dummy generators if each sub-input address is a wildcard. The architecture of a sub-dummy generator being depicted by first circuit 800 in FIG. 8. The sub-dummy generator contains l 2-input XOR gates and multiplexors. Using the sub-input address with dummy information mdkj1 and the corresponding rule m′dkj2, a (w(i1,j1)(i2,j2)) is stored in the memory block 2.
  • In the retrieving process, an input address min is partitioned into c-sub-messages and ((c−1)/l) connections (w(i,j)(i′,j′)) are read from the memory block 1. w(m.inj,j)(m.in(j+1),j+1) in Equation (10) are selected in a multiplexor of an input-selection module. The input-selection module being depicted by second circuit 850 in FIG. 8. Then, the last half of the input address is replaced by dummy information if these corresponding connections are not found. The output msin that contains the first half of the input address and the generated last half of input address is sent to the memory block 2. The one-hot decoder transforms msin to v(ni1,j1). In the memory block 2, (c′l′) connections (w(i1,j1)(i2,j2)) are read by msin and are sent to a decoding module shown in FIG. 9. The decoding module contains (c′l′) global decoder, c′ max-function blocks and c′ ambiguity checkers, where c′ is set to 1 in FIG. 9. In the global decoder, (c/2) 2-input AND gates and (c/2)-input AND gate generate an enable signal to a (c/2)-input adder, where these circuits are corresponding to (12)-(14). There are a l′-input max-function block that decides an activated neuron v(ni2,j2)=1. The ambiguity checker checks that two neurons are activated simultaneously.
  • 4.2 Implementation 2
  • Referring to FIG. 10 there is depicted an overall structure of another implementation of an IP lookup engine. A memory block for local storing (MLS) stores connections in (8) and one for global storing (MGS) stores connections in (10). These memory modules are designed using SRAMs. A local decoding module in (3) is a one-hot decoder included in the memory blocks. An ambiguity elimination block includes a small TCAM that stores the ambiguous entries in the ILSW (MILSW), wherein it should be noted that the ambiguous entry generates more than one output port. If the ILSW (MILSW) retrieves multiple output ports, the ambiguous entry is matched to an input search address. In this case, the output of the IP lookup engine is selected from the ambiguity elimination block in an output selector. Otherwise, it is selected from the ILSW (MILSW). The ILSW takes 2 clock cycles and the MILSW takes 3 clock cycles for the retrieving process. The ambiguity elimination block takes 2 clock cycles.
  • FIG. 11 shows the MLS contains (c−l) l2-bit sub-memory blocks and the MGS contains c*c′ ll′-bit sub-memory blocks. Both the ILSW and the MILSW use the same memory block. In the storing process, suppose that the stored connections are preliminarily generated in an external processing unit (e.g. a central processing unit (CPU) or microprocessor). In Local storing, l bits of ω′(i,j)(i′,j′) are serially sent from the CPU and are stored in the SRAM shown in FIG. 11A, where it takes at least/clock cycles to store all ω′(i,j)(i′,j′) depending on the number of inputs/outputs (I/Os) of the implemented semiconductor circuit. In Global storing, it also takes at least l clock cycles to store all ω(i1,j1)(i2,j2)
  • In the retrieving process for the MLS, an input address min is partitioned into c sub-addresses and the corresponding connections are read in (11). FIG. 12A depicts a circuit diagram of the input-replacement module based on the ILSW. The updated input address (msin) is generated using the stored connections read from the MLS in (12) at the first clock cycle. At the second clock cycle, a flip-flop is enabled to store msinj1 and transfers it to the MGS. FIG. 12B depicts a circuit diagram of the dummy generator for the MILSW. Accordingly, mainj, is generated before msinj1 in (16) at the first clock cycle and is transferred to the MGS using the same path of msinj1 at the second clock cycle. At the 3rd clock cycle, msinj1 is transferred to the MGS. Hence, ω(i1,j1)(i2,j2) are read twice from the MGS using mainj1 and then msinj1.
  • Now referring to FIG. 13A there is depicted the decoding module based on the ILSW which operates at the 2nd clock cycle. Using the connections read from the MGS, (13), (14) and (15) are calculated using a cb-input AND gate and a (c−cb)-input adder. Then, only activated output neuron or neurons are selected using a max function in (5) and (6). FIG. 13B depicts the corresponding decoding module based on the MILSW that operates at the 2nd and the 3rd clock cycles. As ω(i1,j1(i2,j2) is read twice from the MGS, the word selected by mainj1 is stored in registers and then is ORed with the word selected by msinj1.
  • FIG. 14A depicts a circuit diagram of the ambiguity elimination block. This block is identical for both the ILSW and the MILSW and consists of q TCAM words (e.g. q=20) that contain the ambiguous entries (mc). The value of q is larger than or equal to the maximum number of ambiguous entries. In the storing process, suppose that mc and the corresponding output ports (m′c) are sent from a CPU and are stored in the TCAM and registers, respectively. In the retrieving process, TCAM and registers, respectively. In the retrieving process, min is searched in the TCAM and a matched word is found when the ILSW (MILSW) generates multiple output ports. The matched word selects its corresponding port from the registers through a one-hot encoder and a multiplexer and then mout is transferred to an output selector shown in FIG. 14B. The signal (mismatch) is low when the matched word is found in the TCAM and high when it is not found. If the matched word is found, mout is selected as an output port in the output selector. Otherwise, the output port is selected from the global decoding module.
  • 5. Evaluation
  • 5.1. Number of Ambiguous Entries
  • Depicted in FIG. 15 are the average (Namb) and the maximum number (max(Namb)) of the ambiguous entries versus the number of stored addresses (N) in the ILSW and the MILSW. Again it is noted that an ambiguous entry activates more than one output neuron. The same parameters as employed in FIG. 7 were used, where cb=8. These results are obtained using simulations, where each simulation point uses 12,000 trials to calculate Namb, and max(Namb). Namb increases with increasing N because it increases the density. The ILWS has lower Namb than that of the MILWS. The reason for this being that the MILWS may “wrongly” activate additional dummy neurons in (16).
  • Referring to FIG. 16 the max(Nerror) versus N is plotted when stored addresses are correlated. PCOR is the number of correlated bits divided by the total number of bits at the first two sub-addresses expressed as a percentage. The simulation conditions are the same as the previous simulation. A high PCOR increases the number of shared connections among stored entries. max(Namb) increases as the number of shared connections is increased.
  • 5.2 Hardware Results and Performance Comparisons of Embodiments of Invention
  • The proposed IP lookup engine described in respect of Section 4.2 Implementation 2 and FIGS. 10 to 14 respectively was designed based upon the Taiwan Semiconductor Manufacturing Company Ltd. (TSMC) 65 nm CMOS technology. The MLS and the MGS both exploit 15 SRAM blocks (256 kb) and 32 SRAM blocks (256 kb), respectively. The small TCAM in the compensation block has q=20 entries. For the purpose of comparison, a reference TCAM was also designed. The TCAM cell is designed using a NAND-type cell that consists of 16 transistors, as per Pagiamtzis et al in “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey” (IEEE J. Solid State Circuits, Vol. 41, No. 3, pp. 712-727). Each entry has 144 TCAM cells and is designed based on a hierarchy design style for high-speed matching operations, see for example Hanzawa et al in “A Large-Scale and Low-Power CAM Architecture featuring a One-Hotspot Block Code for IP-Address Lookup in a Network Router” (IEEE J. Solid State Circuits, Vol. 40, No. 4, pp. 853-861). Referring to Table III performance comparisons under HSPICE simulation of the ILSW and MILSW with the reference TCAM design and that of a previous design embodiment of the inventors are presented, see Onizawa et al “Low-Power Area-Efficient Large-Scale IP Lookup Engine based on Binary Weighted Clustered Networks” (Proc. 50th IEEE Design Automation Conference, 2013, hereinafter Ozinawa2). The reference TCAM is designed to store 100,000 144-bit entries and is divided into 20 sub-TCAMs that each have 5,000 entries in order to achieve a power reduction for the search lines. A priority encoder attached to the TCAM has 100,000 inputs and 17-bit outputs.
  • For storing process, the previous implementation of the inventors, see Ozinawa2, and the IP lookup engines according to embodiments of the invention l=512 clock cycles to store all tables in the MLS and the MGS. The previous method has the probability of generating ambiguous output ports (<10−8). The proposed methods according to embodiments of the invention remove the probability using the small TCAM. As the small TCAM can compensate up to q=20 error entries, N is increased by 83.5% and 70.0% based on the ILSW and the MILSW, respectively compared to the previous method.
  • For the retrieving process, in the ILSW, the worst-case delay is 1.31 ns in the block that includes the max-function block. This delay is 89.1% of the previous method (Ozinawa2) that includes an ambiguity checker after Global decoding. The delay of the max-function block is 65.8% of the whole delay. In the MILWS, as the max-function block is removed, the worst-case delay is 0.62 ns. As throughput may be defined by (address length)/(worst-case delay)/(retrieving clock cycles) the MILSW offers increased throughput compared to the previous method (Ozinawa2) and the ILSW.
  • However, the dynamic power dissipation of the MILSW is increased compared to that of the ILSW because the MILSW reads connections twice from the MGS. However, the energy dissipations of the ILSW and the MILSW are 2.7% and 4.8% of that of the reference TCAM which is significant. The main reason of the energy reduction is the use of SRAMs instead of the power-hungry TCAM based on the brute-force search. A lookup speed of the ILSW and the MILSW can be 229 Gb/s and 323 Gb/s over 40 Gb/s (OC-768) where the packet size is supposed to be 75 bytes. In terms of die footprint a further reduction is achieved as each TCAM cell contains 16 transistors while each SRAM cell contains 6 transistors. Accordingly, the required memory size of a CMOS implementation of a circuit operating according to an embodiment of the invention is 11.75 Mb, which 30.6% of the equivalent area of the reference TCAM design
  • TABLE 3
    Performance Comparisons
    Reference
    TCAM Ozinawa2 ILSW MILSW
    Number of Tables 100,000 100,000 183,500 170,000
    Throughput (Gbps) 52.0 48.3 54.8 77.7
    Dynamic Power (W) 3.030 0.140 0.156 0.363
    Static Power (W) 0.240 0.090 0.090 0.090
    Energy Metric (fJ/bit/search) 0.584 0.028 0.016 0.028
    Probability of Generating 10−8
    Ambiguous Outputs
    q (number of entries in TCAM) 0 20 20
    Memory (Mb) 14.4 11.75 (SRAM)
    (TCAM)
    Equivalent Size (SRAM) (Mb) 38.4 11.75
    Number of Transistors 256M 77M
  • 5.3 Performance Comparison Relative to Prior Art
  • Referring to Table 4 there are presented performance comparisons of embodiments of the invention with related works within the prior art. The design of Gamache is a straightforward implementation using TCAMs whilst Hayashi et al in “A 250-MHZ 18-Mb Full Ternary CAM with Low-Voltage Matchline Sensing Scheme in 65-nm CMOS” (IEEE J. Solid State Circuits, Vol. 48, No. 11, pp. 2671-2680) requires a very large (18 Mb) memory implementation under a 65 nm CMOS technology. In contrast Noda employs eDRAMs to reduce the size of TCAM cells for low power dissipation, however this tends to be a complex process. In Maurya several entries are shared using special-purpose CAM cells to reduce the number of entries required whilst Kuroda realizes the prefix match by reading candidates from eDRAMs, thereby yielding an energy metric which is very small. However, as the memory size is O(2n) where n is the word length, this leads to an unacceptably large memory of 1,039 Mb for long words (e.g. 144 bits). The trie-based method of Bando (PC trie-4) realizes a memory-efficient IP lookup engine using a prefix-compressed trie and also uses a hash function to reduce memory accesses to off-chip memory in order to achieve high throughput. Hasan exploits a hash-based method which reduces power dissipation using a collision-free hash function compared with the TCAM in Gamache. Compared with these methods, the IP lookup engines ILSW and MILSW according to an embodiments of the invention realize low energy metrics while dealing with large number of long entries and also achieving high throughout and small die footprint through a reasonable memory size.
  • It would be evident to one skilled in the art that small die footprints equate to reduced die costs thereby reducing the cost of an IP lookup engine according to an embodiment of the invention. Beneficially low cost TCAMs and IP lookup engines implemented according to embodiments of the invention would therefore not only allow for their deployment within a variety of other applications where to date they have not been feasible due to cost but also others where their deployment had not been previously considered. For example, low cost TCAMs would enable routers to perform additional functions beyond address lookups, including, but not limited to, virus detection and intrusion detection.
  • TABLE 4
    Performance Comparison of Embodiment of the Invention with Prior Art
    TCAM1 TCAM2 DTCAM IPCAM eDRAM Trie Hash MILSW
    Length
    512 72 144 32 (128) 23 63 32 144
    (bits) (Note 1)
    No. of 21,504 262,144 32,768 65,536 16 × 106 318,043 524,288 170,000
    Entries (Note 2)
    Throughput 76.8 18 20.6 32   4.6 12.6 6.4 77.7
    (Gbps)
    e 12.26 9.6 2.0 7.33   0.6 5.5 0.363
    Energy 5.53 1.98 2.96 0.159    0.007 1.64 0.028
    Metric
    Memory 10.5 18 4.5 2 432  31.19 60 11.75
    (Mb)
    Equiv. Size 28 48 7.33
    SRAM
    (Mb)
    Equiv. Size 7.88 96 4.5 32.99 1039 11.75
    for 144-bit
    Length
    (Mb)
    Technology 130 65 130 65 40 FPGA 130 65
    (nm)
    (Note 1): Method can be extended to 128 bits for IPv6
    (Note 2): 1.38M - IPCAM word is approximately equivalent to 22 TCAM words.
    Note 3: TCAMI - Gamache, TCAM2 - Hayashi, DTCAM - Noda, IPCAM - Maurya, EDRAM - Kuroda, Trie - Bando, Hash - Hasan
  • Specific details are given in the above description to provide a thorough understanding of the embodiments. However, it is understood that the embodiments may be practiced without these specific details. For example, circuits may be shown in block diagrams in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
  • In addition to the IP lookup search engines and context driven search engines discussed supra other applications of embodiments of the invention include, but are no limited, CPU fully associative cache controllers and translation lookaside buffers, database search engines, database engines, data compression, artificial neural networks, and electronic intrusion prevention system.
  • Implementation of the techniques, blocks, steps and means described above may be done in various ways. For example, these techniques, blocks, steps and means may be implemented in hardware, software, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described above and/or a combination thereof.
  • Also, it is noted that the embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process is terminated when its operations are completed, but could have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination corresponds to a return of the function to the calling function or the main function.
  • Furthermore, embodiments may be implemented by hardware, software, scripting languages, firmware, middleware, microcode, hardware description languages and/or any combination thereof. When implemented in software, firmware, middleware, scripting language and/or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine readable medium, such as a storage medium. A code segment or machine-executable instruction may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a script, a class, or any combination of instructions, data structures and/or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters and/or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
  • For a firmware and/or software implementation, the methodologies may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. Any machine-readable medium tangibly embodying instructions may be used in implementing the methodologies described herein. For example, software codes may be stored in a memory. Memory may be implemented within the processor or external to the processor and may vary in implementation where the memory is employed in storing software codes for subsequent execution to that when the memory is employed in executing the software codes. As used herein the term “memory” refers to any type of long term, short term, volatile, nonvolatile, or other storage medium and is not to be limited to any particular type of memory or number of memories, or type of media upon which memory is stored.
  • Moreover, as disclosed herein, the term “storage medium” may represent one or more devices for storing data, including read only memory (ROM), random access memory (RAM), magnetic RAM, core memory, magnetic disk storage mediums, optical storage mediums, flash memory devices and/or other machine readable mediums for storing information. The term “machine-readable medium” includes, but is not limited to portable or fixed storage devices, optical storage devices, wireless channels and/or various other mediums capable of storing, containing or carrying instruction(s) and/or data.
  • The methodologies described herein are, in one or more embodiments, performable by a machine which includes one or more processors that accept code segments containing instructions. For any of the methods described herein, when the instructions are executed by the machine, the machine performs the method. Any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine are included. Thus, a typical machine may be exemplified by a typical processing system that includes one or more processors. Each processor may include one or more of a CPU, a graphics-processing unit, and a programmable DSP unit. The processing system further may include a memory subsystem including main RAM and/or a static RAM, and/or ROM. A bus subsystem may be included for communicating between the components. If the processing system requires a display, such a display may be included, e.g., a liquid crystal display (LCD). If manual data entry is required, the processing system also includes an input device such as one or more of an alphanumeric input unit such as a keyboard, a pointing control device such as a mouse, and so forth.
  • The memory includes machine-readable code segments (e.g. software or software code) including instructions for performing, when executed by the processing system, one of more of the methods described herein. The software may reside entirely in the memory, or may also reside, completely or at least partially, within the RAM and/or within the processor during execution thereof by the computer system. Thus, the memory and the processor also constitute a system comprising machine-readable code.
  • In alternative embodiments, the machine operates as a standalone device or may be connected, e.g., networked to other machines, in a networked deployment, the machine may operate in the capacity of a server or a client machine in server-client network environment, or as a peer machine in a peer-to-peer or distributed network environment. The machine may be, for example, a computer, a server, a cluster of servers, a cluster of computers, a web appliance, a distributed computing environment, a cloud computing environment, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. The term “machine” may also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
  • The foregoing disclosure of the exemplary embodiments of the present invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many variations and modifications of the embodiments described herein will be apparent to one of ordinary skill in the art in light of the above disclosure. The scope of the invention is to be defined only by the claims appended hereto, and by their equivalents.
  • Further, in describing representative embodiments of the present invention, the specification may have presented the method and/or process of the present invention as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. As one of ordinary skill in the art would appreciate, other sequences of steps may be possible. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. In addition, the claims directed to the method and/or process of the present invention should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the present invention.

Claims (9)

What is claimed is:
1. A device comprising;
a clustered neural network processing algorithms storing ternary data.
2. The device according to claim 1 wherein;
the ternary data comprises data having values of either 0, 1, or don't care.
3. The device according to claim 1;
wherein the device only stores data as binary weighted connections between clusters of the clustered neural network.
4. The device according to claim 1 wherein;
input data to the device comprises a network address of a plurality of network addresses according to a predetermined standard, each network address relating to a destination of a packet of data received at a system comprising at least the device; and
output data from the device comprises a rule for routing the packet of data; wherein the plurality of addresses and their associated rules are stored as associations within the device without the requirement for an additional rule specific memory.
5. A device comprising:
a first plurality of input clusters forming a first predetermined portion of a clustered neural network, each input cluster comprising of a first predetermined number of input neurons; and
a second plurality of output clusters forming a second predetermined portion of the clustered neural network, each output cluster comprising a second predetermined number of output neurons; wherein,
the clustered neural network stores a plurality of network addresses and routing rules relating to the network addresses as associations.
6. The device according to claim 5 wherein,
the clustered neural network operates using ternary data, the ternary data comprising data having values of either 0, 1, or don't care.
7. The device according to claim 5;
wherein the device only stores data as binary weighted connections between clusters of the clustered neural network.
8. A method comprising;
providing an address lookup engine for a routing device employing a clustered neural network capable of processing ternary data;
teaching the address lookup engine about a plurality of addresses and their corresponding rules for routing data packets received by the routing device in dependence upon at least an address forming a predetermined portion of the data packet; and
routing data packets using the address lookup engine.
9. The method according to claim 8 wherein,
providing the address lookup engine comprises:
providing a first plurality of input clusters forming a first predetermined portion of the clustered neural network, each input cluster comprising of a first predetermined number of input neurons; and
providing a second plurality of output clusters forming a second predetermined portion of the clustered neural network, each output cluster comprising a second predetermined number of output neurons; wherein,
the clustered neural network stores a plurality of network addresses and routing rules relating to the network addresses as associations.
US14/175,108 2013-02-07 2014-02-07 Methods and systems for network address lookup engines Abandoned US20140219279A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/175,108 US20140219279A1 (en) 2013-02-07 2014-02-07 Methods and systems for network address lookup engines
US15/211,335 US10469235B2 (en) 2013-02-07 2016-07-15 Methods and systems for network address lookup engines

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361761998P 2013-02-07 2013-02-07
US14/175,108 US20140219279A1 (en) 2013-02-07 2014-02-07 Methods and systems for network address lookup engines

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/211,335 Continuation US10469235B2 (en) 2013-02-07 2016-07-15 Methods and systems for network address lookup engines

Publications (1)

Publication Number Publication Date
US20140219279A1 true US20140219279A1 (en) 2014-08-07

Family

ID=51259171

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/175,108 Abandoned US20140219279A1 (en) 2013-02-07 2014-02-07 Methods and systems for network address lookup engines
US15/211,335 Active US10469235B2 (en) 2013-02-07 2016-07-15 Methods and systems for network address lookup engines

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/211,335 Active US10469235B2 (en) 2013-02-07 2016-07-15 Methods and systems for network address lookup engines

Country Status (2)

Country Link
US (2) US20140219279A1 (en)
CA (1) CA2842555A1 (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150089329A1 (en) * 2013-09-26 2015-03-26 International Business Machines Corporation Electronic circuit for fitting a virtual address range to a physical memory containing faulty address
CN106875011A (en) * 2017-01-12 2017-06-20 南京大学 The hardware structure and its calculation process of two-value weight convolutional neural networks accelerator
US10027739B1 (en) 2014-12-16 2018-07-17 Amazon Technologies, Inc. Performance-based content delivery
US10104009B2 (en) 2008-09-29 2018-10-16 Amazon Technologies, Inc. Managing resource consolidation configurations
US10148542B2 (en) 2008-09-29 2018-12-04 Amazon Technologies, Inc. Monitoring domain allocation performance
US10205644B2 (en) 2008-09-29 2019-02-12 Amazon Technologies, Inc. Managing network data display
US10225326B1 (en) 2015-03-23 2019-03-05 Amazon Technologies, Inc. Point of presence based data uploading
US10225365B1 (en) 2014-12-19 2019-03-05 Amazon Technologies, Inc. Machine learning based content delivery
US10284446B2 (en) 2008-09-29 2019-05-07 Amazon Technologies, Inc. Optimizing content management
US10311372B1 (en) * 2014-12-19 2019-06-04 Amazon Technologies, Inc. Machine learning based content delivery
US10311371B1 (en) 2014-12-19 2019-06-04 Amazon Technologies, Inc. Machine learning based content delivery
US10410085B2 (en) 2009-03-24 2019-09-10 Amazon Technologies, Inc. Monitoring web site content
US10462025B2 (en) 2008-09-29 2019-10-29 Amazon Technologies, Inc. Monitoring performance and operation of data exchanges
JP2020113010A (en) * 2019-01-10 2020-07-27 株式会社三菱Ufj銀行 Telegram delivery method and program
US10783153B2 (en) * 2017-06-30 2020-09-22 Cisco Technology, Inc. Efficient internet protocol prefix match support on No-SQL and/or non-relational databases
US10812358B2 (en) 2014-12-16 2020-10-20 Amazon Technologies, Inc. Performance-based content delivery
US10924381B2 (en) 2015-02-19 2021-02-16 Arista Networks, Inc. System and method of processing in-place adjacency updates
US20210075725A1 (en) * 2019-07-19 2021-03-11 Arista Networks, Inc. Avoiding recirculation of data packets in a network device
CN112636974A (en) * 2020-12-22 2021-04-09 安徽飞凯电子技术有限公司 Communication equipment intelligent supervision system based on big data
US11232038B2 (en) 2019-06-05 2022-01-25 Samsung Electronics Co., Ltd. Ternary content addressable memory and operating method thereof
US11263520B2 (en) * 2016-11-30 2022-03-01 Shanghai Cambricon Information Technology Co., Ltd. Instruction generation process multiplexing method and device
CN117440053A (en) * 2023-12-21 2024-01-23 沐曦集成电路(上海)有限公司 Multistage cross die access method and system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10540588B2 (en) * 2015-06-29 2020-01-21 Microsoft Technology Licensing, Llc Deep neural network processing on hardware accelerators with stacked memory
CN116248590B (en) * 2022-12-16 2024-05-10 中国联合网络通信集团有限公司 Data forwarding method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060167843A1 (en) * 2005-01-24 2006-07-27 3Com Corporation Tire search engines and ternary CAM used as pre-classifier
US7730086B1 (en) * 2002-02-11 2010-06-01 Louisiana Tech University Foundation, Inc. Data set request allocations to computers

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9877337B2 (en) * 2013-03-20 2018-01-23 Lg Electronics Inc. Method for transmitting and receiving signal using device-to-device communication in wireless communication system, and device for same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7730086B1 (en) * 2002-02-11 2010-06-01 Louisiana Tech University Foundation, Inc. Data set request allocations to computers
US20060167843A1 (en) * 2005-01-24 2006-07-27 3Com Corporation Tire search engines and ternary CAM used as pre-classifier

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10148542B2 (en) 2008-09-29 2018-12-04 Amazon Technologies, Inc. Monitoring domain allocation performance
US10462025B2 (en) 2008-09-29 2019-10-29 Amazon Technologies, Inc. Monitoring performance and operation of data exchanges
US10284446B2 (en) 2008-09-29 2019-05-07 Amazon Technologies, Inc. Optimizing content management
US10205644B2 (en) 2008-09-29 2019-02-12 Amazon Technologies, Inc. Managing network data display
US10104009B2 (en) 2008-09-29 2018-10-16 Amazon Technologies, Inc. Managing resource consolidation configurations
US10410085B2 (en) 2009-03-24 2019-09-10 Amazon Technologies, Inc. Monitoring web site content
US9343185B2 (en) * 2013-09-26 2016-05-17 International Business Machines Corporation Electronic circuit for fitting a virtual address range to a physical memory containing faulty address
US20150089329A1 (en) * 2013-09-26 2015-03-26 International Business Machines Corporation Electronic circuit for fitting a virtual address range to a physical memory containing faulty address
US10027739B1 (en) 2014-12-16 2018-07-17 Amazon Technologies, Inc. Performance-based content delivery
US10812358B2 (en) 2014-12-16 2020-10-20 Amazon Technologies, Inc. Performance-based content delivery
US10311371B1 (en) 2014-12-19 2019-06-04 Amazon Technologies, Inc. Machine learning based content delivery
US10225365B1 (en) 2014-12-19 2019-03-05 Amazon Technologies, Inc. Machine learning based content delivery
US11457078B2 (en) 2014-12-19 2022-09-27 Amazon Technologies, Inc. Machine learning based content delivery
US10311372B1 (en) * 2014-12-19 2019-06-04 Amazon Technologies, Inc. Machine learning based content delivery
US10924381B2 (en) 2015-02-19 2021-02-16 Arista Networks, Inc. System and method of processing in-place adjacency updates
US10225326B1 (en) 2015-03-23 2019-03-05 Amazon Technologies, Inc. Point of presence based data uploading
US11297140B2 (en) 2015-03-23 2022-04-05 Amazon Technologies, Inc. Point of presence based data uploading
US11263520B2 (en) * 2016-11-30 2022-03-01 Shanghai Cambricon Information Technology Co., Ltd. Instruction generation process multiplexing method and device
CN106875011A (en) * 2017-01-12 2017-06-20 南京大学 The hardware structure and its calculation process of two-value weight convolutional neural networks accelerator
US10783153B2 (en) * 2017-06-30 2020-09-22 Cisco Technology, Inc. Efficient internet protocol prefix match support on No-SQL and/or non-relational databases
JP2020113010A (en) * 2019-01-10 2020-07-27 株式会社三菱Ufj銀行 Telegram delivery method and program
JP7224188B2 (en) 2019-01-10 2023-02-17 株式会社三菱Ufj銀行 Message delivery method and program
US11232038B2 (en) 2019-06-05 2022-01-25 Samsung Electronics Co., Ltd. Ternary content addressable memory and operating method thereof
US20210075725A1 (en) * 2019-07-19 2021-03-11 Arista Networks, Inc. Avoiding recirculation of data packets in a network device
US11582151B2 (en) * 2019-07-19 2023-02-14 Arista Networks, Inc. Avoiding recirculation of data packets in a network device
CN112636974A (en) * 2020-12-22 2021-04-09 安徽飞凯电子技术有限公司 Communication equipment intelligent supervision system based on big data
CN117440053A (en) * 2023-12-21 2024-01-23 沐曦集成电路(上海)有限公司 Multistage cross die access method and system

Also Published As

Publication number Publication date
US20190222398A1 (en) 2019-07-18
CA2842555A1 (en) 2014-08-07
US10469235B2 (en) 2019-11-05

Similar Documents

Publication Publication Date Title
US10469235B2 (en) Methods and systems for network address lookup engines
US8688902B2 (en) Method and system for processing access control lists using an exclusive-or sum-of-products evaluator
Ullah et al. E-TCAM: An efficient SRAM-based architecture for TCAM
US8780926B2 (en) Updating prefix-compressed tries for IP route lookup
Banerjee et al. Tag-in-tag: Efficient flow table management in sdn switches
US20150127900A1 (en) Ternary content addressable memory utilizing common masks and hash lookups
Xie et al. Mousika: Enable general in-network intelligence in programmable switches by knowledge distillation
Le et al. Scalable tree-based architectures for IPv4/v6 lookup using prefix partitioning
Qian et al. Low power RAM-based hierarchical CAM on FPGA
CN110460529B (en) Data processing method and chip for forwarding information base storage structure of content router
Li et al. Memory‐efficient recursive scheme for multi‐field packet classification
Indira et al. A trie based IP lookup approach for high performance router/switch
US11431626B2 (en) Forwarding rules among lookup tables in a multi-stage packet processor
Veeramani et al. Efficient IP lookup using hybrid trie-based partitioning of TCAM-based open flow switches
Trinh et al. Algorithmic TCAM on FPGA with data collision approach
Sun et al. Using TCAM efficiently for IP route lookup
Thinh et al. Massively parallel cuckoo pattern matching applied for NIDS/NIPS
Nakahara et al. A memory-based IPv6 lookup architecture using parallel index generation units
Vijay et al. A memory-efficient adaptive optimal binary search tree architecture for IPV6 lookup address
Saxena et al. Scalable, high-speed on-chip-based NDN name forwarding using FPGA
Onizawa et al. Low-power area-efficient large-scale IP lookup engine based on binary-weighted clustered networks
Mahini et al. MLET: a power efficient approach for TCAM based, IP lookup engines in Internet routers
Erdem Pipelined hierarchical architecture for high performance packet classification
Srinivasavarma et al. Hardware-based multi-match packet classification in NIDS: an overview and novel extensions for improving the energy efficiency of TCAM-based classifiers
Hsiao et al. A high-throughput and high-capacity IPv6 routing lookup system

Legal Events

Date Code Title Description
AS Assignment

Owner name: THE ROYAL INSTITUTION FOR THE ADVANCEMENT OF LEARN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GROSS, WARREN;ONIZAWA, NAOYA;REEL/FRAME:032171/0012

Effective date: 20130228

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION