WO2000070821A2 - Packet classification state machine - Google Patents
Packet classification state machine Download PDFInfo
- Publication number
- WO2000070821A2 WO2000070821A2 PCT/CA2000/000580 CA0000580W WO0070821A2 WO 2000070821 A2 WO2000070821 A2 WO 2000070821A2 CA 0000580 W CA0000580 W CA 0000580W WO 0070821 A2 WO0070821 A2 WO 0070821A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- state
- data
- state machine
- address
- memory
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/65—Updates
- G06F8/656—Updates while running
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/448—Execution paradigms, e.g. implementations of programming paradigms
- G06F9/4498—Finite state machines
Definitions
- the invention relates to programmable state machines and more particularly to programmable packet classification state machines for use in high-speed communication.
- non-procedural methods of programming the state machines have been regarded as having few advantages.
- Some of the "non-procedural" methods of programming a state machine focus on reuse of programming code and modularity of the code. Therefore, procedural elements within each state are not only permitted, but are viewed as somewhat essential. Different states, however, are commonly easily reused or assembled into a whole classification description.
- Reprogramming of a state machine having procedural programming is considered a straightforward task. State machine execution is paused, new programming is loaded into program memory and then the state machine is restarted. The process of pausing the state machine often involves halting data flow, which is undesirable. For use in firewalls and other security applications, a change in programming often results from a change in security procedures. As such, it is important to load the change as quickly as possible. Of course, it is evident to one skilled in the art that it is often impractical to shut down a system during reprogramming and, hence reprogramming of a system often only occurs at certain times. This is inconvenient.
- Another problem is encountered with current reprogramming techniques for programmable state machines when several state machines share a common program memory. When upgrading the programming for any of the state machines, all state machines are paused. This is a significant problem that is possibly avoided by duplicating program memory in order to allow all but one state machine to remain operative during a reprogramming operation. Another method of avoiding this problem is to provide each state machine with dedicated state machine memory.
- a current area of research in high-speed state machine design is the area of digital communications.
- data is grouped into packets, cells, frames, buffers, and so forth.
- the packets, cells etc contain data and classification information. It is important to classify packets, cells, etc. for routing and correctly responding to data communications.
- An approach to classifying data of this type uses a state machine.
- Gigabit Ethernet For Gigabit Ethernet, it is essential that a state machine operate at very high speeds to process data in order to determine addressing and routing information as well as protocol-related information. Unfortunately, at those speeds, memory access is a significant bottleneck in implementing a state machine or any other type of real time data processor. This is driving researchers to search for innovative solutions to increase classification performance. An obvious solution is to implement a classification state machine completely in hardware. Non-programmable hardware state machines are known to have unsurpassed performance and are therefore well suited to these high data rates; however, the implementation of communication protocols is inherently flexible in nature. A common protocol today may be obsolete in a few months. Therefore, it is preferable that a state machine for use with Gigabit Ethernet is programmable. In the past, solutions for 10 Mbit and 100 Mbit Ethernet data networks required many memory access instructions per state in order to accommodate programmability. This effectively limits operating speeds of the prior art state machines.
- a programmable state machine for classification of data can be implemented entirely in software.
- software state machines are often much slower than their hardware equivalents.
- each operation is performed by a software instruction and state changes result in branch operations.
- to implement a high-speed state machine in software for packet classification requires many instructions per second - many more than a billion - requiring expensive parallel processors or technologies unknown at present.
- a severe limitation to performance is the speed of memory devices. For example, should a 7 ns memory device be used, less than one memory access per memory device is possible for each bit of a Gigabit Ethernet stream. Thus, if each byte - 8 bits - of data is processed in a single state, only one memory access operations is possible for each state. To implement such a system as a purely software solution is impractical and unlikely.
- a method of programming state machine memory comprising a plurality of storage locations including a first state address storage location, comprising the steps of: providing data corresponding to each new state, the data including data corresponding to all states preceding each new state including a new first state; storing the data within the program memory, data relating to each new state stored at a state address for said new state, the data stored in unoccupied storage locations; storing data relating to the new first state at the new first state address, the data stored in a storage location unoccupied by current state machine programming; and once the data corresponding to each new state is stored, replacing data within the first state address location with the new first state address.
- programming of the state machine is performed during execution of the state machine, and programming of the state machine is performed without interruption of execution of the state machine.
- the first state address location can be read at any time and will return proper data results.
- a method of achieving this is to store data within the first state address location atomically.
- Another method is to provide two registers for the first state address, an active register and an inactive register, activating the inactive register being performed atomically.
- the state machine memory is for simultaneous use by a plurality of state machines.
- the programming of each of the plurality of state machines is optionally same programming or some state machines have different programming.
- programming of a state machine from the plurality of state machines is performed during execution of the plurality of state machines without interruption of execution of any other state machine from the plurality of state machines; programming of a state machine from the plurality of state machines is performed during execution of the plurality of state machines without interruption of execution of any state machine from the plurality of state machines; and/or programming of a state machine from the plurality of state machines, data relating to another state machine from the plurality of state machines remains unaltered.
- a method of programming state machine memory comprising a plurality of locations and a first state address storage location, comprising the steps of: providing an image of current state machine memory on a computer; altering the state machine programming; determining states that are modified within the current state machine; determining states preceding the states that are modified including a new first state; determining unoccupied memory locations within the current state machine memory; compiling the modified states and the states preceding the states that are modified for storage in unoccupied memory locations within the current state machine memory to form reprogramming data; storing the reprogramming data within the unoccupied memory locations within the current state machine memory; once the reprogramming data is stored, replacing data within the first state address location with the new first state address; and, updating the image of the current state machine.
- the method also comprises the step of: determining an amount of unoccupied memory; determining if the reprogramming data will fit within the unoccupied memory; and, when the reprogramming data will not fit within the unoccupied memory, compiling a portion of the modified states and the states that precede them to form new reprogramming data and replacing the reprogramming data with the new reprogramming data.
- the program memory is only read by the state machine. This is achieved by storing a current state address in memory outside the program memory. In such an embodiment, only the reprogrammer writes data to the program memory. This is advantageous since, when a plurality of state machines operates from a same memory, a state machine is assured of state machine data integrity.
- a reprogrammable state machine memory for simultaneous use by a plurality of state machines, the reprogrammable state machine memory comprising: a program memory for storing data relating to states within each state machine, the data for each state stored at a state address; a plurality of initial state address memory locations, each initial state address memory location for storing a first state address of a state machine from the plurality of state machines, means for storing the data within the program memory, data relating to each new state stored at a state address for said new state and data relating to the new first state stored at the new first state address, the data stored in storage locations unoccupied by any of the plurality of state machines; and, means for once the data corresponding to each new state is stored, replacing data within the first state address location with the new first state address.
- the plurality of initial state address memory locations is a plurality of registers, one register for each state machine from the plurality of state machines.
- the invention provides a packet classification state machine for classifying data from a data stream.
- the state machine comprises: a) a programmable memory for storing information relating to states within the state machine, the states including a first group of states and a second group of states, the first group of states each represented by a table of data at a table address and including a first plurality of table elements addressable at an offset from the table address, each of the table elements indicative of a next state within the state machine, and the second group of states each represented by a table of data including a table element and occupying less memory than a table of data representing a state of the first group of states; and, b) a processor for determining a next state based on contents of a table element of a present state at an offset from the table address, the offset determined during a table address load portion of an instruction cycle in dependence upon the bits in the data stream, the processor also for switching the state machine into the next state so determined.
- the invention provides a packet classification state machine for classifying data from a data stream.
- the state machine comprises: a) a programmable memory for storing information relating to states within the state machine, the states including three groups of states; the first group of states each represented by a table of data including 2" table elements addressable at an offset from the table address, the table elements indicative of a next state within the state machine, n bits in the data stream for determining the offset, the second group of states each represented by a table of data including 2 n table elements having a size smaller than that of table elements of the first group of states, the table elements addressable at an offset from the table address, the table elements indicative of a next state within the state machine, n bits in the data stream for determining the offset; and the third group of states each represented by data indicative of a single possible next state; and, c) a processor for storing for a table address of the first group a current state address and a plurality of bits from the data stream together to form an address, for a table
- a method of packet classification for classifying data from a data stream comprises the following steps: a) providing classification data comprising information relating to states within a state machine, the states including a first group of states and a second group of states, the first group of states each represented by a table of data at a table address including a first plurality of table elements addressable at an offset from the table address, each of the table elements indicative of a next state within the state machine, and the second group of states each represented by a table of data including a table element and occupying less memory than a table of data representing a state of the first group of states; b) providing a table address of a current state; c) selecting a table element, the table element selected in dependence upon the table address, the table contents, and an offset based on a plurality of bits in the data stream; d) determining a next state of the state machine based on the content of the selected element of a present state; and, e) switching the
- Figures la and lb are simplified state diagrams for classification state machines according to the prior art
- Figures lc is a simplified state diagram for classification state machines according to the invention.
- Figure 2 is a simplified flow diagram of a method of reprogramming a state machine during execution according to the invention
- Figure 3 is a simplified flow diagram of a method of reprogramming a plurality of state machines during execution according to the invention
- Figure 4 is simplified state diagram of a greatly simplified state machine
- FIG. 5 is simplified state diagram of the greatly simplified state machine of Figure 4 with modifications thereto;
- Figure 6 is a simplified state diagram of the combined state machines of Figures 4 and 5;
- Figure 7 is a simplified flow diagram of a method of memory recovery according to the invention.
- Figure 8 is a simplified state diagram of a cyclic state machine for explanation of the present inventive method
- Figs. 9a and 9b are simplified state diagrams for classification state machines according to the prior art
- Fig. 10a is a simplified packet descriptor for classifying a packet as one of four classifications - A, B, C, or D
- Fig. 10b is a simplified diagram of a classification tree for the packet classifications of
- Fig. 10c is a classification tree equivalent to that of Fig. 10b shown using 2 bits per symbol
- Fig. lOd is a classification tree equivalent to that of Fig. 10b shown using 3 bits persymbol
- Fig. 11 is a table indicating memory usage and opcodes for an exemplary embodiment of a state machine according to the invention.
- Fig. 12a is a simplified memory diagram for a state machine memory according to the invention.
- Fig. 12b is an address table for a classification tree implemented for the classification tree of Fig. 10b;
- Fig. 12c is an address table for a classification tree implemented for the classification tree of Fig. 10c
- Fig. 12d is an address table for a classification tree implemented for the classification tree of Fig. lOd;
- Fig. 13 is a simplified block diagram of an integrated circuit implementation of an acyclic classification state machine according to the invention.
- Fig. 14 is a simplified diagram of a system according to the invention of implementing a plurality of state machines using a single same programmable classification data memory.
- data packet encompasses the terms buffer, frame, cell, packet, and so forth as used in data communications.
- a data packet is a grouping of data that is classifiable according to a predetermined classification. Classifications are commonly codified by standards bodies, which supervise communication standards.
- a state represented by a rectangle is a terminal state.
- Other than terminal states are represented by triangles.
- Figure la a simplified state diagram for a state machine supporting a single data access operation per data cycle is shown. Each state is in the form of a look-up table at a state address. The states are represented as blocks having four state transitions, each shown as a line to another state. Sometimes, several state transitions lead to a same next state. A predetermined number of bits (2 for the simple diagram of Fig. la) is loaded into the lowest order bits of the address, and data from the newly formed address is read. When the data comprises another state address, the next predetermined number of bits from the data stream is loaded into the lowest order bits. Otherwise an action such as filtering or packet classification is performed.
- the address of block 1 is fixed.
- the address is 0.
- another fixed address is possible. Since writing a single block, for example 1. often requires storing data in each of several locations thereby requiring more than a single clock cycle, there is a time during reprogramming when the block is partially reprogrammed. If the state machine reaches the block while it is partially programmed, indeterminate results occur. This is undesirable.
- FIG. lb three state machines stored within a single memory are shown. Each has completely separate blocks and, as such, each is independent of the others. In order to reprogram one of the state machines, it alone needs to be interrupted. Unfortunately, there is little benefit in storing programming of several state machines within a single memory when they are truly independent. In contrast, where a plurality of state machines share a single memory and share some programming, reprogramming of one of the plurality of state machines results in "downtime" for all the state machines that share the single memory.
- FIG. lc shown is a similar state diagram to that of Figure lb in which the three state machines share a same program memory and similar blocks, for example 6, 6b, and 6c, are stored as a same block (6). As is evident from Figure lc, each state machine executes a different programming. Presently, it is difficult to reprogram the state machine memory of Figure 1 c without interrupting execution of all three state machines.
- a state machine executes a classification function.
- the classification function is an acyclic classification function, but this need not be so.
- the state machine is based on a table look-up for each state transition and information relating to each state transition is stored as a table of data at a state address.
- a first state address is read from a first state address storage location.
- the first state address is an address that is read at a start of state machine operation.
- the first state address indicates a first table.
- a programmer of the state machine comprises a processor for differentiating between storage locations that contain state machine data and those that do not. Commonly, this is performed by maintaining information relating to locations where current state machine information is stored.
- the programmer is provided with modifications to current state machine programming. For example, a table of data relating to a second state of the state machine is modified.
- the programmer writes any new information to the memory in storage locations that are unused by current state machines. Unused storage locations do not contain current state machine programming data.
- each state preceding any modified states is also written to program memory unused by current state machine programming.
- the newly written states form a start of the state machine programming from a beginning of state machine operation until a point in the state machine programming from which no further modifications are being made.
- the first state of the newly written data is a first state of the state machine.
- the newly written state data is used during a subsequent execution of the state machine - the modified state machine is executed.
- this operation is performed atomically. It is possible to ensure that no first state address location is read during a non-atomic first state address write operation; however, this results in a small amount of downtime for the state machine and is, therefore, undesirable.
- modified state denotes states that are modified as well as those states that are newly created. It will be apparent from the above description that reprogramming of the state machine memory is now possible during state machine operation absent pausing state machine execution or with minimal pause when other than an atomic operation is used to store the first state address.
- a subsequent execution of the state machine uses the programming of the modified state machine. However, until the first state address is updated, the unmodified programming of the state machine is executed.
- the first state address is stored in a first hardware register to enable fast access to the address.
- an atomic operation to write the first state address is typically used.
- the number of bits stored in a single clock cycle is equal to the number of bits within the register.
- a pseudo- atomic operation is used.
- the write operation writes a number of bits per available cycle and those bits are all simultaneously clocked into the register once the entire address is available.
- a second other register is written in several clock cycles and a flag bit is then updated to cause the newly written register to be read as the first state address instead of the first hardware register.
- Other methods of making the changeover from the first hardware register with an old first state address to a hardware register with the new first state address atomically will be evident to those of skill in the art based on the above disclosure.
- the first state address is stored at a fixed address within the program memory from which it is capable of being loaded and into which it is capable of being stored in an atomic or pseudo atomic fashion.
- the first state address is accessible in a single clock cycle.
- FIG. 3 a method of reprogramming a state machine during execution and according to the invention is shown.
- the method shown is for state machine program memory in concurrent use by a plurality of state machines.
- Each state machine executes a classification function.
- the classification function is an acyclic classification function, but this need not be so.
- the state machines similar to the state machine described with reference to Figure 2, are based on a table look-up for each state transition and information relating to each state transition is stored as a table of data at a state address.
- a first state address is read from a first state address storage location.
- the first state address is an address that is read at a start of state machine operation of each state machine.
- each state machine has an associated first state address.
- a programmer of the state machine comprises a processor for differentiating between storage locations that contain state machine data and those that do not.
- this is performed by maintaining information relating to locations where information of current state machines is stored.
- the programmer is provided with modifications to programming of current state machines. For example, a table of data relating to a second state of the first state machine is modified.
- the programmer writes any new information to the memory in storage locations that are unused by current state machine programming. Unused storage locations do not contain state machine programming data relating to any current state machine.
- each state preceding any modified states is also written to program memory unused by current state machine programming.
- the newly written states form a start of the state machines' programming from a beginning of state machines' operation until a point in the state machines' programming from which no further modifications are being made.
- the first states of the newly written data are first states of the state machines. Therefore, by modifying the information stored in the first state address locations the newly written state data is used during subsequent executions of the state machines, i.e., the modified state machines are executed.
- the modification of the first state address occurs in an atomic or pseudo-atomic fashion.
- state machine operation is paused during writing of the first state address. Otherwise, it is possible that the first state address is accessed during writing of the first state address and that the address loaded for starting execution of the state machine is non-sensical.
- a counter indicating a number of states referencing that state is provided.
- a counter associated with data at the address pointed to by the contents of the first state address register prior to the change is decremented since there is one less reference to that state data.
- the counter is zero, memory locations associated with that state are recovered and counters associated with data referenced from the data in the recovered memory locations are decremented. The memory recovery operation proceeds recursively until no counters having a zero value remain.
- a timer is provided for providing a timing signal to pause state machine execution when the timer has expired. This is useful for programming of the programmable memory where a program must be stored by a certain time but also may be stored anytime prior.
- the program is entered and reprogramming is performed according to the above-described method. If reprogramming is not completed within the specified time, the state machine is paused between classification operations and the programming is completed.
- the timer when a buffer is used to buffer data prior to its provision to the state machine, the timer may be implemented to determine an amount of processing time to devote to the reprogramming task. For example, when new programming is introduced, it may be desirable to provide 10% of the processing power or the state machine devoted to reprogramming. Of course, when the state machine requires less than 90% of its processing power, the reprogramming operations may consume the available processing time.
- Figure 4 is a state diagram of a current state machine having states 1 to 11. When it is desired to modify states 6, 10, and 11, prior art devices require pausing of state machine execution.
- the state diagram is shown with eleven states, however, in practice the number of states is only limited by the memory size of the state machine and the operation performed. Alterations are made to an image of the state machine of Figure 4 resulting in a state machine represented by the state diagram of Figure 5 having modified states 6n, 12, 13 and 14. As one skilled in the art will appreciate, it is possible to modify or add many new states, the number of which is only limited by the memory size of the state machine memory. Memory contents for the state machine illustrated in Figure 4 are shown in Figure 6 along with the modified states In, 2n and 6n. Data relating to the new and/or modified states, i.e., In, 2n and 6n are written into memory storage locations unused by currently executing state machine programming.
- the state machine uses the newly stored data.
- Sufficient state machine memory is provided to allow use of either state machine programming depending on the selected start address. Thus, even if three state machines use identical programming, it is possible to modify the programming of one and not the others, given three separate first state address locations. It is important that state transitions are maintained during modification of state machine programming to prevent "downtime".
- the start address is that of state In
- the state machine diagram of Figure 5 is the current state machine diagram having modified states In, 2n and 6n. Memory locations associated with states that are not accessed by any state machine - in this case by the single state machine - are now "free" storage locations or storage locations unused by current state machine programming.
- the two banks of memory are program memory and the start addresses in each bank are identified by a current bank. Therefore, the indicator for identifying the current bank forms a start address storage location.
- the start address storage location need only accommodate one bit of data which is necessarily written in an atomic fashion.
- the start address is described above as stored in a location, it is possible to devote two locations to the first state's programming and to select between the two locations based on an indicator bit. In these cases, the indicator bit forms the start address - address 0 or address 1 - and the data may be stored in a flip-flop or in another conventional fashion.
- the step of loading the current state address at the beginning of state machine execution refers to loading the address from the start address location
- the address of the first state's programming is typically fixed except for selection of the appropriate bank. Therefore, the current state address is loaded with the fixed address though the memory location that the programming data is retrieved from is dependent upon the content of the data relating to the start address.
- the current state address register contains less than the necessary address data to determine the current state address and the start address location forms part of the current state address in so far as it determines which of the available fixed start addresses to use.
- FIG. 7 a simplified flow diagram of a method of recovering memory unused by any currently executing state machine is shown.
- new state data is stored in the program memory, the existing state data remains unchanged.
- the first state address of each modified state machine is updated to reflect a new first state address for that state machine, the replaced state machine data is identified and noted as unused state machine memory.
- a value is stored with state data for each state indicating a number of references to that state data.
- the state data to which the previous first state address referenced is accessed and the value therein is decremented. If the value is 1 or greater, then their still exists a reference to that state data and it is not noted as unused memory. If the value is zero, then the state machine data at that location is not referenced and it is noted as unused. All state machine data referenced from that data is then accessed and the value associated therewith is decremented.
- the method proceeds, recursively, until there remains only state machine data with associated values of 1 or more. Of course, the method could also be implemented iteratively if so desired. The use of such a method makes memory recovery simple and allows for self contained state machine programming including all necessary information for programming and reprogramming of same. It is therefore advantageous.
- states are described as preceding other states or following other states, this terminology is most applicable to an acyclic state machine.
- the term "reachable from” is a more accurate statement of a state that follows another state.
- state A is reachable from state B is, for an acyclic state machine, equivalent to state A follows state B.
- a cyclic state machine is shown in Fig. 8 in which A is reachable from B. Incremental programming according to the invention is still useful. For modifying a single node all nodes from which the modified single node is reachable are then updated. If from a part of the graph the updated nodes are not reachable, that part of the graph need not be modified.
- Fig. 9a a simplified state diagram of a typical classification state machine according to the prior art is shown. Transitions between states are represented by a series of operations shown as lines connecting states shown as circles.
- the state diagram is for an acyclic state machine and each state is followed by one of a number of possibilities. Such a state machine is easily implemented in either software or hardware. Unfortunately, as the speed of the state machine operation is increased, operations for each state transition must be executed concurrently in order to achieve necessary performance. This bottleneck has, heretofore, required dedicated non-programmable hardware state machine design.
- Fig. 9b a reduced state diagram is shown for the state machine of Fig. 9a. Here, terminal states of the classification A are combined into a single state 901a. Other states are similarly combined.
- state R Some states, such as state R, are terminal states, ACCEPT or REJECT. Restarting the state machine follows these states. The restart typically occurs before the beginning of the subsequent packet. Typically, there is a means external to the present invention to identify the start of each packet.
- Fig. 10a a simplified diagram of a greatly simplified protocol for packet classification is shown.
- the simplified protocol is used to facilitate understanding of the invention absent detailed knowledge of Ethernet or other communication protocols.
- Four bit patterns are shown, each representing a different classification. The bit patterns are similar. A first set of three bits must each be one or the data within the data stream remains unclassified. This is followed by 8 bits that are not important to the classification excepting that they occur. Three more bits must each be one and then eight more "don't care" bits. The final two bits are then used to distinguish between the four classifications.
- Fig. 10b a classification tree for implementing a programmable state machine is shown.
- a typical packet classification tree comprises data relating to a plurality of classification protocols each of which has many bits; the classification tree shown in Fig. 10b is simplified to facilitate explanation of the invention.
- Typical classification trees result in a very large data structures that are, in many instances, too large to store in a single integrated memory device.
- a gigabit Ethernet For a gigabit Ethernet a single bit arrives in Ins. For a gigabit Ethernet packet classification state machine operating on three bits per state, the first three bits arrive in 3ns. Once provided to the state machine, there is a lag of 3 ns until a further 3 bits arrive. During those 3 ns, all operations for a state transition are completed. Of course, when more time is required to prepare for a state transition, more bits are grouped together. Thus, depending on the speed of communications, a state machine for classification using two bits at a time is possible as is one using 8 bits or 16 bits at a time.
- Such a state is easily implemented in a look-up table having a number of addresses or in another form of address conversion.
- a look-up table implementation upon receiving the 8 data bits, only a single memory access is required for determining the next memory address for the look-up table.
- a current table address is loaded in the high order bits of a register. 8 bits from the data stream are loaded into the low order bits of the register and act as an offset of 0-255. Once loaded, data at the location indicated by the register is loaded into the higher order bits. It is checked for an accept or reject and the next 8 bits from the data stream are loaded into the low order bits to form a new address. This continues until a terminal state is reached.
- look-up table storage becomes very large and results in increased costs and reduced performance by forcing the use of external memory devices. Therefore, it is evident that there is a practical limitation to the number of bits that may be processed in parallel in a programmable state machine. Also, since memory circuitry capable of supporting necessary speeds is currently limited to integrated memory circuitry, there is a limitation on the amount of memory that is available as classification data. Because of this limitation, memory optimization provides significant advantages.
- packet classification state machines For packet classification state machines, the resulting state is of significance for determining an operation or a packet class. As such, packet classification state machines follow a tree structure. When each node of the tree relates to one bit as shown in Fig. 2b, the tree is very large and has many nodes, each node having two child nodes and one parent node. When each node relates to 8 bits, the tree has far fewer levels, but actually has a similar number of edges. In Figs. 10c and lOd two bits and three bits, respectively, relate to each node. As is evident to those of skill in the art, many nodes are of no consequence and, hence, optimization of those nodes, which are useful in packet classification is an important aspect of a programmable state machine for this purpose.
- a preferred method of classification data optimization makes use of novel memory optimization techniques in order to provide reduced classification tree data having same information therein.
- Figs. 10b, 10c, and lOd alternative representations of the same classification tree are shown.
- nodes that require more or less information than others.
- This is used, according to the present invention, in order to optimize memory storage requirements. For example, storing each node having only one possible next node - each edge from the node is to a same destination node - as a single edge reduces the number of edges in Fig. 10b from 50 to 34. This is a significant reduction in overall storage requirements.
- using a different number of edges per node results in different savings.
- the same optimisation technique reduces the number of edges in the tree of Fig. 1 Oc from 48 to 27 and in the tree of Fig. lOd from 64 to 36.
- the preferred method also groups nodes into several categories each requiring a different amount of storage - full width nodes; half width nodes; full width don't care nodes, half width don't care nodes, and so forth.
- memory requirements are significantly reduced.
- the representation of data as one of the groups is easily distinguishable. Where four groups exist, two bits are sufficient to classify an element into a group. Memory optimization is preferred since, as disclosed above, once external memory is used performance is decreased. Of course, other forms of optimizing memory usage are also applicable in association with the present invention.
- every edge extending from a node - every address - is individually addressable whether it is a full width edge or a half width edge and whether it is word aligned or not.
- full width nodes are word aligned and require, for a 2 n way node, 2 ⁇ words.
- Half width nodes are either word aligned or half word aligned. This allows a consistent concatenation of bits from the data stream for full width and half width nodes where, for half width nodes, a bit selects between right half of a word and left half thereof - half word aligned and word aligned, respectively. Of course, for full width nodes, no extra bit is necessary.
- half width nodes are shown as 2 n half words aligned either left or right within a word.
- half width nodes are word aligned - for a two edge pair - and, for a 2 n way node, require 2 n" ' words.
- each edge includes information distinguishing the group of the node to which the edge extends. This additional information includes the two bits described above and, for half width nodes, a bit indicative of right half word or left half word.
- a half word is referred to herein as a byte regardless of its size. Since half width nodes are byte aligned and, a word is read for each state transition, it is useful to select the appropriate byte for a current node. A preferred method of doing so is described below with reference to Fig. 12.
- a packet classifier with substantially optimized memory usage and capable of supporting high-speed packet classification
- the following edge contents of a classification tree are used: ACCEPT, REJECT, and JUMP.
- ACCEPT REJECT
- JUMP JUMP
- Other operations are possible. Some operations may require more or less data bits and therefore may necessitate other groups of nodes.
- ACCEPT indicates that a packet is classified - a leaf of the classification tree is reached - and that appropriate action for the classified packet is desired.
- Some forms of appropriate action include passing a classification tag to a system or user, passing the packet to a predetermined routine, pushing the packet onto an application stack for a known application, and so forth. Packet processing is well known in the art of Ethernet communication.
- REJECT indicates that a packet is not of a classified packet type. As such no "appropriate" action is performed, though a default action is desirable in many situations. For example, a reject tag is provided to a system or user to indicate that classification was unsuccessful.
- JUMP affects the contents of a current status register. In effect, this operation results in a change of the current node and therefore of the current state.
- a JUMP operation loads an address contained in the word with the JUMP operation into the status register. This results in a change of state - a change to a different node within the classification tree.
- a memory access is performed to retrieve the edge information and a further register to register transfer is performed to load new contents into the status register.
- Using a packet classification state machine designed specifically to implement the present invention these actions are easily performed within existing time constraints. The inclusion of a command and data required to complete the command within a same word of data enables this performance.
- a table shows memory word contents.
- Each word in a full width node comprises 2w+2 bits, where w is sufficient number of bits to represent an address within the classification tree - within the state machine.
- w is sufficient number of bits to represent an address within the classification tree - within the state machine.
- other than the ACCEPT command all other commands fit within a half word.
- some memory addresses are represented by more than w bits and some JUMP operations are full width. This is significant for state machine optimization. As shown, the width need not include the lower order bits, which are inserted from the data stream being classified.
- an indication that a node is half width, full width, or don't care is registered and stored for use in the subsequent state.
- node information of a direct acyclic state machine is stored in memory.
- protocol descriptions are provided and during configuration, those protocols that are selected are compiled into a classification tree, optimised, and stored in programmable memory of the state machine device.
- Nodes of the state machine have two of the following formats resulting in four possible formats: full width/half width, and don't care/ 2 n way. Each of these node formats is related to a format for storing edge information extending from the nodes. Don't care nodes have one edge to a following state node. 2" way nodes have 2 n edges.
- Full width nodes are implemented such that each edge is allocated a complete word, while half width nodes store information relating to each of two edges within a single word. Basically, some edge information requires a half word of information or less and others require more. When a single edge from a node requires more than a half word of information for implementation, then all edges from the node are full width. Otherwise, the edges are optimized to half width. Also, where all edges from a node are identical, storage is reduced to a single edge. Of course, other node types having other memory storage optimizations are possible within the scope of the invention.
- a memory map of packet classification data memory is shown.
- the memory is divided into four areas.
- the areas support the four formats of nodes - don't care nodes, both full width and half width, and 2" way nodes, both full width and half width.
- Each JUMP instruction has two additional bits to identify the type of the next node - don't care/2 n way and full/half width. For half width nodes, another bit indicates the alignment as word aligned or right aligned (half word aligned).
- the tree of Fig. 20b requires storage for 50 edges and the tree of Fig. lOd requires storage for 64 edges. Therefore, maintaining similar memory storage requirements for a tree having more edges per node is advantageous.
- FIG. 13 A simplified block diagram of a classification state machine according to the invention is shown in Fig. 13.
- the state machine is useful for classifying data from a data stream and in particular for classifying data packets.
- the state machine as shown, is integrated within a single integrated circuit 1 100.
- the integrated circuit 1 100 comprises a programmable memory 1110, a processor 1130, and a programmable memory arbiter 1140.
- the programmable memory 1110 in the form of fast static random access memory
- RAM random access memory
- static RAM is capable of achieving performance speeds that enable support of high-speed operation.
- known DRAM circuitry is used to increase densities and decrease costs.
- the RAM 1110 is for storing information relating to classification of stream data - classification tree data.
- the RAM 1110 is for storing data relating to states within the state machine.
- the RAM 1110 data relating to each state machine node is stored.
- the data is stored in tables, each table having a table address and a table format.
- the RAM 1110 is accessible by the processor 1 130 one time per state transition.
- the processor 1130 has a highest priority when accessing classification data within the RAM 1110. This ensures that no other RAM access operations affect state machine performance.
- the programmable memory arbiter 1140 controls access to the RAM 1110.
- the programmable memory arbiter 1 140 ensures that reprogramming of the RAM 1110 does not effect performance of the state machine. Effectively, at low speeds the arbiter 1140 guarantees the processor a single access to the RAM 1110 for every state transition.
- the arbiter prevents RAM access operations other than by the processor 1130 while classification operations are underway. This essentially limits reprogramming of the programmable memory 1 1 10 to times when the state machine is disabled to allow for reprogramming or when incoming data is part of a classified packet, there are no packets requiring classification.
- a state machine is commonly disabled to allow reprogramming when it is used in security applications requiring changes in programmable memory contents due to security concerns.
- the processor 1 130 is for retrieving information from the programmable memory 1110 one or fewer times per state transition. Preferably, a single access to programmable memory occurs for each state transition during classification. Of course, once data is classified, there is a pause in state machine operation until a next packet commences. During this pause, the RAM arbiter 1140 allows programming of the RAM 1 1 10. Thus, even at very high speeds, the state machine is fully programmable.
- the processor retrieves data from the RAM 1 1 10 and uses the data to perform an operation for determining a next state and then switches the state machine into the next state so determined.
- ACCEPT operation is provided with a small amount of bits for indicating a classification, this is optionally increased in any of several ways.
- an ACCEPT operation is implemented with a value that is an index to a classification, thereby allowing lengthy classification codes.
- ACCEPT operations are implemented using full width nodes providing an extra w bits for classification results.
- the two methods may be combined.
- the arbiter 1 140 provides transparent access to the programmable memory 1 110 for the processor 1130. This is achieved by blocking memory access operations for reprogramming the program memory 1 110 when a memory access from the processor 1 130 is a potential operation. Since the processor 1130 is provided highest priority, it is essential that it not be blocked in attempting to access the programmable memory 11 10. Alternatively, other methods of arbitrating between ports are employed.
- the processor since the processor does not access programmable memory 1 110 more than one time per state transition, when more than a single RAM access is possible per state transition, the processor is allocated one access to programmable memory 11 10 and reprogramming the programmable memory is performed during other accesses to the memory for a same state transition.
- the processor When new packet data is detected, the processor is provided with a start address for beginning classification operations. The start address is programmable and is used in accordance with a method of reprogramming the programmable memory as described below.
- Clock sources for use according to the invention are selected based on an application and based on current knowledge within the art. Selection and implementation of clock sources and of methods of synchronizing clocks to network provided clock sources are well known in the art of communication hardware design.
- the present invention scales easily to gigabit rates.
- a CPU 10 times faster per port is not economical.
- required processing speed varies with complexity of classification criteria.
- a 10Mbit Ethernet packet classification state machine when implementing a 10Mbit Ethernet packet classification state machine, several processors use a same programmable memory 1110. This is shown in Fig. 14.
- the resulting system has a plurality of processors 1 130, and a single programmable memory 1110.
- Each state machine is simultaneously classifying data according to a different classification tree structure.
- several processors follow a same classification tree structure.
- classification data associated with each processor is independently programmable.
- Reprogramming of the program memory 1110 is accomplished by adding new table data relating to changes in the classification tree structure of one or more state machines from an existing node in the tree back toward the root of the tree. The root address is then provided as a start address to the processor executing that state machine.
- tables stored within the programmable memory 1110 are no longer used by any processor, those memory locations are reused during subsequent reprogramming. In this way, programming occurs during operation of the state machines without affecting existing programming or existing classification operations. Because of the above, it is important to automate, so much as possible, the classification data table generation and optimization process. Once automated, the specification of packet classification is unimportant in a procedural sense and becomes a process of pattern matching. Essentially, tree construction is a matter left to a packet classification program compiler. Of course, a similar system is applicable to filtering which is considered packet classification with two classes - accept and reject.
- compiler software maintains an image of the data within the programmable memory and determines memory in use and memory locations, which are no longer part of any state machine classification data. In this way, frequency of memory overflow is reduced by allowing reuse of memory once its contents are obsolete. Though this appears straightforward, because the present invention supports incremental programming and the program memory can support a plurality of processors, it is very useful to track memory allocation automatically.
- the invention may also be implemented where unnecessary accesses to the programmable memory are replaced with, for example, a counter.
- a count command is loaded with a count number, 4, and a next address. Data from the next address within programmable memory is retrieved once the counter has completed counting the specified number. For a single Don't care, this eliminates a single memory access.
- the programmable memory is accessed for read purposes by the processor of the state machine.
- the state machine processor does not write to the programmable memory.
- the programming of the programmable memory only requires write operations. Therefore a memory having a write access and a separate read access - dual port - is applicable to the invention.
- true dual port memory is used and memory arbitration is unnecessary or is provided within the memory circuitry.
- programming and diagnostics uses read write access to the programmable memory while the processor has read access to the programmable memory.
- the invention is implemented within a custom IC such as an ASIC.
- a custom IC such as an ASIC.
- the MAC stops DMA of the incoming frame into host memory or if DMA has not started flushes the frame from it's queue. The same receive descriptor is then used without any involvement of the driver software, thereby, reducing traffic on a host's bus.
- the design when implemented within an FPGA, the design is preferably optimized for implementation within the type of FPGA used.
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00929186A EP1190314A2 (en) | 1999-05-18 | 2000-05-17 | Packet classification state machine |
AU47395/00A AU4739500A (en) | 1999-05-18 | 2000-05-17 | Packet classification state machine |
IL14653900A IL146539A0 (en) | 1999-05-18 | 2000-05-17 | Packet classification state machine |
JP2000619158A JP2003500709A (en) | 1999-05-18 | 2000-05-17 | Packet classification state machine |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/313,183 US6424934B2 (en) | 1998-05-18 | 1999-05-18 | Packet classification state machine having reduced memory storage requirements |
US09/313,182 US6349405B1 (en) | 1999-05-18 | 1999-05-18 | Packet classification state machine |
US09/313,182 | 1999-05-18 | ||
US09/313,183 | 1999-05-18 | ||
US45946099A | 1999-12-13 | 1999-12-13 | |
US09/459,460 | 1999-12-13 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2000070821A2 true WO2000070821A2 (en) | 2000-11-23 |
WO2000070821A3 WO2000070821A3 (en) | 2001-10-25 |
Family
ID=27405653
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CA2000/000580 WO2000070821A2 (en) | 1999-05-18 | 2000-05-17 | Packet classification state machine |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1190314A2 (en) |
JP (1) | JP2003500709A (en) |
AU (1) | AU4739500A (en) |
IL (1) | IL146539A0 (en) |
WO (1) | WO2000070821A2 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7802245B2 (en) * | 2006-04-27 | 2010-09-21 | Agere Systems Inc. | Methods and apparatus for performing in-service upgrade of software in network processor |
KR101448550B1 (en) | 2012-11-21 | 2014-10-13 | 서울대학교산학협력단 | Apparatus and Method for Traffic Classificaiton |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS59188702A (en) * | 1983-03-04 | 1984-10-26 | Mazda Motor Corp | Programmable controller |
EP0228053A2 (en) * | 1985-12-23 | 1987-07-08 | AT&T Corp. | Control of real-time systems utilizing a nonprocedural language |
EP0266505A2 (en) * | 1986-09-11 | 1988-05-11 | International Business Machines Corporation | Versioning of message formats in a 24-hour operating environment |
US5375248A (en) * | 1990-10-05 | 1994-12-20 | Bull Hn Information Systems Inc. | Method for organizing state machine by selectively grouping status signals as inputs and classifying commands to be executed into performance sensitive and nonsensitive categories |
US5826030A (en) * | 1995-11-30 | 1998-10-20 | Excel Switching Corporation | Telecommunication switch having a universal API with a single call processing message including user-definable data and response message each having a generic format |
-
2000
- 2000-05-17 JP JP2000619158A patent/JP2003500709A/en active Pending
- 2000-05-17 AU AU47395/00A patent/AU4739500A/en not_active Abandoned
- 2000-05-17 EP EP00929186A patent/EP1190314A2/en not_active Withdrawn
- 2000-05-17 WO PCT/CA2000/000580 patent/WO2000070821A2/en not_active Application Discontinuation
- 2000-05-17 IL IL14653900A patent/IL146539A0/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS59188702A (en) * | 1983-03-04 | 1984-10-26 | Mazda Motor Corp | Programmable controller |
EP0228053A2 (en) * | 1985-12-23 | 1987-07-08 | AT&T Corp. | Control of real-time systems utilizing a nonprocedural language |
EP0266505A2 (en) * | 1986-09-11 | 1988-05-11 | International Business Machines Corporation | Versioning of message formats in a 24-hour operating environment |
US5375248A (en) * | 1990-10-05 | 1994-12-20 | Bull Hn Information Systems Inc. | Method for organizing state machine by selectively grouping status signals as inputs and classifying commands to be executed into performance sensitive and nonsensitive categories |
US5826030A (en) * | 1995-11-30 | 1998-10-20 | Excel Switching Corporation | Telecommunication switch having a universal API with a single call processing message including user-definable data and response message each having a generic format |
Non-Patent Citations (6)
Title |
---|
"TABLE UPDATE SERIALIZATION TECHNIQUE" IBM TECHNICAL DISCLOSURE BULLETIN, [Online] vol. 21, no. 3, August 1978 (1978-08), pages 1158-1162, XP002154926 New York, US Retrieved from the Internet: <URL:http://www.delphion.com/tdbs/tdb?&ord er=78A+04946> [retrieved on 2000-12-04] * |
ASHAR P., DEVADAS S., NEWTON A. R.: "OPTIMUM AND HEURISTIC ALGORITHMS FOR FINITE STATE MACHINE DECOMPOSITION AND PARTITIONING" INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, US, LOS ALAMITOS, IEEE COMPUTER SOCIETY PRESS, vol. CONF. 7, 5 November 1989 (1989-11-05), pages 216-219, XP000164239 ISBN: 0-8186-1986-4 * |
DEVADAS S., NEWTON A. R.: "DECOMPOSITION AND FACTORIZATION OF SEQUENTIAL FINITE STATE MACHINES" INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, US, WASHINGTON, IEEE COMPUTER SOCIETY PRESS, vol. CONF. 6, 7 November 1988 (1988-11-07), pages 148-151, XP000040350 * |
LAM K., DEVADAS S.: "PERFORMANCE-ORIENTED DECOMPOSITION OF SEQUENTIAL CIRCUITS" PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, US, NEW YORK, IEEE, vol. CONF. 23, 1 May 1990 (1990-05-01), pages 2642-2645, XP000164086 * |
PATENT ABSTRACTS OF JAPAN vol. 009, no. 050 (P-339), 5 March 1985 (1985-03-05) -& JP 59 188702 A (MAZDA KK;OTHERS: 01), 26 October 1984 (1984-10-26) * |
VILLA T., SANGIOVANNI-VINCENTELLI A.: "NOVA: STATE ASSIGNMENT OF FINITE STATE MACHINES FOR OPTIMAL TWO- LEVEL LOGIC IMPLEMENTATION" IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, US, IEEE INC., NEW YORK, vol. 9, no. 9, September 1990 (1990-09), pages 905-924, XP000159299 ISSN: 0278-0070 * |
Also Published As
Publication number | Publication date |
---|---|
JP2003500709A (en) | 2003-01-07 |
EP1190314A2 (en) | 2002-03-27 |
AU4739500A (en) | 2000-12-05 |
WO2000070821A3 (en) | 2001-10-25 |
IL146539A0 (en) | 2002-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6167047A (en) | Packet classification state machine | |
JP4856580B2 (en) | Method and apparatus for performing an in-service upgrade of software in a network processor | |
US6799267B2 (en) | Packet processor | |
US5555405A (en) | Method and apparatus for free space management in a forwarding database having forwarding entry sets and multiple free space segment queues | |
US7496721B1 (en) | Packet processor memory interface with late order binding | |
US6985438B1 (en) | Method and apparatus for processing and forwarding data packets | |
USRE44129E1 (en) | System and method for instruction-level parallelism in a programmable multiple network processor environment | |
US6349405B1 (en) | Packet classification state machine | |
US7930691B2 (en) | Methods and apparatus for updating data structures during in-service upgrade of software in network processor | |
US6424934B2 (en) | Packet classification state machine having reduced memory storage requirements | |
US5493652A (en) | Management system for a buffer memory having buffers of uniform size in which the buffers are divided into a portion of contiguous unused buffers and a portion of contiguous buffers in which at least some are used | |
CN100349442C (en) | Ping pong buffer device | |
US7350202B2 (en) | Method for re-programming a firmware state machine during execution | |
EP0168054B1 (en) | Method and system for data driven information processing | |
WO2000070821A2 (en) | Packet classification state machine | |
US5276821A (en) | Operation assignment method and apparatus therefor | |
JP2006500653A (en) | Method and apparatus for detecting errors when writing to persistent memory | |
CN1781079A (en) | Maintaining entity order with gate managers | |
US6886159B2 (en) | Computer system, virtual machine, runtime representation of object, storage media and program transmission apparatus | |
US7158529B2 (en) | Device for data stream decoding | |
WO2001086433A1 (en) | Packet classification state machine | |
US6401188B1 (en) | Method for selection on a pattern sequence | |
CN114442736B (en) | Clock configurator based on dynamic configuration interface and FPGA system | |
CN112350947B (en) | Message matching decision tree updating method and device | |
CN114024885B (en) | IP routing table management system and method based on subnet mask division |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2000 619158 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2000929186 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 2000929186 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000929186 Country of ref document: EP |