AU2005247023A1

AU2005247023A1 - Network security packet memory allocation method

Info

Publication number: AU2005247023A1
Application number: AU2005247023A
Authority: AU
Inventors: Ashley Partis; Lok Sun Nelson Tam
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2005-12-22
Filing date: 2005-12-22
Publication date: 2007-07-12

Description

S&FRef: 744400

AUSTRALIA

PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address of Applicant Actual Inventor(s): Address for Service: Invention Title: Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3-chome, Ohta-ku, Tokyo, 146, Japan Ashley Partis, Lok Sun Nelson Tam Spruson Ferguson St Martins Tower Level 31 Market Street Sydney NSW 2000 (CCN 3710000177) Network security packet memory allocation method The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5845c -1- NETWORK SECURITY PACKET MEMORY ALLOCATION METHOD FIELD OF THE INVENTION The current invention relates to memory allocation and in particular to methods of efficiently allocating memory among multiple processes within hardware-accelerated implementations of cryptographic processing systems.

BACKGROUND

The data communication protocol that is generally used to communicate over the Internet, namely the Internet Protocol is not secure. Consequently, third parties can potentially eavesdrop on transmitted packets, repeat transmitted packets or even divert packets off the network and replace the diverted packets with locally created or forged packets.

IPsec is a standard for securing IP communications by encrypting and authenticating IP packets. IPsec is a protocol suite. Examples of IPsec sub-protocols include the Authentication Header (AH) and the Encapsulating Security Payload (ESP).

The act of applying one or more sub-protocols to an IP packet is known as a transformation. A transformation typically comprises a combination of encryption, header attachment, hashing, and compression processes. These processes tend to be computation-intensive. For network nodes that process a large amount of secure network traffic, this produces a large computation load that is typically handled by a Central Processing Unit (CPU). A more efficient way is to offload this work to dedicated application-specific integrated circuits (ASICs), sometimes referred to as Security Processing Units (SPUs).

There are several benefits to using a hardware-implemented SPU. One of these is hardware's ability to process several network packets concurrently. Most CPUs can 221205 744400 -2only transform one packet at a time due to their sequential design. Hardwareimplemented SPUs on the other hand can be designed to contain several cryptographic processors, thereby being able to transform several packets concurrently in parallel. This

C)

Spresents challenges to memory allocation to each of these concurrently operable processors on the SPU.

When each crypto-processor operates on a network packet, the packet resides at some location in a shared memory bank. Since memory is shared, it is possible for packets to overlap one another, causing the cryptographic operation to fail. The term (i "packet overlap" in the present context means, for example, that if packet No. 1 is written to memory locations 1-5 inclusive and packet No. 2 is written to memory locations 4-7 inclusive, then the packets overlap at memory locations 4 and 5. This overlapping scenario must be prevented.

It is not sufficient to merely place the packets in a non-overlapping manner before transformation. Since headers and padding are added to the packets during transformation, the packets are constantly growing in size. Therefore, even if two packets are initially placed away from each other in distinct memory locations, there is still a possibility of packets overlapping during transformation.

This task is further complicated by the fact that a packet may undergo multiple transformations, if the network security policy requires so. It is not possible to predict exactly how many transformations will be applied to a packet when it is received.

Consequently, the final size of the packet cannot be determined in advance, so it is impossible to pre-allocate a section of memory that is guaranteed to be sufficient for any particular packet.

221205 744400 -3- For software implementations on typical computer systems, memory access by multiple concurrent processes is not an issue, since this matter is automatically managed by an operating system. Operating systems use virtual memory to give each concurrent Nprocess an illusion of owifling its individual memory space, independent of all other

(N

processes. This approach prevents any packets overlapping.

N Further, software systems run at higher clock speeds, and thus have more processing power to execute complex memory allocation schemes. Memory allocation and data copying thus have little effect on overall performance of software systems as a N result of high clock speeds. Furthermore, since the aforementioned software systems have large (typically hundreds of megabytes) amounts of memory available, large exclusive regions of memory can be allocated to each concurrent processor to avoid packet overlapping.

The situation is very different in an Application Specific Integrated Circuit (ASIC) environment. Memory access has a greater impact on system performance due to lower clock speeds. Memory capacity itself is limited in ASICs because adding memory has a large impact on cost and performance of the ASIC. Furthermore, there is no operating system managing shared memory access, and it is up to the ASIC logic to distribute memory among all crypto-processors.

Some hardware implementations bypass this problem by not supporting multiple concurrent lIPsec processes applicable to a packet stream, thereby limiting such systems to only applying one single IPsec transformation on a packet at a time. However this provides lower performance that is achievable by concurrently operable processes, and does not conform to the full IPsec specification, as described in RFC2401 "Security Architecture for the Internet Protocol".

221205 744400

I

SUMMARY

It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.

NDisclosed are arrangements referred to, as "progressive memory allocation" (PMA) arrangements, which seek to address the above problems by firstly allocating a Ssmall amount of memory to a processor processing an incoming packet, discarding the packet if the allocated amount is insufficient, and progressively increasing its allocation to Ssubsequent retransmitted instances of the discarded packet. The disclosed method of N allocating a memory resource between multiple processes working in parallel can be fast, typically utilises memory efficiently, and is usually cost-efficient to implement in hardware.

According to a first aspect of the present invention, there is provided a method of allocating memory in a data packet processing system comprising a plurality of concurrent processes, the method comprising the steps of: dividing the memory into a plurality of blocks; on a said process receiving an incoming packet allocating, according to a rule, a corresponding number of the memory blocks to the process; if the allocated memory blocks are insufficient to allow completion of the process: updating the rule; freeing the allocated memory blocks; and discarding the incoming packet.

221205 744400 According to another aspect of the present invention, there is provided a method of

O

Oallocating memory among a plurality of concurrent processes, each conducted according to a protocol, said method comprising the steps of: dividing said memory into a plurality of blocks; on a process receiving an incoming packet, allocating an initial number of blocks Nto said process; storing said incoming packet in said allocated blocks; if said allocated blocks are insufficient to hold said data during said process: (predicting a number of blocks required to complete said process on said incoming packet; storing said predicted number in a history database indexed by said incoming packet; freeing said allocated blocks for use by another process.

According to another aspect of the present invention, there is provided a multiprocessor chip for packet data processing, the chip comprising: a plurality of concurrently operable processors each for performing a corresponding process; a memory; means for dividing the memory into a plurality of blocks; means for, on a said processor receiving an incoming packet, allocating, according to a rule stored in the memory, a corresponding number of the memory blocks to the processor; means for, if the allocated memory blocks are insufficient to allow completion of the corresponding process: 221205 744400 updating the rule;

O

Ofreeing the allocated memory blocks; and discarding the incoming packet.

N According to another aspect of the present invention, there is provided a multiprocessor chip for packet data processing, the chip comprising: N a plurality of concurrently operable processors each for performing a corresponding process; a memory for storing a memory allocation program; a control processor for executing the memory allocation program, said memory allocation program comprising: code for dividing the memory into a plurality of blocks; code for, on a said concurrently operable processor receiving an incoming packet, allocating, according to a rule stored in the memory, a corresponding number of the memory blocks to the concurrently operable processor; code for, if the allocated memory blocks are insufficient to allow completion of the corresponding process: updating the rule; freeing the allocated memory blocks; and discarding the incoming packet.

According to another aspect of the present invention, there is provided a computer readable medium having recorded thereon a computer program for directing a control processor to execute a memory allocation method, said control processor being incorporated into in a multi-processor chip for packet data processing, the chip 221205 744400 -7comprising a memory for storing the program and a plurality of concurrently operable processors each for performing a corresponding process, said program comprising: code for dividing the memory into a plurality of blocks;

C)

code for, on a said concurrently operable processor receiving an incoming packet, allocating, according to a rule stored in the memory, a corresponding number of Sthe memory blocks to the concurrently operable processor; code for, if the allocated memory blocks are insufficient to allow completion of the corresponding process: (N updating the rule; freeing the allocated memory blocks; and discarding the incoming packet.

Other aspects of the invention are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS One or more embodiments of the present invention will now be described with reference to the following drawings, in which: Fig. 1 is a block diagram showing an example structure of a communications network; Fig. 2 is a functional block diagram of an interconnected computer system upon which the disclosed abbreviated searching methods can be practiced; Fig. 3 describes a process of memory allocation; Fig. 4 shows a pictorial representation of how memory may be shared according to the arrangement of Fig. 3; Fig. 5 describes an example of the disclosed progressive memory allocation arrangement; 221205 744400 -8- Fig. 6 describes the process of IPsec cryptographic processing; OFig. 7 shows an example of the structure of a shared memory system, according etto the disclosed progressive memory allocation arrangement; Fig. 8 shows the layout of the memory history database of Fig. 5; and Fig. 9 shows how memory may be shared, in the arrangement of Fig. 7; DETAILED DESCRIPTION INCLUDING BEST MODE Where reference is made in any one or more of the accompanying drawings to Ssteps and/or features, which have the same reference numerals, those steps and/or features Shave for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

Fig. 1 shows an example of a network 800 in which IPsec processing is implemented. Fig. 1 shows two nodes 802, 806 communicating via a network 810. The Source Node 802 sends packets which have undergone IPsec Processing in an Psec processing module 804 to the Destination Node 806. The Destination Node performs lPsec Processing using an IPsec processing module 808 on all packets it receives.

The Source Node 802 determines for each packet before sending the packet to the Destination Node 806, whether IPsec services need to be applied to the packet, or whether the packet can bypass IPsec services or whether the packet should be discarded and not sent to the Destination Node 806.

This determination is made by examining the correct Security Policy in the Security Policy Database. The correct Security Policy is found by comparing attributes of a packet with the selectors of each Security Policy in the Security Policy database until an entry is found that matches all the attributes with all the selectors thereby identifying a valid Security Policy.

221205 744400 Fig. 2 is a general-purpose computer system 1200, wherein the disclosed progressive memory allocation arrangement of Fig. 5 may be implemented as an embedded lIPsec co-processor module 1230 incorporated into the source node 802 and also into the destination node 806. The embedded LiPsec module 1230 can take the form

(N

of a plug-in card, an Application Specific Integrated Circuit (ASIC), or other suitable ¢¢1 form. The IPsec co-processor module 1230 has on-board processing hardware 1231 and a memory 1229. The on-board processing hardware 1231 comprises a plurality of concurrently operable processors, for performing, for example, concurrent IPsec ,I processing of incoming packets. The on-board processing hardware 1231 can further include a control processor for executing progressive memory allocation code.

Alternately, the progressive memory allocation process can be implemented in a progressive memory allocation hardware module 1228.

An SPD 1226 and an SAD 1227 are accessible by the co-processor module 1230.

As noted, the disclosed progressive memory allocation arrangement can be implemented completely in hardware, or in a mixture of hardware and software, in which event the progressive memory allocation code, preferably stored in the memory 1229, operates in conjunction with the on-board processing hardware 1231.

In one arrangement, the steps implementing the progressive memory allocation process are effected by the hardware processing module 1231 operating in conjunction with instructions in the PMA software code. The PMA software code may be formed as one or more code modules, each for performing one or more particular tasks. Alternately, the steps implementing the progressive memory allocation process are effected by the hardware processing module 1228 alone.

221205 744400 The software modules may be stored in computer readable media, including the storage devices described below, for example. The software modules are loaded into the Scomputers from the computer readable media, and then executed by the computers. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program products in the computers o'3 effects one arrangement for performing the progressive memory allocation processes.

The computer system 1200 is formed by the source machine 802 and the t destination machine 806. The following description relates primarily to the source machine 802 however the description applies equally, to the destination node 806.

The source computer module 802 comprises input devices such as a keyboard 1202 and mouse 1203, output devices including a printer 1215, a display device 1214 and loudspeakers 1217. A Modulator-Demodulator (Modem) transceiver device 1216 is used by the source computer module 802 for communicating to and from the communications network 810, for example connectable via a telephone line 1221 or other functional medium. The modem 1216 can be used to obtain access to other network systems including the destination machine 806, via the Internet and Local Area Networks (LANs) or Wide Area Networks (WANs), and may be incorporated into the source computer module 802 in some implementations. The Security Policy Database 1226, and the Security Association Database 1227 store information items associated with Security policies and Security Associations respectively.

The source computer module 802 typically includes at least one processor unit 1205, and a memory unit 1206, for example formed from semiconductor random access memory (RAM) and read only memory (ROM). The destination computer module 806 typically includes at least one processor unit 1223, and a memory unit 1224, 221205 744400 -11for example formed from semiconductor random access memory (RAM) and read only memory (ROM).

The source module 802 also includes an number of input/output interfaces Sincluding an audio-video interface 1207 that couples to the video display 1214 and loudspeakers 1217, an I/O interface 1213 for the keyboard 1202 and mouse 1203 and Soptionally a joystick (not illustrated), and an interface 1208 for the modem 1216 and printer 1215. In some implementations, the modem 1216 may be incorporated within the computer module 802, for example within the interface 1208. A storage device 1209 is N provided and typically includes a hard disk drive 1210 and a floppy disk drive 1211. A magnetic tape drive (not illustrated) may also be used. A CD-ROM drive 1212 is typically provided as a non-volatile source of data. The components 1205 to 1213 of the computer module 802, typically communicate via an interconnected bus 1204 and in a manner which results in a conventional mode of operation of the computer system 1200 known to those in the relevant art. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations or alike computer systems evolved therefrom.

If a hybrid hardware/software architecture is used for the IPsec processing module 1230, the progressive memory allocation code is preferably resident in the memory 1229 and is read and controlled in execution by the control processor in the processor hardware 1231. In some instances, the progressive memory allocation application code may be supplied encoded on a CD-ROM or floppy disk and read via the corresponding drive 1212 or 1211, or alternatively may be read by the computer module 802 from the network 810 via the modem device 1216. Still further, the software can also be loaded into the computer system 1200 from other computer readable media.

221205 744400 -12- The term "computer readable medium" as used herein refers to any storage or transmission medium that participates in providing instructions and/or data to the computer system 1200 for execution and/or processing. Examples of storage media Sinclude floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, a magneto-optical disk, or a computer readable card such as a PCMCIA card and N the like, whether or not such devices are internal or external of the computer modules 802 and 806. Examples of transmission media include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Intemrnet or Intranets including e-mail transmissions and information recorded on Websites and the like. In a preferred progressive memory allocation arrangement, the co-processor module 1230 is an ASIC implementing the IPsec network security protocol. The processes requesting memory are performed by cryptographic processors implementing the Authentication Header (AH) or Encapsulating Security Payload (ESP) sub-protocols, which are parts of the larger IPsec protocol. The crypto-processors preferably form part of the processing hardware 1231.

Memory capacity is limited in an ASIC environment, due to its major impact on both cost and performance. Therefore, when designing memory allocation in an ASIC, it is important to fully utilise the limited memory available.

A simple way to share memory is to evenly divide the memory bank 1229 among each of the concurrently operable crypto-processors. Each processor may only access the portion of memory assigned to it and may not use another processor's portion. This ensures that no IPsec packet will coincide with another one in the memory 1229 since it is kept within a well-defined memory boundary.

221205 744400 The disadvantage with this approach is that it leads to poor memory utilisation.

Not all of the concurrently operable processors are always active, and while they are idle they still occupy their allotted memory region, preventing other processors from using the region if needed. In this approach, no processor can ever handle a packet that is larger than its memory region size, even when the packet is smaller than the total memory size.

For example, an ASIC may have 64 kbytes memory, divided between 4 concurrently operable crypto-processors with 16 kbytes each. With the aforementioned method, the ASIC can only transform packets up to 16 kbytes in size.

Memory allocation constitutes a potential performance bottleneck. The need to allocate memory to one of the concurrently operable processors receiving an incoming packet is a frequent task to be performed by the ASIC. Consequently, a slow allocation method significantly degrades the performance of the ASIC. To achieve good performance, the allocation process should be simple to implement, without consuming too many clock cycles and thus processing power. For example, storing a packet as a linked-list of packet fragments is not a preferred approach in an ASIC, because although it is memory-efficient, it is also expensive to maintain and search in an ASIC environment.

Fig. 3 and Fig. 4 respectively depict a simple memory allocation process 600 and a pictorial representation of how memory may be shared according to the aforementioned process, that can be implemented using the arrangement in Fig. 2. The aforementioned process 600 prevents packets from overlapping in memory.

One of the concurrently operable cryptographic processors (not shown) in the processing hardware module 1231 first receives an incoming packet at 601. The processor is then allocated a memory region at 602 in the memory 1229. The size and location of this memory region can be determined in several ways. Since the initial size 221205 744400 -14of the incoming packet is known, and the described example only permits one transformation to be applied to the incoming packet, the maximum size of the packet after Stransformation can be determined. The Memory Allocator logic (not shown) in the processing hardware 1231 keeps track of -the size and location of memory regions allocated to each concurrently operable crypto-processor. The Memory Allocator logic Suses this information to find the first location in the memory bank 1229 that has a free region of the required maximum size. That particular region will be allocated.

Fig. 4 shows a pictorial representation of how memory may be shared according (to the aforementioned process. Initially, each crypto-processor 721-724, these being incorporated into the processing hardware 1231 and being identical in functionality, is given an equal amount of a shared memory 700. The shared memory 700 is a partition in the memory 1229. The processors 721-724 are given respective memory regions 701-704.

Each crypto-processor 721-724 may only use its allocated region 701-704 and no other region. Input packets 711., 713 occupy respective parts of the respective regions 701-703.

The packets 711, 713 may grow, as a result of the processing they undergo by their respective processors 721 and 723, onto other areas within their respective allocated regions 701, 703. There is however a possibility that, while the crypto-processor 723 fully consumes, and desires to exceed its memory region 703, another crypto-processor 722 is left idle with an unused memory region 702. It is noted that since the processor 722 it idle, its associated memory region 702 could have been made available to the processor 723. However, in this simple arrangement, the memory region 702 can be used only by the "starved" crypto-processor 703. This leads to poor memory utilisation.

221205 744400 Regardless of the method of allocation, the allocated memory is used for both storing the input packet and the transformed packet, as transformation can happen inplace.

N Returning to Fig. 3, the incoming packet is loaded into the memory 1229 at 603.

Processing instructions are fetched at 604 from a database that contains these instructions S620, the database being located in the described example in the memory 1229. The aforementioned instructions are applied at 605. The transformation either succeeds, or fails due to the allocated memory region being respectively sufficient or insufficient to Nhold the transformed packet. If the transform fails due to inadequate memory having been allocated, the packet overflows.

Whether the packet has overflowed is tested at 606. This can be tested by checking whether the packet's boundaries have crossed over another packet or region's boundaries. If the packet has overflowed, the error cannot be handled by the cryptoprocessors 721-724. The error is reported to another entity that can handle the error, for example driver software at 607.

If the transformation is successful, the process 600 follows a NO arrow to the step 604 and the database 620 is checked once again for any further instructions at 604. If more instructions are found, the steps 605-606 are repeated again. Otherwise, the packet is output at 608. The allocated memory is then returned to the pool for later allocation at 609. This usually consists of removing the appropriate entry in the allocation table.

When this is done, the crypto-processor remains idle until another input packet arrives at 601.

Fig. 5 shows a process 100 for memory allocation according to the preferred embodiment of the disclosed progressive memory allocation approach. In the process 221205 744400 100, which starts at a START step, the cryptographic processor in question, this being one of the concurrently operable processors in the processing hardware 1231, waits for an input packet at a following step 101. Thereafter, in a step 102, when an input packet

C)

arrives, the memory allocator compares the packet with information stored in a packet history database 121. The packet histoy database 121, preferably located in the memory Cc, S1229, maintains a list of packets that have not been successfully processed due to overflow during transformation. The packet history database 121 is further described below with reference to Fig. 8.

C A following step 103 determines if the packet context, this being a unique identifier associated with the packet and or the network session to which the packet belongs, matches any packet in the packet history database 121, which is indexed by packet context as described below. If this is the case, then the process 100 follows a YES arrow from the step 103 to a step 105 which determines the number of memory blocks to allocate to the incoming packet. In other words, the crypto-processor is allocated a number of memory blocks at the step 104 depending on whether the current input packet matches an entry in the database 121, tested at the decision point 103. As noted, if there is a match in the database 121, then the allocation amount is determined from data stored in the database 121, at the step 105. Otherwise the process 100 follows a NO arrow from the step 103 to a step 104, in which the input packet is allocated the minimum number of blocks required to hold it.

After memory allocation is complete, the process 100 is directed from the step 104 to a step 106 in which the input packet is loaded into the memory blocks allocated to the crypto-processor transforming in preparation for the actual processing. The allocated memory blocks serve not only for storing the input packet, but also the output packet.

221205 744400 -17- This is because during transformation the output packet immediately, in this example, overwrites the input packet.

To begin the transformation, instructions need to be obtained from a processing

C)

N, instructions database 120, and thus the process 100 is directed from the step 106 to a step 107 which fetches the instructions from the processing instructions database 120. In a preferred arrangement, the processing instructions database 120 comprises the Security Policy Database (SPD) 1226 and the Security Association Database (SAD) 1227 as described in RFC 2401. The instructions may include encryption and decryption Nalgorithms and keys, hash algorithms and keys, and compression and decompression instructions. The instructions may also specify whether the Authentication Header (AH) or Encapsulating Security Payload (ESP) sub-protocol is used, and how IPsec headers are applied to the input packet.

At the decision step 107, the processing instructions are fetched from the processing instructions database 120 if the instructions are present. The processing instructions, if found, are applied to the input packet a following step 108. When applying these instructions, the input packet may progressively expand in size. Thus it is possible for the input packet to overflow the memory blocks allocated for its processing even during the transformation.

The crypto-processor in question checks, in a subsequent decision step 109, whether packet overflow has occurred after each set of instructions are completely applied at the step 108. If this is not the case, then the process 100 is directed by a NO arrow back to the step 107. If memory overflow is detected, then the process 100 follows a YES arrow from the step 109 to a step 111 that stores the context of the packet in the packet history database 121.

221205 744400 It is important to ensure that even in the case of overflow in the step 108, the overflowing packet must not overwrite another legitimate packet. This can be achieved by enforcing that crypto-processors have no access or read-only access to memory blocks

C)

Sthat are not allocated to them until the allocated memory blocks are freed for use. In an alternate arrangement, the concurrently operable processor in question may check for Spacket overflow after every intermediate step (ie that occurs in the step 108) of the processing instructions to avoid overflowing packets overwriting another legitimate packet.

N' If the instructions were applied successfully and no overflow occurred, the process 100 follows the NO arrow from the step 109 back to the step 107 where further instructions are obtained from the processing instructions database 120. The steps 107- 109 are iteratively executed until no more instructions are found for the input packet at the step 107, at which point the process 100 follows a NO arrow from the step 107 to a step 110 at which point the input packet that is resident in memory becomes the output packet.

The output packet is copied out to an external entity at the step 110.

The loop comprising the steps 107-109 is also broken if packet overflow occurred, and in this circumstance, the incoming packet in question is discarded. This overflow occurs if the number of memory blocks allocated to the crypto-processor transforming the packet are insufficient for completing all the required processing instructions obtained from the instructions database 120. In order to successfully and completely transform the packet, another attempt may be made but more memory blocks must be made available to it on that next attempt, when a retransmitted instance of the packet is question is received.

221205 744400 -19- To achieve this, the context of the current overflowing packet is stored in the packet history database 121 at the step 111. When the retransmitted instance of the same packet, or a further packet from the same session, is received at the step 101, the packet N, can be distinguished by its context from new packets at the steps 102-103, which therefore require non-default amounts of memory allocation in the steps 104-105.

N, The number of memory blocks to allocate to a retransmitted packet can be determined at different steps of the process 100. In a preferred arrangement, the number of memory blocks to allocate is calculated at the step 111 just after the overflow has occurred. In an alternative arrangement, it may be calculated at the step 105 just before the next memory allocation.

There are a number of ways to determine the number of memory blocks to allocate for processing an overflowing packet in the next attempt. In a preferred arrangement, the number of memory blocks to allocate in the next attempt is twice the number of blocks that is currently allocated, the current number of blocks being insufficient and causing the overflow. For example, if a packet is allocated one block on the initial attempt, it will be allocated two blocks on the second attempt; four (4) blocks on the third attempt, and so on.

In an alternative arrangement, a separate and independent process (not shown) searches for all processing instructions in the database 120 to be applied to the input packet, while the packet is being transformed in the step 108. Instead of obtaining each set of instructions as they are applied, all instructions are obtained at once, in parallel to applying the instructions. With knowledge of all processing instructions, the maximum possible size of the output packet may then be calculated. Thus the minimum number of blocks required to hold this theoretical output packet of calculated size may be 221205 744400 determined. On the initial attempt, the minimum number of blocks required to hold the input packet, or another number of memory blocks determined according to a default rule, is allocated. If that number is not sufficient, on the retry attempt, the minimum number of

C)

blocks required to hold the theoretical output packet is allocated. This guarantees that the second allocation attempt will always be successful.

STo illustrate the aforementioned approach, assuming that the memory block size is 1 kbytes, an input packet is 1.5 kbytes, and each transformation may add up to 120 bytes to the size of the packet. If when looking up processing instructions from the N database 120, five sets of instructions are found, these could add up to a total of 600 bytes maximum. Consequently, the theoretical maximum size of the output packet is 2.1 kbytes. A packet of this size requires three memory blocks to be allocated. Therefore on the retry attempt, three memory blocks will be allocated for processing the retransmitted instance of the input packet.

Returning to the process 100, after the context information of an overflowing packet has been stored at the step 111, a failure code is sent to an extemrnal entity at a following step 112. The external entity may require this knowledge to initiate the retry of processing the overflowing packet. The packet in question is also discarded in the step 113.

Thereafter, after a successfully processed packet has been output at the step 110, or after a packet failure has been notified to an external entity at the step 112, a following step 113 disassociates the memory blocks from the crypto-processor using those blocks.

This frees up the memory blocks making them available for reallocation for processing another input packet. When the blocks are reallocated for processing another input packet, the new packet can simply overwrite the existing memory contents. Finally, the 221205 744400 -21process 100 follows an arrow back to the step 101, and the crypto-processor waits for another input packet to arrive.

In a further alternative embodiment of the invention, it is possible to not discard the packet at the step 113 and reuse the intermediate packet contents. However there must be a mechanism for accurately recording the extent of the current progress, otherwise the processing cannot be resumed properly and the output packet will be incorrect due to misapplied transformations.

SFig. 6 shows a process 108', being an instance of the process 108 in Fig. 5, for ,I applying IPsec transformations to a packet, and how the transformation may increase input packet size.

The process 108 is made up of two paths a send path and a receive path. The send path is used for packets to be sent via the network 810, and the receive path is used for packets received from the network 810. In other words, consider an example in which the process 100 is running on the node 802 in Fig. 2. The step 101 can, on the one hand, wait for a packet generated by an internal process in the node 802, where the packet is to be sent by the node 802 over the network 810 to the node 806. On the other hand, the step 101 can wait for a packet sent from the node 806 over the network 810 to the node 802.

The appropriate path is determined by reading the contents of the processing instructions at a step 201. If the input packet is a send packet, then the process 108' is directed from the step 201 via a YES arrow to a step 202, in which the packet is encrypted with a key and encryption algorithm, both algorithms being part of the processing instruction set from the database 120. The input packet may be compressed at a following step 203. The input packet is then hashed using the hash algorithm that is also part of the processing instruction set at a following step 204. The hash and some padding bytes are 221205 744400 -22appended to the end of the input packet, while some header bytes are appended to the beginning and middle of the input packet. The details of how these data are appended are documented in RFC 2401. These appended data contribute to the growth of the packet on C, the send path. After this, the current set of processing instructions is complete at a following step 209, where the packet is marked as the output packet if no more instructions are found for the packet and the process 108' proceeds to the step 109 in Fig.

Returning to the step 201, if the input packet is a receive packet, roughly the N, reverse of the send path is applied. In this event, the process 108' is directed from the step 201 by a NO arrow to a step 205 in which some information is obtained from the input packet. The input packet may be decompressed at a following step 206 if compression was applied before the packet was sent by the node 806. Decompression contributes to the growth of the packet on the receive path. The input packet is then decrypted at a following step 207, again using keys and algorithms mandated in the processing instruction set retrieved from the database 120. Finally, input packet headers, the hash, and padding, are removed from the input packet in a following step 208. After this, the current set of processing instructions is complete at the following step 209. It may be marked as the output packet if no more instructions are found for the packet and the process 108' proceeds to the step 109 in Fig. Fig. 7 shows the functional components of an SPU 300 according to a preferred arrangement of the disclosed progressive memory allocation technique, this being a preferred hardware arrangement of the IPSec coprocessing module 1230 in Fig. 2.

Concurrently operable processors 301a-301d, forming part of the processing hardware 1231, communicate, via distinct connections, with an Arbiter 302, which may be 221205 744400 -23identified with the PMA hardware module 1228. The arbiter 302 comprises a memory address translator 303, a block allocator 304, and a memory accountant 305. The arbiter 302 communicates via a connection with a shared memory module 306, identifiable with

C)

Smemory 1229.

The cryptographic processors 301a-d do not access the shared memory 306 Sdirectly. Rather, they access the memory 306 via the memory arbiter 302. Since the memory 306 typically can handle only one request at any point in time, the SPU 300 needs the memory arbiter 302 for managing multiple concurrent memory requests issued N by the processors 301a-d. The arbiter 302 queues up the memory requests and send the requests sequentially to the memory bank 306. Each crypto-processor 301a-d waits for the arbiter 302 to fulfil their request before proceeding to do other work.

The arbiter may be implemented by a first-in-first-out (FIFO) queue for memory requests, and a FIFO queue for each crypto-processor 301a-d for buffering the data going between the processors 301a-d and memory 306, and a signal for each crypto-processor 301a-d for control purposes.

The block allocator sub-module 304 is responsible for keeping track of memory block allocation state, and choosing the most suitable memory blocks for allocation when a request is received. Keeping the allocation state for each individual memory block is important, since a memory block that has already been allocated, for example, to the processor 301a, must not be allocated to any of the other crypto-processors 301b-d. In a preferred arrangement of the disclosed progressive memory allocation technique, packets are loaded into a contiguous region of the shared memory 306. It is therefore harder to fit a large packet into the shared memory 306 than a small packet, since it occupies more contiguous blocks. To accommodate large packets, free blocks should preferably be kept 221205 744400 -24together. It is also desirable to design the block allocation algorithm such that free regions are grouped together as far as possible.

The block allocator 304 may be implemented using an array of bits for allocation ,state, with each bit denoting the allocation status of a block, and two registers for each crypto-processor 301a-d. One of the aforementioned two registers holds the initial block ,IC address in the memory 306 that is allocated to the respective processor 301a-d, and the other register holds the last block address in the memory 306 that is allocated to the same Srespective processor 301a-d. The allocation algorithm implemented by the block allocator N, 304 searches each of the free contiguous memory regions, and chooses the first free region that is large enough to satisfy the number of memory blocks requested.

The memory address translator sub-module 303 translates from memory addresses that the crypto-processors 301a-d understand, to memory addresses that the shared memory 306 understands. In a preferred arrangement of the disclosed progressive memory allocation technique, the crypto-processors 301a-d are aware of memory offsets of the input packets they are processing, or alternately, they are aware of the logical addresses of the input packets they are processing. However the processors 301a-d do not know the locations of the packets they are processing in the shared memory 306, or the real address of the packets they are processing in the memory 306.

The memory address translator 303 has knowledge of the shared memory blocks 306 allocated to each processor 301a-d and so it is able to translate logical addresses into real addresses. This is implemented by using the registers in the block allocator 304. The translation involves adding the memory offset of the input packet (logical address) to the base address of the initial block allocated to the particular crypto-processor 301a-d issuing 221205 744400 the memory request in question. This produces the real physical address in shared memory 306.

The memory address translator 303 also performs memory bounds checking, this

C)

being a mechanism for detecting packet overflow. Each crypto-processor 301a-d is allocated, when processing an incoming packet, a contiguous region of memory blocks. If Sany crypto-processor 301a-d issues a memory request with an address that, after translation, lies outside of its allocated region, then an overflow has occurred. The real t physical address of a memory request must be larger than the base address of its allocated (region, and smaller than the end address of its allocated region. This mechanism corresponds with the step 109 of the memory allocation process 100.

The memory accountant sub-module 305 is responsible for keeping track of memory block usage by input packets and IPsec sessions and thus determining the number of memory blocks that should be allocated on the retry attempt if a packet overflows. In a preferred arrangement of the disclosed progressive memory allocation technique, the memory accountant 305 is implemented using a packet history database 121, and an algorithm for determining the number of memory blocks to allocate on a retry attempt.

Two versions of the algorithm have been described in relation to the memory allocation process 100 of Fig. Fig. 8 shows a preferred arrangement of the packet history database 121 (see Fig.

5) that is part of the memory accountant sub-module 305 (see Fig. A database entry is defined by a row in the database 400, one example being the combination of fields 411- 413. In Fig. 8 there are shown three database entries in the database 400, namely 411- 413, 421-423, and 431-433. There are three fields for each database 400 entry. The packet ID field 401 contains a unique identifier to distinguish individual packets within a 221205 744400 -26session. In IPsec this may, for example, be the sequence number for the session. The session ID field 402 contains a unique identifier to distinguish individual network sessions. Each IPsec session has its own identification number. Depending on the arrangement and the choice of identifiers, a matching packet may be defined as one that matches both the packet ID 401 and the session ID 402, or alternately, one that matches Sthe session ID 402 only. The identifier examples given for IPsec require both ID fields 401-402 to match.

An invalid database entry is identified by an invalid Packet ID 401 and/or SSession ID 402. A zero value or minus one value are suitable invalid values for both fields. It is possible for a database entry to have an invalid Packet ID 401 yet have a valid Session ID 402. The Session ID 402 entry applies to all packets within the same network session, allowing memory block allocation to be done on a per-session basis. A session match is defined by an input packet with no exact match in any entry in the database 400, but matching an entry with the same session ID 402 and an invalid packet ID 401. The matching of packet and session ID with the packet history database 400 takes place at the step 102 of the process 100 in Fig. The third field, "blocks to allocate" 403, contains information about the previous memory allocation to the packet or session identified by the matching packet ID 401 and or session ID 402. In a preferred arrangement, the "blocks to allocate" field contains the number of memory blocks that should be allocated to an input packet matching that database entry. This value is preferably calculated at the step 111 of the process 100 in Fig. 5. The value of the "blocks to allocate" field is used at the step 105 of the process 100 in Fig. 221205 744400 Fields 411-413 represent an example of a database entry that is specific to a single packet. An input packet must have a packet ID of"P1" (ie 411), and a session ID of "SI" (ie 412) to match this entry. When an incoming packet does match this entry at

C)

N the step 102 in Fig. 5, the packet will be allocated four blocks (ie 413) for processing at the step 105 in Fig. 5. The fields 421-423 represent an example of a database entry that applies to a network session. An input packet must have a session ID of"S 1" (ie 422) to match this entry, unless it matches the packet ID 401 of another database entry, such as "P1" (ie 411), as well as the session ID 402 of that other entry. When a packet matches Sthis entry, it will be allocated two memory blocks (ie 423). Fields 431-433 are an example of an "invalid" database entry. This entry must not match any valid input packet, and may be used for a new overflowing packet not previously registered in the database.

Database entries may be deleted when a packet exactly matches the entry, and is successfully processed.

Fig. 9 shows at 500 two snapshots of the shared memory 306 using the block allocation process 100 according to the disclosed progressive memory allocation technique. A snapshot 510 is taken at an arbitrary point in time, and a snapshot 520 is taken at a later time than snapshot 510. Every entity in Fig. 9 is identical across both the snapshots 510, 520, and hence numbered identically. The shared memory bank 500 is divided into multiple blocks of identical size 501-508. When crypto-processors 521, 523 receive respective input packets 511, 513, the respective processors are allocated one or more corresponding memory blocks 501, 504. In the snapshot 510, the processor 521 is transforming the packet 511 residing in the block 501, and the processor 523 is transforming the packet 513 residing in the block 504. Processors 522, 524 are idle and 221205 744400 -28are not allocated any of the memory blocks 501-508. The memory blocks 502-503, 505- 508 are not allocated to any of the crypto-processors 521-524.

In the snapshot 520, the packet 511 is larger in size than the packet 511 in the

C)

snapshot 510, but the packet 511 has not overflowed its allocated memory block 501.

Therefore the memory allocation to the crypto-processor 521 remains unchanged. On the C other hand, the packet 513 has overflowed its previously allocated memory block 504, and now resides in the memory blocks 505-506, thus the processor 523 now has access to blocks 505-506. The processors 522, 524 are still idle. It is noted that the packet 513 is a 'i retransmission of the original packet 513 that has been allocated the two memory blocks 505-506., If memory block 506 had previously been allocated to another processor, the memory block 506 would not have been made accessible for the packet 513.

The choice of memory block 501-508 size is significant to the performance of the memory allocation process 100. If the block size is too large, most of the memory block 501-508 is unused when processing small packets, and it is harder for a crypto-processor 521-524 to obtain a large number of memory blocks 501-508 at once, since there are fewer memory blocks 501-508 overall. If the memory block size is too small, packet overflow will occur more often, greatly reducing the efficiency of the memory allocation process 100, since the error-handling path is time-consuming.

The most suitable memory block size depends on the nature of the implementation, and its application. In particular, the memory block size should be large enough to accommodate most packets of typical size. Finding the typical size of network packets involves some sampling and statistical work, on the particular packet traffic being considered, for example.

221205 744400 -29- In the preferred arrangement of the disclosed progressive memory allocation technique, the application is IPsec processing over an IPv4 network. Most IPsec packets are approximately 2 kbytes in size, and in most applications, only a few sets of IPsec Stransformations are applied to any one packet. It is in this case found that it is statistically uncommon for an IPsec packet to grow beyond 4 kbytes in size. Therefore, dividing the N shared memory 500 into equal-sized memory blocks 501-508 of 4 kbytes in size would be a reasonable choice.

Industrial Applicability It is apparent from the above that the arrangements described are applicable to the telecommunication and data processing industries.

The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of'. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings.

221205 744400

Claims

1. A method of allocating memory in a data packet processing system comprising a plurality of concurrent processes, the method comprising the steps of: C 5 dividing the memory into a plurality of blocks; q on a said process receiving an incoming packet allocating, according to a rule, a corresponding number of the memory blocks to the process; I if the allocated memory blocks are insufficient to. allow completion of the process: updating the rule; freeing the allocated memory blocks; and discarding the incoming packet.

2. A method according to claim 1, further comprising a step of: preventing allocation of the allocated memory blocks to any of the other concurrent processes prior to the freeing step.

3. A method according to claim 1, wherein said rule is obtained from a database indexed by a context for said incoming packet.

4. A method according to claim 3, wherein said context comprises an identifier for said incoming packet. 221205 744400 -31- A method according to claim 3 or claim 4, wherein said context comprises an O identifier for a session to which said incoming packet belongs.

6. A method according to claim 1, wherein the updating step comprises the steps of: predicting the number of memory blocks required to allow completion of the Sprocess; and amending the rule to account for the predicted number. C 7. A method according to claim 6, wherein the predicting step is dependent upon the size of the incoming packet.

8. A method according to claim 7, wherein the predicting step is further dependent upon the transformation to be applied to the incoming packet.

9. A method according to claim 3, wherein if a said rule cannot be obtained for said incoming packet, said corresponding number is the minimum number of memory blocks required to store the incoming packet. A method according to claim 6, wherein the predicting step increases the number of allocated memory blocks by a predetermined factor.

11. A multi-processor chip for packet data processing, the chip comprising: a plurality of concurrently operable processors each for performing a corresponding process; 221205 744400 -32- a memory; means for dividing the memory into a plurality of blocks; means for, on a said processor receiving an incoming packet, allocating, Saccording to a rule stored in the memory, a corresponding number of the memory blocks to the processor; Smeans for, if the allocated memory blocks are insufficient to allow completion of the corresponding process: updating the rule; freeing the allocated memory blocks; and discarding the incoming packet.

12. A multi-processor chip according to claim 11, further comprising: means for preventing allocation of the allocated memory blocks to any of the other concurrent processors prior to the freeing step.

13. A multi-processor chip for packet data processing, the chip comprising: a plurality of concurrently operable processors each for performing a corresponding process; a memory for storing a memory allocation program; a control processor for executing the memory allocation program, said memory allocation program comprising: code for dividing the memory into a plurality of blocks; 221205 744400 -33- code for, on a said concurrently operable processor receiving an incoming packet, allocating, according to a rule stored in the memory, a corresponding number of the memory blocks to the concurrently operable processor; C) Scode for, if the allocated memory blocks are insufficient to allow completion of the corresponding process: N updating the rule; freeing the allocated memory blocks; and discarding the incoming packet.

14. A computer readable medium having recorded thereon a computer program for directing a control processor to execute a memory allocation method, said control processor being incorporated into in a multi-processor chip for packet data processing, the chip comprising a memory for storing the program and a plurality of concurrently operable processors each for performing a corresponding process, said program comprising: code for dividing the memory into a plurality of blocks; code for, on a said concurrently operable processor receiving an incoming packet, allocating, according to a rule stored in the memory, a corresponding number of the memory blocks to the concurrently operable processor; code for, if the allocated memory blocks are insufficient to allow completion of the corresponding process: updating the rule; freeing the allocated memory blocks; and discarding the incoming packet. 221205 744400 O 15. A method of allocating memory in a data packet processing system substantially as described herein with reference to the accompanying drawings.

16. A multi-processor chip for packet data processing substantially as described herein with reference to the accompanying drawings.

17. A computer readable medium substantially as described herein with reference to C NI the accompanying drawings. DATED this 2 2 nd Day of December 2005 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant SPRUSON&FERGUSON 221205 744400