EP2834950A1

EP2834950A1 - A method and device for providing a dynamic scheduling of memory accesses to a data memory

Info

Publication number: EP2834950A1
Application number: EP12722332.9A
Authority: EP
Inventors: Shlomo Reches; Yoram Gross; Nissim DANGUR
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2012-05-15
Filing date: 2012-05-15
Publication date: 2015-02-11
Also published as: CN104272682A; WO2013170886A1

Abstract

The present invention provides a network device (1) within a data network for providing a dynamic scheduling of memory accesses to a data memory (4) connected to said network device (1), wherein said data memory (4) is partitioned into data memory slices of a slice size corresponding to a minimum possible data packet size of a data packet, DP, transported in said data network, wherein a load balancer (5) of said network device (1) is adapted to write ingress data packets into data memory slices of said data memory (4) in write operations such that the ingress data packets, DP, are evenly distributed in the write operations between the different data memory slices of said data memory (4), wherein said load balancer (5) is adapted to read egress data packets to be transmitted from the data memory slices of said data memory (4) in read operations such that a misbalance between read operations from different data memory slices of said data memory (4) is minimized.

Description

TITLE

A method and device for providing a dynamic scheduling of memory accesses to a data memory

TECHNICAL BACKGROUND

The invention relates to a method and device within a data network for providing a dynamic scheduling of memory accesses and, in particular, to dynamic bandwidth allocation for a memory connected to a network device.

A parallel digital interface is a fundamental communication infrastructure and can be found at any level of integration from microelectronics integrated circuits to larger data network systems or components. Such parallel digital interfaces in conventional systems do waste bandwidth due to their inherent property because they cannot harness the full capabilities of the data infrastructure. When data of variable length is transferred or transported over a digital interface there is often a waste of bandwidth. For example, when data is transferred via the digital interface to a data memory, there is a minimum burst size. Even if the transported amount of data is less than the minimum burst size, the bandwidth consumed by the data transport will be the same as the burst size. This results in a waste of bandwidth which is a function of the relation between the data length and the minimum burst size. A conventional digital interface has an inherent property characteristic of comprising a minimum block or mini- mum burst size. The reason for this inherent property is due to the use of parallel buses which the data traverses. Even if at an external interface the data is transmitted serially over a single line, inside a device the received data will be de-serialized. If the internal parallel interface uses a data bus with a data bus width of n-bits there will be a waste of bandwidth if a chunk of data is sent that is not a multiple integer of the respective data bus width.

Fig. 1 shows a diagram for illustrating the problem underlying the present invention. The diagram illustrates the bandwidth efficiency of data packets of variable size which are sent through a conventional digital data interface. In the shown example the data bus is working at a frequency of 100MHz and comprises a data bus width of 256 bits. The maximum raw data bandwidth is 25.6 Gbps corresponding to 256 x 10⁸ x 1/sec = 25.6 x 10⁹ x 1/sec. The inherent property of this exemplary digital interface is its minimum burst size of 256 bits. If the data packet size is an integer multiple of the minimum burst size then the bandwidth efficiency is 100% as shown in Fig. 1 . For example, if the data packet size is 32 bytes corresponding to 256 bits, the bandwidth efficiency is 100%. Further, if the data packet size is double burst size equal to 2 x 256 bits or 64 bytes then the bandwidth efficiency is also 100%. However, if the data packet size DPS of a data packet DP is only one byte more than the minimum burst size, for example 33 bytes or 65 bytes, the bandwidth efficiency degrades considerably as illustrated in Fig. 1 . For example, if the data packet size is 33 bytes, the bandwidth efficiency drops almost to 50% as shown in Fig. 1 and increases gradually until a bandwidth efficiency of 100% is reached at a data packet size of 64 bytes. Accordingly, there is a need for an apparatus and a method for providing a dynamic scheduling and bandwidth allocation which increases the bandwidth efficiency for data packets with variable data packet size.

SUMMARY OF THE INVENTION

According to a first aspect of the present invention a network device within a data network for providing a dynamic scheduling of memory access to a data memory connected to said device is provided. According to a first possible implementation of the first aspect of the present invention a network device within a data network for providing a dynamic scheduling of memory accesses to a data memory is provided,

wherein the data memory is partitioned into data memory slices of a slice size corresponding to a minimum possible data packet size of a data packet transported in said data network,

wherein a load balancer of said network device is adapted to write ingress data packets received from a data source of said data network into data memory slices of the data memory in write operations such that the received data packets are evenly distributed in the write operations between the different data memory slices of said data memory,

wherein said apparatus is adapted to read egress data packets to be transmitted to a data link of said data network from the data memory slices of the data memory in read operations such that a misbalance between read operations from different data memory slices of said data memory can be minimized.

In a further possible second implementation of the network device according to the first implementation of the first aspect of the present invention said network device is adapted to access said data memory by bursts.

In a further possible third implementation of the first or second implementation of the network device according to the first aspect of the present invention the bursts comprise read bursts where data is read from a predetermined number of consecutive memory addresses.

In a further possible fourth implementation of any of the first to third implementation of the network device according to the first aspect of the present invention the bursts comprise also write bursts where data is written to a predetermined number of con- secutive memory addresses.

In a further possible fifth implementation of the network device according to any of the first to fourth implementation of the first aspect of the present invention the predetermined number of consecutive memory addresses is the burst length of a burst.

In a further possible sixth implementation of the network device according to any of the first to fifth implementation of the first aspect of the present invention the data memory consists of a predetermined number of memory devices each having a memory device width.

In a further possible seventh implementation of the sixth implementation of the network device according to the first aspect of the present invention the number of memory devices multiplied with the memory device width is equal to the memory width of the data memory. In a further possible eighth implementation of the sixth or seventh implementation of the network device according to the first aspect of the present invention the memory width is equal to the bus width of a data bus of a digital interface.

In a further possible ninth implementation of any of the second to eighth implementation of the network device according to the first aspect of the present invention the number of bits in a burst is the burst size of the burst and corresponds to a burst length multiplied with the slice size of a data memory slice of the data memory.

In a further possible tenth implementation of any of the third to ninth implementation of the network device according to the first aspect of the present invention the burst size of the burst is smaller than the minimum possible data packet size of a data packet transported in said data network.

In a further possible eleventh implementation of any of the sixth to tenth implementation of the network device according to the first aspect of the present invention the number of data memory slices multiplied with the slice size of the data memory slices corresponds to the memory width of the data memory.

In a further possible twelfth implementation of any of the sixth to eleventh implementation of the network device according to the first aspect of the present invention the memory devices of said data memory are DDR-SDRAM devices. In a further possible thirteenth implementation of any of the first to twelfth implementation of the network device according to the first aspect of the present invention the minimum possible data packet size of a data packet transported in the data network is 512 bits. In a further possible fourteenth implementation of any of the second to thirteenth implementation of the network device according to the first aspect of the present invention the burst length of a burst is eight. In a further possible fifteenth implementation of any of the tenth to fourteenth implementation of the network device according to the first aspect of the present invention the slice size of a data memory slice is equal to the minimum possible data packet size divided by the burst length.

In a further possible sixteenth implementation of the fifteenth implementation of the network device according to the first aspect of the present invention the slice size comprises 64 bits. In a further possible seventeenth implementation of any of the first to sixteenth implementation of the network device according to the first aspect of the present invention the data memory comprises a predetermined number of memory buffers.

In a further possible eighteenth implementation of the seventeenth implementation of the network device according to the first aspect of the present invention each memory buffer is adapted to store several bursts and comprises a memory buffer size which corresponds to a number of bursts stored in the respective memory buffer.

In a further possible nineteenth implementation of any of the seventeenth and eight- eenth implementation of the network device according to the first aspect of the present invention for each memory buffer of the data memory a queue is provided having a memory buffer pointer managed by a queue manager of the apparatus.

In a further possible twentieth implementation of the eighteenth or nineteenth imple- mentation of the network device according to the first aspect of the present invention the memory buffer size of a memory buffer is several times the burst length of a burst.

In a possible twenty-first implementation of the twentieth implementation of the net- work device according to the first aspect of the present invention the memory size of the data memory is equal to the memory buffer size multiplied by the number of memory buffers. The invention further provides a method for dynamic scheduling of memory accesses to a data memory as a second aspect of the present invention.

In a first possible implementation of the method for dynamic scheduling of memory accesses of a data memory partitioned into data memory slices of a slice size corresponding to a minimum possible data packet size of a data packet transported in said data network, said method comprises the steps of:

writing ingress data packets received from a data source of said data network into data memory slices of the data memory in write operations such that the received data packets are evenly distributed in the write operations between the different data memory slices of said data memory and

reading egress data packets to be transmitted to a data link of said data network from the data memory slices of the data memory in read operations such that a misbalance between read operations from different data memory slices of said data memory is minimized.

The invention further provides according to a third aspect a store and forward network device of a data network comprising a network device according to the first aspect of the present invention.

In a possible first implementation of the store and forward network device according to the third aspect of the present invention the store and forward network device is formed by a network router. In a possible second implementation of the store and forward network device according to the third aspect of the present invention the store and forward network device is formed by a network switch.

BRIEF DESCRIPTION OF FIGURES

Fig. 1 shows a diagram for illustrating the problem underlying the method and apparatus according to the present invention;

Fig. 2 shows a block diagram of a possible implementation of a network device according to the first aspect of the present invention; Figs. 3A, 3B show diagrams for illustrating a bandwidth allocation which is performed by a method according to a possible implementation of the second aspect of the present invention in comparison with a conventional method;

Fig. 4 shows a diagram for illustrating an effect of an increased bandwidth effi- ciency which is provided by the method according to a possible implementation of the second aspect of the present invention in comparison with a conventional method;

Fig. 5 shows a further diagram for illustrating functionality of an apparatus and method according to a possible implementation of the first and second aspect of the present invention;

Fig. 6 shows possible data distributions of data bursts in a data buffer of a data memory;

Figs. 7, 8, 9show flow charts of a possible implementation of a method according to the second aspect of the present invention.

DETAILED DESCRIPTION OF EMBODIMENTS

As can be seen in Fig. 2 a network device 1 within a data network can comprise a scheduling apparatus 2 connected via a digital interface 3A having a data bus and via a load balancer 5, burst reordering units 6, 7, 8 and slice memory controllers 9, 10 to a data memory 4. The scheduling apparatus 2 of the network device 1 comprises ingress ports and egress ports. The scheduling apparatus 2 is adapted to provide a dynamic scheduling of memory accesses to the data memory 4 and connected to a load balancer 5. The load balancer 5 is adapted to perform load balancing and can be integrated in the network device 1 . The network device 1 shown in Fig. 2 can form a store and forward network device according to a third aspect of the present invention. In a possible implementation the store and forward network device 1 is a network router. In a further possible implementation the store and forward network device 1 is formed by a data network switch. The scheduling apparatus 2 receives at its ingress ports ingress data packets as illustrated in Fig. 2. The scheduling apparatus 2 forwards egress data packets to data links of the data network via its egress data ports as illustrated in Fig. 2. The data memory 4 is partitioned into data memory slices (DMS) of a slice size (SS) corresponding to a minimum possible data packet size (DPSmin) of data packets (DPs) transported in the respective data network. In a possible implementation of the data network data packets DPs of different and variable data packet size DPS are transported.

In the block diagram of Fig. 2 it can be seen that a scheduling apparatus 2 is con- nected via a first digital interface 3A to a load balancer 5 and connected to reordering units 6, 7, 8 for reordering bursts. In the implementation shown in Fig. 2 the network device 1 comprises a lower slice memory controller 9 and an upper slice memory controller 10. Accordingly, in the shown implementation the data memory 4 is partitioned into two data memory slices comprising a lower memory slice and an upper memory slice. The number of slice memory controllers can vary in alternative implementations. The slice memory controllers 9, 10 are connected to a data memory 4 via a second digital interface 3B. In a possible implementation the data memory 4 can form an integral part of the data network device 1 as shown in Fig. 2. In an alternative implementation the data memory 4 can be connected to the data network device 1 , for example, via a bus.

The data network device 1 as shown in Fig. 2 is adapted to write ingress data packets DPs received at its ingress data ports, from a data source of the data network into data memory slices DMS of the data memory 4 in write operations. Writing of ingress data packets DPs into data memory slices DMS of the data memory 4 is performed such that the received data packets DPs are evenly distributed in write operations between the different data memory slices DMS of the data memory 4.

The data network device 1 is further adapted to read egress data packets DPs to be transmitted to a data link of the data network from the data memory slices DMS of the data memory 4 in read operations. Reading of egress data packets is performed by the data network device 1 such that a misbalance between read operations from different data memory slices DMS of the data memory 4 is minimized. In a possible implementation of the network device 1 according to the first aspect of the present invention the apparatus 2 is adapted to access the data memory 4 by bursts. These bursts B can comprise read bursts (RB) and write bursts (WB). In read bursts RB data is read from a predetermined number of consecutive memory addresses of the data memory 4. In write bursts WB the data is written to a predeter- mined number of consecutive memory addresses within the data memory 4. The predetermined number of consecutive memory addresses to read or write data is the burst length (BL) of a burst (B). In a possible implementation the network device 1 according to the first aspect of the present invention being connected to the data memory 4 the connected data memory 4 consists of a predetermined number of memory devices (MD) each having a corresponding memory device width (MDW). Memory devices MD of the data memory 4 can be DDR-SDRAM devices. The number of memory devices MD or memory chips within the data memory 4 multiplied with the memory device width MDW is equal to a memory width (MW) of the data memory 4. The memory width MW is equal to a bus width (BW) of a data bus of the digital interface 3B which connects the data network device 1 with the data memory 4: MW = N x MDW≥BW, (1 ) wherein MW is the memory width of the data memory 4,

N is the number of memory devices MD within the data memory 4,

MDW is the memory device width of a memory device MD and

BW is the bus width of the data bus of the digital interface 3B.

The number of bits in a data burst is the burst size BS of the data burst B and corresponds to a burst length (BL) multiplied with the slice size SS of a data memory slice DMS of the data memory 4:

BS = BL x SS. (2)

In a possible implementation of the apparatus according to the first aspect of the present invention the burst size BS of the data burst B is equal to or smaller than the minimum possible data packet size DPSmin-

BS < DPSmin, (3) wherein BS is the burst size of a data burst and DPSmin is the minimum possible data packet size of a data packet DP transported within the data network.

Accordingly, the burst length BL multiplied with the slice size SS is also equal to or smaller than the minimum data packet size DPSmin-

BL x SS < DPSmin. (4)

Consequently, the slice size SS of a data memory slice DMS is equal to or smaller than the minimum possible data packet size DPSmin divided by the burst length BL.

DPS

SS < - . (5)

BL

Further, a number K of data memory slices DMS multiplied with the slice size SS of the data memory slices DMS corresponds to the memory width MW of the data memory 4:

K x SS = MW. (6) In a possible implementation of the network device 1 according to the first aspect of the present invention the minimum possible data packet size DPSmin of a data packet DP transported in the data network is 512 bits. In a further possible implementation of the device according to the first aspect of the present invention the burst length BL of a data burst B is eight (BL = 8). In a possible implementation of the device according to the first aspect of the present invention the slice size SS of a data memory slice DMS is equal to the minimum possible data packet size DPSmin divided by the burst length BL and comprises 64 bits or 8 bytes.

In a possible implementation of the network device 1 according to the first aspect of the present invention the data memory 4 comprises a predetermined number M of memory buffers MB. Each memory buffer MB can be adapted to store several bursts B and comprises a memory buffer size MBS which corresponds to a number of bursts B stored in the respective memory buffer MB. In a possible implementation of the network device 1 according to the first aspect of the present invention for each memory buffer MB of the data memory DM a queue (Q) is provided having a memory buffer pointer (MBP) which can be managed by a queue manager of the network device 1 .

The memory buffer size MBS of a memory buffer MB can be several times the burst length BL of a data burst B: MBS = n x BL. (7)

In a further possible implementation of the network device 1 the memory size of the data memory 4 is equal to the memory buffer size MBS multiplied by the number M of memory buffers MB:

MS = M x MBS, (8)

wherein MS is the memory size of the data memory 4,

M is number of memory buffers MB and

MBS is the memory buffer size of a memory buffer.

With the method and apparatus according to the first and second aspect of the present invention the digital interface bus of the digital interface 3B is divided into slices. A data packet DP is transferred by the digital interface 3B and uses as many slices as required. Accordingly, the data waste forms now part of the slice which is smaller than the larger part of the minimum data burst. With the method according to the second aspect of the present invention a control mechanism is provided that fits the slices of bandwidth to avoid a bandwidth waste. An example is shown in the diagrams of figs. 3A, 3B. In the given example three data packets with different data packet sizes of 10, 50 and 100 bytes traverse through the digital interface. In the dia- gram of Fig. 3A the transport of the data packets DPs is performed without slicing whereas diagram 3B shows a data transport of the data packets DPs with slicing as performed by the apparatus and method according to the first and second aspect of the present invention. As can be seen in Fig. 3A, three data packets DP1 , DP2, DP3 are transported via the digital interface, wherein the first data packet comprises 100 bytes corresponding to 800 bits so that 4 x 256 bits are transported consecutively via the data bus width of 256 bits. The second data packet DP2 comprises 50 bytes corresponding to 400 bits so that 2 x 256 bits have to be transported via the data bus width of 256 bits. The last data packet DP3 comprises only 10 bytes corresponding to 80 bits so that data can be transported in a single burst of 1 x 256 bits over the data bus width of 256 bits. Accordingly, the three data packets DP1 , DP2, DP3 require in the shown example of Fig. 3 seven bursts transported via the digital interface bus.

In contrast, in the example of Fig. 3B the same three data packets DP1 , DP2, DP3 undergo a slicing procedure as performed by the device and method according to the first and second aspect of the present invention. In the shown example, slicing is performed in two times 128 bits. As can be seen in Fig. 3B, the first data packet DP1 having 800 bits is transported as 7 x 128 bits bursts as the second data packet DP2 is transported in 4 x 128 bits bursts and the last data packet DP3 is transported in a single 1 x 128 bit burst. It is of note that the seventh 128 bit burst of the first data packet DP1 can be transported at the same time as the first 128 bit burst of the second data packet DP2 so that the bandwidth of the data bus of the digital interface is completely used and waste is avoided. In the same manner, the 128 bit burst of the third data packet DP3 can be transported at the same time in parallel as the last 128 bit wide burst of the second data packet DP2, thus avoiding a waste of bandwidth as well. Fig. 4 illustrates a bandwidth efficiency of one and two slices as shown in figs. 3A, 3B of data packets with different data packet size DPS. As can be seen by performing the slicing according to the method according to the present invention the bandwidth efficiency is significantly increased. The scheduling apparatus 2 as shown in Fig. 2 can form part of a store and forward network device 1 such as a switch or router. In an application with queuing, like switches or routers or image processors or other applications, there can be a memory queue. The network device 1 comprises a load balancer 5 which ensures that the writing of received data packets DPs into the data memory 4 is balanced. The queue can be usually implemented by a linked list. Such received ingress data packets DPs can be stored anywhere in the free memory space of the data memory 4 without restrictions. However, egress data packets DPs to be transmitted to a data link of the data network have not often a freedom of choice. Accordingly, a mechanism of the network device 1 decides which data packet DP should be sent which would lead to a misbalance in the data memory slices DMS of the data memory 4.

The data memory 4 consists of a predetermined number of memory devices MD such as DDR-SDRAM chips each having a predetermined memory device width MDW. Further, the data memory 4 can comprise a predetermined number M of memory buffers MB. Usually, if the buffer size is set to the minimum packet size, there are too many buffers and corresponding memory buffer pointers MBPS to be managed. Hence, in a possible embodiment a larger buffer size is selected that is several times the burst size BS. If, for example, the memory buffer size of a memory buffer MB comprises four slices then the space image of the data memory 4 can look like the patterns illustrated in Fig. 5. The data memory of Fig. 5 is partitioned in an upper and lower slice (U, L). The memory buffer MB is a block memory within the data memory 4 that is handled by a queue manager. In the implementation of Fig. 5 each memory buffer MB has memory space for up to 4 bursts which can be stored in different patterns as shown in Fig. 6. Each memory buffer MB has a pointer that the queue manager QM manages in queues Q in linked lists. The memory buffer MB can hold several bursts B and its size is determined to limit the number of pointers. A compromise on buffer size BS is between a very small buffer size with too many pointers versus a larger buffer size causing a waste of memory. Memory waste is caused because no more than one data packet DP is stored in the memory buffer MB as it is more convenient and easy to handle. Yet, one data packet DP can occupy more than one memory buffer MB.

Fig. 6 shows different possible data storage patterns of bursts within the memory buffer MB of 4 times minimum packet size or burst size. Each data buffer or memory buffer MB is spread over two slices of the data memory 4 as shown in Fig. 6. There is an upper slice and a lower slice. Accordingly, the data memory 4 is partitioned in the shown example of Fig. 6 in two slices, i.e. an upper slice and a lower slice and does comprise a predetermined number of memory buffers MB. Each memory buffer in the shown implementation is adapted to store several bursts, i.e. up to a maximum of 4 bursts. In the implementation shown Fig. 6 up to 4 bursts can be stored in the same memory buffer MB. As can be seen in Fig. 6 there are eight different possible patterns or combinations for storing bursts in a single memory buffer MB. Consequently, a data buffer descriptor can comprise a 3 bit code to describe the structure or pattern of the bursts stored in the respective memory buffer MB. This 3 bit code can be put in the data buffer descriptor after the load balancer 5 decides about the structure to write the bursts into the data memory 4. Accordingly, in a read operation the 3 bit code of the data buffer descriptor indicates where to find valid bursts of data within the data memory 4. The data buffer size can be, for example, 4 times the minimum data packet size.

A memory slice or a data slice refers to a physical data interface. Data buffers and memory buffers MB on the other hand are logical and not physical and define how the memory addresses are handled. For example, assuming that the data memory 4 comprises four memory devices MD each comprising 1 GB then the system memory comprises 4 GB. This 4 GB memory range can be divided logically into data buffers, for example, 1 KB each leading to 4 M data buffers. These data memory buffers MBs have data buffer pointers that are managed by the data buffer manager. This can be one layer beneath the queue manager. A queue manager (QM) of the scheduling apparatus 2 manages the data packets. A data packet DP can occupy one or several data buffers. There are many possible implementations to define the mechanism of queues and data buffers where there is a trade-off between performance and overhead. A possible implementation is that when a data packet DP is received it is being stored in as much data buffers as it consumes. For example, a 7, 5 KB long data packet can be stored in 8 x 1 K data buffers. The data buffer manager can link the eight data buffers by pointers. In a possible implementation the queue manager QM does store only the pointer to the first data buffer of the data packet DP. Accordingly, data buffers form memory locations while data memory slices are physical interfaces. Accordingly, data buffers can be split on all memories. In order to write a data buffer one must access by a read or write operation to all slices. Alternatively, one can define the data buffers to be completely contained inside memories of the same slice. Accordingly, in this implementation a data buffer can involve only one of the slices. In this implementation the data buffer pointer does need more bits. Part of the data buffer pointer can indicate the memory address while another part of said pointer indicates which slice is used and a structure how bursts are stored inside a data buffer. In a possible implementation the memory bandwidth which is consumed can be counted by the number of accesses or data bursts.

In a possible implementation the memory accesses are toggled between slices to provide a balance and efficient bandwidth. The data packet DP can be represented by a sequence of data memory accesses or bursts. Accordingly, a data packet DP can be formed by a sequence of accesses and the order is kept inside the data memory. The accesses or bursts can be applied to a corresponding sequence of memory locations. For example, if a data packet DP comprises a sequence of accesses or bursts 1 , 2, 3 one can write in a possible implementation out of order the second access to memory location 2 then the third access 3 to memory location 3 and finally the access 1 to memory location 1 . When the memory is read in a read operation the sequence can in a possible implementation be always read from the lower address to the higher address or memory location in the order 1 , 2, 3 so that the data will be forwarded in the correct order. Another possible implementation is to read out of order into a temporary buffer in an internal memory and then to reorder. When writing out of order the access sequence of the slices is modified and provides balancing.

The data buffer size can be derived by two factors. On the one hand, it is adapted to have a minimum possible data buffer equal to the buffer size so that the waste of memory is kept to a minimum. On the other hand, too many data buffers mean too many data buffer pointers to handle. The data buffer pointers are usually held in a faster and expensive memory such as a SRAM or ASIC internal along with the controller. Consequently, the number of data buffer pointers can also be minimized. In a possible embodiment the data buffer size is between 1 - 4 times minimum data packet size DPSmin-

A data network has a round time trip (RTT) which gives the time for router A to send a data packet DP to a neighbouring router B to get the data packet back that acknowledges the reception of the data packet DP. This RTT time can be around 100 ms for an Internet data network topology. This round time trip RTT defines for how long a data packet DP should be stored in a data memory. Consequently, the maximum input bandwidth multiplied by the RTT time is the amount of memory that has to be provided by the data memory 4. In a data network a worst case scenario is that all data traffic comprises minimum sized data packets back-to-back. If for example, the buffer size is 4 times the minimum data packet size DPSmin then in the worst case scenario of the data network ¾ of the memory is wasted because no more than one data packet DP can be stored in a data buffer. Accordingly, the data memory is implemented with 4 times the actual traffic stored during RTT time. With the method according to the present invention a slicing is performed, wherein a wide bus is divided to narrow busses that are independent from each other. This property is taken as an advantage in the case of slicing by choosing the slice where to store the received data packet DP. With the egress data packets a dequeuing is performed wherein a data packet DP is read from the data memory 4 and forwarded to a data link within the data network. As opposed to the enqueuing operation the dequeuing operation specifies precisely which data packet DP is to be sent.

The method and device according to the present invention with the partitioning of the data bus and memory into slices reduces waste of bandwidth. Balancing of the slices can be performed by mechanisms comprising a static slicing by design or a dynamic slicing during operation. With a static slicing the slice size SS is selected such that even the minimum data packet is spread over more than one slice. Data packets DP can be written such that the slices always toggle. With dynamic slicing it is possible to perform a write balance for read (W4R). When the mechanism detects that one slice is utilized more than the other, it inserts the write operation in unbalanced manner but in the opposite direction. Further, it is possible to perform a read balance for read (R4R). When the method detects that one slice is utilized more than the other it reads the next slice in an unbalanced manner but in the opposite direction. It does this by changing the order that is necessary as explained above.

Figs. 7, 8 show flow charts for illustrating the operation of the method for dynamic scheduling of memory accesses of a data memory according to the second aspect of the present invention. Fig. 7 shows the handling of a pending read packet request applied to the load balancer 5 of the network device 1 . The load balancer 5 decides first whether the packet size of the data packet to be read is bigger than the burst size. If this is not the case one burst is read as specified from one of the memory buffers MB being partitioned into, for example, four slices each having a size of a burst. As can be seen, for example, in figs. 5, 6 in the shown implementation the data memory 4 is partitioned into an upper and lower memory slice and comprises a plurality of memory buffers MB each having four data memory slices DMS wherein two of the four data memory slices DMS belong to the upper slice 4 and two of the data memory slices DMS be- long to the lower slice L. If in case the packet size is bigger than the burst size, the load balancer 5 decides in a further step whether a last memory access has been to an upper slice of the data memory 4 or to a lower slice of the data memory. If it happens that the last slice was an upper slice, the lower slice memory controller 9 reads a burst series starting at a lower slice as shown in the flow chart of Fig. 7. On the contrary, if the last slice was not an upper slice, i.e. when the last slice was a lower slice, the upper slice memory controller 10 starts a burst series starting from an upper slice. For example, if the memory block MB contains four slices, i.e. when it is completely filled with bursts and comprises the code as shown in Fig. 6, the operation may lead to a sequence of slices read from the memory block MB starting with a lower slice and reading the slices in the order 2, 1 , 4, 3. In this example the read-out burst are out of order and the reordering of the bursts is performed by the reordering units 7, 8 of the network device 1 . After the slices have been read from the memory buffers MB, the respective memory buffers are released as shown in the flow chart of Fig. 7.

Fig. 8 shows the operation when a load balancer 5 receives a pending write packet request. The buffer manager first allocates data buffers wherein the number of data buffers is given by round up [data packet size/burst size BS/ 4] for a data memory 4 partitioned into memory blocks each having four bursts as shown for example in Fig. 5, 6. After allocation of the data buffers, the load balancer 5 checks whether the last slice that was accessed as a previous data packet is an upper slice U of the data memory 4 or a lower slice L. If the last slice has been an upper slice, a memory controller writes a burst series starting at a lower slice into a single memory buffer MB. It can write up to four slices in a memory buffer MB as, for example shown in figs. 5, 6. On the contrary, if the last slice has been not an upper slice, i.e. a lower slice, a memory controller writes a burst series starting with an upper slice into the data memory 4. Fig. 9 shows a flow chart of a possible implementation of a method for providing a dynamic scheduling of memory accesses to a data memory 4 according to the second aspect of the present invention. The data memory 4 is partitioned into data memory slices DMS of a slice size SS corresponding to a minimum possible data packet size of a data packet DP transported in the data network.

In a first step S1 write ingress data packets received from a data source of the data network are written into data memory slices DMS of the data memory 4 in write operations. This is performed such that the received data packets DPs are evenly distributed in the write operations between the different data memory slices DMS of the data memory 4.

In a second step S2 the egress data packets transmitted to a data link of the data network are read from the data memory slices DMS of the data memory 4 in read operations. This is performed such that a misbalance between read operations from different data memory slices DMS of the data memory 4 is minimized.

The method and device according to the first and second aspect of the present invention as well as the store and forward network device 1 according to the third aspect of the present invention can be used in any data network transporting data in data packets. They are especially suited for data networks for transporting data in data packets DPs of variable size. The method and device according to the present invention increase the digital interface performance as well as the memory performance of the data memory 4. The store and forwarding device is a device 1 where data packets DPs are stored and then forwarded. In the store and forward technique information or data is sent to an intermediate station or node where it is kept and sent at a later time to a final destination of the data network or to another intermediate station or network device. In a possible implementation of the network device 1 as shown in Fig. 2 the respective node can also verify integrity of a message or data packet DP before forwarding it to a data link of a data network. In the embodiment shown in Fig. 2 the load balancer 5 is connected to a single data memory 4. In a further possible embodiment the load balancer can be connected to several data memories 4 via parallel digital interfaces.

Claims

1 . A network device (1 ) within a data network for providing a dynamic scheduling of memory accesses to a data memory (4) connected to said network device (1 ), wherein said data memory (4) is partitioned into data memory slices of a slice size corresponding to a minimum possible data packet size of a data packet, DP, transported in said data network,

wherein a load balancer (5) of said network device (1 ) is adapted to write ingress data packets into data memory slices of said data memory (4) in write opera- tions such that the ingress data packets, DP, are evenly distributed in the write operations between the different data memory slices of said data memory (4),

wherein said load balancer (5) is adapted to read egress data packets to be transmitted from the data memory slices of said data memory (4) in read operations such that a misbalance between read operations from different data memory slices of said data memory (4) is minimized.

2. The network device according to claim 1 , wherein said network device (1 ) is adapted to access said data memory (4) by bursts,

wherein said network device (1 ) is adapted to access said data memory (4) by bursts comprising

read bursts where data is read from a predetermined number of consecutive memory addresses and

write bursts where data is written to a predetermined number of consecutive memory addresses,

wherein said predetermined number of consecutive memory addresses is a burst length of a burst.

3. The network device according to claim 2,

wherein said data memory (4) consists of a predetermined number of memory devices each having a memory device width,

wherein the number of memory devices multiplied with the memory device width is equal to a memory width of said data memory (4),

wherein the memory width is equal to a bus width of a data bus of a digital interface (3).

4. The network device according to one of the preceding claims 2 or 3, wherein the number of bits in a burst is a burst size of said burst and corresponds to a burst length multiplied with the slice size of a data memory slice of said data memory (4).

5. The network device according to one of the preceding claims 2 to 4,

wherein the burst size of the burst is smaller than the minimum possible data packet size.

6. The network device according to one of the preceding claims 3 to 5,

wherein the number of data memory slices multiplied with the slice size of the data memory slices corresponds to the memory width of said data memory (4).

The network device according to one of the preceding claims 3 to 6, wherein the memory devices of said data memory (4) are DDR SDRAM de-

8. The network device according to one of the preceding claims 1 to 7,

wherein the minimum possible data packet size of a data packet transported in said data network is 512 bits.

9. The network device according to one of the preceding claims 2 to 8,

wherein the burst length of a burst is eight.

10. The network device according to one of the preceding claims 5 to 9,

wherein the slice size of a data memory slice being equal to the minimum possible data packet size divided by the burst length comprises 64 bits.

1 1 . The network device according to one of the preceding claims 1 to 10,

wherein the data memory (4) comprises a predetermined number of memory buffers, wherein each memory buffer is adapted to store a plurality of bursts and comprises a memory buffer size which corresponds to a number of bursts stored in the respective memory buffer.

12. The network device according to claim 1 1 ,

wherein for each memory buffer of said data memory (4) a queue is provided having a memory buffer pointer managed by a queue manager of said apparatus (2).

13. The network device according to claim 1 1 ,

wherein the memory buffer size of a memory buffer is multiple times the burst length of a burst.

14. The network device according to claim 13,

wherein the memory size of said data memory (4) is equal to the memory buffer size multiplied by the number of memory buffers.

15. A method for dynamic scheduling of memory accesses of a data memory (4) being partitioned into data memory slices of a slice size corresponding to a minimum possible data packet size of a data packet transported in said data network, said method comprising:

writing (S1 ) ingress data packets into data memory slices of the data memory (4) in write operations such that the ingress data packets are evenly distributed in write operations between the different data memory slices of the data memory (4); and

reading (S2) egress data packets to be transmitted from the data memory slices of the data memory (4) in read operations such that a misbalance between read operations from different data memory slices of said data memory is minimized.