CN116389390A

CN116389390A - Switching system for high-speed high-capacity aggregation type cross node output joint queuing

Info

Publication number: CN116389390A
Application number: CN202310308728.XA
Authority: CN
Inventors: 潘伟涛; 石廷澳; 邱智亮; 高一鸣; 李晓旺; 孙伟
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-03-27
Filing date: 2023-03-27
Publication date: 2023-07-04

Abstract

The invention provides a high-speed high-capacity aggregation type cross node output joint queuing switching system, which consists of an input processing module, a high-capacity switching processing unit Crossbar and output queue management. The input processing module adopts a pipeline structure, so that each clock can be guaranteed to transmit data, more ports can be aggregated, and the occupation of a Crossbar buffer memory of a high-capacity exchange processing unit and the number of output queues are reduced. The crossing node in the high-capacity exchange processing unit adopts a Fifo form, so that the cache utilization rate can be improved; the invention uses buffer to isolate the high-capacity exchange processing unit and output queue management, which can make the high-capacity exchange processing unit and output queue management, with higher processing efficiency, and can greatly reduce the number of queues by adopting output queue management.

Description

Switching system for high-speed high-capacity aggregation type cross node output joint queuing

Technical Field

The invention belongs to the technical field of packet switching architecture, and particularly relates to a switching system for high-speed high-capacity aggregation type cross node output joint queuing.

Background

Current switching fabrics can be categorized into Input Queue (IQ), output Queue (OQ), input-output joint queue (Combined Input and Output Queued, CIOQ), and Input-cross-node joint queuing fabrics (Combined Input and Crosspoint Queued, CICQ) according to queuing policies.

The input queuing structure (IQ) is to set a buffer memory at an input port in front of the Crossbar switch network, store the input data packet in the buffer memory, schedule the data packet to dequeue through a specific queue scheduling algorithm, and transmit the data packet to an output port through the Crossbar structure. Because the output rate of the output port connected with each output channel of the Crossbar switching network is limited, queuing caches are not arranged at the output port and the Crossbar node, and the queues of each input cache can be used for dequeuing data packets only when the output port is idle, the channel transmission rate is only the same as the output rate of the output port for a single channel of the Crossbar switching network. At this time, the speed ratio (the ratio of the internal speed of the switch structure to the port speed) of the structure is 1, and the internal speed of the switch structure and the cache writing and reading speed are not required too high, so that the expandability is good. The input queuing structure has the problem of Head of Line (HOL) blocking when the input buffer has only a single First-In-First-Out (FIFO) queue, and when the number of ports is large, the switching capacity is seriously reduced, for example, under Bernoulli service with all output ports evenly distributed, the structure has a throughput rate of only 58.6%. For this problem, queues may be set up at the input buffer per output port by virtual output queue (Virtual Output Queue, VOQ) techniques to distinguish between data packets in different directions. However, because of distinguishing multiple queues, similar to the shared cache structure, in order to ensure fairness among the queues, a certain cache is naturally allocated to each queue, and if the number of ports is increased to increase the number of the partitioned queues, the requirement on cache resources at each input port will also be high.

The output queuing structure (OQ) is to set a queuing buffer at the output port, and the input data packet is switched to the corresponding output port through the Crossbar switch network before entering the buffer for queuing. Because the structure is not provided with queuing caches at the input position and in the Crossbar network, when all N input ports simultaneously input packets destined to the same output port, in order to ensure that no packet loss occurs at this time, the receiving bandwidth at the output queuing position needs to reach at least N times of the line speed of the input port (the input port rate is the same), namely the acceleration ratio needs to reach N. It can be seen that, in essence, this structure is also a shared buffer, but just the single-channel shared buffer is split into multiple channels, and the distributed shared buffer exchange naturally inherits the disadvantage of the shared buffer.

The input-output joint queuing structure (CIOQ) is a compromise between input queuing and output queuing, and a buffer is set at the input port and the output port at the same time, and a Crossbar switching network is set to have a certain speed-up ratio s. The problem of N times of the speed-up ratio of the output queuing structure can be solved by setting the input port cache, and 1< s < N; and the switching network has high speed ratio, so that a plurality of input ports can be aggregated, and the problem that the VOQ mechanism of the input queuing structure has large requirement on cache resources is solved. It is found that for the input-output joint queuing structure, the throughput rate of 100% can be achieved when the speed ratio is 2. However, the structural scheduling algorithm needs to process queues for queuing input and output simultaneously, has high complexity and is not beneficial to FPGA realization.

The input cross-node joint queuing structure (CICQ) is a combination of Input Queuing (IQ) and cross-node buffered queuing (Crosspoint Queued, CQ), and differs from input queuing mainly in that a certain amount of buffering for queuing is provided on the cross-node. The method effectively isolates the input end and the output end, particularly in a further developed input cross node output joint queuing structure, the buffer of the cross node isolates the queue scheduling of the input and the output, so that the input and the output can respectively adopt different scheduling algorithms, the complexity of a single algorithm is reduced, and the method is distributedThe structure of (C) is also suitable for joint realization of a plurality of FPGAs. As the number of ports increases, the dependence on cache resources under the structure is also higher, and O (N ² ) Relationship.

Through the above analysis, the problems and defects existing in the prior art are as follows:

(1) The existing switching architecture technology easy to realize by the FPGA, such as an input queuing structure (IQ), an input cross node joint queuing structure (CICQ) and an output queuing structure (OQ), has the defect that the dependence on cache resources is higher and higher along with the increase of the number of ports, the port rate and the port bit width.

(2) The input-output joint queuing structure (CIOQ) can solve the problem that the VOQ mechanism of the input queuing structure has large requirement on cache resources, but the scheduling algorithm on the structure needs to process the queues of the input queuing and the output queuing simultaneously, has higher complexity and is not beneficial to FPGA realization.

(3) In the existing switching architecture technology easy to realize by the FPGA, such as an output queuing structure (OQ), when all N input ports simultaneously input packets destined for the same output port, in order to ensure that no packet loss occurs at this time, the receiving bandwidth at the output queuing position needs to reach at least N times the line speed of the input port, i.e. the acceleration ratio needs to reach N.

(4) The existing switching architecture technology easy to realize by the FPGA is only suitable for low-speed, small-bit-width and small-switching capacity, and has low practicability for high-speed and large-capacity switching.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a switching system for high-speed high-capacity aggregation type cross node output joint queuing. The technical problems to be solved by the invention are realized by the following technical scheme:

the invention provides a switching system for high-speed high-capacity aggregation type cross node output joint queuing, which comprises: the system comprises an input processing module, a high-capacity exchange processing unit and an output processing module;

the input module is used for carrying out aggregation polling on all the input Ethernet frame data, searching and learning frame information matched with the Ethernet frame data, and writing the Ethernet frame data into corresponding cross nodes in the high-capacity exchange processing unit;

the high-capacity exchange processing unit is used for arbitrating and scheduling the written Ethernet frame data according to the priority so as to generate an enqueue request;

the output processing module is used for determining whether the scheduled Ethernet frame data is enqueued or discarded according to the enqueue request, and dequeuing when the Ethernet frame data is queued.

The invention provides a high-speed high-capacity aggregation type cross node output joint queuing switching system, which consists of input processing, cross nodes in a high-capacity switching processing unit Crossbar and output queue management. The input processing module adopts a pipeline structure, so that each clock can be guaranteed to transmit data, more ports can be aggregated, and the occupation of a Crossbar buffer memory of a high-capacity exchange processing unit and the number of output queues are reduced. The crossing node in the high-capacity exchange processing unit adopts a Fifo form, so that the cache utilization rate can be improved; the invention uses buffer to isolate the high-capacity exchange processing unit and output queue management, which can make the high-capacity exchange processing unit and output queue management, with higher processing efficiency, and can greatly reduce the number of queues by adopting output queue management.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

FIG. 1 is a block diagram of a switching system with high-speed high-capacity aggregate cross-node output joint queuing provided by an embodiment of the present invention;

FIG. 2 is a flow chart of aggregate polling of multiple packets provided by an embodiment of the present invention;

FIG. 3 is a 4-stage pipeline structure diagram of a learning look-up table module in input processing according to an embodiment of the present invention;

FIG. 4 is a 4-stage pipeline flow chart of a learning look-up table module in input processing according to an embodiment of the present invention;

FIG. 5 is a 3-stage pipeline architecture diagram of a frame shifting module in input processing provided by an embodiment of the present invention;

FIG. 6 is a 3-stage pipeline flow diagram of a frame shifting module in input processing provided by an embodiment of the present invention;

FIG. 7 is a block diagram of a cross node in a Crossbar of a high-capacity switching processing unit according to an embodiment of the present invention;

FIG. 8 is a flowchart of the cross node and output queue management outlining in the high capacity switch processing unit Crossbar provided by the embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.

As shown in fig. 1, the present invention provides a switching system for high-speed high-capacity aggregation type cross node output joint queuing, comprising: the system comprises an input processing module, a high-capacity exchange processing unit and an output processing module;

noteworthy are: the aggregation means that a plurality of ports are combined into a bus, so that the cache resources occupied by a Crossbar node of a high-capacity exchange processing unit can be reduced, and the bus utilization rate is improved. The aggregation port adopts a LocalLink format, and when a port has a packet, a send_data_flag flag signal is pulled up in advance to indicate that the port has a packet to enter input processing. When the polling output bus is idle, only one path of send_data_flag flag is pulled high, and the condition is that the output bus occupation is grouped for the path of ports, and the bus occupation is unchanged when the output bus is busy. When the bus is idle, the multi-channel send_data_flag signal is pulled high, and the ready signal of the LocalLink is pulled high to indicate which port the polling output bus is delivered to. For fairness, an RR polling approach is used.

Referring to fig. 1, the input processing module in the present invention includes an input polling module, a frame extraction module, a learning search module, and a frame moving module; the output processing module comprises a pre-enqueuing module and an output queue management module;

the input polling module is used for receiving Ethernet frame data through multiple paths of parallel high-speed ports, wherein the Ethernet frame data of each path represents a packet, and the multiple paths of parallel packets are subjected to aggregation polling to form a path of serial packets and then stored into the RAM of the frame moving module;

referring to fig. 2, fig. 2 shows the details of the high-speed port, packet, ethernet frame data, and aggregate polling in the present invention, and it can be seen from fig. 2 that multiple packets finally form a single serial packet. The polled packet is stored in the RAM of the frame moving module, the RAM is divided into 9 sections, each section accommodates a longest frame, and when the destination port number is found, the packet enters a corresponding node of the high-capacity switching processing unit Crossbar according to the destination port number.

The frame extraction module is used for extracting source MAC, destination MAC and source port number of the packet for the learning and searching module to learn and search;

the learning searching module is used for searching and learning in a four-level pipelining mode according to the source MAC, the destination MAC and the source port number, so that the conforming frame information is found out from the searching table and the learning table;

the frame information comprises a destination port, a packet priority, a frame length and a frame type;

the frame moving module is used for reading out the corresponding packet from the RAM of the frame moving module and writing the packet into the cross node of the high-capacity exchange processing unit according to the frame information;

the high-capacity exchange processing unit is used for arbitrating the packet written in the self-intersecting node according to the priority, so as to determine whether to store the packet in the self-unicast buffer fifo or the multicast buffer fifo;

the pre-enqueuing module is used for caching the packets after the cross node column arbitration, isolating the high-capacity exchange processing unit from the output queue management, and moving the packets to the bus;

and the output queue management module is used for reading the packets from the bus in the form of shared cache and caching the packets, and performing at least one operation of logical enqueuing, physical enqueuing, logical dequeuing and physical dequeuing on the packets.

The aggregate joint cross node and output queuing exchange architecture greatly reduces the number of queues and the number of caches for queue management compared with input queue management, and reduces the dependence on caches.

The whole input processing module adopts a pipeline and an outlining architecture, can be more suitable for the requirements of high speed and large capacity, meets the requirement that each clock of a moving bus can transmit data, and has higher port aggregation degree. The number of buses can be reduced under high aggregation, and the number of the cross nodes is the square of the number of the buses, namely, the number of the cross nodes can be reduced by reducing the number of the buses, so that the cache occupation is reduced.

The threshold value is set in the cross node of the high-capacity exchange processing unit Crossbar, so that Qos with high priority can be ensured, the column arbitration adopts a multi-stage column arbitration mode, and wiring congestion can be avoided; in the output processing, a pre-enqueuing module is used for isolating the cross node in the output queue management and high-capacity exchange processing unit Crossbar, so that the cross node outlining in the queue management and high-capacity exchange processing unit Crossbar is used for reducing the scheduling complexity and improving the scheduling efficiency.

As shown in FIG. 3, an embodiment of the present invention provides a schematic diagram of a four stage pipeline learning lookup process at high speed and large capacity. As shown in fig. 4, the left diagram is a find four-stage pipeline flow diagram and the right diagram is a learn four-stage pipeline flow diagram. The first stage of the four-stage pipeline is hash, the second stage is table lookup, the third stage is processing of the result, and the fourth stage is write-back.

The step of searching by using a four-stage pipelining mode in the learning searching module comprises the following steps:

when a valid source MAC address is received, the pipeline starts to work;

the first stage of the pipeline is used for carrying out hash operation on the source MAC and the destination MAC;

the second stage of the pipeline is used for reading the lookup table according to the hash result of the source MAC;

the third stage of the pipeline is used for processing the result read from the lookup table, and the processing process comprises the following steps: comparing whether the source MAC address in the result read by the lookup table is the same as the current destination MAC address, if so, the destination port is the source port number in the result read by the lookup table; if not, broadcasting the packet to each port;

the fourth stage of the pipeline is used for taking the source MAC as a query address, taking the source MAC, the source port number and the current time as data, and writing the data into a lookup table according to the query address.

The learning and searching module performs learning by using a four-stage pipelining mode, and comprises the following steps:

when a valid MAC address is received, the pipeline starts to work;

the second stage of the pipeline is used for reading the learning table by utilizing the hash result of the source MAC;

the third stage of the pipeline is used for processing the result read from the lookup table, and the processing process comprises the following steps: comparing whether the source port number is the same as the source port number in the result read by the learning table, if the source port number is different and the conflict detection is started, learning fails, and recording the conflict port and the source MAC, and the pipeline result; if not, but the conflict detection is not started, entering a fourth stage of the pipeline;

the fourth stage of the pipeline is used for taking the source MAC as a query address as an address of a learning table, taking the source MAC, a source port number and the current time as data, and writing the data into the learning table according to the query address.

Four-stage flow of learning table look-up has two bypasses: adjacent stage bypass and barrier stage bypass. The neighbor bypass refers to that the address of the lookup table is the same as that of the previous lookup table, and then the content of the upper-level result processing is processed as the result obtained by the current lookup table. The level-separated bypass means that the write-back address of the upper level is the same as the current table-look-up address, and then the content written back by the upper level is treated as the result obtained by the current table-look-up.

As shown in fig. 5, an embodiment of the present invention provides a schematic diagram of a three-stage pipeline frame shift module at high speed and large capacity. As shown in fig. 6, the frame moving module of the present invention adopts a three-stage pipeline form, and specifically comprises the following steps:

when the grouping carries out frame information extraction and learning searching, the grouping polled by the polling module firstly enters the frame moving module for buffering. The buffer memory is a RAM of 4KB, which is divided into 10 sections, and each section is marked by a bit or not, so that a ram_seg_valid signal of 10 bits is maintained, when each bit corresponds to the section being idle, the corresponding bit is set to 1, otherwise, the corresponding bit is set to 0. When one of the segments of data is shifted away by one clk, the bit position 1 corresponding to ram_seg_valid can be set. Looking for the first 1 is combinational logic, similar to division 2, when data is received it can be known to which segment the data is stored. When the destination MAC is found, the source port number, the output port number, the multicast flag, the frame priority, which section of the data frame exists, the frame length, and the frame type are written into the frame information fifo. When fifo is not empty and the moving bus is idle, framing tag is removed according to frame information fifo data: destination port, frame priority, frame length, frame type. And meanwhile, reading the data frame according to the frame length and the number of segments, and adding the frame tag to the head of the corresponding frame. When the frame is being transmitted, a clock group tag reads data before the frame is moved, so that each clock can transmit data under the high-speed port line speed condition, and the bandwidth utilization rate is improved. If a frame is stored in a RAM, for example 256 bits wide, supporting a 1522B length, a RAM requires 4 36 kbits. If each RAM is separated independently, the resource waste is 90.8%. The memory RAM needs to be taken together with one RAM and then the address is segmented. 256 bits wide and 512 depths can support 10 longest frames, namely can be divided into 10 segments, so that the resource waste is 7.11%. So segmenting the RAM in this way can improve resource utilization.

When receiving the destination port number of the learning table look-up module, taking the destination port number, the source port number, the multicast mark, the priority, the storage section number, the frame length and the frame type as frame information; the frame information fifo adopts a First-word-Fall-Through form;

storing the frame information into own frame information fifo;

when the frame information fifo is not empty, calculating a starting address and a frame length stored in a packet according to the frame length in the frame information fifo and a section corresponding to a packet RAM, and simultaneously reading the frame information fifo to obtain frame information of a next packet;

reading the RAM of the self according to the initial address and frame length of the packet to obtain the packet;

setting tags for the read packets;

wherein, the label tag includes: destination port number, priority, frame length, frame type;

and dividing the packet carrying the tag to a corresponding crossover node.

The invention splices the tag and the data, places the tag in front of the frame, and then moves the tag onto the bus together. the tag information is used for generating enqueue frame information according to the tag information after the high-capacity exchange processing unit Crossbar arbitrates, and enqueue is requested.

The whole input processing adopts a pipeline and an outlining architecture, and can be more suitable for the requirements of high speed and large capacity. The input processing adopts a pipeline and an outlining architecture, so that the condition that each clock of the moving bus can transmit data can be met, and the port aggregation degree can be higher. The number of buses can be reduced under high aggregation, and the number of the cross nodes is the square of the number of the buses, namely, the number of the cross nodes can be reduced by reducing the number of the buses, so that the cache occupation is reduced.

As shown in fig. 7, the embodiment of the present invention provides a schematic diagram of a cross node in a Crossbar of a high-capacity exchange processing unit under high-speed and high-capacity conditions, and corresponding column arbitration is multi-stage column arbitration, so that wiring resources are more uniform, and wiring congestion is relieved under high-speed and high-bit width conditions. There are unicast and multicast buffers fifo in the intersecting nodes, each buffer fifo being 16KB. The high-capacity exchange processing unit comprises a plurality of rows and a plurality of columns, each column is provided with a column arbitration module, and the rows and the columns are crossed to form a cross node;

the crossover node is not prioritized here in order to reduce reliance on cache at high speed and large capacity. However, in order to provide better QoS, a threshold is set at the crossover node, and when the capacity of the crossover node is greater than the threshold, only the high priority is allowed to enter the crossover node, and the low priority is discarded, and when the capacity of the crossover node is lower than the threshold, both the high priority and the low priority can enter the crossover node, so that better QoS is provided for the high priority. When the data exists in the corresponding column node of the high-capacity exchange processing unit Crossbar, the column arbitration is performed on the corresponding column of the high-capacity exchange processing unit Crossbar in an RR scheduling mode. In order to reduce the congestion of the layout wiring under the conditions of high speed, large capacity and large bit width, a multi-stage arbitration mode is adopted to more uniformly distribute the layout wiring resources.

Each crossover node for:

after receiving the packet, judging whether the packet is unicast or multicast, if the packet is unicast, judging whether the stored data quantity of the unicast cache fifo exceeds a set quantity threshold, and if the stored data quantity exceeds the quantity threshold, storing the high-priority packet according to the priority of the tag; if this set number threshold is not exceeded, both high and low priority are allowed to enter unicast cache fifo;

if the packet is multicast, directly storing the packet in multicast fifo; when the packet enters the cache fifo, maintaining the residual data quantity of the multicast cache fifo according to the frame length; when the multicast cache fifo has data, a request is sent to the column arbitration module, wherein the request indicates that the cross node has data to be forwarded;

the column arbitration module is used for carrying out RR polling scheduling according to the request sent by the corresponding column crossing node, generating an enqueue request according to the tag of the packet during scheduling, and sending the enqueue request to the output queue for management; and sending the packets without labels to a pre-enqueuing module for caching.

The invention sets the threshold value in the unicast buffer fifo, can ensure the Qos of the high-priority group under the high-speed high-capacity low-buffer, and meanwhile, the column arbitration is multi-stage column arbitration, so that the wiring resources are more uniform, and the wiring congestion is relieved under the high-speed high-bit width.

As shown in fig. 8, an embodiment of the present invention provides a process schematic of the pre-enqueuing module at high speed and high capacity. The column arbitration scheduling mode and the enqueuing scheduling mode of the high-capacity exchange processing unit Crossbar can be isolated through the pre-enqueuing module, and the enqueuing, moving and discarding operations of the packets are performed in the module, so that the execution process of the high-capacity exchange processing unit Crossbar column arbitration and the enqueuing scheduling cannot be influenced, the scheduling complexity is reduced, and the scheduling efficiency is improved. The method comprises the following specific steps of enabling a high-capacity exchange processing unit Crossbar and output queue management outlining to be carried out through a pre-enqueuing module under high-speed high-capacity conditions:

when receiving the grouping after column arbitration, writing the grouping into a self data cache fifo, and simultaneously writing the frame length of the grouping into a self frame length fifo;

the frame length is used for making an offset when reading data buffer or discarding a packet;

when an enqueue ready signal is received, caching fifo according to frame length read data, and when the corresponding packet is read out, reading the frame length fifo to obtain next packet frame length information; the frame length fifo adopts a First-word-Fall-Through mode, so that the fifo reading time can be saved; and moving the read packet onto the bus;

when the enqueue failure signal is received, the data buffer is shifted according to the frame length, and the corresponding packet is discarded.

The pre-enqueuing module is used for isolating the column arbitration and the enqueuing scheduling, so that the column arbitration and the enqueuing scheduling can be out-stand, the enqueuing result does not influence the processing bandwidth of the column arbitration, and the packet processing efficiency is higher.

The enqueuing of the queue management in the present invention takes the form of a shared BD. Enqueues are classified into logical enqueues and physical enqueues. Logical enqueuing refers to reading related queue information according to enqueuing frame information, and then carrying out enqueuing judgment by using the queue information and a threshold: whether the number of used nodes BD plus the current BD consumption number is greater than a maximum node maximum threshold; whether the number of used BD of the queue plus the current BD consumption exceeds the maximum threshold of the queue; if the number of the used nodes BD is larger than the minimum threshold of the nodes, whether the shared buffer overflows or not; if the number of used nodes BD is smaller than the node minimum threshold, the shared buffer area overflows. If the enqueue can be carried out according to the judgment, the corresponding BD number is applied, otherwise, the enqueue is refused. Physical enqueuing refers to writing the corresponding packet into the corresponding buffer according to the applied BD information.

Dequeuing of queue management takes place in the form of a shared BD. Dequeuing is classified into logical dequeuing and physical dequeuing. Logical dequeuing refers to obtaining relevant queue information according to the queue number, and generating dequeuing scheduling. Physical dequeuing refers to obtaining a storage address corresponding to a packet according to dequeuing scheduling information, and taking data according to the storage address and putting the data on a bus for outputting.

The invention can realize the application of the system by executing the following steps on the system:

s101, an input polling module aggregates a plurality of ports and performs RR fair polling;

s102, a frame extraction module extracts source MAC, destination MAC and source port number of a packet;

s103, a learning and searching module adopts a pipeline form to perform learning and searching;

s104, the frame moving module reads data according to the frame length number, sets tag labels, and moves the data to corresponding cross nodes according to the destination port number;

s105, the column arbitration module performs RR scheduling according to the cache condition of the corresponding column crossing node, and simultaneously generates an enqueue request according to the scheduled packet tag header;

s106, the scheduled packets enter a pre-enqueuing module, a enqueuing result is waited, physical enqueuing can be carried out when enqueuing is carried out, and otherwise, the packets are discarded;

s107, when the queue has data, the logical dequeuing and the physical dequeuing are carried out.

The above description is of the scheme details of the switching system for high-speed high-capacity aggregation type cross node output joint queuing provided by the invention. Packet forwarding in the system of the present invention is based on an aggregate cross-node output joint queuing switching architecture for data interaction with devices in the switching network. The invention is divided into three parts of input processing, cross nodes in the high-capacity exchange processing unit Crossbar and output queue management. In order to meet the rate of high-speed port aggregation, the input processing is integrally in a pipeline form: the learning and searching module is four-stage flow, namely hash, table lookup, result processing and writing back. The moving module is three-level pipeline, which is used for respectively obtaining frame information, grouping tags and reading data RAMs and moving data to a bus, wherein the learning and searching module and the moving module are out-setting. The input processing module is designed in such a way that each clock can transmit data, more ports can be aggregated, and occupation of a Crossbar buffer memory of the high-capacity exchange processing unit and the number of queues of an output queue are reduced.

The invention also aims to reduce cache and ensure Qos of grouping high priority at high speed and large capacity, lead wiring resources to be more uniform through multi-level column arbitration and alleviate wiring congestion at high speed and large bit width. When the high-capacity switch processing unit Crossbar node cache exceeds the threshold, only high priority is allowed to enter the cross node cache, low priority can be lost, and when the low priority is lower than the threshold, both the high priority and the low priority can enter the cross node, so that Qos of the high priority can be ensured.

Another object of the present invention is to change input queue management into output queue management, which can greatly reduce the number of queues: let the bus number be B, the priority be F, and the queue output port be N, the total queue number q=b×f×n. Taking six buses, two ports per bus aggregate as an example, the number of input queue management queues=6×8×12, the number of output queue management queues=6×8×2, and the number of queues is reduced by 5 times. And buffer memory is used for isolation between the high-capacity exchange processing unit Crossbar and the output queue management, and the high-capacity exchange processing unit Crossbar and the output queue management outlining are higher in grouping efficiency.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Although the present application has been described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the figures, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the "a" or "an" does not exclude a plurality.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. A high-speed high-capacity aggregate cross-node output joint queuing switching system, comprising: the system comprises an input processing module, a high-capacity exchange processing unit and an output processing module;

2. The switching system of high-speed high-capacity aggregate cross node output joint queuing as claimed in claim 1, wherein said input processing module comprises an input polling module, a frame extraction module, a learning search module and a frame moving module; the output processing module comprises a pre-enqueuing module and an output queue management module;

3. The high-speed high-capacity aggregate cross-node output joint queuing switching system of claim 2, wherein the step of searching in the learning search module by means of four-stage pipelining comprises:

when a valid source MAC address is received, the pipeline starts to work;

4. The high-speed high-capacity aggregate cross-node output joint queuing switching system of claim 2, wherein the learning search module learns by means of four-stage pipelining, comprising:

when a valid MAC address is received, the pipeline starts to work;

5. The switching system of high-speed high-capacity aggregate cross-node output joint queuing as claimed in claim 2, wherein said frame shifting module is specifically configured to:

when receiving the destination port number of the learning table look-up module, taking the destination port number, the source port number, the multicast mark, the priority, the storage section number, the frame length and the frame type as frame information;

storing the frame information into own frame information fifo;

setting tags for the read packets;

and dividing the packet carrying the tag to a corresponding crossover node.

6. The high-speed high-capacity aggregate cross-node output joint queuing switching system of claim 2, wherein the high-capacity switching processing unit comprises a plurality of rows and a plurality of columns, each column having a column arbitration module, the rows and columns crossing to form cross-nodes;

each crossover node for:

7. The switching system of high-speed high-capacity aggregate cross-node output joint queuing as claimed in claim 2, wherein the pre-enqueuing module is specifically configured to:

when an enqueue ready signal is received, caching fifo according to frame length read data, and when the corresponding packet is read out, reading the frame length fifo to obtain next packet frame length information; and moving the read packet onto the bus;