CN115379316B - Pipelined BENES network route solving hardware accelerating device - Google Patents
Pipelined BENES network route solving hardware accelerating device Download PDFInfo
- Publication number
- CN115379316B CN115379316B CN202210927907.7A CN202210927907A CN115379316B CN 115379316 B CN115379316 B CN 115379316B CN 202210927907 A CN202210927907 A CN 202210927907A CN 115379316 B CN115379316 B CN 115379316B
- Authority
- CN
- China
- Prior art keywords
- pipeline
- output port
- dyeing
- solving
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004043 dyeing Methods 0.000 claims description 91
- 238000013507 mapping Methods 0.000 claims description 71
- 239000013598 vector Substances 0.000 claims description 70
- 238000000034 method Methods 0.000 claims description 46
- 230000008569 process Effects 0.000 claims description 42
- 238000004364 calculation method Methods 0.000 claims description 12
- 241000257303 Hymenoptera Species 0.000 claims description 4
- 230000001133 acceleration Effects 0.000 claims 8
- 230000003287 optical effect Effects 0.000 abstract description 19
- 238000004519 manufacturing process Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 238000012790 confirmation Methods 0.000 description 4
- 238000010186 staining Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000003111 delayed effect Effects 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q11/00—Selecting arrangements for multiplex systems
- H04Q11/0001—Selecting arrangements for multiplex systems using optical switching
- H04Q11/0005—Switch and router aspects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q11/00—Selecting arrangements for multiplex systems
- H04Q11/0001—Selecting arrangements for multiplex systems using optical switching
- H04Q11/0005—Switch and router aspects
- H04Q2011/0007—Construction
- H04Q2011/0035—Construction using miscellaneous components, e.g. circulator, polarisation, acousto/thermo optical
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q11/00—Selecting arrangements for multiplex systems
- H04Q11/0001—Selecting arrangements for multiplex systems using optical switching
- H04Q11/0005—Switch and router aspects
- H04Q2011/0052—Interconnection of switches
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q11/00—Selecting arrangements for multiplex systems
- H04Q11/0001—Selecting arrangements for multiplex systems using optical switching
- H04Q11/0062—Network aspects
- H04Q2011/0073—Provisions for forwarding or routing, e.g. lookup tables
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a pipelined BENES network route solving hardware accelerating device, which comprises a pipeline solver, a prediction enumerator and a high-speed buffer, wherein the pipeline solver carries out pipelined hardware solving according to route solving input, and the output of the pipeline solver is stored in the high-speed buffer; the prediction enumerator enumerates and generates different route states according to route solving input and inputs the different route states into the pipeline solver for solving; the cache compares the route solving input with the previously calculated result stored by the cache, directly outputs the corresponding result if the calculated result exists, and waits for the output result of the pipeline solver if the calculated result does not exist. The invention further reduces the solving time required by the current optimal route solving hardware to tens of nanoseconds or even lower, so that the BENES optical network can be truly used in a production environment.
Description
Technical Field
The invention relates to the technical field of route solving, in particular to a pipelined BENES network route solving hardware accelerating device.
Background
The Benes network is significantly better than the CrossBar network in the field of optical interconnects with its low requirements on the number of basic components and high average of path loss. On the basis of the number requirement of the 2x2 cross switches, the BENES network of 8x8 requires 20, the BENES network of 16x16 requires 56, and the BENES network of 32x32 requires 144; whereas the 8x8 cross bar network requires 64, 256 for 16x16 and 1024 for 32x 32. And, the losses of any two paths of the BENES network are almost the same (the same number of switch components are passed through), and the cross bar network is evenly distributed among 1 to 2N-1 switch components.
However, the routing computation of the Benes network is significantly slower than the cross Bar computation. For the cross bar, any two nodes are connected, and only the optical switch components on the intersection point of the corresponding lines are found according to the node numbers in the drawing, so that the state of the optical switch components is changed, the time complexity is O (1), and the space complexity is negligible. For the BENES network, it is not practical to perform the table lookup operation by caching the whole result, the solution space is too large (the solution of uncompressed 16×16 BENES can occupy several GB of storage space |), and therefore a route solution algorithm is generally adopted. The parallel algorithm commonly adopted in the prior academic research is an try-roll algorithm, the upper limit of time complexity cannot be guaranteed, and the algorithm variants such as matrix and Boolean operation are not changed in the nature of try-roll, but the space complexity is increased in a free way, so that the hardware implementation is not facilitated. However, crossBar has two fatal disadvantages that make it unsuitable for use in the field of optical interconnects: firstly, the path loss is seriously uneven, so that the requirement on the path loss of an optical switch is too high; and secondly, the requirement on the number of the optical switches is too high. Together, the two result in that a cross bar network with a slightly large scale is difficult to ensure that the path is smooth.
The prior art provides a hardware-friendly BENES network solution, realizes a solution algorithm for determining time delay, and reduces the time delay for solving a 16x16 network to 200ns magnitude (the ms magnitude when the time delay is slowest). However, the solution process is still full of data dependence, meaning that faster solutions can only be achieved by increasing the dominant frequency, which cannot be increased without limit, so that the solution time cannot always come within 100 ns. The state switching of the advanced high-performance electro-optical switch in the industry can be completed in ps-magnitude time, and obviously, no route solving algorithm matched with the state switching can be provided, so that the performance of the state switching cannot be fully exerted.
Disclosure of Invention
The invention provides a pipelined BENES network route solving hardware accelerating device, which effectively covers calculation delay and reduces response time.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a pipelined BENES network route solving hardware accelerating device comprises a pipeline solver, a prediction enumerator and a cache, wherein
The pipeline solver solves the pipeline hardware according to the route solving input, and the output of the pipeline solver is stored in the cache;
the prediction enumerator enumerates and generates different route states according to route solving input and inputs the different route states into the pipeline solver for solving;
the cache compares the route solving input with the previously calculated result stored by the cache, directly outputs the corresponding result if the calculated result exists, and waits for the output result of the pipeline solver if the calculated result does not exist.
The scheme utilizes the pipeline hardware solving circuit with far excessive computing power to accelerate solving, so that a large number of solving tasks can be completed in a short time. And (3) carrying out predictive route solving calculation in advance by utilizing a caching mechanism of a cache, and storing calculation results for quick inquiry when receiving an actual request so as to mask calculation time delay. And generating a large number of prediction routes by using a high hit rate prediction strategy specific to the prediction enumerator, and further improving the hit rate by matching with pipeline hardware solution.
Preferably, the route solving input is a forward mapping vector, and is used for quickly inquiring an input port to obtain a corresponding output port; the forward mapping vector can be calculated to obtain a reverse mapping vector, and the reverse mapping vector is used for quickly querying the output port to obtain a corresponding input port.
Preferably, the pipeline solver uses pipelined hardware to solve, for the BENES network to divide into a plurality of stages, the input port of each stage of BENES network is a left edge node, the output port is a right edge node, the left edge node and the right edge node are connected by the next stage of BENES network, each stage is divided into two stages, the reverse mapping vector and the dyeing are respectively generated, the reverse mapping value of each input and output is generated, only 1 pipeline period is needed, dyeing is realized, each path of dyeing through one input or output needs to consume 1 period, the parts can not be completed in parallel due to the existence of data dependency, therefore, the BENES network of NxN has N paths of input and N paths of output, the result is connected into a result collector after each stage of solving is finished, the result collector is also designed in a pipeline form, and the result collector combines and outputs the node states obtained by solving in the same period and outputs the node states into a cache together.
Preferably, the forward mapping vector is represented by a series of sequences, the sequence number of each element in the sequence represents the corresponding output port number, and the value of each element represents the input port number connected with the output port number corresponding to the sequence number; the calculation of the reverse mapping vector is specifically as follows:
if the value of each item of the reverse mapping vector is denoted by b [ i ], i is the sequence number thereof, and the value of each item of the forward mapping vector is denoted by a [ j ], j is the sequence number thereof, then the following are:
b[a[j]]=j。
preferably, the dyeing is a process of marking each path of input/output ports, and when the upper path of input/output port of a node is dyed to be a first state color, the node should be set to be in a crossing state; if the input or output port of the upper way is dyed to the second state color, the node should be set to be in parallel state, and the pipeline of each stage needs to obtain the forward mapping vector, the reverse mapping vector, the port dyed by the upper stage pipeline and the color dyed by the upper stage pipeline from the upper stage pipeline.
Preferably, the specific process of dyeing is as follows:
the process is circularly carried out along the order of right-left-right until all edge node links are dyed:
right-left: selecting any one of the output ports of the right edge node, setting the dyeing state of the output port, and inquiring the reverse mapping vector to obtain a corresponding input port, so that the dyeing state of the input port is consistent with the dyeing state of the selected output port;
left-left: searching the other path of input port of the left edge node where the input port inquired in the right-left step is located, and setting the other path of input port to have a dyeing state different from that of the input port inquired in the right-left step.
Left-right: querying a forward mapping vector to obtain an output port corresponding to the input port found in the left-left step, and setting the output port to have a dyeing state consistent with the input port found in the left-left step;
right-right: searching another path of output port of the right edge node where the output port inquired in the left-right step is located, and setting the output port to have a dyeing state different from that of the output port inquired in the left-right step.
Preferably, the specific process of dyeing is as follows:
the process is circularly carried out along the order of right-left until all edge node links are dyed:
right-right: selecting any one output port of the right edge node, setting a dyeing state of the output port, searching another output port of the right edge node where the output port is located, and setting the output port to have a dyeing state different from that of the output port;
right-left: inquiring the reverse mapping vector to obtain an input port corresponding to the other output port found in the right-right step, so that the dyeing state of the input port is consistent with the dyeing state of the other output port found in the right-right step;
left-left: searching another path of input port of the left edge node where the input port inquired in the right-left step is located, and setting the input port to have a dyeing state different from that of the input port inquired in the right-left step;
left-right: querying the forward mapping vector obtains an output port corresponding to the other input port found in the "left-left" step, and sets the output port to have a dyeing state consistent with the other input port found in the "left-left" step.
Preferably, if a certain round of dyeing process accesses a node which has been accessed, that is, the link has been dyed in the previous dyeing process, at this time, the node is skipped, and a node which has not been dyed is selected to continue the dyeing process.
Preferably, the prediction enumerator is configured to generate a set for a given routing state s, where the elements of the set are each generated by s, by exchanging values of only any two output ports in s, and the prediction enumerator generates a different routing state for each cycle and sends the different routing state to the pipeline solver for solving until all the routing states are generated.
Preferably, the cache component is configured to cache a result obtained by calculation by the pipeline solver, and adopts a key-value query mode, and uses a forward mapping vector corresponding to the routing state as a key, and a switch state set of the Benes switch array as a value.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention further reduces the solving time required by the current optimal route solving hardware to tens of nanoseconds or even lower, so that the BENES optical network can be truly used in a production environment.
Drawings
FIG. 1 is a schematic view of the apparatus of the present invention.
Fig. 2 is a schematic diagram of a 2×2 optical switch and a switching state thereof according to an embodiment.
Fig. 3 is a schematic diagram of an 8x 8Benes network provided by an embodiment.
Fig. 4 is an edge solution schematic of an 8×8 bees network provided in an embodiment.
Fig. 5 is a flow chart of edge solution dyeing for an 8x 8Benes network provided by the examples.
Fig. 6 is a schematic diagram of confirming a node switch state of an 8×8Benes network according to an embodiment.
FIG. 7 is a schematic diagram of a solution pipeline of an 8×8BENES network provided by an embodiment.
FIG. 8 is a timing diagram of a solution pipeline for an 8×8BENES network, according to one embodiment.
Fig. 9 shows an optical switch state confirmation method corresponding to the left edge node.
Fig. 10 shows an optical switch state confirmation method corresponding to the right edge node.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;
it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
Example 1
The present embodiment provides a pipelined BENES network route solving hardware accelerating device, as shown in FIG. 1, including a pipeline solver, a prediction enumerator, and a cache, wherein
The pipeline solver solves the pipeline hardware according to the route solving input, and the output of the pipeline solver is stored in the cache;
the prediction enumerator enumerates and generates different route states according to route solving input and inputs the different route states into the pipeline solver for solving;
the cache compares the route solving input with the previously calculated result stored by the cache, directly outputs the corresponding result if the calculated result exists, and waits for the output result of the pipeline solver if the calculated result does not exist.
Example 2
The present embodiment continues to disclose the following on the basis of embodiment 1:
the route solving input is a forward mapping vector and is used for rapidly inquiring an input port to obtain a corresponding output port; the forward mapping vector can be calculated to obtain a reverse mapping vector, and the reverse mapping vector is used for quickly querying the output port to obtain a corresponding input port.
The pipeline solver adopts pipeline hardware to solve, for the BENES network, the input port of each stage of BENES network is a left edge node, the output port of each stage of BENES network is a right edge node, the left edge node and the right edge node are connected by the next stage of BENES network, each stage of BENES network is divided into two stages, reverse mapping vectors and staining are respectively generated for each path of input and output reverse mapping values, only 1 pipeline period is needed, on the staining realization, each path of input or output staining is needed, 1 period is needed, and due to the fact that data dependency exists, the parts cannot be completed in parallel, therefore, the NxN BENES network has N paths of input and N paths of output, the result is needed to complete the staining, after each stage of solving is completed, the result is accessed into a result collector which is also designed in a pipeline form, the result collector stores the result into a pipeline queue, and the node states of each stage of solving are combined and output in the same period and are output to a cache.
The forward mapping vector is represented by a series of sequences, the serial number of each element in the sequence represents a corresponding output port number, and the value of each element represents the input port number connected with the output port number corresponding to the serial number; the calculation of the reverse mapping vector is specifically as follows:
if the value of each item of the reverse mapping vector is denoted by b [ i ], i is the sequence number thereof, and the value of each item of the forward mapping vector is denoted by a [ j ], j is the sequence number thereof, then the following are:
b[a[j]]=j。
the dyeing is a process of marking each path of input/output ports, and when the upper path of input/output port of a node is dyed to be in a first state color, the node is set to be in a crossing state; if the input or output port of the upper way is dyed to the second state color, the node should be set to be in parallel state, and the pipeline of each stage needs to obtain the forward mapping vector, the reverse mapping vector, the port dyed by the upper stage pipeline and the color dyed by the upper stage pipeline from the upper stage pipeline. In this embodiment, the first status color is black, and the second status color is white.
The specific process of dyeing is as follows:
the process is circularly carried out along the order of right-left-right until all edge node links are dyed:
right-left: selecting any one of the output ports of the right edge node, setting the dyeing state of the output port, and inquiring the reverse mapping vector to obtain a corresponding input port, so that the dyeing state of the input port is consistent with the dyeing state of the selected output port;
left-left: searching the other path of input port of the left edge node where the input port inquired in the right-left step is located, and setting the other path of input port to have a dyeing state different from that of the input port inquired in the right-left step.
Left-right: querying a forward mapping vector to obtain an output port corresponding to the input port found in the left-left step, and setting the output port to have a dyeing state consistent with the input port found in the left-left step;
right-right: searching another path of output port of the right edge node where the output port inquired in the left-right step is located, and setting the output port to have a dyeing state different from that of the output port inquired in the left-right step.
In another embodiment, the specific process of dyeing is as follows:
the process is circularly carried out along the order of right-left until all edge node links are dyed:
right-right: selecting any one output port of the right edge node, setting a dyeing state of the output port, searching another output port of the right edge node where the output port is located, and setting the output port to have a dyeing state different from that of the output port;
right-left: inquiring the reverse mapping vector to obtain an input port corresponding to the other output port found in the right-right step, so that the dyeing state of the input port is consistent with the dyeing state of the other output port found in the right-right step;
left-left: searching another path of input port of the left edge node where the input port inquired in the right-left step is located, and setting the input port to have a dyeing state different from that of the input port inquired in the right-left step;
left-right: querying the forward mapping vector obtains an output port corresponding to the other input port found in the "left-left" step, and sets the output port to have a dyeing state consistent with the other input port found in the "left-left" step.
If a certain round of dyeing process accesses the accessed node, that is, the link is dyed in the previous dyeing process, at this time, the node is skipped, and a node which is not dyed is selected to continue the dyeing process.
In a specific embodiment, as shown in FIG. 3, the BENES solution of 8x8 is divided into three total stages, 16x16 into four stages, each of which is divided into two stages: reverse mapping vectors and stains are generated. The process of generating the reverse mapping vector is used for quick call of a subsequent dyeing process, and the dyeing is to sequentially traverse all edge nodes according to a specific mapping rule until all dyeing is completed. The state of the optical switch corresponding to 2x2 can be confirmed according to the dyeing result.
In fig. 3, for the beam network of 8x8, the node circled by the dashed box 1 is an edge node, and referring to fig. 4, it can be seen that all edge nodes are actually equivalent to those connected to two beam networks of 4x 4. Note that for each edge node, its two inputs (or outputs) are connected to two different 4x4 Benes networks, respectively, and each 4x4 Benes network has only 1 connection with all edge nodes. Thus, a path is constructed from any one edge node to a designated opposite edge node, with two and only two alternative paths, namely going up the 4x4 Benes network, or going down.
It is also noted that for the left edge node, whose two inputs are assigned to the two outputs on the right, the 4x4 Benes network that it selects is mutually exclusive, i.e., if one path selects an upper path, the other path can only select a lower path. The Benes network has been proven to be a re-arrangeable non-blocking all-interconnect network, i.e. any ingress port can be connected to any egress port of itself in a specific configuration, and an 8x 8BENES supports 8-way data paths running in parallel. Therefore, the switch state of any one edge node is specified, the switch states of all other edge nodes can be deduced in turn by using the above conclusion (connection characteristic, mutual exclusion characteristic) chain, and the input and output requirements of two 4x4 BENES networks inside the switch state can be deduced by the way.
After the state confirmation of the edge node is completed, the state confirmation of the edge node of the internal sub-network is performed. It is readily apparent that the entire solution process is recursive, as shown in fig. 7 and 8. The edge node has only two paths of input and output ports, and the two paths of input and output ports are required to be marked correspondingly, so that the edge node meets the mutual exclusion characteristic of the switch. The process of marking each input/output port is called dyeing. The node switch has only two states, so the dyeing corresponds to two states: black and white. The following is specified: the input port or the output port of the upper path of one node is dyed to be black, and the node is in a crossed state; if the input or output port of the upper path is dyed white, the nodes should be placed in parallel as shown in fig. 6, 9 and 10.
In the actual operation process, the first port of the output port is usually selected as the start, and is dyed black, and the subsequent dyeing work is completed.
How to associate the left edge node with the right edge node requires the use of forward and reverse mapping vectors. The forward mapping vector is represented by a series of sequences, the sequence number of each element in the sequence represents its corresponding output port number, and the value of each element represents its connected input port number. A set of forward mapping vectors, as denoted as [7,2,8,4,3,1,6,5] for an 8x 8Benes network, means that input port 1 is connected to output port 7, input port 2 is connected to output port 2, … … input port 8 is connected to output port 5. Obviously, by inquiring through the sequence number, the mapping relation with the right edge node can be obtained quickly under the condition that the number of the left edge node is known.
Thus, the forward mapping vector [7,2,8,4,3,1,6,5] corresponds to the reverse mapping vector [6,2,5,4,8,7,1,3], meaning that output port 1 is connected to input port 6, output port 2 is connected to input port 2, and … … output port 8 is connected to input port 3.
The dyeing process is always fixed because the dyeing process is always cycled along the right-left-right order until all edge node links are dyed.
After all the nodes are dyed, the on-off state (parallel or crossed) of the edge node can be determined by only inquiring the dyeing states of two paths of ports contained in the edge node, as shown in fig. 9 and 10; and meanwhile, according to the determined switch state, the input sequence number and the output sequence number of the connected upper sub-network and the lower sub-network can be further determined. Thus, the known conditions for solving the edge nodes of the subnetwork are complete, and further solving can be performed.
When solving proceeds to the most central 2x2 node, the dyeing process for the edge node is the dyeing process for itself. After the solution is finished, the recursion is terminated because no corresponding BENES subnetwork exists, and the previous calculation results are collected and integrated to obtain a switch state set of all the optical switches.
In terms of hardware implementation, only 1 pipeline period is needed for generating the reverse mapping value of each path of input and output, and the reverse mapping values of different paths of input and output can be generated in parallel, so that only 1 period is needed for generating the reverse mapping vector.
In the dyeing implementation, each dyeing process with one input or output needs to consume 1 period, and the parts cannot be completed in parallel due to data dependence, so that the BENES network of NxN has N inputs and N outputs, and the dyeing process needs 2N periods.
After fully considering the succinct factors of the pipeline design, as the dyeing flow is always fixed, it is easy to know that each dyeing flow is only related to the current flow number and the color of the dyed node in the last flow, and the node information of the dyed node is required to be transferred to the next dyeing flow. Therefore, the stage pipeline can complete dyeing only by acquiring the following information from the previous stage pipeline:
1. forward mapping vectors;
2. reverse mapping vector;
3. ports dyed by the previous stage pipeline;
4. the color of the upper-stage pipeline dyeing.
To facilitate the transfer of the dyeing information, the information 4 is expanded into two vectors, the first vector marks the dyeing information of the corresponding port, and the second vector marks whether the corresponding port has been dyed.
The two steps of determining the input and output sequence numbers connected to the upper sub-network and the lower sub-network and determining the switch state of the edge node are all combinational logic, and can be additionally performed after the last dyeing process without occupying a flowing period.
Note that the recursive solution of the subnets, there is no data dependency between different subnets and therefore parallel solutions are possible. The flow chart is shown in fig. 7.
The generation of the reverse mapping vector can be changed into a 1-stage pipeline through hardware parallel optimization, and the dyeing process needs 2-stage pipelines per one node. In order to ensure that the dyeing process only has data dependence between adjacent pipelines, the pipelines can transmit the current traversed node position, the traversed state of all nodes, the traversed result of all nodes, the forward mapping vector and the reverse mapping vector information to the next stage pipeline so as to ensure that the current pipeline can independently calculate the state of the next traversed node to be placed and transmit the increment of the state to the next stage pipeline. Thus, when the last stage pipeline finishes the work, the dyeing process is completely finished, and the stage pipeline outputs a complete dyeing result, namely a state vector of the edge node. From the state vector, it can be determined what state (parallel or cross) each edge's 2x2 optical switch needs to be placed.
Since the 2Nx2N Benes network is built from two NxN Benes networks (see FIG. 4), after the edge optical switch state is determined, the problem becomes a solution to the two internal NxN Benes networks. The same method is used for recursion solution until the most central 2x2 optical switch state is also solved.
In the solving process, after each layer of solution is finished, a result collector is accessed, and the collector is also designed in a pipeline form and stores the result into a pipeline queue. The correctly set queue length can ensure that the node states of the outer layer, the middle layer and the innermost layer can be output in the same period. Such as: the solution result of the outermost layer 8x8 needs to be delayed for a certain period, so that the solution result is output in the same period when the solution of the innermost layer 2x2 is completed. It is easy to know that the solution of the edge node for completing the 4x4 needs to consume 1+2x4=9 cycles, and the solution for completing the 2x2 needs 1+2x2=5 cycles, so the queue length for temporarily storing the solution result of the outermost layer 8x8 should be set to 9+5 =14, i.e. after the solution result is input from the head of the queue, the solution is delayed by 14 cycles and taken out from the tail of the queue. Similarly, the length of the temporary storage queue of the 4x4 solving result should be set to be 5 periods. See fig. 7.
In another embodiment, the dyeing process is fine-tuned from "right-left", "left-right", "right-right" to "right-right", "right-left", "left-right". Because the calculation logic of dyeing is very simple, the original two-stage pipeline stage is integrated into a one-stage pipeline stage (dyeing two ports instead of one port at a time and the two ports for dyeing are positioned in the same edge node, see fig. 5, which is equivalent to finishing dyeing steps (4) and (5) simultaneously in the one-stage pipeline), the time is still shorter than that of the pipeline stage of generating the reverse mapping vector, and therefore, the shorter pipeline stage number and the shorter solving time are realized. After the new pipeline optimization flow is applied, the pipeline length of 8x8 edge node solution is shortened from 1+2x8=17 to 1+8=9 stages. The pipeline stages required for 16x16 complete solution also change from 64 stages to 34 stages.
The original 16x16 pipeline solver without any optimization has a pipeline stage of 64 stages, and can achieve a clock frequency of 350MHz on the XilinxVC707 FPGA, that is, the clock period of each stage is about 3ns. From this it can be concluded that calculating a single route takes 192ns and calculating 65 routes consecutively takes 384ns, achieving a far excess of computational performance.
Assuming that the response time of each route solution has been successfully shortened to tens of ns, the route solution algorithm that occurs at high frequency tends to be: two outlet ports of the BENES optical switching apparatus are selected to switch their data outputs. Namely: assuming the original ingress 1→egress 5, ingress 2→egress 8, exchanging the data output of egress 5 and egress 8 means that the expected result is: inlet 1→outlet 8, inlet 2→outlet 5.
Obviously, if only the data outputs of the two outlets are exchanged, the decompression space can be greatly compressed. For a 16x16 BENES optical switching device, the solution space for the above operation is only 16x 15/2=120. Obviously, with a pipelined hardware solver, it takes less than 600ns to complete enumeration of the entire solution space.
Considering the transmission scenario using optical interconnects, the data throughput tends to be large, which means that an already established link is typically not quickly re-reconfigured into a new link, i.e. the pipelined hardware solver always has sufficient time to complete the solution space enumeration, guaranteeing the hit rate.
Example 3
This example continued to disclose the following on the basis of examples 1 and 2:
the prediction enumerator is used for generating a set for a given routing state s, the elements of the set are all generated by s, the method is to exchange the values of any two output ports in s only, and each cycle of the prediction enumerator generates a different routing state and gives the different routing state to the pipeline solver for solving until all the routing states are generated.
The high-speed buffer component is used for buffering the result obtained by calculation of the pipeline solver, adopts a key-value query mode, takes a forward mapping vector corresponding to the routing state as a key, and takes a switch state set of the BENES switch array as a value.
In order to cooperate with the pipeline solver to fully exert the excess computing power, the prediction enumerator and the cache component need to be synchronously realized.
The prediction enumerator is used for generating a set for a given routing state s, and the elements of the set are all generated by s, by exchanging the values of any two output ports in s only. For a 16x16 BENES network, it is known that there are 16x 15/2=120 such exchanges, and 119 such exchanges are excluded. A hardware prediction enumerator is constructed, and in the future 119 clock cycles, each cycle generates one such different routing state that is passed to the pipeline solver for solution.
The cache component is used to cache results computed by the pipeline solver. And adopting a key-value query mode, taking a forward mapping vector corresponding to the routing state as a key, and taking a switch state set of the BENES switch array as a value.
The overall structure diagram is shown in fig. 1, and the route solving input (forward mapping vector) is synchronously fed into the pipeline solver, the prediction enumerator and the cache. The cache firstly compares the previously calculated results stored by the cache according to the results, and if the previously calculated results exist, the results are directly output, so that the long time consumption of waiting for a solver to solve the results is avoided (the cache can inquire and return the results only by a few nanoseconds). If no match is found, the cache returns a "miss" message, at which point the system waits for the pipeline solver to solve for the corresponding result and return it. Since the pipeline solver can accept the new solution request already in the next clock cycle of the route solution input, the prediction enumerator outputs the prediction solution (predicted forward mapping vector) generated by the prediction enumerator to the pipeline solver, and does so in each clock cycle thereafter until the complete prediction sequence is enumerated. All the results solved by the pipeline solver are synchronously output to the cache for temporary storage for subsequent rapid matching.
The same or similar reference numerals correspond to the same or similar components;
the terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;
it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.
Claims (9)
1. A pipelined BENES network route solving hardware accelerating device is characterized by comprising a pipeline solver, a prediction enumerator and a cache, wherein
The pipeline solver carries out pipeline hardware solving according to the route solving input or different route state inputs generated by the prediction enumerator, and the output of the pipeline solver is stored in the cache;
the prediction enumerator enumerates and generates different route states according to route solving input and inputs the different route states into the pipeline solver for solving;
the cache compares the route solving input with the previously calculated result stored by the cache, if the calculated result exists, the corresponding result is directly output, and if the calculated result does not exist, the output result of the pipeline solver is waited;
the pipeline solver adopts pipeline hardware to solve, the BENES network is divided into a plurality of stages, the input port of each stage of BENES network is a left side edge node, the output port is a right side edge node, the left side edge node and the right side edge node are connected by the next stage of BENES network, each stage is divided into two stages, namely, generation of reverse mapping vectors and dyeing, the result is connected into a result collector after the completion of the solution of each stage, the result collector is also designed in a pipeline form, the result is stored into a pipeline queue, the result collector combines and outputs node states of each stage obtained by the solution in the same period and outputs the node states to a cache together, and the reverse mapping vectors are used for quickly searching the output port to obtain the corresponding input port.
2. The pipelined bees network route solving hardware acceleration device of claim 1, wherein the route solving input is a forward mapping vector for fast querying an input port to obtain a corresponding output port; the forward mapping vector may be calculated as a reverse mapping vector.
3. The hardware acceleration apparatus for solving a path of a pipeline-type beies network according to claim 2, wherein the forward mapping vector is represented by a series of sequences, a sequence number of each element in the sequence represents a corresponding output port number, and a value of each element represents an input port number to which the output port number corresponding to the sequence number is connected; the calculation of the reverse mapping vector is specifically as follows:
if the value of each item of the reverse mapping vector is denoted by b [ i ], i is the sequence number thereof, and the value of each item of the forward mapping vector is denoted by a [ j ], j is the sequence number thereof, then the following are:
b[a[j]]=j。
4. the hardware acceleration apparatus for solving a path of a pipeline-type Benes network according to claim 3, wherein the dyeing is a process of marking each path of input/output ports, and when an input port or an output port of an upper path of a node is specified to be dyed to be in a first state color, the node should be set to be in a crossing state; if the input or output port of the upper way is dyed to the second state color, the node should be set to be in parallel state, and the pipeline of each stage needs to obtain the forward mapping vector, the reverse mapping vector, the port dyed by the upper stage pipeline and the color dyed by the upper stage pipeline from the upper stage pipeline.
5. The pipelined bees network route solving hardware acceleration device of claim 4, wherein the specific process of dyeing is:
the process is circularly carried out along the order of right-left-right until all edge node links are dyed:
right-left: selecting any one of the output ports of the right edge node, setting the dyeing state of the output port, and inquiring the reverse mapping vector to obtain a corresponding input port, so that the dyeing state of the input port is consistent with the dyeing state of the selected output port;
left-left: searching another path of input port of the left edge node where the input port inquired in the right-left step is located, and setting the input port to have a dyeing state different from that of the input port inquired in the right-left step;
left-right: querying a forward mapping vector to obtain an output port corresponding to the input port found in the left-left step, and setting the output port to have a dyeing state consistent with the input port found in the left-left step;
right-right: searching another path of output port of the right edge node where the output port inquired in the left-right step is located, and setting the output port to have a dyeing state different from that of the output port inquired in the left-right step.
6. The pipelined bees network route solving hardware acceleration device of claim 4, wherein the specific process of dyeing is:
the process is circularly carried out along the order of right-left until all edge node links are dyed:
right-right: selecting any one output port of the right edge node, setting a dyeing state of the output port, searching another output port of the right edge node where the output port is located, and setting the output port to have a dyeing state different from that of the output port;
right-left: inquiring the reverse mapping vector to obtain an input port corresponding to the other output port found in the right-right step, so that the dyeing state of the input port is consistent with the dyeing state of the other output port found in the right-right step;
left-left: searching another path of input port of the left edge node where the input port inquired in the right-left step is located, and setting the input port to have a dyeing state different from that of the input port inquired in the right-left step;
left-right: querying the forward mapping vector obtains an output port corresponding to the other input port found in the "left-left" step, and sets the output port to have a dyeing state consistent with the other input port found in the "left-left" step.
7. The hardware acceleration apparatus of claim 5 or 6, wherein if a certain round of dyeing process accesses a node that has been accessed, i.e. the link has been dyed in a previous dyeing process, the node is skipped and a node that has not been dyed is selected to continue the dyeing process.
8. The pipelined Benes network route solving hardware acceleration apparatus of claim 7, wherein the prediction enumerator is configured to generate a set for a given one of the route states s, where the elements of the set are each generated by s, by exchanging only the corresponding input ports of any two of the output ports in s, and the prediction enumerator generates a different route state for each cycle and gives the different route state to the pipeline solver for solving until all the route states are generated.
9. The pipelined Benes network route solving hardware acceleration device of claim 2, wherein the cache component is configured to cache the result obtained by calculation by the pipelined solver, and uses a key-value query mode, and uses a forward mapping vector corresponding to the route state as a key, and a switch state set of the Benes switch array as a value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210927907.7A CN115379316B (en) | 2022-08-03 | 2022-08-03 | Pipelined BENES network route solving hardware accelerating device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210927907.7A CN115379316B (en) | 2022-08-03 | 2022-08-03 | Pipelined BENES network route solving hardware accelerating device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115379316A CN115379316A (en) | 2022-11-22 |
CN115379316B true CN115379316B (en) | 2024-04-05 |
Family
ID=84062982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210927907.7A Active CN115379316B (en) | 2022-08-03 | 2022-08-03 | Pipelined BENES network route solving hardware accelerating device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115379316B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113938772A (en) * | 2021-09-29 | 2022-01-14 | 电子科技大学 | Multicast control method of optical switch array based on Benes structure |
CN114363251A (en) * | 2022-03-21 | 2022-04-15 | 之江实验室 | Low-complexity obstacle avoidance routing method and device for Benes network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9916274B2 (en) * | 2015-07-23 | 2018-03-13 | Cavium, Inc. | Apparatus and method for on-chip crossbar design in a network switch using benes network |
WO2020026142A1 (en) * | 2018-07-30 | 2020-02-06 | Telefonaktiebolaget Lm Ericsson (Publ) | Joint placement and chaining of virtual network functions for virtualized systems based on a scalable genetic algorithm |
-
2022
- 2022-08-03 CN CN202210927907.7A patent/CN115379316B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113938772A (en) * | 2021-09-29 | 2022-01-14 | 电子科技大学 | Multicast control method of optical switch array based on Benes structure |
CN114363251A (en) * | 2022-03-21 | 2022-04-15 | 之江实验室 | Low-complexity obstacle avoidance routing method and device for Benes network |
Also Published As
Publication number | Publication date |
---|---|
CN115379316A (en) | 2022-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4594666B2 (en) | Reconfigurable computing device | |
US4621359A (en) | Load balancing for packet switching nodes | |
JPH11259441A (en) | All-to-all communication method for parallel computer | |
CN100464323C (en) | Reconfigurable processor and semiconductor device | |
JP2001345813A (en) | Device and method for pipeline processing type shaping | |
CN115379316B (en) | Pipelined BENES network route solving hardware accelerating device | |
CN103336334B (en) | Optical switching system based on arrayed waveguide grating | |
Khan et al. | Machine-learning-aided abstraction of photonic integrated circuits in software-defined optical transport | |
CN117633527A (en) | Heterogeneous environment-oriented large model hybrid parallel training method and system | |
CN115379318B (en) | BENES network route speculative solving method and device | |
Lu et al. | Parallel routing algorithms for nonblocking electronic and photonic switching networks | |
CN101964747A (en) | Two-stage exchanging structure working method based on preposed feedback | |
CN115379319A (en) | BENES network route composite solving method and device based on prediction | |
CN107171973A (en) | A kind of two-stage switching fabric implementation method based on neighboring terminal mouthful schedule information | |
Żal | Energy-efficient optical switching nodes based on banyan-type switching fabrics | |
JP3863421B2 (en) | Data transfer combination determination method and combination determination circuit | |
CN117155843B (en) | Data transmission method, device, routing node, computer network and medium | |
US6128719A (en) | Indirect rotator graph network | |
WO2023065852A1 (en) | Service deployment method and related device | |
KR100238436B1 (en) | Composite banyan network architecture and creation method of routing tag using network symmetry | |
Zhang et al. | LOOP: A low-loss compact plasmonic router for ONoC | |
Yao et al. | Efficient O-type mapping and routing of large-scale neural networks to torus-based ONoCs | |
Luo et al. | New free-space multistage optical interconnection network and its matrix theory | |
Wang et al. | A Tale for Many: Integrated Control Mechanism of Optical Circuit Switching for Data Center and Distributed Deep Learning System | |
Oliver | A parallel Newton method for unconstrained optimisation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |