CN113783806B

CN113783806B - Shunt route jump method, device, medium, equipment and multi-core system applied by same

Info

Publication number: CN113783806B
Application number: CN202111014229.7A
Authority: CN
Inventors: 陈克林; 袁抗; 吕正祥; 杨力邝; 陈旭; 梁龙飞
Original assignee: Shanghai New Helium Brain Intelligence Technology Co ltd
Current assignee: Shanghai New Helium Brain Intelligence Technology Co ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2023-10-17
Anticipated expiration: 2041-08-31
Also published as: CN113783806A

Abstract

The application provides a method, a device, a medium, equipment and a multi-core system applied to the method, the device, the medium and the equipment for the shunt route jump, which comprise the steps of receiving a data packet to be forwarded; fault monitoring is carried out on all output ports of the current divider, and a plurality of output ports which are not used recently are screened out from the output ports without faults; and selecting an output port closest to a target address of the data packet to be forwarded from the screened output ports to be used for forwarding the data packet to be forwarded to the target address. The application realizes a high fault tolerance shunt routing algorithm; the diverter routing algorithm of the application not only ensures that the bandwidth is approximately and uniformly distributed on the output port, but also has shorter distance from the data packet to the target address.

Description

Shunt route jump method, device, medium, equipment and multi-core system applied by same

Technical Field

The application relates to the technical field of routing algorithms, in particular to a method, a device, a medium, equipment and a multi-core system applied by a shunt routing jump method, a device, a medium and equipment.

Background

Deep Neural Network (DNN) studies have been rapidly developed and initially applied in recent years. In order to improve classification accuracy, deep neural networks are increasingly complex. Neural networks in excess of 1000 layers have emerged. When the performance of a single chip is insufficient, a task is completed by parallel operation of a plurality of chips.

The mesh (mesh) array is a typical inter-chip core interconnection mode, multiple cores form a rectangle, each core can directly communicate with four adjacent cores in southwest and northwest, and each core can communicate with any other core in the array through forwarding of the cores in the array. The grid in the chip is often connected with a aggregator, and data is sent out of the chip after passing through the aggregator. The data input by the chip is often connected with a splitter, and the data is sent to grids in the chip after passing through the splitter. After the splitter receives the data packet input by the chip, the data packet needs to be forwarded to different output ports of the splitter through a routing algorithm.

A typical diverter routing algorithm is a round robin algorithm: the first output port forwards the packet, the second output port forwards the packet, … the last output port forwards the packet, the first output port then forwards the packet, and so on. However, the general splitter algorithm does not take into account whether each output port is available or not, nor does it take into account the delay of the packet on the trellis.

Accordingly, there is a need in the art for a diverter routing algorithm that combines fault tolerance, bandwidth balancing, and low latency.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, an object of the present application is to provide a method, an apparatus, a medium, a device and a multi-core system using the same for solving the technical problem that the current diverter routing algorithm cannot achieve the fault tolerance, the bandwidth balance and the low delay.

To achieve the above and other related objects, a first aspect of the present application provides a method for forwarding a splitter in a multi-core system, including: receiving a data packet to be forwarded; fault monitoring is carried out on all output ports of the current divider, and a plurality of output ports which are not used recently are screened out from the output ports without faults; and selecting an output port closest to a target address of the data packet to be forwarded from the screened output ports to be used for forwarding the data packet to be forwarded to the target address.

In some embodiments of the first aspect of the present application, the plurality of recently unused output ports includes an unused output port during a last packet transmission process or an unused output port during a last packet transmission process.

In some embodiments of the first aspect of the present application, N output port routing results are recorded using an out_hist register set of length N, a cur_out_hist_index pointer is used to point to the out_hist register to be updated currently, and the following steps are performed: assigning Neff fault-free ports to a corresponding number of out_hist registers according to a certain sequence, and assigning cur_out_hist_index pointers according to a preset initial value; output port information used by the last M forwarding packets has been stored into registers out_hist ((cur_out_hist_index-1)% Neff) to out_hist ((cur_out_hist_index-m+1)% Neff), respectively; removing the output ports stored in the M registers from the N output ports, and removing the fault ports to obtain (Neff-M) output ports; calculating the distance between all (Neff-M) output ports and the target address of the data packet to be forwarded, and selecting the output port j closest to the output port j to forward the data packet; and the register pointed to by the cur_out_hist_index pointer is assigned j, the cur_out_hist_index pointer is updated to (cur_out_hist_index+1)% Neff); wherein,% operation refers to a remainder operation.

To achieve the above and other related objects, a second aspect of the present application provides a splitter route jump device in a multi-core system, including: the data receiving module is used for receiving the data packet to be forwarded; the port monitoring module is used for carrying out fault monitoring on all output ports of the current divider; the port screening module is used for screening a plurality of output ports which are not used recently from the output ports without faults; and the port selection module is used for selecting an output port closest to the target address of the data packet to be forwarded from the screened output ports so as to forward the data packet to be forwarded to the target address.

To achieve the above and other related objects, a third aspect of the present application provides a multi-core system, comprising: the grid array comprises a plurality of rows and columns of chip cores, and each chip core is in communication connection with the adjacent chip cores; a shunt connected to the grid array; the diverter comprises a plurality of output ports, wherein after the diverter receives a data packet to be forwarded from a chip, fault monitoring is carried out on all the output ports of the diverter, and a plurality of output ports which are not used recently are screened out of the output ports without faults; and selecting an output port closest to a target address of the data packet to be forwarded from the screened output ports to be used for forwarding the data packet to be forwarded to the target address.

To achieve the above and other related objects, a third aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of shunt routing jump in the multi-core system.

To achieve the above and other related objects, a fourth aspect of the present application provides a control apparatus comprising: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory, so that the terminal executes the diverter route jump method in the multi-core system.

As described above, the method, the device, the medium, the device and the multi-core system applied by the method, the device and the medium for the shunt route jump have the following beneficial effects: (1) The application realizes the high fault tolerance shunt routing algorithm. (2) The diverter routing algorithm of the application not only ensures that the bandwidth is approximately and uniformly distributed on the output port, but also has shorter distance from the data packet to the target address.

Drawings

Fig. 1 is a schematic diagram of a prior art shunt.

FIG. 2 is a flow chart of a splitter routing algorithm in a multi-core system according to an embodiment of the application.

FIG. 3 is a schematic diagram of a device for routing a splitter in a multi-core system according to an embodiment of the application

Fig. 4 is a schematic structural diagram of a control device according to an embodiment of the present application.

Detailed Description

Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

In the following description, reference is made to the accompanying drawings, which illustrate several embodiments of the application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "upper," and the like, may be used herein to facilitate a description of one element or feature as illustrated in the figures as being related to another element or feature.

In the present application, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," "held," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances.

Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, operations, elements, components, items, categories, and/or groups. It will be further understood that the terms "or" and/or "as used herein are to be interpreted as inclusive, or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a, A is as follows; b, a step of preparing a composite material; c, performing operation; a and B; a and C; b and C; A. b and C). An exception to this definition will occur only when a combination of elements, functions or operations are in some way inherently mutually exclusive.

In order to make the objects, technical solutions and advantages of the present application more apparent, further detailed description of the technical solutions in the embodiments of the present application will be given by the following examples with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

As shown in fig. 1, a schematic structural diagram of a prior art shunt is shown. And after the flow divider receives the data packet sent by the chip, forwarding the data packet to different output ports of the flow divider by using a routing algorithm. The typical current diverter routing algorithm is a round-robin algorithm, the algorithm principle is: firstly, forwarding a data packet by a first output port, then forwarding the data packet by a second output port, and the like until the last output port forwards the data packet, so as to finish a round trip; immediately after the completion of the previous round, the next round starts, and the first output port forwards the data packet, thus the cycle is repeated.

For example, if the destination address of the chip input packet is core (1, 3), and if the round-robin algorithm is adopted and the output port currently being turned on is out5 port, 6 hops are required to transmit the packet from out5 port to core (1, 3), the transmission path may be, for example: a core (5, 5) -a core (4, 5) -a core (3, 5) -a core (2, 5) -a core (1, 4) -a core (1, 3), or a core (5, 5) -a core (5, 4) -a core (5, 3) -a core (4, 3) -a core (3, 3) -a core (2, 3) -a core (1, 3), and the like. It should be appreciated that there are multiple hop paths from core (5, 5) to core (1, 3) and are not limited to the examples described above.

The round-robin algorithm has the advantage of balanced bandwidth, namely each output port shares a certain flow to make the flow uniform and avoid grid congestion, but the routing algorithm does not consider whether each output port is available or not and does not consider delay of a data packet on a grid. For example, in the above example, the out5 port is not actually the closest port to the target address core (1, 3), simply because the round robin algorithm is currently just toggling to the out5 port, resulting in a large number of hops and thus a significant increase in latency.

In view of this, the present application provides a high-efficiency splitting algorithm applied in a multi-core system, which can give consideration to fault tolerance, bandwidth balance and low delay, and the splitting algorithm provided by the present application will be further explained below.

FIG. 2 is a flow diagram illustrating a flow diagram of a splitter routing algorithm in a multi-core system in accordance with one embodiment of the present application. It should be appreciated that the shunt routing algorithm of the present embodiment may be applied to controllers such as ARM (Advanced RISC Machines) controller, FPGA (Field Programmable Gate Array) controller, soC (System on Chip) controller, DSP (Digital Signal Processing) controller, or MCU (Micorcontroller Unit) controller, etc.; the method can also be applied to equipment such as smart phones, tablet computers, desktop computers, notebook computers, smart bracelets, smart helmets and the like; the method is also applicable to servers which can be arranged on one or more entity servers according to various factors such as functions, loads and the like, and can also be formed by distributed or centralized server clusters.

In this embodiment, the splitter routing algorithm in the multi-core system mainly includes steps S21 to S23, and the implementation process and the principle of each step will be explained and described in detail below.

Step S21: and receiving the data packet to be forwarded. For example, the splitter needs to receive a data packet from a master device connected to the splitter and learn the destination address of the data packet.

Step S22: and performing fault monitoring on all output ports of the current divider, and screening a plurality of output ports which are not used recently from the output ports without faults.

In some examples, if it is detected that the output port is a failed port, the master device stores failure information after testing for the splitter to avoid forwarding from the failed port when forwarding the data packet using the routing algorithm. In addition, the monitoring mode of the fault port is that, for example, test data is sent to the output port to be monitored first, and if feedback data of the output port is not received or the feedback data is inaccurate within a preset time period, it can be determined that the output port has a fault. It should be understood that the above examples are provided for illustrative purposes and should not be construed as limiting; in fact, any technical solution in the prior art that can be used for fault monitoring of ports can be applied in the present application.

In some examples, the plurality of output ports that are not used recently are selected from the non-failure output ports, which may be output ports that are not used in the transmission process of the last data packet, or output ports that are not used in the transmission process of the last data packet.

Step S23: and selecting an output port closest to a target address of the data packet to be forwarded from the screened output ports to be used for forwarding the data packet to be forwarded to the target address.

For example, assuming that the splitter has N output ports, the master device finds that K output ports have faults (K < N) after fault monitoring of the N output ports, the number of valid available output ports is: neff=n-K. In addition, M output ports are used in the latest packet forwarding process, and these used output ports are removed, so the number of output ports that are actually valid and not used is: neff-M. And selecting the port with the shortest distance from the target address from the effective and unused (Neff-M) ports as a data packet forwarding port.

In some examples, the implementation of the splitter routing algorithm in the multi-core system is as follows: the out_hist register set of length N is used to record N output port routing results. Using the cur_out_hist_index pointer to point to the out_hist register currently to be updated, the following two steps are then performed:

step 1, initializing an out_hist register and a cur_out_hist_index pointer. Specifically, all the non-fault ports are assigned to a corresponding number of out_hist registers in a certain sequence, and the cur_out_hist_index pointers are assigned according to preset initial values. For example, all non-faulty ports may be assigned to the first Neff out_hist registers in ascending order, and the cur_out_hist_index pointer initialized to 0.

And 2, forwarding the data packet by using a routing algorithm.

out_hist ((cur_out_hist_index-1)% Neff), out_hist ((cur_out_hist_index-2)% Neff), … out_hist ((cur_out_hist_index-m+1)% Neff) and the like store the output ports used by the most recent M forwarding packets. The ports stored in the M registers are removed from the N output ports, and the ports with faults are removed at the same time to obtain an alternative port list, wherein (Neff-M) ports are arranged in the list. The ports perform distance operation with the target address in the data packet, and the port j with the shortest distance is selected as the current output port. While the out_hist register pointed to by cur_out_hist_index is assigned j, cur_out_hist_index is updated to (cur_out_hist_index+1)% Neff. The above% means a remainder operation or a modulo operation.

For ease of understanding, we now combine fig. 1 and assume that n= 6,K =1, m=3, and the output 2 port is a failed port. First, an out_hist register and a cur_out_hist_index pointer are initialized, out_hist (0 …) = [0,1,3,4,5], cur_out_hist_index=0; and secondly, forwarding the data packet by using a routing algorithm.

The 1 st packet arrives, assuming that its destination address is (dest_y=2, dest_x=4). out_hist (2..4) holds the last M used ports, so output ports 3,4,5 are not alternatives, the remaining output ports cull out the failed port 2, so only output ports 0,1 are available alternatives. Since the output port 1 is closer to the destination address (dest_y=2 and dest_x=4) than the output port 0, the current routing result is the output port 1, and the 1 st data packet is forwarded from the out1 port to the destination address. In addition, the register out_hist and the pointer cur_out_hist_index are updated respectively, and the update results are as follows: out_hist (0 …) = [1,1,3,4,5], cur_out_hist_index=1.

The 2 nd packet arrives assuming its destination address is (dest_y=3, dest_x=2). out_hist (3..4) and out_hist (0) hold the last M used ports, so output ports 4,5, 1 are not alternatives, the remaining output ports reject the failed port 2, only ports 0, 3 are alternatives. Since the output port 3 is closer to the destination address (dest_y=3, dest_x=2) than the output port 0, the current routing result is the output port 3, and the 2 nd packet is forwarded from the out3 port to the destination address. In addition, the register out_hist and the pointer cur_out_hist_index are updated respectively, and the update results are as follows: out_hist (0 …) = [1,3,3,4,5], cur_out_hist_index=2.

The 3 rd packet arrives, assuming that its destination address is (dest_y=0, dest_x=5). out_hist (4) and out_hist (0 … 1) hold the last M used ports, so output ports 5, 1,3 are not alternatives, the remaining ports cull the failed port 2, only ports 0,4 are alternatives. Since the output port 0 is closer to the destination address (dest_y=0, dest_x=5) than the output port 4, the current routing result is the output port 0, and the 3 rd packet is forwarded from the out0 port to the destination address. In addition, the register out_hist and the pointer cur_out_hist_index are updated respectively, and the update results are as follows: out_hist (0 …) = [1,3,0,4,5], cur_out_hist_index=3.

In some examples, when there is and only one output port is active and not in use, the routing algorithm described above is degraded to a highly fault tolerant round robin algorithm, i.e. when m=neff-1, only one output port is actually optional. When m=0, neff output ports participate in distance calculation together, and the routing algorithm is degraded into a shortest path algorithm or a minimum delay algorithm with high fault tolerance.

The application can realize the shunt routing algorithm with high fault tolerance by using the shunt routing algorithm in the multi-core system, and the shunt routing algorithm not only ensures the approximately uniform distribution of the bandwidth on the output port, but also has shorter distance from the data packet to the target address.

Referring to FIG. 3, a schematic diagram of a bypass route hopping device in a multi-core system according to an embodiment of the present application is shown. The shunt routing hopping apparatus 300 in the present embodiment includes: a data receiving module 301, a port monitoring module 302, a port screening module 303 and a port selecting module 304.

The data receiving module 301 is configured to receive a data packet to be forwarded; the port monitoring module 302 is configured to perform fault monitoring on all output ports of the splitter; the port screening module 303 is configured to screen a number of output ports that are not used recently from the output ports that have no fault; the port selection module 304 is configured to select, from the screened output ports, an output port closest to a destination address of the data packet to be forwarded, for forwarding the data packet to be forwarded to the destination address.

It should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated when actually implemented. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the port monitoring module may be a processing element which is set up separately, may be implemented in a chip of the above apparatus, or may be stored in a memory of the above apparatus in the form of program codes, and may be called by a processing element of the above apparatus to execute the functions of the port monitoring module. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more microprocessors (digital signal processor, abbreviated as DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

As shown in fig. 4, a schematic structural diagram of a control device in an embodiment of the present application is shown. The control device provided in this example includes: a processor 41, a memory 42, a communicator 43; the memory 42 is connected to the processor 41 and the communicator 43 via a system bus and performs communication with each other, the memory 42 is used for storing a computer program, the communicator 43 is used for communicating with other devices, and the processor 41 is used for running the computer program to cause the control device to execute the respective steps of the shunt routing jump method in the multi-core system as above.

The system bus mentioned above may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The system bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface is used to enable communication between the database access apparatus and other devices (e.g., clients, read-write libraries, and read-only libraries). The memory may comprise random access memory (Random Access Memory, RAM) and may also comprise non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method of shunt routing hops in the multi-core system.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

In the embodiments provided herein, the computer-readable storage medium may include read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, U-disk, removable hard disk, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. In addition, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable and data storage media do not include connections, carrier waves, signals, or other transitory media, but are intended to be directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

The application also provides a multi-core system, which comprises a grid array and a shunt; the grid array comprises a plurality of rows and columns of chip cores, and each chip core is in communication connection with the adjacent chip cores; a shunt connected to the grid array; the diverter comprises a plurality of output ports, wherein after the diverter receives a data packet to be forwarded from a chip, fault monitoring is carried out on all the output ports of the diverter, and a plurality of output ports which are not used recently are screened out of the output ports without faults; and selecting an output port closest to a target address of the data packet to be forwarded from the screened output ports to be used for forwarding the data packet to be forwarded to the target address. Since the implementation of this embodiment is similar to the above embodiments, a detailed description thereof will be omitted.

In summary, the present application provides a method, a device, a medium, a device and a multi-core system using the same for shunt routing jump, which implements a shunt routing algorithm with high fault tolerance; the diverter routing algorithm of the application not only ensures that the bandwidth is approximately and uniformly distributed on the output port, but also has shorter distance from the data packet to the target address. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. A method for diverter route hopping in a multi-core system, comprising:

receiving a data packet to be forwarded;

fault monitoring is carried out on all output ports of the current divider, and a plurality of output ports which are not used recently are screened out from the output ports without faults;

selecting an output port closest to a target address of the data packet to be forwarded from the screened output ports, and forwarding the data packet to be forwarded to the target address;

the method includes the steps of recording N output port routing results by using an out_hist register set with a length of N, pointing to an out_hist register to be updated currently by using a cur_out_hist_index pointer, and executing the following steps: assigning Neff fault-free ports to a corresponding number of out_hist registers according to a certain sequence, and assigning cur_out_hist_index pointers according to a preset initial value; output port information used by the latest M forwarding data packets is respectively stored into registers out_hist (cur_out_hist_index-1)% Neff) to out_hist (cur_out_hist_index-M+1)% Neff; removing the output ports stored in the M registers from the N output ports, and removing the fault ports to obtain (Neff-M) output ports; calculating the distance between all (Neff-M) output ports and the target address of the data packet to be forwarded, and selecting the output port j closest to the output port j to forward the data packet; and the register pointed to by the cur_out_hist_index pointer is assigned j, the cur_out_hist_index pointer is updated to (cur_out_hist_index+1)% Neff); wherein,% operation refers to a remainder operation.

2. The method of claim 1, wherein the plurality of recently unused output ports comprises an unused output port during a last packet transfer or an unused output port during a last packet transfer.

3. A splitter route hopping apparatus in a multi-core system, comprising:

the data receiving module is used for receiving the data packet to be forwarded;

the port monitoring module is used for carrying out fault monitoring on all output ports of the current divider;

the port screening module is used for screening a plurality of output ports which are not used recently from the output ports without faults;

the port selection module is used for selecting an output port closest to a target address of the data packet to be forwarded from the screened output ports so as to forward the data packet to be forwarded to the target address; recording N output port routing results using a length N out_hist register set, pointing to the currently to-be-updated out_hist register using a cur_out_hist_index pointer, and performing the following steps: assigning Neff fault-free ports to a corresponding number of out_hist registers according to a certain sequence, and assigning cur_out_hist_index pointers according to a preset initial value; output port information used by the latest M forwarding data packets is respectively stored into registers out_hist (cur_out_hist_index-1)% Neff) to out_hist (cur_out_hist_index-M+1)% Neff; removing the output ports stored in the M registers from the N output ports, and removing the fault ports to obtain (Neff-M) output ports; calculating the distance between all (Neff-M) output ports and the target address of the data packet to be forwarded, and selecting the output port j closest to the output port j to forward the data packet; and the register pointed to by the cur_out_hist_index pointer is assigned j, the cur_out_hist_index pointer is updated to (cur_out_hist_index+1)% Neff); wherein,% operation refers to a remainder operation.

4. A multi-core system, comprising:

the grid array comprises a plurality of rows and columns of chip cores, and each chip core is in communication connection with the adjacent chip cores;

a shunt connected to the grid array;

the diverter comprises a plurality of output ports, wherein after the diverter receives a data packet to be forwarded from a chip, fault monitoring is carried out on all the output ports of the diverter, and a plurality of output ports which are not used recently are screened out of the output ports without faults; selecting an output port closest to a target address of the data packet to be forwarded from the screened output ports, and forwarding the data packet to be forwarded to the target address; the method includes the steps of recording N output port routing results by using an out_hist register set with a length of N, pointing to an out_hist register to be updated currently by using a cur_out_hist_index pointer, and executing the following steps: assigning Neff fault-free ports to a corresponding number of out_hist registers according to a certain sequence, and assigning cur_out_hist_index pointers according to a preset initial value; output port information used by the latest M forwarding data packets is respectively stored into registers out_hist (cur_out_hist_index-1)% Neff) to out_hist (cur_out_hist_index-M+1)% Neff; removing the output ports stored in the M registers from the N output ports, and removing the fault ports to obtain (Neff-M) output ports; calculating the distance between all (Neff-M) output ports and the target address of the data packet to be forwarded, and selecting the output port j closest to the output port j to forward the data packet; and the register pointed to by the cur_out_hist_index pointer is assigned j, the cur_out_hist_index pointer is updated to (cur_out_hist_index+1)% Neff); wherein,% operation refers to a remainder operation.

5. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the method of shunt routing hopping in a multi-core system of claim 1 or 2.

6. A control apparatus, characterized by comprising: a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to execute the computer program stored in the memory, so that the device performs the method for branching the router in the multi-core system according to claim 1 or 2.