WO2014183530A1 - 任务分配方法、任务分配装置及片上网络 - Google Patents
任务分配方法、任务分配装置及片上网络 Download PDFInfo
- Publication number
- WO2014183530A1 WO2014183530A1 PCT/CN2014/075655 CN2014075655W WO2014183530A1 WO 2014183530 A1 WO2014183530 A1 WO 2014183530A1 CN 2014075655 W CN2014075655 W CN 2014075655W WO 2014183530 A1 WO2014183530 A1 WO 2014183530A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- chip
- idle
- rectangular area
- threads
- processor cores
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 238000010586 diagram Methods 0.000 description 21
- 238000002347 injection Methods 0.000 description 13
- 239000007924 injection Substances 0.000 description 13
- 239000000243 solution Substances 0.000 description 12
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000001788 irregular Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 229920006395 saturated elastomer Polymers 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G06F9/5088—Techniques for rebalancing the load in a distributed system involving task migration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7825—Globally asynchronous, locally synchronous, e.g. network on chip
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
Definitions
- Embodiments of the present invention relate to on-chip multi-core network technologies, and in particular, to a task allocation method, a task distribution device, and an on-chip network.
- NoC Network-on-Chip
- the communication between the processor cores of different threads running the same task is affected by the data flow of other tasks, and the Quality of Service (QoS) is not guaranteed.
- QoS Quality of Service
- the method of subnetting is usually adopted, that is, the data flow belonging to the same task is limited to a specific area of the NoC.
- FIG. 1 is a schematic diagram of a task allocation method based on a routing algorithm in the prior art. As shown in Figure 1, if other on-chip routers in task five need to communicate with Dest, they need to go through the same link, which may cause link congestion and affect network throughput.
- the embodiments of the present invention provide a task allocation method, a task distribution apparatus, and an on-chip network, which are used to solve the problems of high hardware overhead and low network throughput of the task allocation method based on the routing algorithm in the prior art.
- an embodiment of the present invention provides a task allocation method, including:
- the thread of the to-be-processed task is allocated to the idle processor core, where Each of the idle processor cores allocates one thread.
- the rectangular area expanded by the non-rectangular area is a smallest rectangular area in the network on which the non-rectangular area is included.
- the determining is continuous, in the on-chip network formed by the multi-core processor After the number of threads matching the plurality of idle processor cores, the method further includes:
- the threads of the to-be-processed task are respectively allocated to the idle processor core, wherein each processor core allocates one thread.
- the on-chip network includes a line Multiple processor cores arranged in columns;
- determining, in the on-chip network formed by the multi-core processor, a plurality of idle processor cores that match the number of the threads including:
- a plurality of idle processor cores that match the number of threads are determined in an on-chip network formed by a multi-core processor.
- the searching and determining the rectangular area expanded by the non-rectangular area comprises: determining, by the adjacent on-chip routers along the on-chip router connected to the initial idle processor core, whether there is a continuous, and the thread a number of idle processor cores that match the number;
- the phase along the same line as the on-chip router connected to the initial idle processor sequentially determines successive second free areas such that the sum of the number of processor cores in the first free area and the number of processor cores in the second free area is equal to the number of threads.
- the searching and determining the rectangular area expanded by the non-rectangular area comprises: sequentially determining whether there is a continuous, and the thread along a neighboring on-chip router in the same column as the on-chip router connected to the initial idle processor core a number of idle processor cores that match the number;
- the phase along the on-chip router connected to the initial idle processor core sequentially determines a continuous fourth free area such that the sum of the number of processor cores in the third free area and the number of processor cores in the fourth free area is equal to the number of threads.
- an embodiment of the present invention provides a task distribution apparatus, including:
- a first determining module configured to determine a number of threads included in the task to be processed
- a second determining module configured to determine, in an on-chip network formed by the multi-core processor, a plurality of idle processor cores that are equal in number to the number of threads, wherein each of the idle processor cores is connected to an on-chip router;
- a third determining module configured to: when the second determining module determines that the area formed by the on-chip router connected to the idle processor core is a non-rectangular area, search and determine in the on-chip network by the non-rectangular area a rectangular area in which a rectangular area extends;
- An allocation module configured to: if the predicted traffic of each on-chip router connected to the non-idle processor core in the rectangular area determined by the third determining module does not exceed a preset threshold, the thread of the to-be-processed task Allocating to the idle processor core, wherein each of the idle processor cores allocates one thread.
- the third determining module is specifically configured to:
- Determining a rectangular area expanded by the non-rectangular area is a smallest rectangular area including the non-rectangular area in the on-chip network.
- the allocating module is further configured to:
- the on-chip routers connected to the plurality of idle processor cores formed by the second determining module form a rectangular area, respectively, the threads of the to-be-processed tasks are respectively allocated to the idle processor cores, wherein each The processor core allocates a thread.
- the second determining module is specifically configured to:
- a plurality of idle processor cores that match the number of threads are determined in an on-chip network formed by a multi-core processor.
- the second determining module is specifically configured to be connected to the initial idle processor core
- the adjacent on-chip routers of the on-chip router peers sequentially determine whether there are consecutive idle processor cores that match the number of threads;
- the phase along the same line as the on-chip router connected to the initial idle processor sequentially determines successive second free areas such that the sum of the number of processor cores in the first free area and the number of processor cores in the second free area is equal to the number of threads.
- the second determining module is specifically configured to:
- a neighboring on-chip router in the same column along the on-chip router connected to the initial idle processor core sequentially determines whether there are consecutive idle processor cores that match the number of threads;
- the phase along the on-chip router connected to the initial idle processor core sequentially determines a continuous fourth free area such that the sum of the number of processor cores in the third free area and the number of processor cores in the fourth free area is equal to the number of threads.
- the dispensing device also includes:
- a prediction module configured to perform on-chip routing according to the non-idle processor core in the rectangular area
- the historical traffic information of the device predicts the traffic of the on-chip router connected to the non-idle processor core in the rectangular area to obtain the predicted traffic.
- an embodiment of the present invention further provides an on-chip network, including a plurality of processor cores, an on-chip router, and an interconnect, and the task distribution apparatus of any of the above.
- the task allocation method, the task distribution device, and the on-chip network determine, by determining the number of threads included in the task to be processed, determining, in the on-chip network, a plurality of idle processor cores that match the number of required threads.
- a rectangular area by means of a border-on-chip router adjacent to the non-rectangular area, an on-chip router connected to the idle processor core in the non-rectangular area forms a regular rectangular area, and then determines that the rectangular area is connected to the non-idle processor core
- the on-chip router that is, whether the traffic of the router on the border slice exceeds a preset threshold. If not, the task to be processed is allocated to the processor core of the free area.
- the task allocation method provided by the embodiment of the present invention, when the idle processor core resource in the network on the chip is equal to or more than the processor core required for the task to be processed, if there is no rule rectangular area to allocate the to-be-processed task, By means of the border router, the non-rectangular area is formed into a regular rectangular area and the task to be processed is allocated. In the rectangular area, the routing table is not needed to determine the routing mechanism of the data packet from the source on-chip router to the target on-chip router, but instead XY routing delivers packets to avoid network congestion and increase network throughput.
- FIG. 1 is a schematic diagram of a task allocation method based on a routing subnet in the prior art
- Embodiment 1 of a task assignment method according to the present invention
- FIG. 3 is a schematic diagram of an on-chip network according to Embodiment 2 of the task assignment method of the present invention.
- FIG. 4A is a schematic diagram of a three-chip network according to an embodiment of a task assignment method according to the present invention.
- FIG. 4B is a schematic diagram of re-searching for a rectangular area in FIG. 4A;
- FIG. 5A is a diagram of analyzing a task allocation method and a routing-based method according to the present invention by using a random uniform traffic model. Schematic diagram of the task assignment method of the network;
- FIG. 5B is a schematic diagram of analyzing a task allocation method of the present invention and a task allocation method based on a routing subnet by using a bit comparison traffic model;
- FIG. 5C is a schematic diagram of analyzing a task assignment method and a route subnet-based task assignment method according to the tornado flow model
- Embodiment 1 of a task distribution device is a schematic structural diagram of Embodiment 1 of a task distribution device according to the present invention.
- Embodiment 7 is a schematic structural diagram of Embodiment 2 of a task distribution device according to the present invention.
- FIG. 8 is a schematic structural diagram of Embodiment 3 of the task distribution device of the present invention.
- FIG. 2 is a flowchart of Embodiment 1 of a task assignment method according to the present invention.
- the execution subject of this embodiment is a task assignment device which can be integrated in an on-chip network composed of a multi-core processor, which can be, for example, any processor in an on-chip network or the like.
- This embodiment is applicable to a scenario in which the idle processor core resources in the on-chip network are equal to or more than the processor cores required for the task to be processed.
- the embodiment includes the following steps:
- the task assignment device determines the number of threads included in the task to be processed. In general, the number of threads a task contains is the same as the number of processor cores needed to process the task. For example, if a task contains 9 threads, then 9 processor cores are needed to process the task.
- 102 Determining a plurality of idle processor cores in an amount equal to the number of threads in an on-chip network formed by a multi-core processor, wherein each idle processor is connected to an on-chip router.
- the on-chip network features simultaneous access, high reliability, and high reusability. It consists of multiple processor cores, on-chip routers, and interconnects (channels). Where the interconnect includes an on-chip router Internal interconnect between the processor core and the external interconnect between the on-chip routers. Each processor core is connected to an on-chip router. Each on-chip router is interconnected into a mesh structure (Mesh Topology, hereinafter referred to as mesh). ). In this step, after determining the number of threads included in the task to be processed, according to the number of threads, the task distribution device determines, in the multi-core processor configuration and the on-chip network, a plurality of idle processor cores that are equal in number to the number of threads. And the corresponding on-chip router.
- mesh Mesh Topology
- the task distribution device determines a continuous number of idle processor cores equal to the number of threads, if the area formed by the on-chip routers connected to the idle processor cores is a rectangular area, the tasks to be processed are directly The included threads are allocated to the idle processor core, and each processor core is assigned a thread; otherwise, if the area formed by the on-chip routers connected to the idle processor cores is a non-rectangular area, then searching and determining to expand by the non-rectangular area Rectangular area, which is the smallest rectangular area in the network on the network that contains non-rectangular areas. For example, a NoC is a 5 x 5 mesh structure, and the number of threads to be processed includes five.
- the 5 threads included in the task to be processed are allocated to the 5 consecutive idle processor cores, and each processor core is assigned one thread; if the number of consecutive idle processor cores determined is 5, but 5
- the area formed by the on-chip routers connected by the processor cores is a non-rectangular area, that is, the area formed by the irregularly shaped area, and the task assigning means determines the rectangular area including the non-rectangular area, that is, the non-idle processing of the assigned task
- the core and the free processor core in the non-rectangular region form a regular rectangular region.
- the five processor cores are composed of the first two of the first row and the first two of the second row, and the on-chip router connected by the third processor core of the second row is used as a border-on-chip router.
- the task distribution device determines a rectangular area formed by the on-chip routers connected to the five processor cores and the routers on the boundary slice.
- consecutive idle processor cores may have different combinations
- non-rectangular regions formed by on-chip routers connected to the processor core may also have various possible forms, such as L-type, E. Type, F type, I type, I type, etc.
- rectangular areas containing the non-rectangular area also have various possible forms.
- the routing table is not required to determine the routing mechanism of the data packet from the router on the source chip to the router on the target chip, but the data packet is transmitted by using the XY route, that is, after determining the router on the source chip and the destination on-chip router.
- the data packet is first transmitted horizontally to the intermediate chip router that intersects the column where the destination chip router is located, and then vertically transmitted to the destination on-chip router; or, the vertical direction is transmitted to the destination on-chip router.
- the lines intersect on the intermediate chip router and then pass horizontally to the destination on-chip router.
- the task assigning means predicts the on-chip router connected to the non-idle processor core in the rectangular area according to the historical flow information of the on-chip router connected to the non-idle processor core in the rectangular area.
- the traffic receives the predicted traffic, and determines whether the predicted traffic exceeds a preset threshold. If the predicted traffic does not exceed the preset threshold, the threads included in the to-be-processed task are respectively allocated to the idle processor core.
- the task allocation method provided by the embodiment of the present invention determines a non-rectangular area formed by a plurality of idle processor cores in an amount equal to the number of required threads in the network on the chip by determining the number of threads included in the task to be processed.
- a border router of a non-rectangular area adjacent to the area forming a regular rectangular area with the on-chip router in the non-rectangular area, and then determining whether the on-chip router connected to the non-idle processor core in the rectangular area, that is, the traffic of the router on the boundary slice If the preset threshold is exceeded, if not exceeded, the pending task is assigned to the processor core of the free area.
- the task allocation method provided by the embodiment of the present invention, when the idle processor core resource in the network on the chip is equal to or more than the processor core required for the task to be processed, if there is no rule rectangular area to allocate the to-be-processed task, Extending the non-rectangular area into a regular rectangular area by means of the border router and allocating the task thread to be processed, in which the routing table is not required to determine the routing mechanism of the data packet from the source on-chip router to the target on-chip router, ⁇ Routing data packets by means of XY routing avoids the problem of high hardware overhead, low network throughput, and low system utilization of other task allocation methods.
- NoC consists of an on-chip router and interconnects (channels), each processor core is connected to an on-chip router, and the number of threads included in a task and the number of processor cores required to process the task are - -corresponding. Therefore, the number of threads included in the task to be processed, the number of processor cores required for the task to be processed, and the number of on-chip routers connected to the required processor core are equal, and the processor core is in the same state as the on-chip router: The status or assigned task status, searching for an idle on-chip router, ie searching for an idle processor core.
- the on-chip routers are shown in the on-chip network shown in the following various views.
- the on-chip network includes multiple processor cores arranged in rows and columns, such as a 5 x 5 on-chip network, including 5 rows and 5 columns of 25 processor cores and 25 on-chip routers.
- the initial idle processor core may be determined in an on-chip network formed by a multi-core processor, and the adjacent on-chip routers along with the on-chip router connected to the initial idle processor core are sequentially determined to determine whether there is continuity.
- FIG. 3 is a schematic diagram of a second on-chip network according to an embodiment of the task assignment method of the present invention. As shown in FIG.
- NoC is 5 ⁇ 5 NoC
- multiple processor cores are arranged in a row and column
- task 1 (4) is to be processed in the task queue, indicating that task one includes 4 threads, and needs to be allocated 4
- the processor core handles this task. Randomly determining that the processor core connected to the on-chip router R1.1 is the initial idle processor core, and determining the consecutive four idle on-chip routers in sequence along the adjacent on-chip routers that are connected with the on-chip router connected to R1.
- R1.1, R1.2, R1.3, and R1.4 have a total of 4 on-chip routers.
- the four on-chip routers form a first free area, which is a regular rectangular area, and directly allocates 4 threads included in task one.
- FIG. 4A is a schematic diagram of a three-chip network according to an embodiment of a task assignment method according to the present invention. As shown in FIG. 4A, in this embodiment, the NoC is 5 x 5 NoC, and the plurality of processor cores are arranged in a row and column.
- the port indicates a high-load on-chip router, indicating a low-load on-chip router, and the port indicates an idle on-chip router, that is, R1 .1 ⁇ R1 .3, R2.1 ⁇ R2.4 and R3.1 - R3.4 are high-load on-chip routers, Rs0.1 ⁇ Rs0.6 are low-load on-chip routers, and the rest are idle routers.
- the method for determining the load on the on-chip router can be set according to requirements. For example, when the traffic carried by an on-chip router is greater than a preset threshold, it is determined to be a high-load on-chip router.
- task 2 (5) is to be processed in the task queue, indicating that task 2 includes 5 threads, and 5 processor cores need to be allocated to process the task.
- the first free area of the four idle on-chip routers does not match the number of threads of the task, that is, the number of processor cores included in the first free area does not satisfy the number of processor cores required for the task.
- searching for the second free area from the column where R5.0 is located when the sum of the number of processor cores in the second free area and the number of processor cores included in the first free area is equal to 5, that is, the first After searching for R5.4 in the two free areas, the number of idle on-chip routers is equal to the number of threads, and R5.0 ⁇ R5.4, Rs0.1, Rs0.2 and the high-loaded on-chip router R1.1 constitute a rule. Rectangular area.
- traffic of the on-chip router connected to the non-idle processor core in the rectangular area exceeds a preset threshold.
- traffic of the on-chip router connected to the non-idle processor core in the rectangular area may be predicted according to historical traffic information of the on-chip router connected to the non-idle processor core in the rectangular area.
- R1.1 Taking R1.1 as an example, if task 2 is assigned to the rectangular area of the rule determined by the first free area and the second free area, the traffic originally carried by R 1.1 is as indicated by the thick black arrow 1 in the figure. Increased traffic after being assigned task two, as shown by the thick black arrow 2 As shown, if the sum of the two does not exceed the preset threshold, it is considered that R1.1 can be shared, and it is determined that task 2 can be assigned to the processor core included in the rectangular area, as shown by the dotted line in the figure. In this case, the data packet is in the rectangular area, and the data packet is transmitted by means of XY routing.
- R5.2 is the source on-chip router and the destination on-chip router is R5.4
- XY route can be used to start the packet from R5.2 and pass through R5.2, R5.1, and R5.0.
- R5.4 can also be transferred from R5.2 to R5.4 through Rs0.2 and Rs0.1; otherwise, if task 2 is assigned to the rectangular area defined by the first free area and the second free area If the traffic originally carried by R1.1 is more than the preset threshold after the task 2 is assigned, it is considered that R1.1 cannot be shared, and it is determined that task 2 cannot be assigned to the rectangular area. Processor core.
- FIG. 4B is a schematic diagram of re-searching for a rectangular area in FIG. 4A.
- the third free area is searched from R5.0, that is, along the first column where R5.0 is located, three free slices including R5.0, R5.4, and R5.5 are searched.
- the third free area of the router does not match the number of threads of the task. That is, the number of processor cores included in the third free area does not satisfy the processor core required by the task.
- the row continues to search for the fourth free area, when the sum of the number of processor cores in the fourth free area and the number of processor cores included in the third free area is equal to 5, that is, R5 is searched in the fourth free area. 1.
- the number of idle on-chip routers is equal to the number of threads, and R5.0, R5.4, R5.5, R5.1, R5.2, and the second row and the fourth row are four.
- the low-load on-chip routers Rs0.1, Rs0.2, Rs0.3, and Rs0.4 form a regular rectangular area.
- the task 2 is assigned to the rectangular area of the rule determined by the third free area and the fourth free area, the shared traffic of the original Rs0.1, Rs0.2, Rs0.3, and Rs0.4 is The traffic added after the task 2 is assigned does not exceed the preset threshold. If the traffic of the four shared on-chip routers does not exceed the preset threshold, task 2 is assigned to the processor included in the rectangular area.
- the core as shown by the dashed box in Figure 4B; otherwise, if the traffic carried by one of them exceeds the preset threshold, it means that task 2 cannot be assigned to the rectangular area, and the processor core needs to be searched again. In the above embodiment, if all the irregular areas cannot pass the traffic prediction, the task 2 just waits for the next task scheduling in the waiting queue. If you release more processor cores after other tasks are processed, re-search for the processor cores required by the task.
- the first free area, the second free area, the third free area, and the fourth free area may be regular rectangular areas, or may be irregular rectangular areas, for example, as shown in FIG. 4A.
- the continuous, idle on-chip routers searched from R5.0 include R5.0, R5.1, 5.2, R5.4, and Rs0.1;
- Rs0.2 is an idle on-chip router.
- the continuous, idle on-chip routers searched from R5.0 include R5.0 and R5.1. , R5.2, Rs0.2, R5.3.
- FIG. 5A is a schematic diagram of analyzing a task assignment method and a route subnet-based task assignment method according to the present invention by using a random uniform traffic model.
- the abscissa is the injection rate, which can be understood as the utilization rate of the processor core of the on-chip network
- the ordinate is the delay time; representing the injection rate and the delay time of the present invention.
- Corresponding curve; A represents that the utilization rate of the processor core of the entire on-chip network is not high when the injection rate is 0 ⁇ 6 ⁇ - 3 , as shown in FIG. 5A.
- the technical solution of the present invention and the existing one are The delay times of the technical solutions are substantially equal.
- the technical solution of the present invention corresponding to the same delay duration is different from the existing technical solution.
- the injection is performed.
- the larger the rate the larger the delay time, which means that the performance of the on-chip network is worse, that is, the delay is significantly increased, and the network throughput is low.
- the injection rate is larger, and the delay time is relatively increased. , indicating that the performance of the on-chip network is high, that is, the delay rise is not obvious, and the network throughput is high.
- FIG. 5 is a schematic diagram of analyzing a task assignment method and a route subnet-based task assignment method according to the present invention using a bit comparison traffic model.
- the abscissa is the injection rate, which can be understood as the utilization of the processor core of the on-chip network;
- the ordinate is the delay time; represents the corresponding curve of the injection rate and the delay time of the present invention;
- A represents the corresponding curve of the injection rate and the delay time in the task allocation method based on the routing subnet.
- FIG. 5C is a schematic diagram of analyzing a task assignment method and a route subnet-based task assignment method according to the tornado flow model.
- the abscissa is the injection rate, which can be understood as the utilization rate of the processor core of the on-chip network;
- the ordinate is the delay time; representing the injection rate of the present invention.
- the corresponding curve with the delay duration represents the corresponding curve of the injection rate and the delay duration in the task allocation method based on the routing subnet.
- the injection rate is more than 4 ⁇ ⁇ ⁇ - 3 , the advantageous effects of the present invention can be clearly exhibited.
- Table 1 is a comparison table of system utilization under the rectangular subnet division method and system utilization under the router sharing method of the present invention in the case where the network load ratio is 0.5 ⁇ 1.
- the network load ratio of 0.5 ⁇ 1 indicates the ratio of the required processor core to the processor core that the system can actually provide.
- the column in Table 1 indicates: When the required processor core and the processor core that the system can actually provide When the ratio is 0.5, the system is in an unsaturated state.
- the system utilization based on the rectangular subnet division method is 0.478033, and in the method of router sharing according to the present invention, the system utilization is 0.465374, and the difference between the two is small. .
- the system utilization rate is 0.810507, and the difference between the two is nearly 10%.
- an idle processor core of the on-chip network is randomly used as the initial idle processor core, and the search is started from the initial idle processor core when the processor core needs to be searched again.
- the initial idle processor core may also be selected according to a preset rule, and the initial idle processor core may be different each time the search is performed.
- a certain processor core in a certain area can be randomly determined as the initial idle processor core.
- FIG. 6 is a schematic structural diagram of Embodiment 1 of a task distribution device according to the present invention.
- the task allocation apparatus provided in this embodiment may implement various steps of the method applied to the task distribution apparatus according to any embodiment of the present invention, and the specific implementation process is not described herein again.
- the task distribution apparatus provided in this embodiment specifically includes:
- the first determining module 1 1 is configured to determine the number of threads included in the task to be processed
- a second determining module 12 configured to determine, in an on-chip network formed by the multi-core processor, a plurality of idle processor cores that are equal in number to the number of threads, wherein each idle processor core is connected to an on-chip router;
- the allocating module 14 is configured to allocate the thread of the task to be processed to the idle if the predicted traffic of each on-chip router connected to the non-idle processor core in the rectangular area determined by the third determining module does not exceed the preset threshold.
- a processor core wherein each idle processor core allocates one thread.
- the task allocation apparatus determines, by the first determining module, the number of threads included in the task to be processed, and the second determining module determines, in the network on the chip, the number of idle processor cores equal to the number of required threads.
- the non-rectangular area by means of the border-on-chip router adjacent to the non-rectangular area, the on-chip router connected to the idle processor core in the non-rectangular area constitutes a regular rectangular area, and then the third determining module determines the rectangular area Connect with non-idle processors
- the on-chip router that is, the traffic of the router on the border slice exceeds the preset threshold. If not, the task to be processed is allocated to the processor core of the free area by the distribution module.
- the task allocation method provided by the embodiment of the present invention, when the idle processor core resource in the network on the chip is equal to or more than the processor core required for the task to be processed, if there is no rule rectangular area to allocate the to-be-processed task, By means of the border router, the non-rectangular area is formed into a regular rectangular area and the task to be processed is allocated. In the rectangular area, the routing table is not needed to determine the routing mechanism of the data packet from the source on-chip router to the target on-chip router, but instead The XY routing method transmits data packets, and the hardware saving avoids the problem that the task allocation method based on the routing subnet has large hardware overhead, low network throughput, and low system utilization.
- the third determining module 13 is specifically configured to:
- the rectangular area expanded by the non-rectangular area is the smallest rectangular area containing the non-rectangular area in the on-chip network.
- allocation module 14 is further configured to:
- the threads of the task to be processed are respectively allocated to the idle processor cores, wherein each processor core allocates one thread.
- the second determining module 12 is specifically configured to:
- an initial idle processor core in an on-chip network formed by a multi-core processor the on-chip network including a plurality of processor cores arranged in rows and columns;
- a plurality of idle processor cores that match the number of threads are determined in the on-chip network formed by the multi-core processor.
- the second determining module 12 is specifically configured to: sequentially determine whether there are consecutive idle processor cores that match the number of threads along the adjacent on-chip routers of the on-chip routers connected to the initial idle processor core;
- the adjacent on-chip routers in the same row as the on-chip router connected to the initial idle processor are sequentially determined to be consecutive.
- the second free area is such that the sum of the number of processor cores in the first free area and the number of processor cores in the second free area is equal to the number of threads.
- the second determining module 12 is specifically configured to: sequentially determine, according to the adjacent on-chip routers in the same column as the on-chip router connected to the initial idle processor core, whether there are consecutive idle processor cores that match the number of threads;
- the adjacent on-chip routers along the on-chip router connected to the initial idle processor core are sequentially determined to be consecutive.
- the fourth free area is such that the sum of the number of processor cores in the third free area and the number of processor cores in the fourth free area is equal to the number of threads.
- FIG. 7 is a schematic structural diagram of Embodiment 2 of a task distribution device according to the present invention.
- the task distribution apparatus provided in this embodiment is based on the apparatus shown in FIG. 6.
- the method further includes: a prediction module 15 for performing on-chip connection with a non-idle processor core according to a rectangular area.
- the historical traffic information of the router predicts the traffic of the on-chip router connected to the non-idle processor core in the rectangular area to obtain the predicted traffic.
- FIG. 8 is a schematic structural diagram of Embodiment 3 of the task distribution device of the present invention.
- the task distribution apparatus 800 of the present embodiment may include a processor 81 and a memory 82.
- the task distribution device 800 can also include a transmitter 83, a receiver 84. Transmitter 83 and receiver 84 can be coupled to processor 81.
- the memory 82 stores execution instructions. When the task distribution device 800 is running, the processor 81 communicates with the memory 82, and the processor 81 calls the execution instructions in the memory 82 for performing the following operations:
- the task assignment device 800 determines the number of threads included in the task to be processed
- the determined area formed by the on-chip router connected to the idle processor core is a non-rectangular area, searching for and determining a rectangular area expanded by the non-rectangular area in the on-chip network;
- the thread of the task to be processed is allocated to the idle processor core, where each idle processor The core allocates a thread.
- the rectangular area expanded by the non-rectangular area is the smallest rectangular area in the network on the chip that includes the non-rectangular area.
- the method further includes:
- the threads of the task to be processed are respectively allocated to the idle processor core, wherein each processor core allocates one thread.
- the on-chip network includes multiple processor cores arranged in rows and columns;
- determining a plurality of idle processor cores that match the number of threads in the on-chip network formed by the multi-core processor includes:
- a plurality of idle processor cores that match the number of threads are determined in the on-chip network formed by the multi-core processor.
- searching for and determining a rectangular area expanded by the non-rectangular area includes:
- the adjacent on-chip routers along the on-chip router connected to the initial idle processor core determine in turn whether there are consecutive idle processor cores that match the number of threads;
- the adjacent on-chip routers in the same row as the on-chip router connected to the initial idle processor are sequentially determined to be consecutive.
- the second free area is such that the sum of the number of processor cores in the first free area and the number of processor cores in the second free area is equal to the number of threads.
- searching for and determining a rectangular area expanded by the non-rectangular area includes:
- the adjacent on-chip routers in the same column along the on-chip router connected to the initial idle processor core determine in turn whether there are consecutive idle processor cores that match the number of threads;
- the adjacent on-chip routers along the on-chip router connected to the initial idle processor core are sequentially determined to be consecutive.
- the fourth free area is such that the sum of the number of processor cores in the third free area and the number of processor cores in the fourth free area is equal to the number of threads.
- the method further includes: before the threads included in the to-be-processed task are respectively allocated to the idle processor core, the method further includes:
- the traffic of the on-chip router connected to the non-idle processor core in the rectangular area is predicted to obtain the predicted traffic.
- the embodiment of the present invention further provides an on-chip network, including a plurality of processor cores, an on-chip router and an interconnection line, and any task distribution device as shown in FIG. 6 or FIG. 7 , based on the task assignment method and the task assignment device.
- an on-chip network including a plurality of processor cores, an on-chip router and an interconnection line, and any task distribution device as shown in FIG. 6 or FIG. 7 , based on the task assignment method and the task assignment device.
- the technical solutions of any of the method embodiments in FIG. 2 to FIG. 4A may be performed, and details are not described herein again.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Multi Processors (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP14797851.4A EP2988215B1 (en) | 2013-05-14 | 2014-04-18 | Task assigning method, task assigning apparatus, and network-on-chip |
KR1020157035119A KR101729596B1 (ko) | 2013-05-14 | 2014-04-18 | 작업 할당 방법, 작업 할당 장치, 및 네트워크 온 칩 |
JP2016513212A JP6094005B2 (ja) | 2013-05-14 | 2014-04-18 | タスク割り当て方法、タスク割り当て装置、およびネットワークオンチップ |
US14/940,577 US9965335B2 (en) | 2013-05-14 | 2015-11-13 | Allocating threads on a non-rectangular area on a NoC based on predicted traffic of a smallest rectangular area |
US15/943,370 US10671447B2 (en) | 2013-05-14 | 2018-04-02 | Method, apparatus, and network-on-chip for task allocation based on predicted traffic in an extended area |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310177172.1A CN104156267B (zh) | 2013-05-14 | 2013-05-14 | 任务分配方法、任务分配装置及片上网络 |
CN201310177172.1 | 2013-05-14 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/940,577 Continuation US9965335B2 (en) | 2013-05-14 | 2015-11-13 | Allocating threads on a non-rectangular area on a NoC based on predicted traffic of a smallest rectangular area |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014183530A1 true WO2014183530A1 (zh) | 2014-11-20 |
Family
ID=51881772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/075655 WO2014183530A1 (zh) | 2013-05-14 | 2014-04-18 | 任务分配方法、任务分配装置及片上网络 |
Country Status (6)
Country | Link |
---|---|
US (2) | US9965335B2 (zh) |
EP (1) | EP2988215B1 (zh) |
JP (1) | JP6094005B2 (zh) |
KR (1) | KR101729596B1 (zh) |
CN (1) | CN104156267B (zh) |
WO (1) | WO2014183530A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017067215A1 (zh) * | 2015-10-21 | 2017-04-27 | 深圳市中兴微电子技术有限公司 | 众核网络处理器及其微引擎的报文调度方法、系统、存储介质 |
JP2017539180A (ja) * | 2014-12-18 | 2017-12-28 | 華為技術有限公司Huawei Technologies Co.,Ltd. | 光ネットワークオンチップ、光ルータ、および信号伝送方法 |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102285481B1 (ko) * | 2015-04-09 | 2021-08-02 | 에스케이하이닉스 주식회사 | NoC 반도체 장치의 태스크 매핑 방법 |
CN105718318B (zh) * | 2016-01-27 | 2019-12-13 | 戴西(上海)软件有限公司 | 一种基于辅助工程设计软件的集合式调度优化方法 |
CN105721342B (zh) * | 2016-02-24 | 2017-08-25 | 腾讯科技(深圳)有限公司 | 多进程设备的网络连接方法和系统 |
CN107329822B (zh) * | 2017-01-15 | 2022-01-28 | 齐德昱 | 面向多源多核系统的基于超任务网的多核调度方法 |
CN107632594B (zh) * | 2017-11-06 | 2024-02-06 | 苏州科技大学 | 一种基于无线网络的电器集中控制系统和控制方法 |
CN108694156B (zh) * | 2018-04-16 | 2021-12-21 | 东南大学 | 一种基于缓存一致性行为的片上网络流量合成方法 |
KR102026970B1 (ko) * | 2018-10-08 | 2019-09-30 | 성균관대학교산학협력단 | 네트워크 온 칩의 다중 라우팅 경로 설정 방법 및 장치 |
US11334392B2 (en) | 2018-12-21 | 2022-05-17 | Bull Sas | Method for deployment of a task in a supercomputer, method for implementing a task in a supercomputer, corresponding computer program and supercomputer |
FR3091775A1 (fr) * | 2018-12-21 | 2020-07-17 | Bull Sas | Execution/Isolation d’application par allocation de ressources réseau au travers du mécanisme de routage |
FR3091773A1 (fr) * | 2018-12-21 | 2020-07-17 | Bull Sas | Execution/Isolation d’application par allocation de ressources réseau au travers du mécanisme de routage |
US11327796B2 (en) | 2018-12-21 | 2022-05-10 | Bull Sas | Method for deploying a task in a supercomputer by searching switches for interconnecting nodes |
FR3091771A1 (fr) * | 2018-12-21 | 2020-07-17 | Bull Sas | Execution/Isolation d’application par allocation de ressources réseau au travers du mécanisme de routage |
EP3671455A1 (fr) * | 2018-12-21 | 2020-06-24 | Bull SAS | Procédé de déploiement d'une tâche dans un supercalculateur, procédé de mise en oeuvre d'une tâche dans un supercalculateur, programme d'ordinateur correspondant et supercalculateur |
CN111382115B (zh) | 2018-12-28 | 2022-04-15 | 北京灵汐科技有限公司 | 一种用于片上网络的路径创建方法、装置及电子设备 |
KR102059548B1 (ko) * | 2019-02-13 | 2019-12-27 | 성균관대학교산학협력단 | Vfi 네트워크 온칩에 대한 구역간 라우팅 방법, vfi 네트워크 온칩에 대한 구역내 라우팅 방법, vfi 네트워크 온칩에 대한 구역내 및 구역간 라우팅 방법 및 이를 실행하기 위한 프로그램이 기록된 기록매체 |
CN109995652B (zh) * | 2019-04-15 | 2021-03-19 | 中北大学 | 一种基于冗余通道构筑的片上网络感知预警路由方法 |
CN110471777B (zh) * | 2019-06-27 | 2022-04-15 | 中国科学院计算机网络信息中心 | 一种Python-Web环境中多用户共享使用Spark集群的实现方法和系统 |
US11134030B2 (en) * | 2019-08-16 | 2021-09-28 | Intel Corporation | Device, system and method for coupling a network-on-chip with PHY circuitry |
CN112612605A (zh) * | 2020-12-16 | 2021-04-06 | 平安消费金融有限公司 | 线程分配方法、装置、计算机设备和可读存储介质 |
CN115686800B (zh) * | 2022-12-30 | 2023-03-21 | 摩尔线程智能科技(北京)有限责任公司 | 用于多核系统的动态核心调度方法和装置 |
CN116405555B (zh) * | 2023-03-08 | 2024-01-09 | 阿里巴巴(中国)有限公司 | 数据传输方法、路由节点、处理单元和片上系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101403982A (zh) * | 2008-11-03 | 2009-04-08 | 华为技术有限公司 | 一种多核处理器的任务分配方法、系统及设备 |
US20090328047A1 (en) * | 2008-06-30 | 2009-12-31 | Wenlong Li | Device, system, and method of executing multithreaded applications |
CN102193779A (zh) * | 2011-05-16 | 2011-09-21 | 武汉科技大学 | 一种面向MPSoC的多线程调度方法 |
CN102541633A (zh) * | 2011-12-16 | 2012-07-04 | 汉柏科技有限公司 | 基于多核cpu的数据平面和控制平面部署系统及方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5007050B2 (ja) * | 2006-02-01 | 2012-08-22 | 株式会社野村総合研究所 | 格子型コンピュータシステム、タスク割り当てプログラム |
JP2008191949A (ja) * | 2007-02-05 | 2008-08-21 | Nec Corp | マルチコアシステムおよびマルチコアシステムの負荷分散方法 |
JP5429382B2 (ja) | 2010-08-10 | 2014-02-26 | 富士通株式会社 | ジョブ管理装置及びジョブ管理方法 |
KR101770587B1 (ko) | 2011-02-21 | 2017-08-24 | 삼성전자주식회사 | 멀티코어 프로세서의 핫 플러깅 방법 및 멀티코어 프로세서 시스템 |
JP5724626B2 (ja) | 2011-05-23 | 2015-05-27 | 富士通株式会社 | プロセス配置装置、プロセス配置方法及びプロセス配置プログラム |
-
2013
- 2013-05-14 CN CN201310177172.1A patent/CN104156267B/zh active Active
-
2014
- 2014-04-18 KR KR1020157035119A patent/KR101729596B1/ko active IP Right Grant
- 2014-04-18 EP EP14797851.4A patent/EP2988215B1/en active Active
- 2014-04-18 WO PCT/CN2014/075655 patent/WO2014183530A1/zh active Application Filing
- 2014-04-18 JP JP2016513212A patent/JP6094005B2/ja active Active
-
2015
- 2015-11-13 US US14/940,577 patent/US9965335B2/en active Active
-
2018
- 2018-04-02 US US15/943,370 patent/US10671447B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090328047A1 (en) * | 2008-06-30 | 2009-12-31 | Wenlong Li | Device, system, and method of executing multithreaded applications |
CN101403982A (zh) * | 2008-11-03 | 2009-04-08 | 华为技术有限公司 | 一种多核处理器的任务分配方法、系统及设备 |
CN102193779A (zh) * | 2011-05-16 | 2011-09-21 | 武汉科技大学 | 一种面向MPSoC的多线程调度方法 |
CN102541633A (zh) * | 2011-12-16 | 2012-07-04 | 汉柏科技有限公司 | 基于多核cpu的数据平面和控制平面部署系统及方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP2988215A4 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017539180A (ja) * | 2014-12-18 | 2017-12-28 | 華為技術有限公司Huawei Technologies Co.,Ltd. | 光ネットワークオンチップ、光ルータ、および信号伝送方法 |
US10250958B2 (en) | 2014-12-18 | 2019-04-02 | Huawei Technologies Co., Ltd | Optical network-on-chip, optical router, and signal transmission method |
WO2017067215A1 (zh) * | 2015-10-21 | 2017-04-27 | 深圳市中兴微电子技术有限公司 | 众核网络处理器及其微引擎的报文调度方法、系统、存储介质 |
CN106612236A (zh) * | 2015-10-21 | 2017-05-03 | 深圳市中兴微电子技术有限公司 | 众核网络处理器及其微引擎的报文调度方法、系统 |
Also Published As
Publication number | Publication date |
---|---|
EP2988215A4 (en) | 2016-04-27 |
US10671447B2 (en) | 2020-06-02 |
US20180225156A1 (en) | 2018-08-09 |
JP6094005B2 (ja) | 2017-03-15 |
KR101729596B1 (ko) | 2017-05-11 |
EP2988215B1 (en) | 2021-09-08 |
CN104156267B (zh) | 2017-10-10 |
CN104156267A (zh) | 2014-11-19 |
KR20160007606A (ko) | 2016-01-20 |
JP2016522488A (ja) | 2016-07-28 |
US20160070603A1 (en) | 2016-03-10 |
EP2988215A1 (en) | 2016-02-24 |
US9965335B2 (en) | 2018-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014183530A1 (zh) | 任务分配方法、任务分配装置及片上网络 | |
US11516146B2 (en) | Method and system to allocate bandwidth based on task deadline in cloud computing networks | |
EP3422646B1 (en) | Method and device for multi-flow transmission in sdn network | |
US9503394B2 (en) | Clustered dispersion of resource use in shared computing environments | |
CN107454017B (zh) | 一种云数据中心网络中混合数据流协同调度方法 | |
CN109614215B (zh) | 基于深度强化学习的流调度方法、装置、设备及介质 | |
US11595315B2 (en) | Quality of service in virtual service networks | |
US20190042314A1 (en) | Resource allocation | |
Guo et al. | Oversubscription bounded multicast scheduling in fat-tree data center networks | |
Zhang et al. | Load balancing with deadline-driven parallel data transmission in data center networks | |
Moreno et al. | Arbitration and routing impact on NoC design | |
KR20120121146A (ko) | 가상네트워크 환경에서의 자원 할당 방법 및 장치 | |
Alvarez-Horcajo et al. | Improving multipath routing of TCP flows by network exploration | |
WO2012113224A1 (zh) | 多节点计算系统下选择共享内存所在节点的方法和装置 | |
CN114996199A (zh) | 众核的路由映射方法、装置、设备及介质 | |
Szymanski | Low latency energy efficient communications in global-scale cloud computing systems | |
Guo et al. | A QoS aware multicore hash scheduler for network applications | |
Li et al. | Congestion‐free routing strategy in software defined data center networks | |
González et al. | Traffic Injection Regulation Protocol based on free time-slots requests | |
US20240028881A1 (en) | Deep neural network (dnn) compute loading and traffic-aware power management for multi-core artificial intelligence (ai) processing system | |
EP2939382B1 (en) | Distributed data processing system | |
WO2024021990A1 (zh) | 一种路径确定的方法及相关设备 | |
Fan et al. | The QoS mechanism for NoC router by dynamic virtual channel allocation and dual-net infrastructure | |
US10165598B2 (en) | Wireless medium clearing | |
Das et al. | Regulating Degree of Adaptiveness for Performance-Centric NoC Routing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14797851 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016513212 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014797851 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20157035119 Country of ref document: KR Kind code of ref document: A |