CN114356580B - Heterogeneous multi-core system task allocation method and device based on shared resource access - Google Patents
Heterogeneous multi-core system task allocation method and device based on shared resource access Download PDFInfo
- Publication number
- CN114356580B CN114356580B CN202210029768.6A CN202210029768A CN114356580B CN 114356580 B CN114356580 B CN 114356580B CN 202210029768 A CN202210029768 A CN 202210029768A CN 114356580 B CN114356580 B CN 114356580B
- Authority
- CN
- China
- Prior art keywords
- task
- processor core
- processor
- worst
- energy density
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000005265 energy consumption Methods 0.000 claims abstract description 21
- 238000004364 calculation method Methods 0.000 claims description 36
- 238000012545 processing Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Multi Processors (AREA)
Abstract
The invention discloses a heterogeneous multi-core system task allocation method based on shared resource access, which comprises the steps of calculating worst-case execution time and actual execution time of each task on each processor core; calculating the energy density of each task on each processor and the energy density difference value of each task; and sequentially selecting an unallocated task with the largest energy density difference, and allocating the task to the processor core with the largest similarity with the task resource in the selectable processor cores. The invention also discloses a task allocation device of the heterogeneous multi-core system based on shared resource access, and in the technical scheme of the invention, the energy consumption of the processor cores of the heterogeneous multi-core system can be effectively reduced by selecting the task allocation sequence from large to small according to the energy density difference value of each task.
Description
Technical Field
The invention belongs to the field of computer architecture, and particularly relates to a heterogeneous multi-core system task allocation method and device based on shared resource access.
Background
With the rapid development of computer technology, embedded devices are becoming more and more widely used, and in particular consumer electronics are rapidly growing. Heterogeneous multi-core processors are increasingly popular in the marketplace to meet the processing demands of embedded devices for different tasks. While the computing power is increased, the power consumption of the equipment is increased, so that the working time of the embedded equipment is reduced, excessive heat is generated, the experience of a user is reduced, and the problem that how to reduce the power consumption of the heterogeneous multi-core embedded equipment is urgent to be solved in the heterogeneous multi-core system-on-chip technology is solved.
In the task allocation scheme of the heterogeneous multi-core system based on shared resource access in the prior art, a heuristic algorithm is generally adopted to allocate tasks. Mainly comprises a worst matching descending method (called WFD for short), a synchronous perception worst matching descending method (called SA-WFD for short) and the like.
The SA-WFD heuristic task allocation algorithm was applied at the earliest to an extended multi-core stack resource protocol (MSRP), in order to ensure that the real-time schedulability of tasks can still be met in the worst case, the SA-WFD algorithm uses the worst estimated utilization of tasks (MSRP,) The worst estimated utilization/>, of each task τ i is calculated firstAnd arranging the tasks in a descending order according to the worst estimated utilization rate, and then distributing the tasks according to the arrangement order.
1. Selecting a task with the worst estimated utilization rate to be the largest;
2. selecting a processor core with the largest similarity (resource similarity refers to the number of the shared resources accessed by the tasks to be allocated and the existing tasks on the processor cores) with the task resources from the rest processor cores, and selecting one with the smallest total worst estimated utilization rate from the processor cores with the same resource similarity if a plurality of processor cores with the same resource similarity exist;
Where j represents the processor core number, EU j represents the current total worst estimated utilization of processor core pi j, ψ j represents the set of tasks allocated to processor core pi j, Representing the worst estimated utilization of task τ i on processor core pi j;
3. After the task is distributed to the selected processor core, whether the total worst estimated utilization rate of the processor core is larger than 1 is calculated, if yes, the step 2 is executed, otherwise, the step 4 is executed;
4. assigning the task to the selected processor core;
5. Calculate and update the current EU j for the processor core pi j;
and repeatedly executing the steps until the task allocation is completed.
The technical scheme has the advantages that a large amount of idle time exists after the distribution is finished, and the influence of energy consumption on a system is not considered in the distribution process, so that the power consumption of equipment is not reduced.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a heterogeneous multi-core system task allocation method and device based on shared resource access, so as to reduce the dynamic energy consumption of a processor.
The heterogeneous multi-core system task allocation method based on shared resource access comprises the following steps:
Calculating worst-case execution time of tasks on processor cores And an actual execution time T i j;
Wherein, Represents the critical section number of task τ i, χ is critical section number,/>Representing worst-case execution time of accessing shared resources of the χ critical section of task τ i,/>, forRepresenting worst-case execution time of task τ i non-access shared resources,/>For the current execution frequency of processor core pi j, i is the task number, j is the processor core number;
Computing energy density on each processor for each task And the energy density difference DD i of each task;
Wherein, For/>Highest energy density of/>For/>The lowest energy density of (a); /(I)For the energy consumption of task τ i on processor core pi j, β j is the architecture coefficient of processor core pi j, p i is the execution cycle of task τ i;
Sequentially selecting unallocated tasks τ i with the greatest energy density difference, if there are in the processor cores with the greatest similarity to the task τ i resources No more than 1, assigning the τ i to a processor core that has the greatest similarity to the task resource and/>A processor core not greater than 1; otherwise, tau i is allocated to the smallest processor core of EU j;
Updating EU j of the processor core;
wherein EU j is the current total worst estimated utilization of processor core pi j, the For the worst estimated utilization of task τ i on this processor core, the/>The task τ i is assigned the worst estimated utilization of the tasks that were previously assigned on that processor core.
Further, the tau i is allocated to a task resource with the greatest similarity to the task resourceThe processor core not greater than 1 includes:
at the maximum similarity with the task resource In processor cores not greater than 1, select/>The smallest processor core to which task τ i is allocated.
Further, the EU j for updating the processor core includes:
Updating worst estimated utilization of τ i on the processing core
SW i,χ is the sum of the longest times for all tasks in the processor core where the task entering the shared resource R (z i,χ) wait queue after task τ i accesses the shared resource; for the number of critical sections of τ i, χ is the critical section number of τ i; BW i j is the sum of the actual global latencies for task τ i to access its shared subset of resources;
the total worst estimated utilization EU j of the processor cores is calculated,
Further, the method also includes setting a final execution frequency of each processor core
Calculating the utilization U j of the processor core pi j;
where τ n is the task allocated on processor core pi j; τ n actual execution time on pi j B i is the local latency of task τ i on its assigned processor core, pi j,/>The sum of the actual global latencies for τ n to access its shared resource subset; /(I)The maximum value of the utilization rate of the processor core when executing each task is represented.
If the utilization of processor core pi j is not greater than 1, the execution frequency of the processor core is increasedReducing the utilization rate by one stage, and recalculating the utilization rate;
If the utilization of processor core pi j is greater than 1, the execution frequency of the processor core is increased Raising by one stage as the final execution frequency/>, of the processor core
The heterogeneous multi-core system task allocation device based on shared resource access comprises:
the task execution time calculation module is used for calculating the actual execution time of each task on each processor core;
Where T i j is the actual execution time of task τ i on processor core pi j; Representing the sum of worst-case execution times for task τ i to access its shared subset of resources,/> Represents the critical section number of task τ i,/>Representing worst-case execution time of accessing shared resources of the χ critical section of task τ i,/>, forRepresenting worst-case execution time of task tau i non-access shared resources;
The energy density difference calculation module is used for calculating the energy density of each task on each processor And the energy density difference DD i of each task;
The task selecting module is used for selecting the task tau i with the largest energy density difference value from the unassigned tasks and sending the task tau i to the task assigning module;
and the task allocation module is used for allocating the task tau i to the processor core with the largest similarity with the task resource in the selectable processor cores.
Further, the energy density difference calculation module includes:
an energy consumption calculation unit for calculating the energy consumption of each task tau i on each processor core pi j
An energy density calculation unit for calculating the energy density of τ i on each processor core
The energy density difference calculation unit is used for calculating an energy density difference DD i of the task tau i;
Wherein, For/>Highest energy density of/>For/>The lowest energy density of (a);
further, the task allocation module includes:
A task worst estimated utilization calculation unit for calculating the worst estimated utilization of task τ i on each processor core
Where BW i max is the sum of the worst-case longest latencies of task τ i accessing the subset of shared resources γ i accessed by the task;
a task allocation unit for allocating tasks according to the resource similarity with the task tau i in each processor core And assigning processor cores for the task τ i with a total worst estimated utilization EU j for each processor core;
The specific allocation method is that if the processor core with the maximum similarity to the task tau i resource exists No more than 1, assigning the τ i to a processor core that has the greatest similarity to the task resourceA processor core not greater than 1; otherwise, tau i is allocated to the smallest processor core of EU j;
The total worst estimated utilization rate calculation unit is used for updating the total worst estimated utilization rate of each processor core;
further, if there is a processor core with the greatest similarity to the task τ i resource in the processor core The task allocation unit allocates the task τ i to the processor core having the greatest similarity to the task resource and not greater than 1In processor cores not greater than 1/>Minimal processor cores.
Further, the task allocation module further includes:
A task worst utilization updating unit for updating the worst estimated utilization of the task on the allocated processing core after the task is allocated to the processor core Transmitting the updated worst estimated utilization rate to the total worst estimated utilization rate calculation unit to calculate the total worst estimated utilization rate of the processor core;
SW i,χ is the sum of the longest times for all tasks in the processor core where the task entering the shared resource R (z i,χ) wait queue after task τ i accesses the shared resource; For the number of critical sections of τ i, χ is the critical section number of τ i; the critical section represents a program segment of a task accessing a shared resource; BW i j represents the sum of the actual global latencies for task τ i to access its shared subset of resources.
Further, the device further comprises:
a final execution frequency setting module for setting final execution frequency of each processor core Comprising the following steps:
the utilization ratio calculation unit is used for calculating the utilization ratio U j of each processor core;
where τ n is the task allocated on processor core pi j; τ n actual execution time on pi j B i is the local latency of task τ i on its assigned processor core, pi j,/>The sum of the actual global latencies for τ n to access its shared resource subset; /(I)The maximum value of the utilization rate of the processor core when executing each task is represented.
A processor core frequency setting unit, configured to set a final execution frequency of each processor core according to a utilization rate of each processor core;
If the utilization rate of the processor core is not more than 1, the execution frequency of the processor core is increased Reducing the first level; the updated execution frequency is sent to a utilization rate calculation unit to calculate the utilization rate of the processor;
if the utilization rate of the processor core is greater than 1, the execution frequency of the processor core is increased by one stage to serve as the final execution frequency of the processor core
Drawings
FIG. 1 is a flow chart of a task allocation method of a heterogeneous multi-core system based on shared resource access in accordance with an embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of a task allocation device of a heterogeneous multi-core system based on shared resource access according to embodiment 2 of the present invention;
FIG. 3 is a schematic diagram of an energy density difference calculation module according to embodiment 2 of the present invention;
FIG. 4 is a schematic diagram of a task allocation module according to embodiment 2 of the present invention;
fig. 5 is a schematic diagram of a final execution frequency setting module according to embodiment 2 of the present invention;
FIG. 6 is a graph of schedulable rate versus the prior art solution of the present invention at each task critical section ratio;
FIG. 7 is an energy consumption optimization diagram of the present invention compared to prior art solutions;
Detailed Description
In order to better explain the technical scheme of the invention, the following detailed description of the specific embodiments of the invention is given with reference to the accompanying drawings.
In the following specific embodiment of the present invention, the system has J processor cores pi= { pi 1,π2,...,πJ }, where J is the total number of processor cores; task set tau= { tau 1,τ2,…,τI }, I is task number, I e [1, I ], where I represents total number of tasks; the deadlines of the execution of the tasks are D i, the execution period p i is independent of each other, and all the tasks start to be released at the same time point 0. The set of operable frequencies for each processor core isJ ε [1, J ], where j is the processor core number, the execution frequency is arranged in ascending order, where f 1 j is the lowest selectable execution frequency of processor pi j,/>Is the highest selectable execution frequency of processor pi j.
Worst-case execution time for task τ i on processor core pi j; /(I)Execution efficiency on processor core pi j for task τ i; the beta j is the architecture coefficient of the processor core pi j, and the value of the beta j is a constant value related to the design and the process of the processor; the total execution time of a processor core is the sum of the execution times on the processor core of the tasks assigned to that processor core.
In the following specific embodiment of the present invention, the task access shared resource adopts a resource access protocol rule based on suspension, and the specific rule can be referred to "research on synchronous task energy saving scheduling policy in Multi-core real-time System based on Voltage island" by doctor's university of science and technology in China published in Wu Xiaodong 2012 (online publishing of data of all parties 2013, 05, 16 days).
The shared resource set of the system is gamma= { R 1,R2,…,Rr }, wherein R shared resources are contained, all tasks can be accessed, the task set tau= { tau 1,τ2,…,τI }, the processor core set pi= { pi 1,π2,…,πJ},Zi,χ represents the χ critical section of the task tau i (the critical section represents the program section of the task accessing the shared resources), and R (Z i,χ) represents the shared resources corresponding to the critical section Z i,χ, and the task scheduling is based on the earliest deadline priority (EDF) of the partition. According to the protocol rules, task τ i is blocked on processor core pi j in two cases:
First, when task τ i executes critical section Z i,χ to access resource R (Z i,χ), resource R (Z i,χ) is being accessed by other on-core tasks, at which time task τ i is added to the FIFO queue of R (Z i,χ), this time being referred to as the global latency (abbreviated as BW i,χ) for accessing resource R (Z i,χ);
Second, when task τ i on processor core pi j wants to access a resource, a task on the same core with a lower priority than task T i is accessing other resources, or the low priority task makes a request to access a resource, but the resource it is applying for is being accessed by the task of other core and places the low priority task in its resource FIFO queue, at this time task τ i is blocked by the low priority task, this waiting time is called local waiting time (B i for short).
At most, only one task on one processor core is in a state of accessing resources or waiting for releasing the resources in a resource FIFO queue at any moment; any one task is blocked at most only once by a lower priority task on the same processor core; the upper limit of the task local latency is the maximum time that a lower priority task accesses a resource on the same processor core (this time includes the global latency of a low priority task accessing the resource).
Example 1
The embodiment is a preferred implementation mode of the task allocation method of the heterogeneous multi-core system based on shared resource access.
Referring to fig. 1, as shown in fig. 1, the method of the present embodiment includes:
s101, powering up and starting a system, and distributing a periodic task set tau= { tau 1,τ2,…,τI } to the system;
s102, setting the execution frequency of each processor core
S103, calculating the actual execution time and the worst execution time of each task on each processor core;
Where T i j is the actual execution time of task τ i on processor core pi j; Representing the sum of worst-case execution times for task τ i to access its shared subset of resources,/> Represents the critical section number of task τ i,/>Representing worst-case execution time of accessing shared resources of the χ critical section of task τ i,/>, forRepresenting worst-case execution time of task tau i non-access shared resources;
repeatedly executing the steps, and calculating to obtain the actual execution time of the task on each processor core;
s104, calculating the energy density difference value of each task;
The method further comprises the following steps:
S1041, calculating energy consumption of task τ i on processor core pi j
S1042, repeating the step S1041 to obtain the energy consumption of the task tau i on all the processor cores;
S1043, calculating the energy density of τ i on each processor core
S1044, calculating an energy density difference DD i of the task tau i;
Wherein, For/>Highest energy density of/>For/>The lowest energy density of (a);
steps S1041 to S1044 are respectively executed for each task, and energy density difference values of each task are obtained;
s105, arranging the tasks in a descending order according to the energy density difference value of each task; obtaining a first task list;
S106, selecting an unassigned task tau i with the largest energy density difference in the first task list;
S107, tau i is distributed to the processor core with the largest similarity with the task resource in the selectable processor cores;
The step may further comprise:
s1071, calculating the resource similarity of the task and each processor core
Where q is the number of assigned tasks on processor core pi j, τ q is the task assigned to processor core pi j; psi j represents the set of tasks allocated on processor core pi j; Θ i,q is the number of shared resources that task τ i and task τ q access the same type;
S1072, calculating the worst estimated utilization rate of the task tau i on each processor core
Where BW i max is the sum of the worst-case longest latencies of task τ i accessing the subset of shared resources γ i accessed by the task;
S1073, judge Whether there is/are in the largest processor coreA processor core not larger than 1, if yes, executing step S1074, otherwise executing step S1075;
Wherein EU j is the current total worst estimated utilization of processor core pi j, The worst estimated utilization for task τ q that has been allocated to processor core pi j;
S1074, assigning task τ i to one Maximum and/>A processor core not greater than 1; step 108 is performed;
as a preferred implementation of this embodiment, this step may further include:
At the position of Among the largest processor cores and processor cores with EU j not greater than 1, select/>A smallest processor core to which task τ i is allocated;
S1075, assigning the task tau i to the smallest processor core of EU j;
S108, updating the total worst estimated utilization rate of the processor core
Steps S107 to S108 are repeated until all task assignments are completed.
As a preferred implementation of this embodiment, the step S108 may include;
S1081, updating the worst estimated utilization of each task on the processing core pi j allocated by the task
SW i,χ is the sum of the longest times for all tasks in the processor core where the task entering the shared resource R (z i,χ) wait queue after task τ i accesses the shared resource; For the number of critical sections of τ i, χ is the critical section number of τ i; the critical section represents a program segment of a task accessing a shared resource; BW i j represents the sum of the actual global latencies of task τ i accessing its shared subset of resources;
S1082, calculating the total worst estimated utilization rate of the processor core;
As a preferred implementation of this embodiment, this embodiment may further include step S109.
S109, setting final execution frequency of each processor core
S1091, calculate the local latency B i of task τ i on its assigned processor core pi j;
Wherein, Representing the worst-case execution time of task τ m, m+.i in critical section Z m,χ; z m,χ represents the χ critical region of task τ m; /(I)Representing the execution efficiency of task τ m in pi j,/>Representing a worst-case global latency for task τ m when accessing resource R (z m,χ); max { } represents taking the maximum value.
S1092, calculating the utilization ratio U j of the processor core pi j;
where τ n is the task allocated on processor core pi j; τ n actual execution time on pi j B i is the local latency of task τ i on its assigned processor core, pi j,/>The sum of the actual global latencies for τ n to access its shared resource subset; /(I)The maximum value of the utilization rate of the processor core when executing each task is represented.
S1093, judging whether U j of the processor core pi j is not more than 1, if yes, executing step 1094, otherwise executing step 1095;
s1094, the execution frequency of the processor core is reduced Reducing the first level; step S1091 is performed;
s1095, raising the execution frequency of the processor core by one stage as the final execution frequency of the processor core
Step S109 is repeatedly performed until the frequency setting of all the processor cores is completed.
Example 2
The embodiment is a preferred implementation mode of the task allocation device of the heterogeneous multi-core system based on shared resource access.
Referring to fig. 2, as shown in fig. 2, the apparatus of this embodiment includes:
the task execution time calculation module is used for calculating the actual execution time of each task on each processor core;
Where T i j is the actual execution time of task τ i on processor core pi j; Representing the sum of worst-case execution times for task τ i to access its shared subset of resources,/> Represents the critical section number of task τ i,/>Representing worst-case execution time of accessing shared resources of the χ critical section of task τ i,/>, forRepresenting worst-case execution time of task tau i non-access shared resources;
The energy density difference calculation module is used for calculating the energy density difference DD i of each task;
referring to fig. 3, as shown in fig. 3, the present module further includes:
an energy consumption calculation unit for calculating the energy consumption of each task tau i on each processor core pi j
An energy density calculation unit for calculating the energy density of τ i on each processor core
The energy density difference calculation unit is used for calculating an energy density difference DD i of the task tau i;
Wherein, For/>Highest energy density of/>For/>The lowest energy density of (a);
The task selecting module is used for selecting the task tau i with the largest energy density difference value from the unassigned tasks and sending the task tau i to the task assigning module;
The task allocation module is used for allocating the task tau i to a processor core with the largest similarity with the task resource in the selectable processor cores;
referring to fig. 4, as shown in fig. 4, the present module further includes:
A resource similarity calculation unit for calculating the resource similarity between the task τ i and each processor core
Where q is the number of assigned tasks on processor core pi j, τ q is the task assigned to processor core pi j; psi j represents the set of tasks allocated on processor core pi j; Θ i,q is the number of shared resources that task τ i and task τ q access the same type;
A task worst estimated utilization calculation unit for calculating the worst estimated utilization of task τ i on each processor core
Where BW i max is the sum of the worst-case longest latencies of task τ i accessing the subset of shared resources γ i accessed by the task;
a task allocation unit for allocating tasks according to the above And allocating processor cores for the task τ i by using the total worst estimated utilization rate EU j of each processor core of the processor, wherein the specific allocation method is as follows:
Judging Whether there is/are in the largest processor coreProcessor cores not greater than 1, if so, assign the task τ i to one/>On the processor core that is largest and EU j is not greater than 1; otherwise, assigning the task τ i to the smallest processor core of EU j;
Wherein the said The worst estimated utilization for task τ q that has been allocated to processor core j;
as a preferred implementation of this embodiment, if in Among the largest processor cores areA processor core not greater than 1, the task allocation unit allocating the task τ i to/>Processor core with maximum EU j not greater than 1/>On the smallest processor core;
The total worst estimated utilization rate calculation unit is used for updating the total worst estimated utilization rate of each processor core;
as a preferred implementation manner of this embodiment, the task allocation module may further include:
A task worst utilization updating unit for updating the worst estimated utilization of the task on the allocated processing core after the task is allocated to the processor core Transmitting the updated worst estimated utilization rate to the total worst estimated utilization rate calculation unit to calculate the total worst estimated utilization rate of the processor core;
SW i,χ is the sum of the longest times for all tasks in the processor core where the task entering the shared resource R (z i,χ) wait queue after task τ i accesses the shared resource; For the number of critical sections of τ i, χ is the critical section number of τ i; the critical section represents a program segment of a task accessing a shared resource; BW i j represents the sum of the actual global latencies of task τ i accessing its shared subset of resources;
As a preferred implementation of this embodiment, the apparatus may further include:
a final execution frequency setting module for setting final execution frequency of each processor core Referring to fig. 5, as shown in fig. 5, the final execution frequency setting module includes:
A task local latency calculation unit, configured to calculate a local latency B i of each task on its allocated processor core;
Wherein, Representing the worst-case execution time of task τ m, q+.i in critical section Z m,χ; z m,χ represents the χ critical region of task τ m; /(I)Representing the execution efficiency of task τ m in pi j,/>Representing a worst-case global latency for task τ m when accessing resource R (z m,χ); max { } represents taking the maximum value.
The utilization ratio calculation unit is used for calculating the utilization ratio U j of each processor core;
where τ n is the task allocated on processor core pi j; τ n actual execution time on pi j B i is the local latency of task τ i on its assigned processor core, pi j,/>The sum of the actual global latencies for τ n to access its shared resource subset; /(I)The maximum value of the utilization rate of the processor core when executing each task is represented.
A processor core frequency setting unit, configured to set a final execution frequency of each processor core according to a utilization rate of each processor core;
If the utilization rate of the processor core is not more than 1, the execution frequency of the processor core is increased Reducing the first level; the updated execution frequency is sent to a utilization rate calculation unit to calculate the utilization rate of the processor;
if the utilization rate of the processor core is greater than 1, the execution frequency of the processor core is increased by one stage to serve as the final execution frequency of the processor core
In the technical scheme of the embodiment of the invention, when task allocation is carried out, the energy consumption difference of each task on different cores of the heterogeneous processor is comprehensively considered, the sequence of task allocation is set according to the energy density difference of each task, the tasks with large energy density difference are preferentially allocated, the system energy consumption problem is comprehensively considered in the preparation stage of task allocation, and compared with the prior art, the energy consumption of the system after task allocation can be effectively reduced. In a preferred implementation scheme of the implementation, after task allocation is completed, the worst estimated utilization rate of each task is updated according to the actual global waiting time of the task accessing the shared resource, so that the calculation of the worst estimated utilization rate and the utilization rate of the processor core used in the task allocation process is more accurate, compared with the prior art, the scheduling success rate of a task set is increased, and as shown in fig. 6, the scheduling success rate is about 8% higher than the scheduling rate in each critical area ratio in the technical scheme of the invention compared with the prior art. In another preferred implementation scheme of the implementation, after task allocation is completed, according to the utilization rate of each processor core, under the consideration of ensuring the real-time performance of task execution, a dynamic voltage frequency adjustment technology is adopted to continuously try to reduce the execution frequency of the processor core, so that the utilization rate of a system is as close to 1 as possible, the execution idle time of the processor core is reduced, the power consumption of the processor core of the system is further reduced, and as shown in fig. 7, compared with the prior art, the energy consumption of the technical scheme of the invention is optimized to about 6.8% under the low cycle of the cycle range [50,100 ]; the energy consumption is optimized to be about 7.8% in the period range [100,200 ]; at high periods in the period range 200,400, the energy consumption is optimized to be about 8.4%.
Claims (10)
1. The heterogeneous multi-core system task allocation method based on shared resource access is characterized by comprising the following steps of:
Calculating worst-case execution time of tasks on processor cores And an actual execution time T i j;
Wherein, Represents the critical section number of task τ i, χ is critical section number,/>Representing worst-case execution time of accessing shared resources of the χ critical section of task τ i,/>, forRepresenting the worst-case execution time of task tau i for non-access to the shared resource,For the current execution frequency of processor core pi j, i is the task number, j is the processor core number;
Computing energy density on each processor for each task And the energy density difference DD i of each task;
Wherein, For/>Highest energy density of/>For/>The lowest energy density of (a); /(I)For the energy consumption of task τ i on processor core pi j, β j is the architecture coefficient of processor core pi j, p i is the execution cycle of task τ i;
Sequentially selecting unallocated tasks τ i with the greatest energy density difference, if there are in the processor cores with the greatest similarity to the task τ i resources No more than 1, assigning the τ i to a processor core that has the greatest similarity to the task resource and/>A processor core not greater than 1; otherwise, tau i is allocated to the smallest processor core of EU j;
Updating EU j of the processor core;
wherein EU j is the current total worst estimated utilization of processor core pi j, the For the worst estimated utilization of task τ i on this processor core, the/>The task τ i is assigned the worst estimated utilization of the tasks that were previously assigned on that processor core.
2. The method of claim 1 wherein said assigning τ i to a task resource that is most similar to the task resourceThe processor core not greater than 1 includes:
at the maximum similarity with the task resource In processor cores not greater than 1, select/>The smallest processor core to which task τ i is allocated.
3. The method of claim 1, wherein the updating EU j of the processor core comprises:
Updating worst estimated utilization of τ i on the processing core
The total worst estimated utilization EU j of the processor cores is calculated,
SW i,χ is the sum of the longest times for all tasks in the processor core where the task entering the shared resource R (z i,χ) wait queue after task τ i accesses the shared resource; For the number of critical sections of τ i, χ is the critical section number of τ i; the sum of the actual global latencies of the subset of its shared resources is accessed for task τ i.
4. A method according to any one of claims 1 to 3, further comprising setting a final execution frequency of each processor core
Calculating the utilization U j of the processor core pi j;
If the utilization of processor core pi j is not greater than 1, the execution frequency of the processor core is increased Reducing the utilization rate by one stage, and recalculating the utilization rate;
If the utilization of processor core pi j is greater than 1, the execution frequency of the processor core is increased Raising by one stage as the final execution frequency/>, of the processor core
Where τ n is the task allocated on processor core pi j; τ n actual execution time on pi j B i is the local latency of task τ i on its assigned processor core, pi j,/>The sum of the actual global latencies for τ n to access its shared resource subset; /(I)The maximum value of the utilization rate of the processor core when executing each task is represented.
5. A heterogeneous multi-core system task allocation device based on shared resource access, comprising:
task execution time calculation module for calculating worst case execution time of each task on each processor core And an actual execution time T i j;
Wherein, Represents the critical section number of task τ i, χ is critical section number,/>Representing worst-case execution time of accessing shared resources of the χ critical section of task τ i,/>, forRepresenting the worst-case execution time of task tau i for non-access to the shared resource,For the current execution frequency of processor core pi j, i is the task number, j is the processor core number;
The energy density difference calculation module is used for calculating the energy density of each task on each processor And the energy density difference DD i of each task;
Wherein, For/>Highest energy density of/>For/>The lowest energy density of (a); /(I)For the energy consumption of task τ i on processor core pi j, β j is the architecture coefficient of processor core pi j, p i is the execution cycle of task τ i;
The task selecting module is used for selecting the task tau i with the largest energy density difference value from the unassigned tasks and sending the task tau i to the task assigning module;
The task allocation module is used for allocating the task tau i to a processor core with the largest similarity with the task resource in the selectable processor cores;
Sequentially selecting unallocated tasks τ i with the greatest energy density difference, if there are in the processor cores with the greatest similarity to the task τ i resources No more than 1, assigning the τ i to a processor core that has the greatest similarity to the task resource and/>A processor core not greater than 1; otherwise, tau i is allocated to the smallest processor core of EU j;
Updating EU j of the processor core;
wherein EU j is the current total worst estimated utilization of processor core pi j, the For the worst estimated utilization of task τ i on this processor core, the/>The task τ i is assigned the worst estimated utilization of the tasks that were previously assigned on that processor core.
6. The apparatus of claim 5, wherein the energy density difference calculation module comprises:
an energy consumption calculation unit for calculating the energy consumption of each task tau i on each processor core pi j
An energy density calculation unit for calculating the energy density of τ i on each processor core
The energy density difference calculation unit is used for calculating an energy density difference DD i of the task tau i;
Wherein, For/>Highest energy density of/>For/>Is the lowest energy density of (a).
7. The apparatus of claim 5, wherein the task allocation module comprises:
A task worst estimated utilization calculation unit for calculating the worst estimated utilization of task τ i on each processor core
Wherein the method comprises the steps ofThe sum of the worst-case longest latencies of the shared resource subset gamma i accessed for the task tau i;
the task allocation unit is used for allocating the processor cores for the task tau i according to the resource similarity between each processor core and the task tau i and the total worst estimated utilization rate EU j of each processor core;
The specific allocation method is that if the processor core with the maximum similarity to the task tau i resource exists No more than 1, assigning the τ i to a processor core that has the greatest similarity to the task resource and/>A processor core not greater than 1; otherwise, tau i is allocated to the smallest processor core of EU j;
The total worst estimated utilization rate calculation unit is used for updating the total worst estimated utilization rate of each processor core;
。
8. the apparatus according to claim 7, wherein:
The task allocation unit allocates processor cores to the task τ i if there are processor cores with the greatest similarity to the task τ i resources The task allocation unit allocates the task τ i to a processor core having the greatest similarity to the task resource and/>In processor cores not greater than 1/>Minimal processor cores.
9. The apparatus of claim 7, wherein the task allocation module further comprises:
A task worst utilization updating unit for updating the worst estimated utilization of the task on the allocated processing core after the task is allocated to the processor core Transmitting the updated worst estimated utilization rate to the total worst estimated utilization rate calculation unit to calculate the total worst estimated utilization rate of the processor core;
SW i,χ is the sum of the longest times for all tasks in the processor core where the task entering the shared resource R (z i,χ) wait queue after task τ i accesses the shared resource; For the number of critical sections of τ i, χ is the critical section number of τ i; the critical section represents a program segment of a task accessing a shared resource; BW i j represents the sum of the actual global latencies for task τ i to access its shared subset of resources.
10. The apparatus according to any one of claims 5-9, further comprising:
a final execution frequency setting module for setting final execution frequency of each processor core Comprises the following steps of;
the utilization ratio calculation unit is used for calculating the utilization ratio U j of each processor core;
A processor core frequency setting unit, configured to set a final execution frequency of each processor core according to a utilization rate of each processor core;
If the utilization rate of the processor core is not more than 1, the execution frequency of the processor core is increased Reducing the first level; the updated execution frequency is sent to a utilization rate calculation unit to calculate the utilization rate of the processor;
if the utilization rate of the processor core is greater than 1, the execution frequency of the processor core is increased by one stage to serve as the final execution frequency of the processor core
Where τ n is the task allocated on processor core pi j; τ n actual execution time on pi j B i is the local latency of task τ i on its assigned processor core, pi j,/>The sum of the actual global latencies for τ n to access its shared resource subset; /(I)The maximum value of the utilization rate of the processor core when executing each task is represented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210029768.6A CN114356580B (en) | 2022-01-12 | 2022-01-12 | Heterogeneous multi-core system task allocation method and device based on shared resource access |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210029768.6A CN114356580B (en) | 2022-01-12 | 2022-01-12 | Heterogeneous multi-core system task allocation method and device based on shared resource access |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114356580A CN114356580A (en) | 2022-04-15 |
CN114356580B true CN114356580B (en) | 2024-05-28 |
Family
ID=81108401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210029768.6A Active CN114356580B (en) | 2022-01-12 | 2022-01-12 | Heterogeneous multi-core system task allocation method and device based on shared resource access |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114356580B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114816720B (en) * | 2022-06-24 | 2022-09-13 | 小米汽车科技有限公司 | Scheduling method and device of multi-task shared physical processor and terminal equipment |
CN117971770A (en) * | 2024-04-01 | 2024-05-03 | 北京麟卓信息科技有限公司 | SoC pre-silicon performance and power consumption estimation method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9465664B1 (en) * | 2015-09-09 | 2016-10-11 | Honeywell International Inc. | Systems and methods for allocation of environmentally regulated slack |
CN106445673A (en) * | 2016-10-14 | 2017-02-22 | 苏州光蓝信息技术有限公司 | Fault-tolerant task scheduling method oriented to mixed-criticality real-time system |
CN111190729A (en) * | 2019-12-25 | 2020-05-22 | 武汉科技大学 | Task allocation method based on heterogeneous multi-core |
CN111679897A (en) * | 2020-06-05 | 2020-09-18 | 重庆邮电大学 | Heterogeneous multi-core system-on-chip task allocation method and device |
CN112034941A (en) * | 2020-08-24 | 2020-12-04 | 朱洪滨 | Chip with novel framework |
EP3872638A1 (en) * | 2020-02-26 | 2021-09-01 | Samsung Electronics Co., Ltd. | Operation method of an accelerator and system including the same |
CN113535409A (en) * | 2021-08-10 | 2021-10-22 | 天津大学 | Server-free computing resource distribution system oriented to energy consumption optimization |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IN2013MU03699A (en) * | 2013-11-25 | 2015-07-31 | Tata Consultancy Services Ltd |
-
2022
- 2022-01-12 CN CN202210029768.6A patent/CN114356580B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9465664B1 (en) * | 2015-09-09 | 2016-10-11 | Honeywell International Inc. | Systems and methods for allocation of environmentally regulated slack |
CN106445673A (en) * | 2016-10-14 | 2017-02-22 | 苏州光蓝信息技术有限公司 | Fault-tolerant task scheduling method oriented to mixed-criticality real-time system |
CN111190729A (en) * | 2019-12-25 | 2020-05-22 | 武汉科技大学 | Task allocation method based on heterogeneous multi-core |
EP3872638A1 (en) * | 2020-02-26 | 2021-09-01 | Samsung Electronics Co., Ltd. | Operation method of an accelerator and system including the same |
CN111679897A (en) * | 2020-06-05 | 2020-09-18 | 重庆邮电大学 | Heterogeneous multi-core system-on-chip task allocation method and device |
CN112034941A (en) * | 2020-08-24 | 2020-12-04 | 朱洪滨 | Chip with novel framework |
CN113535409A (en) * | 2021-08-10 | 2021-10-22 | 天津大学 | Server-free computing resource distribution system oriented to energy consumption optimization |
Non-Patent Citations (3)
Title |
---|
"Energy-Efficient Scheduling of Real-Time Tasks in Reconfigurable Homogeneous Multicore Platforms";Aymen Gammoudi;《IEEE Transactions on Systems, Man, and Cybernetics: Systems》;20201231;第50卷(第12期);第5092-5105期 * |
"异构多核架构下的高能效混合任务调度算法研究";陈磊;《中国优秀硕士学位论文全文数据库 信息科技辑》;20230615(第2023年06期);第I137-10页 * |
"线性加速比并行实时任务的节能算法研究";林宇晗;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170315(第2017年03期);第I137-397页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114356580A (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114356580B (en) | Heterogeneous multi-core system task allocation method and device based on shared resource access | |
JP6386165B2 (en) | Method and apparatus for managing jobs that can and cannot be interrupted when there is a change in power allocation to a distributed computer system | |
JP5120061B2 (en) | Priority control program, priority control apparatus, and priority control method | |
WO2023082560A1 (en) | Task processing method and apparatus, device, and medium | |
CN102043675B (en) | Thread pool management method based on task quantity of task processing request | |
US20040015973A1 (en) | Resource reservation for large-scale job scheduling | |
WO2015106533A1 (en) | Coprocessor-based job scheduling processing method and device | |
JPH0659906A (en) | Method for controlling execution of parallel | |
Xie et al. | Mixed real-time scheduling of multiple dags-based applications on heterogeneous multi-core processors | |
US20130061233A1 (en) | Efficient method for the scheduling of work loads in a multi-core computing environment | |
TWI257544B (en) | Windows-based power management method and portable device using the same | |
US20090178045A1 (en) | Scheduling Memory Usage Of A Workload | |
KR100731983B1 (en) | Hardwired scheduler for low power wireless device processor and method of scheduling using the same | |
CN111104211A (en) | Task dependency based computation offload method, system, device and medium | |
US20140137122A1 (en) | Modified backfill scheduler and a method employing frequency control to reduce peak cluster power requirements | |
KR101373786B1 (en) | Resource-based scheduler | |
CN112306642B (en) | Workflow scheduling method based on stable matching game theory | |
JP2008226023A (en) | Job allocating device and job allocating method | |
Nieh et al. | Integrated processor scheduling for multimedia | |
CN104598311A (en) | Method and device for real-time operation fair scheduling for Hadoop | |
CN116048721A (en) | Task allocation method and device for GPU cluster, electronic equipment and medium | |
CN105320565A (en) | Computer resource scheduling method for various application software | |
CN116610422A (en) | Task scheduling method, device and system | |
CN104731662B (en) | A kind of resource allocation methods of variable concurrent job | |
Du et al. | A combined priority scheduling method for distributed machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |