CN114356580B

CN114356580B - Heterogeneous multi-core system task allocation method and device based on shared resource access

Info

Publication number: CN114356580B
Application number: CN202210029768.6A
Authority: CN
Inventors: 夏军; 兰浩; 李铮; 陈磊
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2024-05-28
Anticipated expiration: 2042-01-12
Also published as: CN114356580A

Abstract

The invention discloses a heterogeneous multi-core system task allocation method based on shared resource access, which comprises the steps of calculating worst-case execution time and actual execution time of each task on each processor core; calculating the energy density of each task on each processor and the energy density difference value of each task; and sequentially selecting an unallocated task with the largest energy density difference, and allocating the task to the processor core with the largest similarity with the task resource in the selectable processor cores. The invention also discloses a task allocation device of the heterogeneous multi-core system based on shared resource access, and in the technical scheme of the invention, the energy consumption of the processor cores of the heterogeneous multi-core system can be effectively reduced by selecting the task allocation sequence from large to small according to the energy density difference value of each task.

Description

Heterogeneous multi-core system task allocation method and device based on shared resource access

Technical Field

The invention belongs to the field of computer architecture, and particularly relates to a heterogeneous multi-core system task allocation method and device based on shared resource access.

Background

With the rapid development of computer technology, embedded devices are becoming more and more widely used, and in particular consumer electronics are rapidly growing. Heterogeneous multi-core processors are increasingly popular in the marketplace to meet the processing demands of embedded devices for different tasks. While the computing power is increased, the power consumption of the equipment is increased, so that the working time of the embedded equipment is reduced, excessive heat is generated, the experience of a user is reduced, and the problem that how to reduce the power consumption of the heterogeneous multi-core embedded equipment is urgent to be solved in the heterogeneous multi-core system-on-chip technology is solved.

In the task allocation scheme of the heterogeneous multi-core system based on shared resource access in the prior art, a heuristic algorithm is generally adopted to allocate tasks. Mainly comprises a worst matching descending method (called WFD for short), a synchronous perception worst matching descending method (called SA-WFD for short) and the like.

The SA-WFD heuristic task allocation algorithm was applied at the earliest to an extended multi-core stack resource protocol (MSRP), in order to ensure that the real-time schedulability of tasks can still be met in the worst case, the SA-WFD algorithm uses the worst estimated utilization of tasks (MSRP,) The worst estimated utilization/>, of each task τ _i is calculated firstAnd arranging the tasks in a descending order according to the worst estimated utilization rate, and then distributing the tasks according to the arrangement order.

1. Selecting a task with the worst estimated utilization rate to be the largest;

2. selecting a processor core with the largest similarity (resource similarity refers to the number of the shared resources accessed by the tasks to be allocated and the existing tasks on the processor cores) with the task resources from the rest processor cores, and selecting one with the smallest total worst estimated utilization rate from the processor cores with the same resource similarity if a plurality of processor cores with the same resource similarity exist;

Where j represents the processor core number, EU ^j represents the current total worst estimated utilization of processor core pi ^j, ψ ^j represents the set of tasks allocated to processor core pi ^j, Representing the worst estimated utilization of task τ _i on processor core pi ^j;

3. After the task is distributed to the selected processor core, whether the total worst estimated utilization rate of the processor core is larger than 1 is calculated, if yes, the step 2 is executed, otherwise, the step 4 is executed;

4. assigning the task to the selected processor core;

5. Calculate and update the current EU ^j for the processor core pi ^j;

and repeatedly executing the steps until the task allocation is completed.

The technical scheme has the advantages that a large amount of idle time exists after the distribution is finished, and the influence of energy consumption on a system is not considered in the distribution process, so that the power consumption of equipment is not reduced.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a heterogeneous multi-core system task allocation method and device based on shared resource access, so as to reduce the dynamic energy consumption of a processor.

The heterogeneous multi-core system task allocation method based on shared resource access comprises the following steps:

Calculating worst-case execution time of tasks on processor cores And an actual execution time T _i ^j;

Wherein, Represents the critical section number of task τ _i, χ is critical section number,/>Representing worst-case execution time of accessing shared resources of the χ critical section of task τ _i,/>, forRepresenting worst-case execution time of task τ _i non-access shared resources,/>For the current execution frequency of processor core pi ^j, i is the task number, j is the processor core number;

Computing energy density on each processor for each task And the energy density difference DD _i of each task;

Wherein, For/>Highest energy density of/>For/>The lowest energy density of (a); /(I)For the energy consumption of task τ _i on processor core pi ^j, β ^j is the architecture coefficient of processor core pi ^j, p _i is the execution cycle of task τ _i;

Sequentially selecting unallocated tasks τ _i with the greatest energy density difference, if there are in the processor cores with the greatest similarity to the task τ _i resources No more than 1, assigning the τ _i to a processor core that has the greatest similarity to the task resource and/>A processor core not greater than 1; otherwise, tau _i is allocated to the smallest processor core of EU ^j;

Updating EU ^j of the processor core;

wherein EU ^j is the current total worst estimated utilization of processor core pi ^j, the For the worst estimated utilization of task τ _i on this processor core, the/>The task τ _i is assigned the worst estimated utilization of the tasks that were previously assigned on that processor core.

Further, the tau _i is allocated to a task resource with the greatest similarity to the task resourceThe processor core not greater than 1 includes:

at the maximum similarity with the task resource In processor cores not greater than 1, select/>The smallest processor core to which task τ _i is allocated.

Further, the EU ^j for updating the processor core includes:

Updating worst estimated utilization of τ _i on the processing core

SW _i,χ is the sum of the longest times for all tasks in the processor core where the task entering the shared resource R (z _i,χ) wait queue after task τ _i accesses the shared resource; for the number of critical sections of τ _i, χ is the critical section number of τ _i; BW _i ^j is the sum of the actual global latencies for task τ _i to access its shared subset of resources;

the total worst estimated utilization EU ^j of the processor cores is calculated,

Further, the method also includes setting a final execution frequency of each processor core

Calculating the utilization U ^j of the processor core pi ^j;

where τ _n is the task allocated on processor core pi ^j; τ _n actual execution time on pi ^j B _i is the local latency of task τ _i on its assigned processor core, pi ^j,/>The sum of the actual global latencies for τ _n to access its shared resource subset; /(I)The maximum value of the utilization rate of the processor core when executing each task is represented.

If the utilization of processor core pi ^j is not greater than 1, the execution frequency of the processor core is increasedReducing the utilization rate by one stage, and recalculating the utilization rate;

If the utilization of processor core pi ^j is greater than 1, the execution frequency of the processor core is increased Raising by one stage as the final execution frequency/>, of the processor core

The heterogeneous multi-core system task allocation device based on shared resource access comprises:

the task execution time calculation module is used for calculating the actual execution time of each task on each processor core;

Where T _i ^j is the actual execution time of task τ _i on processor core pi ^j; Representing the sum of worst-case execution times for task τ _i to access its shared subset of resources,/> Represents the critical section number of task τ _i,/>Representing worst-case execution time of accessing shared resources of the χ critical section of task τ _i,/>, forRepresenting worst-case execution time of task tau _i non-access shared resources;

The energy density difference calculation module is used for calculating the energy density of each task on each processor And the energy density difference DD _i of each task;

The task selecting module is used for selecting the task tau _i with the largest energy density difference value from the unassigned tasks and sending the task tau _i to the task assigning module;

and the task allocation module is used for allocating the task tau _i to the processor core with the largest similarity with the task resource in the selectable processor cores.

Further, the energy density difference calculation module includes:

an energy consumption calculation unit for calculating the energy consumption of each task tau _i on each processor core pi ^j

An energy density calculation unit for calculating the energy density of τ _i on each processor core

The energy density difference calculation unit is used for calculating an energy density difference DD _i of the task tau _i;

Wherein, For/>Highest energy density of/>For/>The lowest energy density of (a);

further, the task allocation module includes:

A task worst estimated utilization calculation unit for calculating the worst estimated utilization of task τ _i on each processor core

Where BW _i ^max is the sum of the worst-case longest latencies of task τ _i accessing the subset of shared resources γ _i accessed by the task;

a task allocation unit for allocating tasks according to the resource similarity with the task tau _i in each processor core And assigning processor cores for the task τ _i with a total worst estimated utilization EU ^j for each processor core;

The specific allocation method is that if the processor core with the maximum similarity to the task tau _i resource exists No more than 1, assigning the τ _i to a processor core that has the greatest similarity to the task resourceA processor core not greater than 1; otherwise, tau _i is allocated to the smallest processor core of EU ^j;

The total worst estimated utilization rate calculation unit is used for updating the total worst estimated utilization rate of each processor core;

further, if there is a processor core with the greatest similarity to the task τ _i resource in the processor core The task allocation unit allocates the task τ _i to the processor core having the greatest similarity to the task resource and not greater than 1In processor cores not greater than 1/>Minimal processor cores.

Further, the task allocation module further includes:

A task worst utilization updating unit for updating the worst estimated utilization of the task on the allocated processing core after the task is allocated to the processor core Transmitting the updated worst estimated utilization rate to the total worst estimated utilization rate calculation unit to calculate the total worst estimated utilization rate of the processor core;

SW _i,χ is the sum of the longest times for all tasks in the processor core where the task entering the shared resource R (z _i,χ) wait queue after task τ _i accesses the shared resource; For the number of critical sections of τ _i, χ is the critical section number of τ _i; the critical section represents a program segment of a task accessing a shared resource; BW _i ^j represents the sum of the actual global latencies for task τ _i to access its shared subset of resources.

Further, the device further comprises:

a final execution frequency setting module for setting final execution frequency of each processor core Comprising the following steps:

the utilization ratio calculation unit is used for calculating the utilization ratio U ^j of each processor core;

A processor core frequency setting unit, configured to set a final execution frequency of each processor core according to a utilization rate of each processor core;

If the utilization rate of the processor core is not more than 1, the execution frequency of the processor core is increased Reducing the first level; the updated execution frequency is sent to a utilization rate calculation unit to calculate the utilization rate of the processor;

if the utilization rate of the processor core is greater than 1, the execution frequency of the processor core is increased by one stage to serve as the final execution frequency of the processor core

Drawings

FIG. 1 is a flow chart of a task allocation method of a heterogeneous multi-core system based on shared resource access in accordance with an embodiment 1 of the present invention;

FIG. 2 is a schematic diagram of a task allocation device of a heterogeneous multi-core system based on shared resource access according to embodiment 2 of the present invention;

FIG. 3 is a schematic diagram of an energy density difference calculation module according to embodiment 2 of the present invention;

FIG. 4 is a schematic diagram of a task allocation module according to embodiment 2 of the present invention;

fig. 5 is a schematic diagram of a final execution frequency setting module according to embodiment 2 of the present invention;

FIG. 6 is a graph of schedulable rate versus the prior art solution of the present invention at each task critical section ratio;

FIG. 7 is an energy consumption optimization diagram of the present invention compared to prior art solutions;

Detailed Description

In order to better explain the technical scheme of the invention, the following detailed description of the specific embodiments of the invention is given with reference to the accompanying drawings.

In the following specific embodiment of the present invention, the system has J processor cores pi= { pi ¹,π²,...,π^J }, where J is the total number of processor cores; task set tau= { tau ₁,τ₂,…,τ_I }, I is task number, I e [1, I ], where I represents total number of tasks; the deadlines of the execution of the tasks are D _i, the execution period p _i is independent of each other, and all the tasks start to be released at the same time point 0. The set of operable frequencies for each processor core isJ ε [1, J ], where j is the processor core number, the execution frequency is arranged in ascending order, where f ₁ ^j is the lowest selectable execution frequency of processor pi ^j,/>Is the highest selectable execution frequency of processor pi ^j.

Worst-case execution time for task τ _i on processor core pi ^j; /(I)Execution efficiency on processor core pi ^j for task τ _i; the beta _j is the architecture coefficient of the processor core pi ^j, and the value of the beta _j is a constant value related to the design and the process of the processor; the total execution time of a processor core is the sum of the execution times on the processor core of the tasks assigned to that processor core.

In the following specific embodiment of the present invention, the task access shared resource adopts a resource access protocol rule based on suspension, and the specific rule can be referred to "research on synchronous task energy saving scheduling policy in Multi-core real-time System based on Voltage island" by doctor's university of science and technology in China published in Wu Xiaodong 2012 (online publishing of data of all parties 2013, 05, 16 days).

The shared resource set of the system is gamma= { R ₁,R₂,…,R_r }, wherein R shared resources are contained, all tasks can be accessed, the task set tau= { tau ₁,τ₂,…,τ_I }, the processor core set pi= { pi ¹,π²,…,π^J},Z_i,χ represents the χ critical section of the task tau _i (the critical section represents the program section of the task accessing the shared resources), and R (Z _i,χ) represents the shared resources corresponding to the critical section Z _i,χ, and the task scheduling is based on the earliest deadline priority (EDF) of the partition. According to the protocol rules, task τ _i is blocked on processor core pi ^j in two cases:

First, when task τ _i executes critical section Z _i,χ to access resource R (Z _i,χ), resource R (Z _i,χ) is being accessed by other on-core tasks, at which time task τ _i is added to the FIFO queue of R (Z _i,χ), this time being referred to as the global latency (abbreviated as BW _i,χ) for accessing resource R (Z _i,χ);

Second, when task τ _i on processor core pi ^j wants to access a resource, a task on the same core with a lower priority than task T _i is accessing other resources, or the low priority task makes a request to access a resource, but the resource it is applying for is being accessed by the task of other core and places the low priority task in its resource FIFO queue, at this time task τ _i is blocked by the low priority task, this waiting time is called local waiting time (B _i for short).

At most, only one task on one processor core is in a state of accessing resources or waiting for releasing the resources in a resource FIFO queue at any moment; any one task is blocked at most only once by a lower priority task on the same processor core; the upper limit of the task local latency is the maximum time that a lower priority task accesses a resource on the same processor core (this time includes the global latency of a low priority task accessing the resource).

Example 1

The embodiment is a preferred implementation mode of the task allocation method of the heterogeneous multi-core system based on shared resource access.

Referring to fig. 1, as shown in fig. 1, the method of the present embodiment includes:

s101, powering up and starting a system, and distributing a periodic task set tau= { tau ₁,τ₂,…,τ_I } to the system;

s102, setting the execution frequency of each processor core

S103, calculating the actual execution time and the worst execution time of each task on each processor core;

repeatedly executing the steps, and calculating to obtain the actual execution time of the task on each processor core;

s104, calculating the energy density difference value of each task;

The method further comprises the following steps:

S1041, calculating energy consumption of task τ _i on processor core pi ^j

S1042, repeating the step S1041 to obtain the energy consumption of the task tau _i on all the processor cores;

S1043, calculating the energy density of τ _i on each processor core

S1044, calculating an energy density difference DD _i of the task tau _i;

Wherein, For/>Highest energy density of/>For/>The lowest energy density of (a);

steps S1041 to S1044 are respectively executed for each task, and energy density difference values of each task are obtained;

s105, arranging the tasks in a descending order according to the energy density difference value of each task; obtaining a first task list;

S106, selecting an unassigned task tau _i with the largest energy density difference in the first task list;

S107, tau _i is distributed to the processor core with the largest similarity with the task resource in the selectable processor cores;

The step may further comprise:

s1071, calculating the resource similarity of the task and each processor core

Where q is the number of assigned tasks on processor core pi ^j, τ _q is the task assigned to processor core pi ^j; psi ^j represents the set of tasks allocated on processor core pi ^j; Θ _i,q is the number of shared resources that task τ _i and task τ _q access the same type;

S1072, calculating the worst estimated utilization rate of the task tau _i on each processor core

S1073, judge Whether there is/are in the largest processor coreA processor core not larger than 1, if yes, executing step S1074, otherwise executing step S1075;

Wherein EU ^j is the current total worst estimated utilization of processor core pi ^j, The worst estimated utilization for task τ _q that has been allocated to processor core pi ^j;

S1074, assigning task τ _i to one Maximum and/>A processor core not greater than 1; step 108 is performed;

as a preferred implementation of this embodiment, this step may further include:

At the position of Among the largest processor cores and processor cores with EU ^j not greater than 1, select/>A smallest processor core to which task τ _i is allocated;

S1075, assigning the task tau _i to the smallest processor core of EU ^j;

S108, updating the total worst estimated utilization rate of the processor core

Steps S107 to S108 are repeated until all task assignments are completed.

As a preferred implementation of this embodiment, the step S108 may include;

S1081, updating the worst estimated utilization of each task on the processing core pi ^j allocated by the task

SW _i,χ is the sum of the longest times for all tasks in the processor core where the task entering the shared resource R (z _i,χ) wait queue after task τ _i accesses the shared resource; For the number of critical sections of τ _i, χ is the critical section number of τ _i; the critical section represents a program segment of a task accessing a shared resource; BW _i ^j represents the sum of the actual global latencies of task τ _i accessing its shared subset of resources;

S1082, calculating the total worst estimated utilization rate of the processor core;

As a preferred implementation of this embodiment, this embodiment may further include step S109.

S109, setting final execution frequency of each processor core

S1091, calculate the local latency B _i of task τ _i on its assigned processor core pi ^j;

Wherein, Representing the worst-case execution time of task τ _m, m+.i in critical section Z _m,χ; z _m,χ represents the χ critical region of task τ _m; /(I)Representing the execution efficiency of task τ _m in pi ^j,/>Representing a worst-case global latency for task τ _m when accessing resource R (z _m,χ); max { } represents taking the maximum value.

S1092, calculating the utilization ratio U ^j of the processor core pi ^j;

S1093, judging whether U ^j of the processor core pi ^j is not more than 1, if yes, executing step 1094, otherwise executing step 1095;

s1094, the execution frequency of the processor core is reduced Reducing the first level; step S1091 is performed;

s1095, raising the execution frequency of the processor core by one stage as the final execution frequency of the processor core

Step S109 is repeatedly performed until the frequency setting of all the processor cores is completed.

Example 2

The embodiment is a preferred implementation mode of the task allocation device of the heterogeneous multi-core system based on shared resource access.

Referring to fig. 2, as shown in fig. 2, the apparatus of this embodiment includes:

The energy density difference calculation module is used for calculating the energy density difference DD _i of each task;

referring to fig. 3, as shown in fig. 3, the present module further includes:

Wherein, For/>Highest energy density of/>For/>The lowest energy density of (a);

The task allocation module is used for allocating the task tau _i to a processor core with the largest similarity with the task resource in the selectable processor cores;

referring to fig. 4, as shown in fig. 4, the present module further includes:

A resource similarity calculation unit for calculating the resource similarity between the task τ _i and each processor core

a task allocation unit for allocating tasks according to the above And allocating processor cores for the task τ _i by using the total worst estimated utilization rate EU ^j of each processor core of the processor, wherein the specific allocation method is as follows:

Judging Whether there is/are in the largest processor coreProcessor cores not greater than 1, if so, assign the task τ _i to one/>On the processor core that is largest and EU ^j is not greater than 1; otherwise, assigning the task τ _i to the smallest processor core of EU ^j;

Wherein the said The worst estimated utilization for task τ _q that has been allocated to processor core j;

as a preferred implementation of this embodiment, if in Among the largest processor cores areA processor core not greater than 1, the task allocation unit allocating the task τ _i to/>Processor core with maximum EU ^j not greater than 1/>On the smallest processor core;

as a preferred implementation manner of this embodiment, the task allocation module may further include:

As a preferred implementation of this embodiment, the apparatus may further include:

a final execution frequency setting module for setting final execution frequency of each processor core Referring to fig. 5, as shown in fig. 5, the final execution frequency setting module includes:

A task local latency calculation unit, configured to calculate a local latency B _i of each task on its allocated processor core;

Wherein, Representing the worst-case execution time of task τ _m, q+.i in critical section Z _m,χ; z _m,χ represents the χ critical region of task τ _m; /(I)Representing the execution efficiency of task τ _m in pi ^j,/>Representing a worst-case global latency for task τ _m when accessing resource R (z _m,χ); max { } represents taking the maximum value.

In the technical scheme of the embodiment of the invention, when task allocation is carried out, the energy consumption difference of each task on different cores of the heterogeneous processor is comprehensively considered, the sequence of task allocation is set according to the energy density difference of each task, the tasks with large energy density difference are preferentially allocated, the system energy consumption problem is comprehensively considered in the preparation stage of task allocation, and compared with the prior art, the energy consumption of the system after task allocation can be effectively reduced. In a preferred implementation scheme of the implementation, after task allocation is completed, the worst estimated utilization rate of each task is updated according to the actual global waiting time of the task accessing the shared resource, so that the calculation of the worst estimated utilization rate and the utilization rate of the processor core used in the task allocation process is more accurate, compared with the prior art, the scheduling success rate of a task set is increased, and as shown in fig. 6, the scheduling success rate is about 8% higher than the scheduling rate in each critical area ratio in the technical scheme of the invention compared with the prior art. In another preferred implementation scheme of the implementation, after task allocation is completed, according to the utilization rate of each processor core, under the consideration of ensuring the real-time performance of task execution, a dynamic voltage frequency adjustment technology is adopted to continuously try to reduce the execution frequency of the processor core, so that the utilization rate of a system is as close to 1 as possible, the execution idle time of the processor core is reduced, the power consumption of the processor core of the system is further reduced, and as shown in fig. 7, compared with the prior art, the energy consumption of the technical scheme of the invention is optimized to about 6.8% under the low cycle of the cycle range [50,100 ]; the energy consumption is optimized to be about 7.8% in the period range [100,200 ]; at high periods in the period range 200,400, the energy consumption is optimized to be about 8.4%.

Claims

1. The heterogeneous multi-core system task allocation method based on shared resource access is characterized by comprising the following steps of:

Wherein, Represents the critical section number of task τ _i, χ is critical section number,/>Representing worst-case execution time of accessing shared resources of the χ critical section of task τ _i,/>, forRepresenting the worst-case execution time of task tau _i for non-access to the shared resource,For the current execution frequency of processor core pi ^j, i is the task number, j is the processor core number;

Updating EU ^j of the processor core;

2. The method of claim 1 wherein said assigning τ _i to a task resource that is most similar to the task resourceThe processor core not greater than 1 includes:

3. The method of claim 1, wherein the updating EU ^j of the processor core comprises:

Updating worst estimated utilization of τ _i on the processing core

SW _i,χ is the sum of the longest times for all tasks in the processor core where the task entering the shared resource R (z _i,χ) wait queue after task τ _i accesses the shared resource; For the number of critical sections of τ _i, χ is the critical section number of τ _i; the sum of the actual global latencies of the subset of its shared resources is accessed for task τ _i.

4. A method according to any one of claims 1 to 3, further comprising setting a final execution frequency of each processor core

Calculating the utilization U ^j of the processor core pi ^j;

If the utilization of processor core pi ^j is not greater than 1, the execution frequency of the processor core is increased Reducing the utilization rate by one stage, and recalculating the utilization rate;

5. A heterogeneous multi-core system task allocation device based on shared resource access, comprising:

task execution time calculation module for calculating worst case execution time of each task on each processor core And an actual execution time T _i ^j;

Updating EU ^j of the processor core;

6. The apparatus of claim 5, wherein the energy density difference calculation module comprises:

Wherein, For/>Highest energy density of/>For/>Is the lowest energy density of (a).

7. The apparatus of claim 5, wherein the task allocation module comprises:

Wherein the method comprises the steps ofThe sum of the worst-case longest latencies of the shared resource subset gamma _i accessed for the task tau _i;

the task allocation unit is used for allocating the processor cores for the task tau _i according to the resource similarity between each processor core and the task tau _i and the total worst estimated utilization rate EU ^j of each processor core;

The specific allocation method is that if the processor core with the maximum similarity to the task tau _i resource exists No more than 1, assigning the τ _i to a processor core that has the greatest similarity to the task resource and/>A processor core not greater than 1; otherwise, tau _i is allocated to the smallest processor core of EU ^j;

。

8. the apparatus according to claim 7, wherein:

The task allocation unit allocates processor cores to the task τ _i if there are processor cores with the greatest similarity to the task τ _i resources The task allocation unit allocates the task τ _i to a processor core having the greatest similarity to the task resource and/>In processor cores not greater than 1/>Minimal processor cores.

9. The apparatus of claim 7, wherein the task allocation module further comprises:

10. The apparatus according to any one of claims 5-9, further comprising:

a final execution frequency setting module for setting final execution frequency of each processor core Comprises the following steps of;