CN113448736B

CN113448736B - Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform

Info

Publication number: CN113448736B
Application number: CN202110827931.9A
Authority: CN
Inventors: 莫磊; 李昕镁; 周琦; 曹向辉
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2024-03-19
Anticipated expiration: 2041-07-22
Also published as: CN113448736A

Abstract

The invention discloses a joint optimization mapping method of approximate calculation tasks on a multi-core heterogeneous platform based on energy and task quality of service (QoS), which comprises the following steps: modeling a real-time task with correlation as a non-accurate calculation task model, thereby obtaining a task directed acyclic graph and a task correlation matrix; based on big. LITTLE multi-core heterogeneous processing platform, the same task can be executed on processors of different clusters through task migration, so that flexibility of task allocation and dynamic voltage/frequency adjustment is improved; by introducing task allocation, frequency selection, instantaneity, task non-preemption, task correlation and energy consumption constraint, a task mapping problem based on QoS and energy joint optimization is constructed; processing nonlinear items in the problem by using a variable substitution method, and linearizing the task mapping problem to obtain an optimal solution; the problem solving time is obviously reduced, and the applicability of the task mapping method is improved.

Description

Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform

Technical Field

The invention belongs to the field of task scheduling of multi-core processors, and relates to a task mapping method based on energy and QoS joint optimization.

Background

The embedded real-time system is widely applied to the fields of network servers, information retrieval, industrial process control, flight control, multimedia systems and the like. Real-time systems require results to be generated within a specified time limit and ensure accuracy of the calculated results. If the system fails to complete the task before the deadline, system failure may result, reducing reliability of the system. Conventional scheduling algorithms generally consider the worst execution situation of a task, and such scheduling methods can reduce the execution efficiency of a processor and waste system resources. In the task scheduling process, approximate calculation is introduced, so that the energy consumption of the system and the accuracy of a calculation result can be balanced, and the utilization rate and the reliability of the system are improved. Therefore, under the condition of limited resources, the research on the problem of optimizing and scheduling the approximate calculation task on the multi-core heterogeneous processing platform has important practical significance.

For real-time systems, researchers typically use dynamic voltage and frequency scaling techniques and dynamic power consumption management techniques to optimize system power consumption. At present, in task scheduling research of heterogeneous multi-core processors, many research results have been achieved, but the following problems still exist: 1) In the task scheduling method based on energy optimization, the execution period of the task is fixed, the resource utilization rate of the system is lower in the scheduling process, and meanwhile, the QoS of the system is fixed, and the QoS of the system cannot be improved through task adjustment; 2) Task scheduling research based on QoS optimization generally considers approximate calculation task models, aims at maximizing system QoS under the condition of energy limitation, and rarely considers the condition of task migration; 3) Aiming at the heterogeneous multi-core processor, the problem of joint optimization task mapping based on QoS and energy has higher computational complexity.

Disclosure of Invention

The invention provides a task mapping method based on energy and QoS joint optimization for approximate calculation tasks on a multi-core heterogeneous processing platform, which introduces a task migration technology on the basis of meeting the real-time performance, energy efficiency and reliability of a system, and further improves the QoS of the system.

In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: a task mapping method for approximate calculation task based on energy and QoS joint optimization on multi-core heterogeneous processing platform comprises the following steps:

(1) Modeling a real-time task with correlation as a non-accurate calculation task model, thereby obtaining a task directed acyclic graph and a task correlation matrix;

(2) Based on big. LITTLE multi-core heterogeneous platform, the same task can be executed on processors of different clusters through task migration, so as to improve flexibility of task allocation and dynamic voltage and frequency adjustment;

(3) Task mapping problems based on QoS and energy joint optimization are solved by constructing task allocation, frequency selection, instantaneity, task non-preemption, task correlation and energy consumption constraint;

(4) Processing nonlinear items in the problem by using a variable substitution method, converting the task mapping problem set forth in (3) into a mixed integer linear programming problem, and solving by using an optimization method;

(5) Aiming at the problem in the step (3), a heuristic algorithm with low computational complexity is designed by utilizing a problem decomposition method, so that the problem solving time is obviously reduced, and the applicability of the task mapping method is improved.

Further, in step (1), N related imprecise computing (Imprecise Computation, IC) tasks { τ } ₁ ,τ ₂ ,…,τ _N A task model describing a real-time system, whereby a directed acyclic graph of tasks can be obtained. For task τ _i The IC tasks are logically divided into an enforcement section and an optional execution section. M is M _i Representing task τ _i Is the forced execution period of (a), variable o _i Represents an optional execution period, D _i Representing the deadline of the task. Alternative execution cycle o _i Should not exceed the upper limit O _i I.e. 0.ltoreq.o _i ≤O _i . For the scheduling of approximate computing tasks, there is a strict execution order constraint between the mandatory and optional execution portions of the task: the optional partial tasks must be performed after the forced partial tasks are completed. The correlation of tasks can be performed using a binary matrix q= [ q ] _ij ] _M×M To describe. q _ij Representing the execution order between tasks; if task tau _i And task tau _j Related and task τ _i At task τ _j Previously execute, then q _ij =1, otherwise, q _ij ＝0。

Further, in the step (2), two different types of clusters of big and LITTLE exist in the big. LITTLE heterogeneous processing platform, wherein processors in the same cluster are isomorphic. The platform supports dynamic voltage and frequency scaling techniques, considering the voltage and frequency levels of the corresponding processors in the big and LITTLE clusters, respectively, to be represented asAnddue to the heterogeneity of inter-cluster processors, γ _i,k ∈(0,1]Is defined as processor theta _k Executing task τ _i Is a performance energy efficiency factor of (a). big. LITTLE supports task migrationTechniques are moved so that the same task can be migrated from one cluster of processors to another cluster of processors for execution during execution. The IC task tau in step (1) can be processed according to the task migration technique _i Decomposition into two dependent sub-tasks τ _2i-1 ^′ And τ _2i ^′ Thereby obtaining a new task relevance matrix. Subtask τ _2i-1 ^′ And τ _2i ^′ The specific implementation of task migration, which may be performed on different clusters of processors, is detailed in the following steps. Mu by normalization _i ∈[0,1]Representing subtask τ _i ^′ The proportion being executed on one processor. For task τ _i Subtask τ _2i-1 ^′ And τ _2i ^′ The sum of the execution ratios of (a) is equal to 1, i.e. mu _2i-1 +μ _2i ＝1。

Further, in the step (3), optimization variables such as task allocation, frequency selection, instantaneity, task scheduling and the like are introduced: 1) If sub-task tau _i ^′ Is assigned to processor theta _k Executing on, binary variable x _i,k =1, otherwise x _i,k =0; 2) If sub-task tau _i ^′ Executing at voltage and frequency level l, then binary variable c _i,l =1, otherwise c _i,l =0; 3) If any two sub-tasks tau without correlation _i ^′ And τ _j ^′ Is allocated to the same processor, τ _i ^′ At τ _j ^′ Before execution, binary variable p _i,j =1, otherwise p _i,j =0; 4) Continuous variable ts _i Sum te _i Representing subtask τ _i ^′ The execution start time and the end time of (a). In order to describe a task mapping method for approximate calculation tasks based on energy and QoS joint optimization on a multi-core heterogeneous processing platform, the following constraint conditions are added:

1) Task allocation: according to the task migration technique, the same task may be executed on processors of different clusters. Because processors in the same cluster are isomorphic, the invention does not consider migration situations of tasks among isomorphic processors. The following constraints are thus added in terms of task allocation:

2) Frequency selection: the invention considers the dynamic voltage and frequency adjustment technology in the task, the processor can adjust the voltage and frequency level after the execution of the subtasks, and each subtask can only be allocated with one voltage and frequency level. Processors in big and littale clusters are heterogeneous and have different voltage and frequency levels, so that the selection range of the voltage and frequency levels of the processors needs to be determined according to the task allocation result. Lambda (lambda) _i Representing subtask τ _i ^′ In the case of execution on big (or littale) clusters. If sub-task tau _i ^′ Executing on big cluster, binary variable lambda _i =1, otherwise λ _i =0. The following constraints need to be added in terms of frequency selection of tasks:

3) Real-time performance: for real-time constraints, task τ _i Is a forced part M of (2) _i And optional part o _i Must be at the cut-off time D _i Inner completion, and subtask τ ^′ _2i Needs to be in sub-task tau ^′ _2i-1 Execution begins after completion. Processor theta _k In voltage and frequency scale (V _l ,f _l ) Executing sub-task τ _2i-1 ^′ The time taken is mu _2i-1 (M _i +o _i )/(γ _2i-1,k f _l ). In order not to introduce an additional subscript k, the parameter gamma is used _i,l Replacement of gamma _i,k 。γ _i,l Expressed in (V) _l ,f _l ) Executing sub-task τ _i ^′ Energy efficiency factor of (2). The following constraints need to be added in terms of real-time:

4) Non-preemptive constraints: the invention considers a non-preemptive scheduling method, namely any two sub-tasks which are distributed to the same processor and have no correlation cannot be executed simultaneously, and the constraint conditions are as follows:

te _i ≤ts _j +(2-x _i,k -x _j,k )H+(1-p _i,j )H#(8)

te _j ≤ts _i +(2-x _i,k -x _j,k )H+p _i,j H#(9)

5) Task dependency constraints: the invention considers a task set with correlation, the tasks are strictly executed according to the sequence in the directed acyclic graph, and the constraint conditions are as follows:

6) Energy constraint: the invention does not consider the energy consumption and time of task communication, but only considers the dynamic power consumption and the static power consumption of the processor, wherein P is as follows _on Indicating the inherent power consumption of keeping the kernel on, during the task mapping process,the total energy consumption of the system cannot exceed the energy budget E _buget The following constraints therefore need to be imposed in energy terms:

wherein t is _i Indicating the time the processor is in an idle state. According to the expression P of system power consumption _core,l ＝P _sta,l +

P _dyn,l +P _on The constraint (11) can be transformed as follows:

the task mapping problem targets QoS optimization as an objective function, and QoS and optional execution period o _i And (5) correlation. The present invention uses a linear QoS function f _i (o _i )＝k _i o _i +R _i Wherein R is _i Indicating a baseline QoS after the enforcement part task is performed. According to the problem model, a task mapping optimization problem based on QoS and energy joint optimization can be established:

further, in the step (4), linearizing is performed on the problem model established in the step (3). In the problem model PP there is a nonlinear term of continuous variable multiplication, integer variable multiplication, so the optimization problem (13) is a mixed integer nonlinear programming problem. Step (4) is to equivalently convert the problem (13) into a mixed integer linear programming problem through linearization modes such as variable substitution and the like, and the process is as follows:

(5.1) nonlinear term (M) due to the presence of continuous variable multiplication in equations (5) and (12) _i +o _i )μ _2i-1 Sum (M) _i +o _i )μ _2i According to the actual physical meaning, introducing auxiliary variables And->To replace the nonlinear term. />And->Representing subtask τ ^′ _2i-1 And τ ^′ _2i The following relationship can be obtained:

and->

The big. Littale platform provides discrete voltages and frequencies (V _l ,f _l ) When the voltage and frequency level l are fixed, their corresponding parameters P _sta,l 、P _dyn,l And 1/f _l May also be determined. Thus, in equations (5) and (12), P is used, respectively _l ^′ 、f ₂ ^′ _i-1 And f ₂ ^′ _i Instead of P _sta,l +P _dyn,l 、1/γ _2i-1,l f _l And 1/gamma _2i,l f _l 。

(5.2) variable substitution by (5.1), nonlinear terms appearing in equations (5) and (12)Andto linearize it, the following quotients are first introduced:

lemma 1: let us assume constant s ₁ ,s ₂ >0, there are two constraint spaces P ₁ ＝{[t,b,x]|t＝bx,-s ₁ ≤x≤s ₂ B.epsilon.0, 1 and P ₂ ＝{[t,b,x]|-b·s ₁ ≤t≤b·s ₂ ,t+b·s ₁ -x-s ₁ ≤0,t-b·s ₂ -x+s ₂ Not less than 0, b.epsilon.0, 1, there is

And (3) proving:since t=bx and-s ₁ ≤x≤s ₂ We can obtain-b.s ₁ ≤t≤b·s ₂ . According to-s ₁ ≤x≤s ₂ And b.epsilon. {0,1}, we can obtain (b-1) (x-s ₂ ) More than or equal to 0 and (b-1) (x+s) ₁ ) And is less than or equal to 0. Thus, t+b.s ₁ -x-s ₁ Less than or equal to 0 and t-b.s ₂ -x+s ₂ And (5) the value is equal to or more than 0.

If b=0, we have t=0 and-s ₁ ≤x≤s ₂ The method comprises the steps of carrying out a first treatment on the surface of the If b=1 we can get-s ₁ ≤t＝x≤s ₂ . Thus (S)>This is true.

According to lemma 1, an intermediate variable C is introduced _i,l The non-linear terms in equations (5) and (12) are replacedWhen c _i,l When=1,>C _i,l with upper bound->And lower bound->When c _i,l When=0, C _i,l =0. For this variable replacement, it is necessary to addThe following constraints are applied:

Formulas (5) and (12) can be linearized to according to lemma 1

(5.3) nonlinear term λ due to integer variable multiplication in equation (4) _i c _i,l ，λ _i c _i,l Can be expressed asIntroducing the lemma 2 linearizes the formula (4).

And (4) lemma 2: let x be ₁ And x ₂ Is a 0-1 variable. Nonlinear term x ₁ x ₂ Can be converted into a 0-1 variable y, wherein the variable y has the constraint y.ltoreq.x ₁ ，y≤x ₂ And y is greater than or equal to x ₁ +x ₂ -1。

Proof 2: when the variable x is 0-1 ₁ And x ₂ Both equal to 1, the constraint can be converted to y=1. Thus, y=x ₁ x ₂ =1 holds. Similarly, if x ₁ ＝0，x ₂ =0, or x ₁ ＝0，x ₂ =1, or x ₁ ＝1，x ₂ By 0 we can get y=x ₁ x ₂ =0. Therefore, the lemma 2 holds.

Based on lemma 2, for nonlinear termsIntroducing intermediate variable z _i,k,l And add the following constraints to replace the nonlinear term x _i,k c _i,l ：

z _i,k,l ≤x _i,k ,z _i,k,l ≤c _i,l ,z _i,k,l ≥x _i,k +c _i,l -1,z _i,k,l ∈{0,1}#(19)

Equation (4) can be converted into:

thus, problem (13) can be linearized as:

furthermore, in step (5), according to the problem structure optimization in step (3), a heuristic algorithm with low computational complexity is designed by using a problem decomposition method so as to improve the applicability of the mapping method. The original task mapping problem in step (3) can be decomposed into 3 sub-problems: 1) Frequency selection; 2) Task allocation and task scheduling; 3) The cycle adjustment may optionally be performed. The above 3 sub-problems are solved in turn, and a task mapping scheme based on QoS optimization can be obtained. The method comprises the following specific steps:

(6.1) determining the frequency selection optimization variable c _i,l ；

To simplify the task migration model, the original IC task is decomposed into two sub-tasks τ _2i-1 ^′ And τ _2i ^′ Wherein the subtask τ _2i-1 ^′ Representing the forced execution part of the original task, τ _2i ^′ Representing an optional execution portion. In order to reduce the optimization variables of the problem solving, the sub-problem (6.1) only considers the forced execution period of the execution task, i.e. the actual optional execution period o _i =0. In a real-time system, the energy E that the system can use _buget Is limited in that after ensuring that a forced portion of the task is fully performed, as much energy as possible is needed to perform an optional portion of the task to improve the QoS of the system. E (E) _2i-1 Representing execution of a forced portion of subtasks τ ^′ _2i-1 Consumed energy. Thus, the sub-problem (6.1) aims to minimize the total energy of task execution as an objective function. In addition, real-time constraints need to be considered, which can be reduced to subtasks on critical paths in the directed acyclic graph meeting the real-time constraints. CPT represents a set of forced subtask sequence numbers on a critical path, and the subtasks in CPT are sequenced according to the execution sequence to obtain a sequenced subtask sequence number set CPT ^′ I.e. CPT ^′ ＝{2r ₁ -1,…,2r _n -1,…,2r _R -1}. Thus, the sub-problem (6.1) can be expressed as:

According to the structure of the problem (22), a greedy algorithm is employed to solve the problem. For each forced execution subtask, all voltage and frequency levels are traversed, and the voltage and frequency level that minimizes the total energy increment of the system is selected as the frequency selection scheme for that subtask. Meanwhile, in the case of a given frequency selection scheme, whether the subtasks on the critical path of the directed acyclic graph meet the real-time constraint is judged, and if the real-time constraint is not met, the frequency selection scheme is excluded. The frequency selection algorithm is iterated in the task set, eventually resulting in a solution to the problem (22).

(6.2) determining the task allocation optimization variable x based on the result of (6.1) _i,k And a task execution start time ts _i ；

According to the frequency selection scheme, the execution time of each forced execution subtask and the cluster in which the execution is located can be determined. TB and TL represent the forced subtask sets performed in big and LITTLE clusters, respectively. To avoid the situation where subtasks are distributed to execute on a few processors, it is necessary to equalize the time each processor performs the tasks. Thus, the sub-problem (6.2) has as an objective function the total time of execution of the task by the minimized processor, while the constraints include non-preemptive and dependency constraints of the task. tp (tp) _k Representing processor θ _k The total time to execute all of the forced subtasks. The sub-problem (6.2) can be expressed as:

problem (23) requires simultaneous solving of task allocation optimization variables x _i,k And a task execution start time ts _i According to the structure of the problem, a greedy algorithm is used to solve, and the solving method is divided into 3 steps.

First, determining the traversing sequence of sub-tasks in a greedy algorithm, namely converting the correlation of the tasks and the task execution time into a hierarchical tree-like relationship of the tasks in a directed acyclic graph, wherein the tasks in each layer are mutually independent. The specific layering rules are as follows: if from the entry node to the task tau _i The longest logical path of (a) consists of n edges, then _i The grade is n; if task tau _i Is the ingress node, then the rank is 0. The lower the hierarchical level of the task, the more forward the execution order of the task。

The specific method comprises the following steps:

(1) In the directed acyclic graph, an entrance subtask is found, and a hierarchical task set RT with the level of 0 is formed ₀ ；

(2) Cycle RT ₀ Sequentially determining the subsequent task level of each sub-task by a recursion method, and updating the determined sub-task level accordingly;

(3) The subtasks are ordered from small to large according to the hierarchical level, and the subtasks of the same level are ordered from small to large according to the execution time.

Second, determining task allocation optimization variable x by greedy algorithm _i,k . The sub-tasks are ordered by using the task level layering method in the first step, so that the sequence of sub-task traversal in the greedy algorithm is determined. The processor selection range of the task allocation can be determined according to the subtask frequency level obtained in the step (6.1). According to a greedy algorithm, sub-task circulation traversals are sequentially distributed to candidate processors for execution according to the layering sequence of tasks, and the distribution condition with the maximum execution time of the processors is selected as a task distribution scheme, so that a task distribution optimization variable x can be determined _i,k And sub-task execution start time ts _i 。

And thirdly, verifying whether the obtained frequency selection and task allocation scheme meets the real-time constraint of the system. If a scheme that violates the real-time constraint occurs, then the frequency selection needs to be re-conducted. The specific method comprises the following steps:

(1) Verifying whether the task mapping schemes obtained in (6.1) and (6.2) meet the real-time requirement, if not, recording the subtasks tau which do not meet the constraint in the scheme ^′ _m Is a sequence number of (2);

(2) τ can be set under conditions that satisfy the energy constraint ^′ _m τ ^′ _m The voltage and frequency levels of the precursor subtasks of (a) are increased by one stage, τ is calculated ^′ _m The voltage and frequency grade of the subsequent subtasks is reduced by one level, and the adjustment times of the voltage and frequency grade are increased by 1;

(3) And (3) re-determining the task allocation scheme according to the step (6.2), re-verifying whether the scheme meets the real-time constraint, and repeating the method in the step (2) until the adjustment times reach the task number, stopping adjustment and failing the task mapping if the scheme does not meet the real-time constraint.

(6.3) determining an optional execution period o _i ；

From the results of (6.1) and (6.2), a task mapping scheme for enforcing subtasks can be obtained, from which the total energy E remaining in the system can be obtained _optl And an idle period Δt on each processor. TC (TC) _l Expressed in terms of voltage and frequency (V _l ,f _l ) The time it takes to perform a unit cycle; EC (EC) _l Expressed in terms of voltage and frequency (V _l ,f _l ) The energy consumed for executing the unit cycle. According to the greedy algorithm concept, in case of limited time and energy, as many optional subtasks need to be performed as possible, i.e. the maximum optional period that can be performed during idle periods is determined to improve the QoS of the system. The specific method comprises the following steps:

(1) Sequencing the processors from small to large according to the starting time of a first idle time period, and determining a subtask candidate set Temp which can be executed in the idle time period according to the task layering level;

(2) Traversing all voltage and frequency levels that can be allocated during the idle period to determine a maximum selectable execution period for the period

(3) And allocating actual execution periods to the optional subtasks in the Temp according to the layering level, and determining the initial execution time of the optional subtasks. Returning to (1), the above steps are circularly performed until the total energy E remained in the system _optl Equal to 0 (or all optional subtasks are performed).

Based on the above problem, the frequency selective optimization variable c can be solved _i,l Task allocation optimization variable x _i,k Task execution start time ts _i And optional execution cycle o _i Thereby obtaining a task mapping scheme based on QoS optimization.

The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial effects: 1) The invention provides a task mapping method based on QoS and energy joint optimization on a big.LITTLE multi-core heterogeneous platform, which can obviously improve the QoS of a system. Under the condition of limited resources, the task mapping method provided by the invention schedules 14 task sets randomly generated in the example 1, and the average QoS of the system is improved by 31.2% (112.8% at maximum). 2) The original task mapping problem has nonlinear items such as coupling items of continuous variables and integer variables, and the problem has a complex structure and cannot obtain an optimal solution in a short time. Aiming at the structure of the original task mapping problem, the invention provides a heuristic greedy algorithm with low computational complexity at the cost of sacrificing solving precision, and the running time of the algorithm can be remarkably reduced. Taking the randomly generated task set in example 2 as an example, compared with the optimization method (the average solving time is about 38 s), the heuristic greedy algorithm proposed by the invention can obtain a suboptimal solution of a problem in a negligible time (about 0.04 s).

Drawings

FIG. 1 is a schematic diagram of a task mapping method based on energy and QoS joint optimization proposed by the present invention;

FIG. 2 is a directed acyclic graph of tasks and an expanded task graph after task migration is introduced, as used in an embodiment of the present invention;

FIG. 3 is a schematic diagram of task mapping results obtained by using a QoS and energy based joint optimization method in a big. LITTLE platform (big and LITTLE clusters each include 4 processors) with a configuration task number of 8 according to example 1 of the present invention, wherein τ ^′ ₁ -τ ^′ ₁₆ The subtasks are expanded for the task graph;

FIG. 4 is a diagram of task mapping results obtained by using a heuristic greedy method on a big. LITTLE platform (big and LITTLE clusters each including 4 processors) with a task number of 8 in example 1 of the present invention, where τ ^′ ₁ -τ ^′ ₁₆ The subtasks are expanded for the task graph;

FIG. 5 is a system QoS comparison chart obtained by obtaining optimal solutions when task migration is introduced into task mapping and task migration is not adopted in the task mapping according to the configuration task number of 5 to 18 and corresponding to 14 randomly generated task sets in the embodiment 2;

FIG. 6 is a graph of the example 2 of the present invention with a number of configuration tasks of 5 to 15, corresponding to 11 task sets randomly generated, and a task scheduling failure rate (ω=number of scheduling failures without task migration/total number of scheduling successes with task migration) when the optimal solution is obtained without task migration in the task map;

FIG. 7 is a system QoS comparison chart obtained by configuring the task number to be 5-8, adjusting an energy factor beta (beta takes a value range of [0,0.5 ]) and a time adjustment factor delta (delta takes a value range of [0.4,1 ]) corresponding to 4 randomly generated task sets in the embodiment 2, and solving an optimal solution when introducing task migration in task mapping;

FIG. 8 is a system QoS increment comparison chart obtained by configuring the number of tasks to be 5 to 8, adjusting the task directed acyclic graph parallelism factor eta corresponding to 4 randomly generated task sets, and solving the optimal solution when introducing task migration in task mapping in the embodiment 2;

FIG. 9 is a diagram of example 2 configuration tasks 5-8, corresponding to 4 randomly generated task sets, adjusting the processor heterogeneous scaling factor gamma _L /γ _b (γ _L /γ _b The value range is [0.5,1]) A system QoS increment comparison graph obtained by solving the optimal solution when introducing task migration in task mapping;

fig. 10 shows the configuration of example 2 with task numbers of 5 to 18, corresponding to 14 randomly generated task sets, energy factor β=0.4, time adjustment factor δ=0.4, big cluster isomerism factor γ _b =1, littale cluster isomerism factor γ _L When=0.6, a comparison graph of the optimal solution and the proposed heuristic in terms of QoS increment is used;

FIG. 11 is a graph comparing the run time of the algorithm using the optimal solution and the proposed heuristic algorithm for example 2 of the present invention with a number of configuration tasks of 5 to 15.

Detailed Description

The following will describe embodiments of the present invention in detail with reference to the drawings and examples, thereby solving the technical problems by applying technical means to the present invention, and realizing the technical effects can be fully understood and implemented accordingly. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.

Example 1: a task mapping method for approximate calculation task based on energy and QoS joint optimization on a multi-core heterogeneous processing platform comprises the following steps:

(1) Modeling a real-time task with correlation as a non-accurate calculation task, thereby obtaining a task directed acyclic graph and a task correlation matrix;

(3) By introducing task allocation, frequency selection, instantaneity, task non-preemption, task correlation and energy consumption constraint, a task mapping problem based on QoS and energy joint optimization is constructed;

Fig. 1 is a schematic diagram of a task mapping method based on energy and QoS joint optimization according to the present invention, fig. 2 is a task directed acyclic graph used in an embodiment of the present invention and a task graph obtained by introducing post-task-migration expansion, and each step is described in detail below with reference to the task mapping method and the task directed acyclic graph examples in fig. 1 and fig. 2.

Step (1), N related imprecise computing (Imprecise Computation, IC) tasks { τ } ₁ ,τ ₂ ,…,τ _N Description of }A task model of the real-time system can obtain a directed acyclic graph of the task. For task τ _i The IC tasks are logically divided into an enforcement section and an optional execution section. M is M _i Representing task τ _i Is the forced execution period of (a), variable o _i Represents an optional execution period, D _i Representing the deadline of the task. Alternative execution cycle o _i Should not exceed the upper limit O _i I.e. 0.ltoreq.o _i ≤O _i . For the scheduling of approximate computing tasks, there is a strict execution order constraint between the mandatory and optional execution portions of the task: the optional partial tasks must be performed after the forced partial tasks are completed. The correlation of tasks can be performed using a binary matrix q= [ q ] _ij ] _M×M To describe. q _ij Representing the execution order between tasks; if task tau _i And task tau _j Related and task τ _i At task τ _j Previously execute, then q _ij =1, otherwise, q _ij ＝0。

And (2) the big. LITTLE heterogeneous processing platform has two different types of clusters, namely big and LITTLE, wherein processors in the same cluster are isomorphic. The platform supports dynamic voltage and frequency adjustment techniques, taking into account big cluster voltage and frequency classes asLITTLE Cluster is->Due to the heterogeneity of inter-cluster processors, γ _i,k ∈(0,1]Is defined as processor theta _k Executing task τ _i Is a performance energy efficiency factor of (a). LITTLE supports task migration techniques, so that the same task can be migrated from one cluster's processor to another during execution. The IC task tau in step (1) can be processed according to the task migration technique _i Decomposition into two dependent sub-tasks τ _2i-1 ^′ And τ _2i ^′ Thereby obtaining a new task relevance matrix. Subtask τ _2i-1 ^′ And τ _2i ^′ The specific implementation process of task migration, which can be executed on processors of different clusters, is detailed in the following steps by normalization processing, mu _i ∈[0,1]Representing subtask τ _i ^′ The proportion being executed on one processor. For task τ _i Subtask τ _2i-1 ^′ And τ _2i ^′ The sum of the execution ratios of (a) is equal to 1, i.e. mu _2i-1 +μ _2i ＝1。

Step (3), introducing optimization variables such as task allocation, frequency selection, instantaneity, task scheduling and the like: 1) If sub-task tau _i ^′ Is assigned to processor theta _k Executing on, binary variable x _i,k =1, otherwise x _i,k =0; 2) If sub-task tau _i ^′ Executing at voltage and frequency level l, then binary variable c _i,l =1, otherwise c _i,l =0; 3) If any two sub-tasks tau without correlation _i ^′ And τ _j ^′ Is allocated to the same processor, τ _i ^′ At τ _j ^′ Before execution, binary variable p _i,j =1, otherwise p _i,j =0; 4) Continuous variable ts _i Sum te _i Representing subtask τ _i ^′ The execution start time and the end time of (a). In order to describe a task mapping method for approximate calculation tasks based on energy and QoS joint optimization on a multi-core heterogeneous processing platform, the following constraint conditions are added:

3) Real-time performance: for real-time constraints, task τ _i Is a forced part M of (2) _i And optional part o _i Must be at the cut-off time D _i Inner completion, and subtask τ ^′ _2i Needs to be in sub-task tau ^′ _2i-1 Execution begins after completion. Processor theta _k In voltage and frequency scale (V _l ,f _l ) Executing sub-task τ _2i-1 ^′ The time taken is mu _2i-1 (M _i +o _i )/(γ _2i-1,k f _l ). In order not to introduce an additional subscript k, the parameter gamma is used _i,k Replaced by gamma _i,l 。γ _i,l Expressed in (V) _l ,f _l ) Executing sub-task τ _i ^′ Energy efficiency factor of (2). The following constraints need to be added in terms of real-time:

te _i ≤ts _j +(2-x _i,k -x _j,k )H+(1-p _i,j )H#(8)

te _j ≤ts _i +(2-x _i,k -x _j,k )H+p _i,j H#(9)

6) Energy constraint: the invention does not consider the energy consumption and time of task communication, but only considers the dynamic power consumption and the static power consumption of the processor, wherein P is as follows _on Indicating the inherent power consumption of keeping the kernel on. During the task mapping process, the total energy consumption of the system cannot exceed the energy budget E _buget The following constraints therefore need to be imposed in energy terms:

wherein t is _i Indicating the time the processor is in an idle state. According to the system power consumption tableReach P _core,l ＝P _sta,l +

P _dyn,l +P _on The constraint (11) can be transformed as follows:

and (4) linearizing the problem model established in the step (3). In the problem model PP there is a nonlinear term of continuous variable multiplication, integer variable multiplication, so the optimization problem (13) is a mixed integer nonlinear programming problem. Step (4) is to equivalently convert the problem (13) into a mixed integer linear programming problem through linearization modes such as variable substitution and the like, and the process is as follows:

(5.1) nonlinear term (M) due to the presence of continuous variable multiplication in equations (5) and (12) _i +o _i )μ _2i-1 Sum (M) _i +o _i )μ _2i According to the actual physical meaning, introducing auxiliary variablesAnd->To replace the nonlinear term. />And->Representing subtask τ ^′ _2i-1 And τ ^′ _2i The following relationship can be obtained:

and->

According to lemma 1, an intermediate variable C is introduced _i,l The non-linear terms in equations (5) and (12) are replacedWhen c _i,l When=1,>C _i,l with upper bound->And lower bound->When c _i,l When=0, C _i,l =0. For this variable replacement, the following constraints need to be added:

formulas (5) and (12) can be linearized to according to lemma 1

Equation (4) can be converted into:

thus, problem (13) can be linearized as:

/>

and (5) designing a heuristic algorithm with low computational complexity according to the problem structure optimization in the step (3) and by using a problem decomposition method so as to improve the applicability of the mapping method. The original task mapping problem in step (3) can be decomposed into 3 sub-problems: 1) Frequency selection; 2) Task allocation and task scheduling; 3) The cycle adjustment may optionally be performed. The above 3 sub-problems are solved in turn, and a task mapping scheme based on QoS optimization can be obtained. The method comprises the following specific steps:

(6.1) determining the frequency selection optimization variable c _i,l ；

To simplify the task migration model, the original IC task is decomposed into two sub-tasks τ _2i-1 ^′ And τ _2i ^′ Wherein the subtask τ _2i-1 ^′ Representing the forced execution part of the original task, τ _2i ^′ Representing an optional execution portion. In order to reduce the optimization variables of the problem solving, the sub-problem (6.1) only considers the forced execution period of the execution task, i.e. the actual optional execution period o _i =0. In a real-time system, the energy E that the system can use _buget Is limited in that after ensuring that a forced portion of the task is fully performed, as much energy as possible is needed to perform an optional portion of the task to improve the QoS of the system. E (E) _2i-1 Representing execution of a forced portion of subtasks τ ^′ _2i-1 Consumed energy. Thus, the sub-problem (6.1) aims to minimize the total energy of task execution as an objective function. In addition, real-time constraint needs to be considered, the constraint can be simplified into that subtasks on a critical path in the directed acyclic graph meet the real-time constraint, CPT represents a set of forced subtask serial numbers on the critical path, the subtasks in CPT are ordered according to the execution sequence, and a ordered subtask serial number set CPT is obtained ^′ I.e. CPT ^′ ＝{2r ₁ -1,…,2r _n -1,…,2r _R -1}. Thus, the sub-problem (6.1) can be expressed as:

According to the frequency selection scheme, the execution of each forced execution subtask can be determinedTime and cluster in which execution is taking place. TB and TL represent the forced subtask sets performed in big and LITTLE clusters, respectively. To avoid the situation where subtasks are distributed to execute on a few processors, it is necessary to equalize the time each processor performs the tasks. Thus, the sub-problem (6.2) has as an objective function the total time of execution of the task by the minimized processor, while the constraints include non-preemptive and dependency constraints of the task. tp (tp) _k Representing processor θ _k The total time to execute all of the forced subtasks. The sub-problem (6.2) can be expressed as:

First, determining the traversing sequence of sub-tasks in a greedy algorithm, namely converting the correlation of the tasks and the task execution time into a hierarchical tree-like relationship of the tasks in a directed acyclic graph, wherein the tasks in each layer are mutually independent. The specific layering rules are as follows: if from the entry node to the task tau _i The longest logical path of (a) consists of n edges, then _i The grade is n; if task tau _i Is the ingress node, then the rank is 0. The lower the hierarchy level of the tasks, the earlier the execution order of the tasks.

The specific method comprises the following steps:

And thirdly, verifying whether the obtained frequency selection and task allocation scheme meets the real-time constraint of the system. If a scheme occurs that does not meet the real-time constraints, then the frequency selection needs to be re-conducted. The specific method comprises the following steps:

(1) Verifying whether the task mapping schemes obtained in (6.1) and (6.2) meet the real-time requirement, if not, recording the subtask tau violating the constraint in the scheme ^′ _m Is a sequence number of (2);

(6.3) determining an optional execution period o _i ；

Example 2:

fig. 5 to 11 are graphs showing experimental results of the present invention.

Fig. 5 is a system QoS comparison chart obtained by configuring the number of tasks to be 5 to 18, corresponding to 14 task sets generated randomly, introducing task migration into task mapping, and solving an optimal solution when the task migration is not adopted in the embodiment 2. From the figure, it can be seen that introducing task migration into task mapping can significantly improve QoS of the system.

Fig. 6 is a graph of task scheduling failure rate (ω=number of scheduling failures without task migration/total number of scheduling successes with task migration) for the optimal solution without task migration in the task map according to example 2 of the present invention, which is configured with a number of tasks of 5 to 15, corresponding to 11 task sets randomly generated. As the task set increases in size, the rate of task scheduling failures increases significantly when the optimal solution is not solved by task migration. It can be seen that introducing task migration can increase the proportion of successful scheduling to some extent.

Fig. 7 is a system QoS comparison chart obtained by configuring the number of tasks to be 5 to 8, adjusting the energy factor β (β takes a value range of [0,0.5 ]) and the time adjustment factor δ (δ takes a value range of [0.4,1 ]) corresponding to 4 task sets generated randomly in the embodiment 2, and obtaining the optimal solution when introducing task migration in the task mapping. From the figure, when the energy factor beta and the time adjustment factor delta are smaller (resources are limited), the better the effect of improving the QoS of the system is by adopting a task mapping method for introducing task migration.

Fig. 8 is a system QoS incremental comparison chart obtained by configuring the number of tasks to be 5 to 8, adjusting the task directed acyclic graph parallelism factor η corresponding to 4 task sets generated randomly, and solving the optimal solution when introducing task migration in task mapping in example 2 of the present invention. From the figure, the lower the parallelism of the DAG task graph is, the larger the system QoS increment obtained by the scheduling scheme for introducing task migration is.

FIG. 9 is a diagram of example 2 configuration tasks 5-8, corresponding to 4 randomly generated task sets, adjusting the processor heterogeneous scaling factor gamma _L /γ _b (γ _L /γ _b The value range is [0.5,1]) And solving a system QoS increment comparison graph obtained by an optimal solution when introducing task migration in task mapping. From the figure, when gamma _L /γ _b When the value of the task is smaller, i.e. the performance difference between the big and the processors in the LITTLE cluster is obvious, the scheduling scheme for introducing task migration is more beneficial to the improvement of the QoS of the system.

Fig. 10 shows the configuration of example 2 with task numbers of 5 to 18, corresponding to 14 randomly generated task sets, energy factor β=0.4, time adjustment factor δ=0.4, big cluster isomerism factor γ _b =1, littlesetGroup isomerism factor gamma _L At=0.6, a comparison of the optimal solution and the proposed heuristic in terms of QoS increment is used. According to the graph, after task migration is introduced into task mapping, the system QoS can be obviously improved by using a mapping scheme obtained by a method for solving the optimal solution, and a suboptimal solution of the system QoS can be obtained by a heuristic algorithm.

FIG. 11 is a graph comparing the run time of the algorithm using the optimal solution and the proposed heuristic algorithm for example 2 of the present invention with a number of configuration tasks of 5 to 15. As can be seen from fig. 10, the heuristic algorithm significantly increases the operation speed of the task scheduling algorithm at the expense of the QoS of the system.

Although the embodiments of the present invention are described above, the embodiments are only used for facilitating understanding of the present invention, and are not intended to limit the present invention. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is still subject to the scope of the appended claims.

Claims

1. The task mapping method based on energy and QoS joint optimization for the approximate calculation task on the multi-core heterogeneous processing platform is characterized by comprising the following steps:

(5) Aiming at the problem in the step (3), a heuristic algorithm with low computational complexity is designed by utilizing a problem decomposition method;

wherein in step (1), N related imprecise computing (Imprecise Computation, IC) tasks { τ } ₁ ，τ ₂ ，…，τ _N Describing task model of real-time system, thereby obtaining directed acyclic graph of task, for task τ _i The IC task is logically divided into a forced execution part and an optional execution part, M _i Representing task τ _i Is the forced execution period of (a), variable o _i Represents an optional execution period, D _i Representing the deadline of the task, optional execution period o _i Not exceeding the upper limit O _i I.e. 0.ltoreq.o _i ≤O _i For the scheduling of approximate computing tasks, there is a strict execution order constraint between the mandatory and optional execution portions of the task: the optional partial tasks must be executed after the forced partial tasks are completed, and the correlation of the tasks can be performed by using a binary matrix q= [ q ] _ij ] _N×N To describe, q _ij Representing the execution order between tasks; if task tau _i And task tau _j Related and task τ _i At task τ _j Previously execute, then q _ij =1, otherwise, q _ij ＝0；

In the step (2), two different types of clusters of big and LITTLE exist in a big. LITTLE heterogeneous processing platform, wherein processors in the same cluster are isomorphic, the platform supports a dynamic voltage and frequency adjustment technology, and voltage and frequency levels of corresponding processors in the big and LITTLE clusters are considered to be respectively expressed asAnddue to the heterogeneity of inter-cluster processors, γ _i，k ∈(0，1]Is defined asProcessor theta _k Executing task τ _i The big. LITTLE supports a task migration technique, so that the same task is migrated from a processor in one cluster to a processor in another cluster to be executed in the execution process, and the IC task tau in the step (1) can be migrated according to the task migration technique _i Decomposition into two dependent sub-tasks τ _2i-1 ' and tau _2i ' thus obtaining a new task correlation matrix, subtask τ _2i-1 ' and tau _2i ' the specific implementation of task migration, which can be performed on processors of different clusters, is detailed in the following steps, μ by normalization processing _i ∈[0，1]Representing subtask τ _i ' ratio of execution on one processor, for task τ _i Subtask τ _2i-1 ' and tau _2i The sum of the execution ratios is equal to 1, i.e. mu _2i-1 +μ _2i ＝1；

In the step (3), optimization variables such as task allocation, frequency selection, instantaneity, task scheduling and the like are introduced: 1) If the subtask τ' _i Is assigned to processor theta _k Executing on, binary variable x _i，k =1, otherwise x _i，k =0; 2) If the subtask τ' _i Executing at voltage and frequency level l, then binary variable c _i，l =1, otherwise c _i，l =0; 3) If any two sub-tasks tau 'without correlation' _i And τ' _j Assigned to the same processor, τ' _i At τ' _j Before execution, binary variable p _i，j =1, otherwise p _i，j =0; 4) Continuous variable ts _i Sum te _i Representing subtask τ' _i To describe a task mapping method for approximating a computing task based on energy and QoS joint optimization on a multi-core heterogeneous processing platform, the following constraints need to be added:

1) Task allocation: according to the task migration technology, the same task can be executed on processors of different clusters, and as the processors in the same cluster are isomorphic, the migration condition of the task among isomorphic processors is not considered, so the following constraint is added in the aspect of task allocation:

2) Frequency selection: the dynamic voltage and frequency regulation technology in the task is considered, the voltage and frequency level of the processor is regulated after the execution of the subtasks is completed, each subtask can only be allocated with one voltage and frequency level, the processors in the big and LITTLE clusters are heterogeneous, the voltage and frequency levels are different, the voltage and frequency level selection range of the processor is required to be determined according to the task allocation result, and lambda _i Representing subtask τ _i ' in the case of execution on big (or LITTLE) clusters, if the subtask τ _i ' executing on big cluster, binary variable lambda _i =1, otherwise λ _i =0, so the following constraint needs to be added in terms of frequency allocation of tasks:

3) Real-time performance: for real-time constraints, task τ _i Is a forced part M of (2) _i And optional part o _i Must be at the cut-off time D _i Inner completion, while subtask τ' _2i Needs to be in subtask τ' _2i-1 Execution begins after completion of processor θ _k In voltage and frequency scale (V _l ，f _l ) Executing sub-task τ _2i-1 ' the time taken is mu _2i-1 (M _i +o _i )/(γ _2i-1，k f _l )，In order not to introduce an additional subscript k, the parameter gamma is used _i，l Replacement of gamma _i，k ，γ _i，l Expressed in (V) _l ，f _l ) Execution of subtask τ' _i In terms of real-time performance, the following constraints need to be added:

4) Non-preemptive constraints: consider a non-preemptive scheduling method, i.e., any two non-dependent sub-tasks allocated to the same processor cannot be executed simultaneously, subject to the constraint:

te _i ≤ts _j +(2-x _i，k -x _j，k )H+(1-p _i，j )H#(8)

te _j ≤ts _i +(2-x _i，k -x _j，k )H+p _i，j H#(9)

5) Task dependency constraints: consider a set of tasks with dependencies, which are executed strictly according to the order in the directed acyclic graph, with the constraint that:

6) Energy constraint: without taking into account the energy consumption and time of task communication, only the processor is consideredDynamic power consumption and static power consumption, where P _on Indicating the inherent power consumption of keeping the kernel on, the total energy consumption of the system cannot exceed the energy budget E during the task mapping process _buget The following constraints are imposed on energy:

wherein t is _i Representing the time when the processor is in an idle state, according to the system power consumption expression P _core，l ＝P _sta，l +P _dyn，l +P _on Converting the constraint (11) as follows:

the task mapping problem targets QoS optimization as an objective function, and QoS and optional execution period o _i Correlation, using a linear QoS function f _i (o _i )＝k _i o _i +R _i Wherein R is _i Representing the baseline QoS after executing the forced part task, and establishing a task mapping optimization problem based on QoS and energy joint optimization according to a problem model:

in the step (4), linearizing the problem model established in the step (3), in the problem model PP, nonlinear terms of continuous variable multiplication and integer variable multiplication exist, the optimization problem (13) is a mixed integer nonlinear programming problem, and in the step (4), the problem (13) is equivalently converted into the mixed integer linear programming problem through linearization modes such as variable replacement, and the process is as follows:

(5.1) nonlinear term (M) due to the presence of continuous variable multiplication in equations (5) and (12) _i +o _i )μ _2i-1 Sum (M) _i +o _i )μ _2i According to the actual physical meaning, introducing auxiliary variablesAnd->Substitutes for non-linear terms->And->Representing subtask τ' _2i-1 And τ' _2i The actual execution cycle of (2) gets the following relationship:

and->

The big. Littale platform provides discrete voltages and frequencies (V _l ，f _l ) When the voltage and frequency level l are fixed, their corresponding parameters P _sta，l 、P _dyn，l And 1/f _l Is also determined, and therefore, in equations (5) and (12), P 'is used, respectively' _l 、f′ _2i-1 And f' _2i Instead of P _sta，l +P _dyn，l 、1/γ _2i-1，l f _l And 1/gamma _2i，l f _l ；

(5.2) variable substitution by (5.1), nonlinear terms appearing in equations (5) and (12)And->To linearize it, the following quotients are first introduced:

lemma 1: set constant s ₁ ，s ₂ > 0, there are two constraint spaces P ₁ ＝{[t，b，x]|t＝bx，-s ₁ ≤x≤s ₂ B.epsilon.0, 1 and P ₂ ＝{[t，b，x]|-b·s ₁ ≤t≤b·s ₂ ，t+b·s ₁ -x-s ₁ ≤0，t-b·s ₂ -x+s ₂ Not less than 0, b.epsilon.0, 1, there is

And (3) proving:since t=bx and-s ₁ ≤x≤s ₂ Obtaining-b.s ₁ ≤t≤b·s ₂ According to-s ₁ ≤x≤s ₂ And b.epsilon. {0,1}, to obtain (b-1) (x-s ₂ ) More than or equal to 0 and (b-1) (x+s) ₁ ) Less than or equal to 0; thus, t+b.s ₁ -x-s ₁ Less than or equal to 0 and t-b.s ₂ -x+s ₂ Not less than 0 is established;

if b=0, there are t=0 and-s ₁ ≤x≤s ₂ The method comprises the steps of carrying out a first treatment on the surface of the If b=1, we get-s ₁ ≤t＝x≤s ₂ The method comprises the steps of carrying out a first treatment on the surface of the Thus (S)>Establishment;

according to lemma 1, an intermediate variable C is introduced _i，l The non-linear terms in equations (5) and (12) are replacedWhen c _i，l When the number of the codes is =1,C _i，l with upper bound- >And lower bound->When c _i，l When=0, C _i，l For this variable substitution, the following constraints need to be added:

formulas (5) and (12) are linearized to according to lemma 1

(5.3) nonlinear term λ due to integer variable multiplication in equation (4) _i c _i，l ，λ _i c _i，l Represented asIntroducing a primer 2 to linearize the formula (4);

and (4) lemma 2: let x be ₁ And x ₂ Is a 0-1 variable, nonlinear term x ₁ x ₂ Can be converted into a 0-1 variable y, wherein the variable y has the constraint y.ltoreq.x ₁ ，y≤x ₂ And y is greater than or equal to x ₁ +x ₂ -1，

Proof 2: when the variable x is 0-1 ₁ And x ₂ All equal to 1, the constraint is converted to y=1, so y=x ₁ x ₂ =1, likewise, if x ₁ ＝0，x ₂ =0, or x ₁ ＝0，x ₂ =1, or x ₁ ＝1，x ₂ =0, yielding y=x ₁ x ₂ =0, so that lemma 2 holds;

based on lemma 2, for nonlinear termsIntroducing intermediate variable z _i，k，l And add the following constraints to replace the nonlinear term x _i，k c _i，l ：

z _i，k，l ≤x _i，k ，z _i，k，l ≤c _i，l ，z _i，k，l ≥x _i，k +c _i，l -1，z _i，k，l ∈{0，1}#(19)

Equation (4) can be converted into:

thus, problem (13) can be linearized as:

in the step (5), according to the problem structure optimization in the step (3), a heuristic algorithm with low computational complexity is designed by using a problem decomposition method so as to improve the applicability of the mapping method, and the original task mapping problem in the step (3) is decomposed into 3 sub-problems: 1) Frequency selection; 2) Task allocation and task scheduling; 3) And optionally executing period adjustment, and sequentially solving the above 3 sub-problems to obtain a task mapping scheme based on QoS optimization, wherein the specific steps are as follows:

(6.1) determining the frequency Allocation optimization variable c _i，l ；

To simplify the task migration model, the original IC task is decomposed into two sub-tasks τ _2i-1 ' and tau _2i ' wherein the subtask τ _2i-1 ' representing the forced execution part of the original task τ _2i ' represents an optional execution part, in order to reduce the optimization variables of the problem solution, the sub-problem (6.1) only considers the forced execution period of the execution task, i.e. the actual optional execution period o _i =0, in real-time system, energy E used by the system _buget Is limited in that after ensuring that the forced part of the task is fully executed, as much energy as possible is needed to perform the optional part of the task to improve the QoS of the system, E _2i-1 Representing execution of a forced portion of subtasks τ' _2i-1 The consumed energy, therefore, the subtask (6.1) takes the minimum total energy of task execution as an objective function, and in addition, the real-time constraint needs to be considered, the constraint is simplified to be that the subtasks on the critical path in the directed acyclic graph meet the real-time constraint, CPT represents a set of forced subtask serial numbers on the critical path, the subtasks in CPT are ordered according to the execution sequence, and a ordered subtask serial number set CPT ', namely CPT' = {2r, is obtained ₁ -1，…，2r _n -1，…，2r _R -1}, therefore, the sub-problem (6.1) is expressed as:

According to the structure of the problem (22), solving the problem by adopting a greedy algorithm, traversing all voltage and frequency levels aiming at each forced execution subtask, selecting the voltage and frequency level which enables the total energy increment of the system to be minimum as a frequency allocation scheme of the subtask, judging whether the subtask on a critical path of the directed acyclic graph meets real-time constraint under the condition of the given frequency allocation scheme, if not, excluding the frequency allocation scheme, and iterating the frequency allocation algorithm in a task set to finally obtain the solution of the problem (22);

(6.2) determining the task allocation optimization variable x based on the result of (6.1) _i，k And a task execution start time ts _i ；

Determining the execution time of each forced execution subtask and the cluster in which the execution is performed according to a frequency allocation scheme, wherein TB and TL represent forced subtask sets executed in big and LITTLE clusters respectively, and in order to avoid the situation that the subtasks are allocated to a few processors for execution, the time of each processor for executing the tasks needs to be balanced, therefore, the subtask (6.2) takes the total execution time of the tasks of the minimized processors as an objective function, and the constraint conditions comprise non-preemptive constraint and correlation constraint of the tasks, tp _k Representing processor θ _k The total time to execute all forced subtasks on, the subtask (6.2) is expressed as:

problem (23) requires simultaneous solving of task allocation optimization variables x _i，k And a task execution start time ts _i According to the structure of the problem, a greedy algorithm is used to solve, which is divided into 3 steps,

the first step, determining the traversing sequence of sub-tasks in a greedy algorithm, namely converting the correlation of the tasks and the task execution time into a hierarchical tree relationship of the tasks in a directed acyclic graph, wherein the tasks in each layer are mutually independent, and the specific hierarchical rule is as follows: if from the entry node to the task tau _i The longest logical path of (a) consists of n edges, then _i The grade is n; if task tau _i For an entry node, the level is 0, the lower the hierarchical level of tasks, the earlier the execution order of tasks,

the specific method comprises the following steps:

(3) The subtasks are ordered from small to large according to the hierarchical level, the subtasks of the same level are ordered from small to large according to the execution time,

Second, determining task allocation optimization variable x by greedy algorithm _i，k Sequencing subtasks by using the task level layering method in the first step, thereby determining the sequence of subtask traversal in a greedy algorithm, determining the processor selection range of task allocation according to the subtask frequency level obtained in step (6.1), sequentially distributing the subtask circulation traversal to candidate processors according to the layering sequence of the tasks according to the greedy algorithm, executing, selecting the allocation condition with the maximum execution time of the processors as a task allocation scheme, and determining a task allocation optimization variable x _i，k Hezi (Chinese character)Business execution start time ts _i ，

Thirdly, verifying whether the obtained frequency allocation and task allocation schemes meet the real-time constraint of the system, and if a scheme violating the real-time constraint occurs, re-performing the frequency allocation is needed, wherein the method comprises the following steps:

(1) Verifying whether the task mapping schemes obtained in (6.1) and (6.2) meet the real-time requirement, if not, recording the subtasks tau 'which do not meet the constraint in the scheme' _m Is a sequence number of (2);

(2) Under the condition of meeting energy constraint, tau 'is calculated' _m τ' _m The voltage and frequency levels of the precursor subtasks of (2) are increased by one step by τ' _m The voltage and frequency grade of the subsequent subtasks is reduced by one level, and the adjustment times of the voltage and frequency grade are increased by 1;

(3) Re-determining the task allocation scheme according to (6.2), re-verifying whether the scheme meets the real-time constraint, if not, repeating the method in (2) until the adjustment times reach the task number, stopping adjustment, failing the task mapping,

(6.3) determining an optional execution period o _i ；

From the results of (6.1) and (6.2), a task mapping scheme is obtained for the forced execution of the subtasks, whereby the total energy E remaining in the system can be obtained _optl And an idle period Δt, TC on each processor _l Expressed in terms of voltage and frequency (V _l ，f _l ) The time it takes to perform a unit cycle; EC (EC) _l Expressed in terms of voltage and frequency (V _l ，f _l ) The energy consumed by executing the unit period, according to the thought of the greedy algorithm, needs to execute as many optional subtasks as possible under the condition of limited time and energy, namely, determines the maximum optional period which can be executed in the idle time period so as to improve the QoS of the system, and the specific method is as follows:

(2) Traversing the allocated portion of the idle periodHaving voltage and frequency levels to determine the maximum selectable execution period for that time period

(3) Allocating actual execution period to the optional subtasks in the Temp according to the layering level, determining initial execution time of the optional subtasks, returning to (1), and circularly executing the steps until the total energy E remained in the system _optl Equal to 0 or all optional subtasks are performed;

based on the above problem, the frequency allocation optimization variable c can be solved _i，l Task allocation optimization variable x _i，k Task execution start time ts _i And optional execution cycle o _i Thereby obtaining a task mapping scheme based on QoS optimization.