CN113448736B - Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform - Google Patents

Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform Download PDF

Info

Publication number
CN113448736B
CN113448736B CN202110827931.9A CN202110827931A CN113448736B CN 113448736 B CN113448736 B CN 113448736B CN 202110827931 A CN202110827931 A CN 202110827931A CN 113448736 B CN113448736 B CN 113448736B
Authority
CN
China
Prior art keywords
task
execution
time
subtask
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110827931.9A
Other languages
Chinese (zh)
Other versions
CN113448736A (en
Inventor
莫磊
李昕镁
周琦
曹向辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202110827931.9A priority Critical patent/CN113448736B/en
Publication of CN113448736A publication Critical patent/CN113448736A/en
Application granted granted Critical
Publication of CN113448736B publication Critical patent/CN113448736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a joint optimization mapping method of approximate calculation tasks on a multi-core heterogeneous platform based on energy and task quality of service (QoS), which comprises the following steps: modeling a real-time task with correlation as a non-accurate calculation task model, thereby obtaining a task directed acyclic graph and a task correlation matrix; based on big. LITTLE multi-core heterogeneous processing platform, the same task can be executed on processors of different clusters through task migration, so that flexibility of task allocation and dynamic voltage/frequency adjustment is improved; by introducing task allocation, frequency selection, instantaneity, task non-preemption, task correlation and energy consumption constraint, a task mapping problem based on QoS and energy joint optimization is constructed; processing nonlinear items in the problem by using a variable substitution method, and linearizing the task mapping problem to obtain an optimal solution; the problem solving time is obviously reduced, and the applicability of the task mapping method is improved.

Description

Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform
Technical Field
The invention belongs to the field of task scheduling of multi-core processors, and relates to a task mapping method based on energy and QoS joint optimization.
Background
The embedded real-time system is widely applied to the fields of network servers, information retrieval, industrial process control, flight control, multimedia systems and the like. Real-time systems require results to be generated within a specified time limit and ensure accuracy of the calculated results. If the system fails to complete the task before the deadline, system failure may result, reducing reliability of the system. Conventional scheduling algorithms generally consider the worst execution situation of a task, and such scheduling methods can reduce the execution efficiency of a processor and waste system resources. In the task scheduling process, approximate calculation is introduced, so that the energy consumption of the system and the accuracy of a calculation result can be balanced, and the utilization rate and the reliability of the system are improved. Therefore, under the condition of limited resources, the research on the problem of optimizing and scheduling the approximate calculation task on the multi-core heterogeneous processing platform has important practical significance.
For real-time systems, researchers typically use dynamic voltage and frequency scaling techniques and dynamic power consumption management techniques to optimize system power consumption. At present, in task scheduling research of heterogeneous multi-core processors, many research results have been achieved, but the following problems still exist: 1) In the task scheduling method based on energy optimization, the execution period of the task is fixed, the resource utilization rate of the system is lower in the scheduling process, and meanwhile, the QoS of the system is fixed, and the QoS of the system cannot be improved through task adjustment; 2) Task scheduling research based on QoS optimization generally considers approximate calculation task models, aims at maximizing system QoS under the condition of energy limitation, and rarely considers the condition of task migration; 3) Aiming at the heterogeneous multi-core processor, the problem of joint optimization task mapping based on QoS and energy has higher computational complexity.
Disclosure of Invention
The invention provides a task mapping method based on energy and QoS joint optimization for approximate calculation tasks on a multi-core heterogeneous processing platform, which introduces a task migration technology on the basis of meeting the real-time performance, energy efficiency and reliability of a system, and further improves the QoS of the system.
In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: a task mapping method for approximate calculation task based on energy and QoS joint optimization on multi-core heterogeneous processing platform comprises the following steps:
(1) Modeling a real-time task with correlation as a non-accurate calculation task model, thereby obtaining a task directed acyclic graph and a task correlation matrix;
(2) Based on big. LITTLE multi-core heterogeneous platform, the same task can be executed on processors of different clusters through task migration, so as to improve flexibility of task allocation and dynamic voltage and frequency adjustment;
(3) Task mapping problems based on QoS and energy joint optimization are solved by constructing task allocation, frequency selection, instantaneity, task non-preemption, task correlation and energy consumption constraint;
(4) Processing nonlinear items in the problem by using a variable substitution method, converting the task mapping problem set forth in (3) into a mixed integer linear programming problem, and solving by using an optimization method;
(5) Aiming at the problem in the step (3), a heuristic algorithm with low computational complexity is designed by utilizing a problem decomposition method, so that the problem solving time is obviously reduced, and the applicability of the task mapping method is improved.
Further, in step (1), N related imprecise computing (Imprecise Computation, IC) tasks { τ } 12 ,…,τ N A task model describing a real-time system, whereby a directed acyclic graph of tasks can be obtained. For task τ i The IC tasks are logically divided into an enforcement section and an optional execution section. M is M i Representing task τ i Is the forced execution period of (a), variable o i Represents an optional execution period, D i Representing the deadline of the task. Alternative execution cycle o i Should not exceed the upper limit O i I.e. 0.ltoreq.o i ≤O i . For the scheduling of approximate computing tasks, there is a strict execution order constraint between the mandatory and optional execution portions of the task: the optional partial tasks must be performed after the forced partial tasks are completed. The correlation of tasks can be performed using a binary matrix q= [ q ] ij ] M×M To describe. q ij Representing the execution order between tasks; if task tau i And task tau j Related and task τ i At task τ j Previously execute, then q ij =1, otherwise, q ij =0。
Further, in the step (2), two different types of clusters of big and LITTLE exist in the big. LITTLE heterogeneous processing platform, wherein processors in the same cluster are isomorphic. The platform supports dynamic voltage and frequency scaling techniques, considering the voltage and frequency levels of the corresponding processors in the big and LITTLE clusters, respectively, to be represented asAnddue to the heterogeneity of inter-cluster processors, γ i,k ∈(0,1]Is defined as processor theta k Executing task τ i Is a performance energy efficiency factor of (a). big. LITTLE supports task migrationTechniques are moved so that the same task can be migrated from one cluster of processors to another cluster of processors for execution during execution. The IC task tau in step (1) can be processed according to the task migration technique i Decomposition into two dependent sub-tasks τ 2i-1 And τ 2i Thereby obtaining a new task relevance matrix. Subtask τ 2i-1 And τ 2i The specific implementation of task migration, which may be performed on different clusters of processors, is detailed in the following steps. Mu by normalization i ∈[0,1]Representing subtask τ i The proportion being executed on one processor. For task τ i Subtask τ 2i-1 And τ 2i The sum of the execution ratios of (a) is equal to 1, i.e. mu 2i-12i =1。
Further, in the step (3), optimization variables such as task allocation, frequency selection, instantaneity, task scheduling and the like are introduced: 1) If sub-task tau i Is assigned to processor theta k Executing on, binary variable x i,k =1, otherwise x i,k =0; 2) If sub-task tau i Executing at voltage and frequency level l, then binary variable c i,l =1, otherwise c i,l =0; 3) If any two sub-tasks tau without correlation i And τ j Is allocated to the same processor, τ i At τ j Before execution, binary variable p i,j =1, otherwise p i,j =0; 4) Continuous variable ts i Sum te i Representing subtask τ i The execution start time and the end time of (a). In order to describe a task mapping method for approximate calculation tasks based on energy and QoS joint optimization on a multi-core heterogeneous processing platform, the following constraint conditions are added:
1) Task allocation: according to the task migration technique, the same task may be executed on processors of different clusters. Because processors in the same cluster are isomorphic, the invention does not consider migration situations of tasks among isomorphic processors. The following constraints are thus added in terms of task allocation:
2) Frequency selection: the invention considers the dynamic voltage and frequency adjustment technology in the task, the processor can adjust the voltage and frequency level after the execution of the subtasks, and each subtask can only be allocated with one voltage and frequency level. Processors in big and littale clusters are heterogeneous and have different voltage and frequency levels, so that the selection range of the voltage and frequency levels of the processors needs to be determined according to the task allocation result. Lambda (lambda) i Representing subtask τ i In the case of execution on big (or littale) clusters. If sub-task tau i Executing on big cluster, binary variable lambda i =1, otherwise λ i =0. The following constraints need to be added in terms of frequency selection of tasks:
3) Real-time performance: for real-time constraints, task τ i Is a forced part M of (2) i And optional part o i Must be at the cut-off time D i Inner completion, and subtask τ 2i Needs to be in sub-task tau 2i-1 Execution begins after completion. Processor theta k In voltage and frequency scale (V l ,f l ) Executing sub-task τ 2i-1 The time taken is mu 2i-1 (M i +o i )/(γ 2i-1,k f l ). In order not to introduce an additional subscript k, the parameter gamma is used i,l Replacement of gamma i,k 。γ i,l Expressed in (V) l ,f l ) Executing sub-task τ i Energy efficiency factor of (2). The following constraints need to be added in terms of real-time:
4) Non-preemptive constraints: the invention considers a non-preemptive scheduling method, namely any two sub-tasks which are distributed to the same processor and have no correlation cannot be executed simultaneously, and the constraint conditions are as follows:
te i ≤ts j +(2-x i,k -x j,k )H+(1-p i,j )H#(8)
te j ≤ts i +(2-x i,k -x j,k )H+p i,j H#(9)
5) Task dependency constraints: the invention considers a task set with correlation, the tasks are strictly executed according to the sequence in the directed acyclic graph, and the constraint conditions are as follows:
6) Energy constraint: the invention does not consider the energy consumption and time of task communication, but only considers the dynamic power consumption and the static power consumption of the processor, wherein P is as follows on Indicating the inherent power consumption of keeping the kernel on, during the task mapping process,the total energy consumption of the system cannot exceed the energy budget E buget The following constraints therefore need to be imposed in energy terms:
wherein t is i Indicating the time the processor is in an idle state. According to the expression P of system power consumption core,l =P sta,l +
P dyn,l +P on The constraint (11) can be transformed as follows:
the task mapping problem targets QoS optimization as an objective function, and QoS and optional execution period o i And (5) correlation. The present invention uses a linear QoS function f i (o i )=k i o i +R i Wherein R is i Indicating a baseline QoS after the enforcement part task is performed. According to the problem model, a task mapping optimization problem based on QoS and energy joint optimization can be established:
further, in the step (4), linearizing is performed on the problem model established in the step (3). In the problem model PP there is a nonlinear term of continuous variable multiplication, integer variable multiplication, so the optimization problem (13) is a mixed integer nonlinear programming problem. Step (4) is to equivalently convert the problem (13) into a mixed integer linear programming problem through linearization modes such as variable substitution and the like, and the process is as follows:
(5.1) nonlinear term (M) due to the presence of continuous variable multiplication in equations (5) and (12) i +o i2i-1 Sum (M) i +o i2i According to the actual physical meaning, introducing auxiliary variables And->To replace the nonlinear term. />And->Representing subtask τ 2i-1 And τ 2i The following relationship can be obtained:
and->
The big. Littale platform provides discrete voltages and frequencies (V l ,f l ) When the voltage and frequency level l are fixed, their corresponding parameters P sta,l 、P dyn,l And 1/f l May also be determined. Thus, in equations (5) and (12), P is used, respectively l 、f 2 i-1 And f 2 i Instead of P sta,l +P dyn,l 、1/γ 2i-1,l f l And 1/gamma 2i,l f l
(5.2) variable substitution by (5.1), nonlinear terms appearing in equations (5) and (12)Andto linearize it, the following quotients are first introduced:
lemma 1: let us assume constant s 1 ,s 2 >0, there are two constraint spaces P 1 ={[t,b,x]|t=bx,-s 1 ≤x≤s 2 B.epsilon.0, 1 and P 2 ={[t,b,x]|-b·s 1 ≤t≤b·s 2 ,t+b·s 1 -x-s 1 ≤0,t-b·s 2 -x+s 2 Not less than 0, b.epsilon.0, 1, there is
And (3) proving:since t=bx and-s 1 ≤x≤s 2 We can obtain-b.s 1 ≤t≤b·s 2 . According to-s 1 ≤x≤s 2 And b.epsilon. {0,1}, we can obtain (b-1) (x-s 2 ) More than or equal to 0 and (b-1) (x+s) 1 ) And is less than or equal to 0. Thus, t+b.s 1 -x-s 1 Less than or equal to 0 and t-b.s 2 -x+s 2 And (5) the value is equal to or more than 0.
If b=0, we have t=0 and-s 1 ≤x≤s 2 The method comprises the steps of carrying out a first treatment on the surface of the If b=1 we can get-s 1 ≤t=x≤s 2 . Thus (S)>This is true.
According to lemma 1, an intermediate variable C is introduced i,l The non-linear terms in equations (5) and (12) are replacedWhen c i,l When=1,>C i,l with upper bound->And lower bound->When c i,l When=0, C i,l =0. For this variable replacement, it is necessary to addThe following constraints are applied:
Formulas (5) and (12) can be linearized to according to lemma 1
(5.3) nonlinear term λ due to integer variable multiplication in equation (4) i c i,l ,λ i c i,l Can be expressed asIntroducing the lemma 2 linearizes the formula (4).
And (4) lemma 2: let x be 1 And x 2 Is a 0-1 variable. Nonlinear term x 1 x 2 Can be converted into a 0-1 variable y, wherein the variable y has the constraint y.ltoreq.x 1 ,y≤x 2 And y is greater than or equal to x 1 +x 2 -1。
Proof 2: when the variable x is 0-1 1 And x 2 Both equal to 1, the constraint can be converted to y=1. Thus, y=x 1 x 2 =1 holds. Similarly, if x 1 =0,x 2 =0, or x 1 =0,x 2 =1, or x 1 =1,x 2 By 0 we can get y=x 1 x 2 =0. Therefore, the lemma 2 holds.
Based on lemma 2, for nonlinear termsIntroducing intermediate variable z i,k,l And add the following constraints to replace the nonlinear term x i,k c i,l
z i,k,l ≤x i,k ,z i,k,l ≤c i,l ,z i,k,l ≥x i,k +c i,l -1,z i,k,l ∈{0,1}#(19)
Equation (4) can be converted into:
thus, problem (13) can be linearized as:
furthermore, in step (5), according to the problem structure optimization in step (3), a heuristic algorithm with low computational complexity is designed by using a problem decomposition method so as to improve the applicability of the mapping method. The original task mapping problem in step (3) can be decomposed into 3 sub-problems: 1) Frequency selection; 2) Task allocation and task scheduling; 3) The cycle adjustment may optionally be performed. The above 3 sub-problems are solved in turn, and a task mapping scheme based on QoS optimization can be obtained. The method comprises the following specific steps:
(6.1) determining the frequency selection optimization variable c i,l
To simplify the task migration model, the original IC task is decomposed into two sub-tasks τ 2i-1 And τ 2i Wherein the subtask τ 2i-1 Representing the forced execution part of the original task, τ 2i Representing an optional execution portion. In order to reduce the optimization variables of the problem solving, the sub-problem (6.1) only considers the forced execution period of the execution task, i.e. the actual optional execution period o i =0. In a real-time system, the energy E that the system can use buget Is limited in that after ensuring that a forced portion of the task is fully performed, as much energy as possible is needed to perform an optional portion of the task to improve the QoS of the system. E (E) 2i-1 Representing execution of a forced portion of subtasks τ 2i-1 Consumed energy. Thus, the sub-problem (6.1) aims to minimize the total energy of task execution as an objective function. In addition, real-time constraints need to be considered, which can be reduced to subtasks on critical paths in the directed acyclic graph meeting the real-time constraints. CPT represents a set of forced subtask sequence numbers on a critical path, and the subtasks in CPT are sequenced according to the execution sequence to obtain a sequenced subtask sequence number set CPT I.e. CPT ={2r 1 -1,…,2r n -1,…,2r R -1}. Thus, the sub-problem (6.1) can be expressed as:
According to the structure of the problem (22), a greedy algorithm is employed to solve the problem. For each forced execution subtask, all voltage and frequency levels are traversed, and the voltage and frequency level that minimizes the total energy increment of the system is selected as the frequency selection scheme for that subtask. Meanwhile, in the case of a given frequency selection scheme, whether the subtasks on the critical path of the directed acyclic graph meet the real-time constraint is judged, and if the real-time constraint is not met, the frequency selection scheme is excluded. The frequency selection algorithm is iterated in the task set, eventually resulting in a solution to the problem (22).
(6.2) determining the task allocation optimization variable x based on the result of (6.1) i,k And a task execution start time ts i
According to the frequency selection scheme, the execution time of each forced execution subtask and the cluster in which the execution is located can be determined. TB and TL represent the forced subtask sets performed in big and LITTLE clusters, respectively. To avoid the situation where subtasks are distributed to execute on a few processors, it is necessary to equalize the time each processor performs the tasks. Thus, the sub-problem (6.2) has as an objective function the total time of execution of the task by the minimized processor, while the constraints include non-preemptive and dependency constraints of the task. tp (tp) k Representing processor θ k The total time to execute all of the forced subtasks. The sub-problem (6.2) can be expressed as:
problem (23) requires simultaneous solving of task allocation optimization variables x i,k And a task execution start time ts i According to the structure of the problem, a greedy algorithm is used to solve, and the solving method is divided into 3 steps.
First, determining the traversing sequence of sub-tasks in a greedy algorithm, namely converting the correlation of the tasks and the task execution time into a hierarchical tree-like relationship of the tasks in a directed acyclic graph, wherein the tasks in each layer are mutually independent. The specific layering rules are as follows: if from the entry node to the task tau i The longest logical path of (a) consists of n edges, then i The grade is n; if task tau i Is the ingress node, then the rank is 0. The lower the hierarchical level of the task, the more forward the execution order of the task。
The specific method comprises the following steps:
(1) In the directed acyclic graph, an entrance subtask is found, and a hierarchical task set RT with the level of 0 is formed 0
(2) Cycle RT 0 Sequentially determining the subsequent task level of each sub-task by a recursion method, and updating the determined sub-task level accordingly;
(3) The subtasks are ordered from small to large according to the hierarchical level, and the subtasks of the same level are ordered from small to large according to the execution time.
Second, determining task allocation optimization variable x by greedy algorithm i,k . The sub-tasks are ordered by using the task level layering method in the first step, so that the sequence of sub-task traversal in the greedy algorithm is determined. The processor selection range of the task allocation can be determined according to the subtask frequency level obtained in the step (6.1). According to a greedy algorithm, sub-task circulation traversals are sequentially distributed to candidate processors for execution according to the layering sequence of tasks, and the distribution condition with the maximum execution time of the processors is selected as a task distribution scheme, so that a task distribution optimization variable x can be determined i,k And sub-task execution start time ts i
And thirdly, verifying whether the obtained frequency selection and task allocation scheme meets the real-time constraint of the system. If a scheme that violates the real-time constraint occurs, then the frequency selection needs to be re-conducted. The specific method comprises the following steps:
(1) Verifying whether the task mapping schemes obtained in (6.1) and (6.2) meet the real-time requirement, if not, recording the subtasks tau which do not meet the constraint in the scheme m Is a sequence number of (2);
(2) τ can be set under conditions that satisfy the energy constraint m τ m The voltage and frequency levels of the precursor subtasks of (a) are increased by one stage, τ is calculated m The voltage and frequency grade of the subsequent subtasks is reduced by one level, and the adjustment times of the voltage and frequency grade are increased by 1;
(3) And (3) re-determining the task allocation scheme according to the step (6.2), re-verifying whether the scheme meets the real-time constraint, and repeating the method in the step (2) until the adjustment times reach the task number, stopping adjustment and failing the task mapping if the scheme does not meet the real-time constraint.
(6.3) determining an optional execution period o i
From the results of (6.1) and (6.2), a task mapping scheme for enforcing subtasks can be obtained, from which the total energy E remaining in the system can be obtained optl And an idle period Δt on each processor. TC (TC) l Expressed in terms of voltage and frequency (V l ,f l ) The time it takes to perform a unit cycle; EC (EC) l Expressed in terms of voltage and frequency (V l ,f l ) The energy consumed for executing the unit cycle. According to the greedy algorithm concept, in case of limited time and energy, as many optional subtasks need to be performed as possible, i.e. the maximum optional period that can be performed during idle periods is determined to improve the QoS of the system. The specific method comprises the following steps:
(1) Sequencing the processors from small to large according to the starting time of a first idle time period, and determining a subtask candidate set Temp which can be executed in the idle time period according to the task layering level;
(2) Traversing all voltage and frequency levels that can be allocated during the idle period to determine a maximum selectable execution period for the period
(3) And allocating actual execution periods to the optional subtasks in the Temp according to the layering level, and determining the initial execution time of the optional subtasks. Returning to (1), the above steps are circularly performed until the total energy E remained in the system optl Equal to 0 (or all optional subtasks are performed).
Based on the above problem, the frequency selective optimization variable c can be solved i,l Task allocation optimization variable x i,k Task execution start time ts i And optional execution cycle o i Thereby obtaining a task mapping scheme based on QoS optimization.
The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial effects: 1) The invention provides a task mapping method based on QoS and energy joint optimization on a big.LITTLE multi-core heterogeneous platform, which can obviously improve the QoS of a system. Under the condition of limited resources, the task mapping method provided by the invention schedules 14 task sets randomly generated in the example 1, and the average QoS of the system is improved by 31.2% (112.8% at maximum). 2) The original task mapping problem has nonlinear items such as coupling items of continuous variables and integer variables, and the problem has a complex structure and cannot obtain an optimal solution in a short time. Aiming at the structure of the original task mapping problem, the invention provides a heuristic greedy algorithm with low computational complexity at the cost of sacrificing solving precision, and the running time of the algorithm can be remarkably reduced. Taking the randomly generated task set in example 2 as an example, compared with the optimization method (the average solving time is about 38 s), the heuristic greedy algorithm proposed by the invention can obtain a suboptimal solution of a problem in a negligible time (about 0.04 s).
Drawings
FIG. 1 is a schematic diagram of a task mapping method based on energy and QoS joint optimization proposed by the present invention;
FIG. 2 is a directed acyclic graph of tasks and an expanded task graph after task migration is introduced, as used in an embodiment of the present invention;
FIG. 3 is a schematic diagram of task mapping results obtained by using a QoS and energy based joint optimization method in a big. LITTLE platform (big and LITTLE clusters each include 4 processors) with a configuration task number of 8 according to example 1 of the present invention, wherein τ 1 16 The subtasks are expanded for the task graph;
FIG. 4 is a diagram of task mapping results obtained by using a heuristic greedy method on a big. LITTLE platform (big and LITTLE clusters each including 4 processors) with a task number of 8 in example 1 of the present invention, where τ 1 16 The subtasks are expanded for the task graph;
FIG. 5 is a system QoS comparison chart obtained by obtaining optimal solutions when task migration is introduced into task mapping and task migration is not adopted in the task mapping according to the configuration task number of 5 to 18 and corresponding to 14 randomly generated task sets in the embodiment 2;
FIG. 6 is a graph of the example 2 of the present invention with a number of configuration tasks of 5 to 15, corresponding to 11 task sets randomly generated, and a task scheduling failure rate (ω=number of scheduling failures without task migration/total number of scheduling successes with task migration) when the optimal solution is obtained without task migration in the task map;
FIG. 7 is a system QoS comparison chart obtained by configuring the task number to be 5-8, adjusting an energy factor beta (beta takes a value range of [0,0.5 ]) and a time adjustment factor delta (delta takes a value range of [0.4,1 ]) corresponding to 4 randomly generated task sets in the embodiment 2, and solving an optimal solution when introducing task migration in task mapping;
FIG. 8 is a system QoS increment comparison chart obtained by configuring the number of tasks to be 5 to 8, adjusting the task directed acyclic graph parallelism factor eta corresponding to 4 randomly generated task sets, and solving the optimal solution when introducing task migration in task mapping in the embodiment 2;
FIG. 9 is a diagram of example 2 configuration tasks 5-8, corresponding to 4 randomly generated task sets, adjusting the processor heterogeneous scaling factor gamma LbLb The value range is [0.5,1]) A system QoS increment comparison graph obtained by solving the optimal solution when introducing task migration in task mapping;
fig. 10 shows the configuration of example 2 with task numbers of 5 to 18, corresponding to 14 randomly generated task sets, energy factor β=0.4, time adjustment factor δ=0.4, big cluster isomerism factor γ b =1, littale cluster isomerism factor γ L When=0.6, a comparison graph of the optimal solution and the proposed heuristic in terms of QoS increment is used;
FIG. 11 is a graph comparing the run time of the algorithm using the optimal solution and the proposed heuristic algorithm for example 2 of the present invention with a number of configuration tasks of 5 to 15.
Detailed Description
The following will describe embodiments of the present invention in detail with reference to the drawings and examples, thereby solving the technical problems by applying technical means to the present invention, and realizing the technical effects can be fully understood and implemented accordingly. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.
Example 1: a task mapping method for approximate calculation task based on energy and QoS joint optimization on a multi-core heterogeneous processing platform comprises the following steps:
(1) Modeling a real-time task with correlation as a non-accurate calculation task, thereby obtaining a task directed acyclic graph and a task correlation matrix;
(2) Based on big. LITTLE multi-core heterogeneous platform, the same task can be executed on processors of different clusters through task migration, so as to improve flexibility of task allocation and dynamic voltage and frequency adjustment;
(3) By introducing task allocation, frequency selection, instantaneity, task non-preemption, task correlation and energy consumption constraint, a task mapping problem based on QoS and energy joint optimization is constructed;
(4) Processing nonlinear items in the problem by using a variable substitution method, converting the task mapping problem set forth in (3) into a mixed integer linear programming problem, and solving by using an optimization method;
(5) Aiming at the problem in the step (3), a heuristic algorithm with low computational complexity is designed by utilizing a problem decomposition method, so that the problem solving time is obviously reduced, and the applicability of the task mapping method is improved.
Fig. 1 is a schematic diagram of a task mapping method based on energy and QoS joint optimization according to the present invention, fig. 2 is a task directed acyclic graph used in an embodiment of the present invention and a task graph obtained by introducing post-task-migration expansion, and each step is described in detail below with reference to the task mapping method and the task directed acyclic graph examples in fig. 1 and fig. 2.
Step (1), N related imprecise computing (Imprecise Computation, IC) tasks { τ } 12 ,…,τ N Description of }A task model of the real-time system can obtain a directed acyclic graph of the task. For task τ i The IC tasks are logically divided into an enforcement section and an optional execution section. M is M i Representing task τ i Is the forced execution period of (a), variable o i Represents an optional execution period, D i Representing the deadline of the task. Alternative execution cycle o i Should not exceed the upper limit O i I.e. 0.ltoreq.o i ≤O i . For the scheduling of approximate computing tasks, there is a strict execution order constraint between the mandatory and optional execution portions of the task: the optional partial tasks must be performed after the forced partial tasks are completed. The correlation of tasks can be performed using a binary matrix q= [ q ] ij ] M×M To describe. q ij Representing the execution order between tasks; if task tau i And task tau j Related and task τ i At task τ j Previously execute, then q ij =1, otherwise, q ij =0。
And (2) the big. LITTLE heterogeneous processing platform has two different types of clusters, namely big and LITTLE, wherein processors in the same cluster are isomorphic. The platform supports dynamic voltage and frequency adjustment techniques, taking into account big cluster voltage and frequency classes asLITTLE Cluster is->Due to the heterogeneity of inter-cluster processors, γ i,k ∈(0,1]Is defined as processor theta k Executing task τ i Is a performance energy efficiency factor of (a). LITTLE supports task migration techniques, so that the same task can be migrated from one cluster's processor to another during execution. The IC task tau in step (1) can be processed according to the task migration technique i Decomposition into two dependent sub-tasks τ 2i-1 And τ 2i Thereby obtaining a new task relevance matrix. Subtask τ 2i-1 And τ 2i The specific implementation process of task migration, which can be executed on processors of different clusters, is detailed in the following steps by normalization processing, mu i ∈[0,1]Representing subtask τ i The proportion being executed on one processor. For task τ i Subtask τ 2i-1 And τ 2i The sum of the execution ratios of (a) is equal to 1, i.e. mu 2i-12i =1。
Step (3), introducing optimization variables such as task allocation, frequency selection, instantaneity, task scheduling and the like: 1) If sub-task tau i Is assigned to processor theta k Executing on, binary variable x i,k =1, otherwise x i,k =0; 2) If sub-task tau i Executing at voltage and frequency level l, then binary variable c i,l =1, otherwise c i,l =0; 3) If any two sub-tasks tau without correlation i And τ j Is allocated to the same processor, τ i At τ j Before execution, binary variable p i,j =1, otherwise p i,j =0; 4) Continuous variable ts i Sum te i Representing subtask τ i The execution start time and the end time of (a). In order to describe a task mapping method for approximate calculation tasks based on energy and QoS joint optimization on a multi-core heterogeneous processing platform, the following constraint conditions are added:
1) Task allocation: according to the task migration technique, the same task may be executed on processors of different clusters. Because processors in the same cluster are isomorphic, the invention does not consider migration situations of tasks among isomorphic processors. The following constraints are thus added in terms of task allocation:
2) Frequency selection: the invention considers the dynamic voltage and frequency adjustment technology in the task, the processor can adjust the voltage and frequency level after the execution of the subtasks, and each subtask can only be allocated with one voltage and frequency level. Processors in big and littale clusters are heterogeneous and have different voltage and frequency levels, so that the selection range of the voltage and frequency levels of the processors needs to be determined according to the task allocation result. Lambda (lambda) i Representing subtask τ i In the case of execution on big (or littale) clusters. If sub-task tau i Executing on big cluster, binary variable lambda i =1, otherwise λ i =0. The following constraints need to be added in terms of frequency selection of tasks:
3) Real-time performance: for real-time constraints, task τ i Is a forced part M of (2) i And optional part o i Must be at the cut-off time D i Inner completion, and subtask τ 2i Needs to be in sub-task tau 2i-1 Execution begins after completion. Processor theta k In voltage and frequency scale (V l ,f l ) Executing sub-task τ 2i-1 The time taken is mu 2i-1 (M i +o i )/(γ 2i-1,k f l ). In order not to introduce an additional subscript k, the parameter gamma is used i,k Replaced by gamma i,l 。γ i,l Expressed in (V) l ,f l ) Executing sub-task τ i Energy efficiency factor of (2). The following constraints need to be added in terms of real-time:
4) Non-preemptive constraints: the invention considers a Non-preemptive scheduling method, namely any two sub-tasks which are distributed to the same processor and have no correlation cannot be executed simultaneously, and the constraint conditions are as follows:
te i ≤ts j +(2-x i,k -x j,k )H+(1-p i,j )H#(8)
te j ≤ts i +(2-x i,k -x j,k )H+p i,j H#(9)
5) Task dependency constraints: the invention considers a task set with correlation, the tasks are strictly executed according to the sequence in the directed acyclic graph, and the constraint conditions are as follows:
6) Energy constraint: the invention does not consider the energy consumption and time of task communication, but only considers the dynamic power consumption and the static power consumption of the processor, wherein P is as follows on Indicating the inherent power consumption of keeping the kernel on. During the task mapping process, the total energy consumption of the system cannot exceed the energy budget E buget The following constraints therefore need to be imposed in energy terms:
wherein t is i Indicating the time the processor is in an idle state. According to the system power consumption tableReach P core,l =P sta,l +
P dyn,l +P on The constraint (11) can be transformed as follows:
The task mapping problem targets QoS optimization as an objective function, and QoS and optional execution period o i And (5) correlation. The present invention uses a linear QoS function f i (o i )=k i o i +R i Wherein R is i Indicating a baseline QoS after the enforcement part task is performed. According to the problem model, a task mapping optimization problem based on QoS and energy joint optimization can be established:
and (4) linearizing the problem model established in the step (3). In the problem model PP there is a nonlinear term of continuous variable multiplication, integer variable multiplication, so the optimization problem (13) is a mixed integer nonlinear programming problem. Step (4) is to equivalently convert the problem (13) into a mixed integer linear programming problem through linearization modes such as variable substitution and the like, and the process is as follows:
(5.1) nonlinear term (M) due to the presence of continuous variable multiplication in equations (5) and (12) i +o i2i-1 Sum (M) i +o i2i According to the actual physical meaning, introducing auxiliary variablesAnd->To replace the nonlinear term. />And->Representing subtask τ 2i-1 And τ 2i The following relationship can be obtained:
and->
The big. Littale platform provides discrete voltages and frequencies (V l ,f l ) When the voltage and frequency level l are fixed, their corresponding parameters P sta,l 、P dyn,l And 1/f l May also be determined. Thus, in equations (5) and (12), P is used, respectively l 、f 2 i-1 And f 2 i Instead of P sta,l +P dyn,l 、1/γ 2i-1,l f l And 1/gamma 2i,l f l
(5.2) variable substitution by (5.1), nonlinear terms appearing in equations (5) and (12)Andto linearize it, the following quotients are first introduced:
lemma 1: let us assume constant s 1 ,s 2 >0, there are two constraint spaces P 1 ={[t,b,x]|t=bx,-s 1 ≤x≤s 2 B.epsilon.0, 1 and P 2 ={[t,b,x]|-b·s 1 ≤t≤b·s 2 ,t+b·s 1 -x-s 1 ≤0,t-b·s 2 -x+s 2 Not less than 0, b.epsilon.0, 1, there is
And (3) proving:since t=bx and-s 1 ≤x≤s 2 We can obtain-b.s 1 ≤t≤b·s 2 . According to-s 1 ≤x≤s 2 And b.epsilon. {0,1}, we can obtain (b-1) (x-s 2 ) More than or equal to 0 and (b-1) (x+s) 1 ) And is less than or equal to 0. Thus, t+b.s 1 -x-s 1 Less than or equal to 0 and t-b.s 2 -x+s 2 And (5) the value is equal to or more than 0.
If b=0, we have t=0 and-s 1 ≤x≤s 2 The method comprises the steps of carrying out a first treatment on the surface of the If b=1 we can get-s 1 ≤t=x≤s 2 . Thus (S)>This is true.
According to lemma 1, an intermediate variable C is introduced i,l The non-linear terms in equations (5) and (12) are replacedWhen c i,l When=1,>C i,l with upper bound->And lower bound->When c i,l When=0, C i,l =0. For this variable replacement, the following constraints need to be added:
formulas (5) and (12) can be linearized to according to lemma 1
(5.3) nonlinear term λ due to integer variable multiplication in equation (4) i c i,l ,λ i c i,l Can be expressed asIntroducing the lemma 2 linearizes the formula (4).
And (4) lemma 2: let x be 1 And x 2 Is a 0-1 variable. Nonlinear term x 1 x 2 Can be converted into a 0-1 variable y, wherein the variable y has the constraint y.ltoreq.x 1 ,y≤x 2 And y is greater than or equal to x 1 +x 2 -1。
Proof 2: when the variable x is 0-1 1 And x 2 Both equal to 1, the constraint can be converted to y=1. Thus, y=x 1 x 2 =1 holds. Similarly, if x 1 =0,x 2 =0, or x 1 =0,x 2 =1, or x 1 =1,x 2 By 0 we can get y=x 1 x 2 =0. Therefore, the lemma 2 holds.
Based on lemma 2, for nonlinear termsIntroducing intermediate variable z i,k,l And add the following constraints to replace the nonlinear term x i,k c i,l
z i,k,l ≤x i,k ,z i,k,l ≤c i,l ,z i,k,l ≥x i,k +c i,l -1,z i,k,l ∈{0,1}#(19)
Equation (4) can be converted into:
thus, problem (13) can be linearized as:
/>
and (5) designing a heuristic algorithm with low computational complexity according to the problem structure optimization in the step (3) and by using a problem decomposition method so as to improve the applicability of the mapping method. The original task mapping problem in step (3) can be decomposed into 3 sub-problems: 1) Frequency selection; 2) Task allocation and task scheduling; 3) The cycle adjustment may optionally be performed. The above 3 sub-problems are solved in turn, and a task mapping scheme based on QoS optimization can be obtained. The method comprises the following specific steps:
(6.1) determining the frequency selection optimization variable c i,l
To simplify the task migration model, the original IC task is decomposed into two sub-tasks τ 2i-1 And τ 2i Wherein the subtask τ 2i-1 Representing the forced execution part of the original task, τ 2i Representing an optional execution portion. In order to reduce the optimization variables of the problem solving, the sub-problem (6.1) only considers the forced execution period of the execution task, i.e. the actual optional execution period o i =0. In a real-time system, the energy E that the system can use buget Is limited in that after ensuring that a forced portion of the task is fully performed, as much energy as possible is needed to perform an optional portion of the task to improve the QoS of the system. E (E) 2i-1 Representing execution of a forced portion of subtasks τ 2i-1 Consumed energy. Thus, the sub-problem (6.1) aims to minimize the total energy of task execution as an objective function. In addition, real-time constraint needs to be considered, the constraint can be simplified into that subtasks on a critical path in the directed acyclic graph meet the real-time constraint, CPT represents a set of forced subtask serial numbers on the critical path, the subtasks in CPT are ordered according to the execution sequence, and a ordered subtask serial number set CPT is obtained I.e. CPT ={2r 1 -1,…,2r n -1,…,2r R -1}. Thus, the sub-problem (6.1) can be expressed as:
according to the structure of the problem (22), a greedy algorithm is employed to solve the problem. For each forced execution subtask, all voltage and frequency levels are traversed, and the voltage and frequency level that minimizes the total energy increment of the system is selected as the frequency selection scheme for that subtask. Meanwhile, in the case of a given frequency selection scheme, whether the subtasks on the critical path of the directed acyclic graph meet the real-time constraint is judged, and if the real-time constraint is not met, the frequency selection scheme is excluded. The frequency selection algorithm is iterated in the task set, eventually resulting in a solution to the problem (22).
(6.2) determining the task allocation optimization variable x based on the result of (6.1) i,k And a task execution start time ts i
According to the frequency selection scheme, the execution of each forced execution subtask can be determinedTime and cluster in which execution is taking place. TB and TL represent the forced subtask sets performed in big and LITTLE clusters, respectively. To avoid the situation where subtasks are distributed to execute on a few processors, it is necessary to equalize the time each processor performs the tasks. Thus, the sub-problem (6.2) has as an objective function the total time of execution of the task by the minimized processor, while the constraints include non-preemptive and dependency constraints of the task. tp (tp) k Representing processor θ k The total time to execute all of the forced subtasks. The sub-problem (6.2) can be expressed as:
problem (23) requires simultaneous solving of task allocation optimization variables x i,k And a task execution start time ts i According to the structure of the problem, a greedy algorithm is used to solve, and the solving method is divided into 3 steps.
First, determining the traversing sequence of sub-tasks in a greedy algorithm, namely converting the correlation of the tasks and the task execution time into a hierarchical tree-like relationship of the tasks in a directed acyclic graph, wherein the tasks in each layer are mutually independent. The specific layering rules are as follows: if from the entry node to the task tau i The longest logical path of (a) consists of n edges, then i The grade is n; if task tau i Is the ingress node, then the rank is 0. The lower the hierarchy level of the tasks, the earlier the execution order of the tasks.
The specific method comprises the following steps:
(1) In the directed acyclic graph, an entrance subtask is found, and a hierarchical task set RT with the level of 0 is formed 0
(2) Cycle RT 0 Sequentially determining the subsequent task level of each sub-task by a recursion method, and updating the determined sub-task level accordingly;
(3) The subtasks are ordered from small to large according to the hierarchical level, and the subtasks of the same level are ordered from small to large according to the execution time.
Second, determining task allocation optimization variable x by greedy algorithm i,k . The sub-tasks are ordered by using the task level layering method in the first step, so that the sequence of sub-task traversal in the greedy algorithm is determined. The processor selection range of the task allocation can be determined according to the subtask frequency level obtained in the step (6.1). According to a greedy algorithm, sub-task circulation traversals are sequentially distributed to candidate processors for execution according to the layering sequence of tasks, and the distribution condition with the maximum execution time of the processors is selected as a task distribution scheme, so that a task distribution optimization variable x can be determined i,k And sub-task execution start time ts i
And thirdly, verifying whether the obtained frequency selection and task allocation scheme meets the real-time constraint of the system. If a scheme occurs that does not meet the real-time constraints, then the frequency selection needs to be re-conducted. The specific method comprises the following steps:
(1) Verifying whether the task mapping schemes obtained in (6.1) and (6.2) meet the real-time requirement, if not, recording the subtask tau violating the constraint in the scheme m Is a sequence number of (2);
(2) τ can be set under conditions that satisfy the energy constraint m τ m The voltage and frequency levels of the precursor subtasks of (a) are increased by one stage, τ is calculated m The voltage and frequency grade of the subsequent subtasks is reduced by one level, and the adjustment times of the voltage and frequency grade are increased by 1;
(3) And (3) re-determining the task allocation scheme according to the step (6.2), re-verifying whether the scheme meets the real-time constraint, and repeating the method in the step (2) until the adjustment times reach the task number, stopping adjustment and failing the task mapping if the scheme does not meet the real-time constraint.
(6.3) determining an optional execution period o i
From the results of (6.1) and (6.2), a task mapping scheme for enforcing subtasks can be obtained, from which the total energy E remaining in the system can be obtained optl And an idle period Δt on each processor. TC (TC) l Expressed in terms of voltage and frequency (V l ,f l ) The time it takes to perform a unit cycle; EC (EC) l Expressed in terms of voltage and frequency (V l ,f l ) The energy consumed for executing the unit cycle. According to the greedy algorithm concept, in case of limited time and energy, as many optional subtasks need to be performed as possible, i.e. the maximum optional period that can be performed during idle periods is determined to improve the QoS of the system. The specific method comprises the following steps:
(1) Sequencing the processors from small to large according to the starting time of a first idle time period, and determining a subtask candidate set Temp which can be executed in the idle time period according to the task layering level;
(2) Traversing all voltage and frequency levels that can be allocated during the idle period to determine a maximum selectable execution period for the period
(3) And allocating actual execution periods to the optional subtasks in the Temp according to the layering level, and determining the initial execution time of the optional subtasks. Returning to (1), the above steps are circularly performed until the total energy E remained in the system optl Equal to 0 (or all optional subtasks are performed).
Based on the above problem, the frequency selective optimization variable c can be solved i,l Task allocation optimization variable x i,k Task execution start time ts i And optional execution cycle o i Thereby obtaining a task mapping scheme based on QoS optimization.
Example 2:
fig. 5 to 11 are graphs showing experimental results of the present invention.
Fig. 5 is a system QoS comparison chart obtained by configuring the number of tasks to be 5 to 18, corresponding to 14 task sets generated randomly, introducing task migration into task mapping, and solving an optimal solution when the task migration is not adopted in the embodiment 2. From the figure, it can be seen that introducing task migration into task mapping can significantly improve QoS of the system.
Fig. 6 is a graph of task scheduling failure rate (ω=number of scheduling failures without task migration/total number of scheduling successes with task migration) for the optimal solution without task migration in the task map according to example 2 of the present invention, which is configured with a number of tasks of 5 to 15, corresponding to 11 task sets randomly generated. As the task set increases in size, the rate of task scheduling failures increases significantly when the optimal solution is not solved by task migration. It can be seen that introducing task migration can increase the proportion of successful scheduling to some extent.
Fig. 7 is a system QoS comparison chart obtained by configuring the number of tasks to be 5 to 8, adjusting the energy factor β (β takes a value range of [0,0.5 ]) and the time adjustment factor δ (δ takes a value range of [0.4,1 ]) corresponding to 4 task sets generated randomly in the embodiment 2, and obtaining the optimal solution when introducing task migration in the task mapping. From the figure, when the energy factor beta and the time adjustment factor delta are smaller (resources are limited), the better the effect of improving the QoS of the system is by adopting a task mapping method for introducing task migration.
Fig. 8 is a system QoS incremental comparison chart obtained by configuring the number of tasks to be 5 to 8, adjusting the task directed acyclic graph parallelism factor η corresponding to 4 task sets generated randomly, and solving the optimal solution when introducing task migration in task mapping in example 2 of the present invention. From the figure, the lower the parallelism of the DAG task graph is, the larger the system QoS increment obtained by the scheduling scheme for introducing task migration is.
FIG. 9 is a diagram of example 2 configuration tasks 5-8, corresponding to 4 randomly generated task sets, adjusting the processor heterogeneous scaling factor gamma LbLb The value range is [0.5,1]) And solving a system QoS increment comparison graph obtained by an optimal solution when introducing task migration in task mapping. From the figure, when gamma Lb When the value of the task is smaller, i.e. the performance difference between the big and the processors in the LITTLE cluster is obvious, the scheduling scheme for introducing task migration is more beneficial to the improvement of the QoS of the system.
Fig. 10 shows the configuration of example 2 with task numbers of 5 to 18, corresponding to 14 randomly generated task sets, energy factor β=0.4, time adjustment factor δ=0.4, big cluster isomerism factor γ b =1, littlesetGroup isomerism factor gamma L At=0.6, a comparison of the optimal solution and the proposed heuristic in terms of QoS increment is used. According to the graph, after task migration is introduced into task mapping, the system QoS can be obviously improved by using a mapping scheme obtained by a method for solving the optimal solution, and a suboptimal solution of the system QoS can be obtained by a heuristic algorithm.
FIG. 11 is a graph comparing the run time of the algorithm using the optimal solution and the proposed heuristic algorithm for example 2 of the present invention with a number of configuration tasks of 5 to 15. As can be seen from fig. 10, the heuristic algorithm significantly increases the operation speed of the task scheduling algorithm at the expense of the QoS of the system.
Although the embodiments of the present invention are described above, the embodiments are only used for facilitating understanding of the present invention, and are not intended to limit the present invention. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is still subject to the scope of the appended claims.

Claims (1)

1. The task mapping method based on energy and QoS joint optimization for the approximate calculation task on the multi-core heterogeneous processing platform is characterized by comprising the following steps:
(1) Modeling a real-time task with correlation as a non-accurate calculation task, thereby obtaining a task directed acyclic graph and a task correlation matrix;
(2) Based on big. LITTLE multi-core heterogeneous platform, the same task can be executed on processors of different clusters through task migration, so as to improve flexibility of task allocation and dynamic voltage and frequency adjustment;
(3) By introducing task allocation, frequency selection, instantaneity, task non-preemption, task correlation and energy consumption constraint, a task mapping problem based on QoS and energy joint optimization is constructed;
(4) Processing nonlinear items in the problem by using a variable substitution method, converting the task mapping problem set forth in (3) into a mixed integer linear programming problem, and solving by using an optimization method;
(5) Aiming at the problem in the step (3), a heuristic algorithm with low computational complexity is designed by utilizing a problem decomposition method;
wherein in step (1), N related imprecise computing (Imprecise Computation, IC) tasks { τ } 1 ,τ 2 ,…,τ N Describing task model of real-time system, thereby obtaining directed acyclic graph of task, for task τ i The IC task is logically divided into a forced execution part and an optional execution part, M i Representing task τ i Is the forced execution period of (a), variable o i Represents an optional execution period, D i Representing the deadline of the task, optional execution period o i Not exceeding the upper limit O i I.e. 0.ltoreq.o i ≤O i For the scheduling of approximate computing tasks, there is a strict execution order constraint between the mandatory and optional execution portions of the task: the optional partial tasks must be executed after the forced partial tasks are completed, and the correlation of the tasks can be performed by using a binary matrix q= [ q ] ij ] N×N To describe, q ij Representing the execution order between tasks; if task tau i And task tau j Related and task τ i At task τ j Previously execute, then q ij =1, otherwise, q ij =0;
In the step (2), two different types of clusters of big and LITTLE exist in a big. LITTLE heterogeneous processing platform, wherein processors in the same cluster are isomorphic, the platform supports a dynamic voltage and frequency adjustment technology, and voltage and frequency levels of corresponding processors in the big and LITTLE clusters are considered to be respectively expressed asAnddue to the heterogeneity of inter-cluster processors, γ i,k ∈(0,1]Is defined asProcessor theta k Executing task τ i The big. LITTLE supports a task migration technique, so that the same task is migrated from a processor in one cluster to a processor in another cluster to be executed in the execution process, and the IC task tau in the step (1) can be migrated according to the task migration technique i Decomposition into two dependent sub-tasks τ 2i-1 ' and tau 2i ' thus obtaining a new task correlation matrix, subtask τ 2i-1 ' and tau 2i ' the specific implementation of task migration, which can be performed on processors of different clusters, is detailed in the following steps, μ by normalization processing i ∈[0,1]Representing subtask τ i ' ratio of execution on one processor, for task τ i Subtask τ 2i-1 ' and tau 2i The sum of the execution ratios is equal to 1, i.e. mu 2i-12i =1;
In the step (3), optimization variables such as task allocation, frequency selection, instantaneity, task scheduling and the like are introduced: 1) If the subtask τ' i Is assigned to processor theta k Executing on, binary variable x i,k =1, otherwise x i,k =0; 2) If the subtask τ' i Executing at voltage and frequency level l, then binary variable c i,l =1, otherwise c i,l =0; 3) If any two sub-tasks tau 'without correlation' i And τ' j Assigned to the same processor, τ' i At τ' j Before execution, binary variable p i,j =1, otherwise p i,j =0; 4) Continuous variable ts i Sum te i Representing subtask τ' i To describe a task mapping method for approximating a computing task based on energy and QoS joint optimization on a multi-core heterogeneous processing platform, the following constraints need to be added:
1) Task allocation: according to the task migration technology, the same task can be executed on processors of different clusters, and as the processors in the same cluster are isomorphic, the migration condition of the task among isomorphic processors is not considered, so the following constraint is added in the aspect of task allocation:
2) Frequency selection: the dynamic voltage and frequency regulation technology in the task is considered, the voltage and frequency level of the processor is regulated after the execution of the subtasks is completed, each subtask can only be allocated with one voltage and frequency level, the processors in the big and LITTLE clusters are heterogeneous, the voltage and frequency levels are different, the voltage and frequency level selection range of the processor is required to be determined according to the task allocation result, and lambda i Representing subtask τ i ' in the case of execution on big (or LITTLE) clusters, if the subtask τ i ' executing on big cluster, binary variable lambda i =1, otherwise λ i =0, so the following constraint needs to be added in terms of frequency allocation of tasks:
3) Real-time performance: for real-time constraints, task τ i Is a forced part M of (2) i And optional part o i Must be at the cut-off time D i Inner completion, while subtask τ' 2i Needs to be in subtask τ' 2i-1 Execution begins after completion of processor θ k In voltage and frequency scale (V l ,f l ) Executing sub-task τ 2i-1 ' the time taken is mu 2i-1 (M i +o i )/(γ 2i-1,k f l ),In order not to introduce an additional subscript k, the parameter gamma is used i,l Replacement of gamma i,k ,γ i,l Expressed in (V) l ,f l ) Execution of subtask τ' i In terms of real-time performance, the following constraints need to be added:
4) Non-preemptive constraints: consider a non-preemptive scheduling method, i.e., any two non-dependent sub-tasks allocated to the same processor cannot be executed simultaneously, subject to the constraint:
te i ≤ts j +(2-x i,k -x j,k )H+(1-p i,j )H#(8)
te j ≤ts i +(2-x i,k -x j,k )H+p i,j H#(9)
5) Task dependency constraints: consider a set of tasks with dependencies, which are executed strictly according to the order in the directed acyclic graph, with the constraint that:
6) Energy constraint: without taking into account the energy consumption and time of task communication, only the processor is consideredDynamic power consumption and static power consumption, where P on Indicating the inherent power consumption of keeping the kernel on, the total energy consumption of the system cannot exceed the energy budget E during the task mapping process buget The following constraints are imposed on energy:
wherein t is i Representing the time when the processor is in an idle state, according to the system power consumption expression P core,l =P sta,l +P dyn,l +P on Converting the constraint (11) as follows:
the task mapping problem targets QoS optimization as an objective function, and QoS and optional execution period o i Correlation, using a linear QoS function f i (o i )=k i o i +R i Wherein R is i Representing the baseline QoS after executing the forced part task, and establishing a task mapping optimization problem based on QoS and energy joint optimization according to a problem model:
in the step (4), linearizing the problem model established in the step (3), in the problem model PP, nonlinear terms of continuous variable multiplication and integer variable multiplication exist, the optimization problem (13) is a mixed integer nonlinear programming problem, and in the step (4), the problem (13) is equivalently converted into the mixed integer linear programming problem through linearization modes such as variable replacement, and the process is as follows:
(5.1) nonlinear term (M) due to the presence of continuous variable multiplication in equations (5) and (12) i +o i2i-1 Sum (M) i +o i2i According to the actual physical meaning, introducing auxiliary variablesAnd->Substitutes for non-linear terms->And->Representing subtask τ' 2i-1 And τ' 2i The actual execution cycle of (2) gets the following relationship:
and->
The big. Littale platform provides discrete voltages and frequencies (V l ,f l ) When the voltage and frequency level l are fixed, their corresponding parameters P sta,l 、P dyn,l And 1/f l Is also determined, and therefore, in equations (5) and (12), P 'is used, respectively' l 、f′ 2i-1 And f' 2i Instead of P sta,l +P dyn,l 、1/γ 2i-1,l f l And 1/gamma 2i,l f l
(5.2) variable substitution by (5.1), nonlinear terms appearing in equations (5) and (12)And->To linearize it, the following quotients are first introduced:
lemma 1: set constant s 1 ,s 2 > 0, there are two constraint spaces P 1 ={[t,b,x]|t=bx,-s 1 ≤x≤s 2 B.epsilon.0, 1 and P 2 ={[t,b,x]|-b·s 1 ≤t≤b·s 2 ,t+b·s 1 -x-s 1 ≤0,t-b·s 2 -x+s 2 Not less than 0, b.epsilon.0, 1, there is
And (3) proving:since t=bx and-s 1 ≤x≤s 2 Obtaining-b.s 1 ≤t≤b·s 2 According to-s 1 ≤x≤s 2 And b.epsilon. {0,1}, to obtain (b-1) (x-s 2 ) More than or equal to 0 and (b-1) (x+s) 1 ) Less than or equal to 0; thus, t+b.s 1 -x-s 1 Less than or equal to 0 and t-b.s 2 -x+s 2 Not less than 0 is established;
if b=0, there are t=0 and-s 1 ≤x≤s 2 The method comprises the steps of carrying out a first treatment on the surface of the If b=1, we get-s 1 ≤t=x≤s 2 The method comprises the steps of carrying out a first treatment on the surface of the Thus (S)>Establishment;
according to lemma 1, an intermediate variable C is introduced i,l The non-linear terms in equations (5) and (12) are replacedWhen c i,l When the number of the codes is =1,C i,l with upper bound- >And lower bound->When c i,l When=0, C i,l For this variable substitution, the following constraints need to be added:
formulas (5) and (12) are linearized to according to lemma 1
(5.3) nonlinear term λ due to integer variable multiplication in equation (4) i c i,l ,λ i c i,l Represented asIntroducing a primer 2 to linearize the formula (4);
and (4) lemma 2: let x be 1 And x 2 Is a 0-1 variable, nonlinear term x 1 x 2 Can be converted into a 0-1 variable y, wherein the variable y has the constraint y.ltoreq.x 1 ,y≤x 2 And y is greater than or equal to x 1 +x 2 -1,
Proof 2: when the variable x is 0-1 1 And x 2 All equal to 1, the constraint is converted to y=1, so y=x 1 x 2 =1, likewise, if x 1 =0,x 2 =0, or x 1 =0,x 2 =1, or x 1 =1,x 2 =0, yielding y=x 1 x 2 =0, so that lemma 2 holds;
based on lemma 2, for nonlinear termsIntroducing intermediate variable z i,k,l And add the following constraints to replace the nonlinear term x i,k c i,l
z i,k,l ≤x i,k ,z i,k,l ≤c i,l ,z i,k,l ≥x i,k +c i,l -1,z i,k,l ∈{0,1}#(19)
Equation (4) can be converted into:
thus, problem (13) can be linearized as:
in the step (5), according to the problem structure optimization in the step (3), a heuristic algorithm with low computational complexity is designed by using a problem decomposition method so as to improve the applicability of the mapping method, and the original task mapping problem in the step (3) is decomposed into 3 sub-problems: 1) Frequency selection; 2) Task allocation and task scheduling; 3) And optionally executing period adjustment, and sequentially solving the above 3 sub-problems to obtain a task mapping scheme based on QoS optimization, wherein the specific steps are as follows:
(6.1) determining the frequency Allocation optimization variable c i,l
To simplify the task migration model, the original IC task is decomposed into two sub-tasks τ 2i-1 ' and tau 2i ' wherein the subtask τ 2i-1 ' representing the forced execution part of the original task τ 2i ' represents an optional execution part, in order to reduce the optimization variables of the problem solution, the sub-problem (6.1) only considers the forced execution period of the execution task, i.e. the actual optional execution period o i =0, in real-time system, energy E used by the system buget Is limited in that after ensuring that the forced part of the task is fully executed, as much energy as possible is needed to perform the optional part of the task to improve the QoS of the system, E 2i-1 Representing execution of a forced portion of subtasks τ' 2i-1 The consumed energy, therefore, the subtask (6.1) takes the minimum total energy of task execution as an objective function, and in addition, the real-time constraint needs to be considered, the constraint is simplified to be that the subtasks on the critical path in the directed acyclic graph meet the real-time constraint, CPT represents a set of forced subtask serial numbers on the critical path, the subtasks in CPT are ordered according to the execution sequence, and a ordered subtask serial number set CPT ', namely CPT' = {2r, is obtained 1 -1,…,2r n -1,…,2r R -1}, therefore, the sub-problem (6.1) is expressed as:
According to the structure of the problem (22), solving the problem by adopting a greedy algorithm, traversing all voltage and frequency levels aiming at each forced execution subtask, selecting the voltage and frequency level which enables the total energy increment of the system to be minimum as a frequency allocation scheme of the subtask, judging whether the subtask on a critical path of the directed acyclic graph meets real-time constraint under the condition of the given frequency allocation scheme, if not, excluding the frequency allocation scheme, and iterating the frequency allocation algorithm in a task set to finally obtain the solution of the problem (22);
(6.2) determining the task allocation optimization variable x based on the result of (6.1) i,k And a task execution start time ts i
Determining the execution time of each forced execution subtask and the cluster in which the execution is performed according to a frequency allocation scheme, wherein TB and TL represent forced subtask sets executed in big and LITTLE clusters respectively, and in order to avoid the situation that the subtasks are allocated to a few processors for execution, the time of each processor for executing the tasks needs to be balanced, therefore, the subtask (6.2) takes the total execution time of the tasks of the minimized processors as an objective function, and the constraint conditions comprise non-preemptive constraint and correlation constraint of the tasks, tp k Representing processor θ k The total time to execute all forced subtasks on, the subtask (6.2) is expressed as:
problem (23) requires simultaneous solving of task allocation optimization variables x i,k And a task execution start time ts i According to the structure of the problem, a greedy algorithm is used to solve, which is divided into 3 steps,
the first step, determining the traversing sequence of sub-tasks in a greedy algorithm, namely converting the correlation of the tasks and the task execution time into a hierarchical tree relationship of the tasks in a directed acyclic graph, wherein the tasks in each layer are mutually independent, and the specific hierarchical rule is as follows: if from the entry node to the task tau i The longest logical path of (a) consists of n edges, then i The grade is n; if task tau i For an entry node, the level is 0, the lower the hierarchical level of tasks, the earlier the execution order of tasks,
the specific method comprises the following steps:
(1) In the directed acyclic graph, an entrance subtask is found, and a hierarchical task set RT with the level of 0 is formed 0
(2) Cycle RT 0 Sequentially determining the subsequent task level of each sub-task by a recursion method, and updating the determined sub-task level accordingly;
(3) The subtasks are ordered from small to large according to the hierarchical level, the subtasks of the same level are ordered from small to large according to the execution time,
Second, determining task allocation optimization variable x by greedy algorithm i,k Sequencing subtasks by using the task level layering method in the first step, thereby determining the sequence of subtask traversal in a greedy algorithm, determining the processor selection range of task allocation according to the subtask frequency level obtained in step (6.1), sequentially distributing the subtask circulation traversal to candidate processors according to the layering sequence of the tasks according to the greedy algorithm, executing, selecting the allocation condition with the maximum execution time of the processors as a task allocation scheme, and determining a task allocation optimization variable x i,k Hezi (Chinese character)Business execution start time ts i
Thirdly, verifying whether the obtained frequency allocation and task allocation schemes meet the real-time constraint of the system, and if a scheme violating the real-time constraint occurs, re-performing the frequency allocation is needed, wherein the method comprises the following steps:
(1) Verifying whether the task mapping schemes obtained in (6.1) and (6.2) meet the real-time requirement, if not, recording the subtasks tau 'which do not meet the constraint in the scheme' m Is a sequence number of (2);
(2) Under the condition of meeting energy constraint, tau 'is calculated' m τ' m The voltage and frequency levels of the precursor subtasks of (2) are increased by one step by τ' m The voltage and frequency grade of the subsequent subtasks is reduced by one level, and the adjustment times of the voltage and frequency grade are increased by 1;
(3) Re-determining the task allocation scheme according to (6.2), re-verifying whether the scheme meets the real-time constraint, if not, repeating the method in (2) until the adjustment times reach the task number, stopping adjustment, failing the task mapping,
(6.3) determining an optional execution period o i
From the results of (6.1) and (6.2), a task mapping scheme is obtained for the forced execution of the subtasks, whereby the total energy E remaining in the system can be obtained optl And an idle period Δt, TC on each processor l Expressed in terms of voltage and frequency (V l ,f l ) The time it takes to perform a unit cycle; EC (EC) l Expressed in terms of voltage and frequency (V l ,f l ) The energy consumed by executing the unit period, according to the thought of the greedy algorithm, needs to execute as many optional subtasks as possible under the condition of limited time and energy, namely, determines the maximum optional period which can be executed in the idle time period so as to improve the QoS of the system, and the specific method is as follows:
(1) Sequencing the processors from small to large according to the starting time of a first idle time period, and determining a subtask candidate set Temp which can be executed in the idle time period according to the task layering level;
(2) Traversing the allocated portion of the idle periodHaving voltage and frequency levels to determine the maximum selectable execution period for that time period
(3) Allocating actual execution period to the optional subtasks in the Temp according to the layering level, determining initial execution time of the optional subtasks, returning to (1), and circularly executing the steps until the total energy E remained in the system optl Equal to 0 or all optional subtasks are performed;
based on the above problem, the frequency allocation optimization variable c can be solved i,l Task allocation optimization variable x i,k Task execution start time ts i And optional execution cycle o i Thereby obtaining a task mapping scheme based on QoS optimization.
CN202110827931.9A 2021-07-22 2021-07-22 Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform Active CN113448736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110827931.9A CN113448736B (en) 2021-07-22 2021-07-22 Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110827931.9A CN113448736B (en) 2021-07-22 2021-07-22 Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform

Publications (2)

Publication Number Publication Date
CN113448736A CN113448736A (en) 2021-09-28
CN113448736B true CN113448736B (en) 2024-03-19

Family

ID=77816967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110827931.9A Active CN113448736B (en) 2021-07-22 2021-07-22 Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform

Country Status (1)

Country Link
CN (1) CN113448736B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115904705B (en) * 2022-11-09 2023-10-24 成都理工大学 Optimal scheduling method for multiprocessor restricted preemption

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866912A (en) * 2012-10-16 2013-01-09 首都师范大学 Single-instruction-set heterogeneous multi-core system static task scheduling method
CN105260237A (en) * 2015-09-29 2016-01-20 中南大学 Task scheduling system of heterogeneous multi-core platform and scheduling method for task scheduling system
KR101879419B1 (en) * 2017-03-15 2018-08-17 주식회사 클래스액트 A task distribution method using parallel processing algorithm
CN112328380A (en) * 2020-11-10 2021-02-05 武汉理工大学 Task scheduling method and device based on heterogeneous computing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866912A (en) * 2012-10-16 2013-01-09 首都师范大学 Single-instruction-set heterogeneous multi-core system static task scheduling method
CN105260237A (en) * 2015-09-29 2016-01-20 中南大学 Task scheduling system of heterogeneous multi-core platform and scheduling method for task scheduling system
KR101879419B1 (en) * 2017-03-15 2018-08-17 주식회사 클래스액트 A task distribution method using parallel processing algorithm
CN112328380A (en) * 2020-11-10 2021-02-05 武汉理工大学 Task scheduling method and device based on heterogeneous computing

Also Published As

Publication number Publication date
CN113448736A (en) 2021-09-28

Similar Documents

Publication Publication Date Title
Ma et al. An IoT-based task scheduling optimization scheme considering the deadline and cost-aware scientific workflow for cloud computing
Ripoll et al. Period selection for minimal hyperperiod in periodic task systems
Sulaiman et al. A hybrid list-based task scheduling scheme for heterogeneous computing
Dai et al. A synthesized heuristic task scheduling algorithm
Asghari Alaie et al. A hybrid bi-objective scheduling algorithm for execution of scientific workflows on cloud platforms with execution time and reliability approach
Shirvani et al. A novel hybrid heuristic-based list scheduling algorithm in heterogeneous cloud computing environment for makespan optimization
Xiao et al. A cooperative coevolution hyper-heuristic framework for workflow scheduling problem
Thammawichai et al. Energy-efficient real-time scheduling for two-type heterogeneous multiprocessors
Izadkhah Learning based genetic algorithm for task graph scheduling
CN113448736B (en) Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform
Maurya et al. Energy-aware scheduling using slack reclamation for cluster systems
WO2021202011A1 (en) Partitioning for an execution pipeline
CN115934362B (en) Deep learning-oriented server non-perception computing cluster scheduling method and product
Tariq et al. Directed acyclic graph based task scheduling algorithm for heterogeneous systems
Bertout et al. Workload assignment for global real-time scheduling on unrelated multicore platforms
Liu et al. Energy aware list-based scheduling for parallel applications in cloud
Djigal et al. Performance evaluation of security-aware list scheduling algorithms in iaas cloud
Bahnasawy et al. Optimization procedure for algorithms of task scheduling in high performance heterogeneous distributed computing systems
Eswari et al. A level-wise priority based task scheduling for heterogeneous systems
Rubio-Anguiano et al. Accounting for preemption and migration costs in the calculation of hard real-time cyclic executives for MPSoCs
Zhang et al. Machine learning on volatile instances: Convergence, runtime, and cost tradeoffs
Sun et al. HEFT-dynamic scheduling algorithm in workflow scheduling
Ahmed et al. Energy Efficient Resource Allocation in Heterogeneous Computing Environments
Tian-mei-zi et al. k-HEFT: A static task scheduling algorithm in clouds
CN112328355B (en) Adaptive optimal memory reservation estimation method for long-life container

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant