CN112463346B - Heuristic processor partitioning method, system and storage medium for DAG task based on partition scheduling - Google Patents

Heuristic processor partitioning method, system and storage medium for DAG task based on partition scheduling Download PDF

Info

Publication number
CN112463346B
CN112463346B CN202011631493.0A CN202011631493A CN112463346B CN 112463346 B CN112463346 B CN 112463346B CN 202011631493 A CN202011631493 A CN 202011631493A CN 112463346 B CN112463346 B CN 112463346B
Authority
CN
China
Prior art keywords
task
processor
time
subtasks
ready
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011631493.0A
Other languages
Chinese (zh)
Other versions
CN112463346A (en
Inventor
张伟哲
吴毓龙
何慧
方滨兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202011631493.0A priority Critical patent/CN112463346B/en
Publication of CN112463346A publication Critical patent/CN112463346A/en
Application granted granted Critical
Publication of CN112463346B publication Critical patent/CN112463346B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Abstract

The invention provides a heuristic processor partitioning method, a system and a storage medium for DAG tasks based on partitioning scheduling, which firstly deduces response time analysis of the DAG tasks based on a partition fixed priority scheduling algorithm; based on the intuition of the analysis, the invention provides a Greedy Parallel Execution Cluster (GPEC) processor allocation strategy, which takes the topology of the DAG tasks and the self-interference among subtasks in the tasks into consideration. The invention has the beneficial effects that: the GPEC strategy of the invention considers the influence of the internal topology and self-interference of the DAG task. In addition, the invention transplants the real-time system to the embedded board and evaluates the performance of the GPEC strategy on a real platform. Compared with two latest processor allocation strategies in experiments, the GPEC strategy of the invention reduces the average WCRT by 35.59% at most and improves the schedulable rate of a DAG task set by 76% at most.

Description

Heuristic processor partitioning method, system and storage medium for DAG task based on partition scheduling
Technical Field
The invention relates to the technical field of data processing, in particular to a heuristic processor partitioning method, a heuristic processor partitioning system and a storage medium for DAG tasks based on partitioning scheduling.
Background
With the increasing number of processors and the strict requirement of completing a large amount of computation before the expiration date, more and more applications are migrated to the embedded multiprocessor platforms [1], [2] of different types of mobile terminals and edge clouds to be executed in parallel. These parallel programs can typically be represented with a Directed Acyclic Graph (DAG) task model, where the DAG tasks are composed of subtasks and edges connecting the subtasks [3 ]. Subtasks represent sequential computations, and edges represent dependencies between connected subtasks.
Fig. 1 shows a real-time obstacle avoidance application for an autonomous vehicle. In this case, there is an obstacle in front of the vehicle, while the vehicle B is traveling in the left lane of the vehicle a. To avoid this obstacle, the vehicle must safely plan a route based on information received from the body sensors and the roadside server. Thus, this application can map to a DAG task with the following 7 subtasks. V _ (i,0) represents a target recognition operation in which the a car recognizes an obstacle using information given by its front sensor. V _ (i,1) is a switch to a change-of-travel-route mode of operation which causes the next three operations of slowing the car (V _ (i,2)), obtaining information from the roadside server to determine that the road ahead is safe (V _ (i,3)), and checking whether the lanes of both parties are safe to use the information from the side sensors (V _ (i, 4)). Finally, V _ (i,5) and V _ (i,6) are operations to perform a steering controller lane change and to resume the normal driving mode, respectively. Due to the dependencies between these operations, V _ (i,5) cannot be executed until V _ (i,3) and V _ (i,4) are complete. Otherwise, the vehicle a may choose to turn left to avoid the obstacle, but such a choice may result in the vehicle a colliding with the vehicle B.
For parallel tasks on a multiprocessor platform, the real-time scheduling algorithm has three types: global scheduling, partition scheduling, and federal scheduling. Under global scheduling, the subtasks can be executed on any processor, and [3] - [5] can be migrated during execution. Federal scheduling and its variants [6] - [9] assign each task to a group of processors, and the subtasks of this task can be executed on any of the assigned processors. In contrast, under partition scheduling [10] - [14], each subtask is assigned to one processor and will always execute on that processor (cannot be migrated to execute on other processors). Compared with global scheduling and federal scheduling, the advantage of partition scheduling is that it has no migration cost of subtasks, and the isolation between processors is better, and is widely used in the industry.
The problem of real-time scheduling of parallel Directed Acyclic Graph (DAG) tasks is a subject of extensive research in recent years. However, it is not clear how to allocate the sub-tasks of the DAG task to reduce the worst-case response time and improve the schedulability of the tasks under the partition fixed priority scheduling.
Disclosure of Invention
The invention provides a heuristic processor partitioning method for DAG tasks based on partition scheduling, which comprises a heuristic processor distribution step of a PEC structure, wherein the heuristic processor distribution step of the PEC structure comprises the following steps:
step 1: initializing remaining utilization U of each processora(i) 1, i, m; initializing Ready queue Ready to be empty; initializing the processor allocation policy to null
Figure GDA0003144714910000021
Step 2: calculating the tolerable latest starting time of each subtask according to the formula (4) and saving the latest starting time into a table LST;
and step 3: slave task tauiCheck whether the PEC structure is derived from the first task to the last task of
Figure GDA0003144714910000022
If get
Figure GDA0003144714910000023
Then step 4 is executed, otherwise, the process is ended; the set of subtasks with and having only one identical parent-subtask is called the PEC structure, which is a parallel execution cluster structure, using
Figure GDA0003144714910000024
Denotes τiThe kth PEC structure of (1), wherein k ∈ [0, πi],πiRepresenting tasks τiThe number of PEC structures in;
and 4, step 4: will be provided with
Figure GDA0003144714910000025
All the subtasks in the queue are added into the Ready queue Ready;
and 5: arranging the subtasks in the Ready queue Ready according to the LST non-descending order;
step 6: allocating processors from the first subtask to the last subtask in Ready;
Figure GDA0003144714910000026
LST(Vi,j) Representing a subtask Vi,jTolerable latest start time if Vi,jIs later than LST (V)i,j) Then task τiMust not be scheduled, LST (V)i,j) Can be calculated by the formula (4) CiRepresenting the total worst case response time, D, of all subtasksiIndicates the deadline of the task, Ci,jRepresents Vi,jWCET, WCET being the worst case response time, F (τ) is used for the set of terminating subtasksi) And (4) showing.
As a further improvement of the present invention, the step 6 includes:
step 61: distributing the subtasks of the current processor to be distributed to the processor p according to a Worst-Fit algorithm;
step 62: updating processor residual utilization Ua(p*);
And step 63: updating processor allocation policy θp*
As a further improvement of the invention, the heuristic processor partitioning method comprises the following steps:
step S1: initializing the processor allocation policy to null
Figure GDA0003144714910000031
Initializing a PST, initializing Ready queue Ready, the PST being a table of potential self-interference, the PST storing τiPotential self-interference subtasks for each subtask;
step S2: a processor for initially allocating a PEC according to the heuristic processor allocation step of the PEC structure;
step S3: updating the latest end time and the PST of the subtask of the allocated processor according to the result of the allocation in the step S2;
step S4: adding the subtasks of which all the previous subtasks are distributed to the processor into Ready;
step S5: judging whether Ready is empty or not, and if not, allocating a processor for the subtask in Ready; if Ready is empty, the allocation completes the exit routine.
As a further improvement of the present invention, in step S5, if Ready is not empty, the allocating processors for the subtasks in Ready includes:
step S51: sequencing the subtasks in Ready according to the LST non-descending order;
step S52: the first subtask V after sequencing Readyi,jA distribution processor;
step S53: updating the PST;
step S54: and adding a new subtask meeting the condition of the step S4 to Ready, and repeating the step S5.
As a further improvement of the present invention, the step S52 includes:
step S521: respectively calculating and distributing the subtasks to m different processors
Figure GDA0003144714910000032
Wherein k ∈ [1, m)],
Figure GDA0003144714910000033
Representing a subtask Vi,jThe latest end time of;
step S522: k is calculated, wherein
Figure GDA0003144714910000034
Step S523: and if there are 2 or more than 2 k, distributing the subtask to the k with the maximum residual utilization rate according to the Worst-Fit algorithm.
The invention also provides a real-time system, wherein the real-time system runs on the embedded development board, and the steps in the heuristic processor partitioning method run on the real-time system.
As a further improvement of the invention, the embedded development board is Raspberry Pi 3Model B +, the real-time system is v4.0.2 version of RT-Thread, and the RT-Thread is an open-source real-time operating system.
As a further improvement of the present invention, the file path and source code for the real-time system are modified as follows:
setting the value of the macro instruction RT _ TICK _ PER _ SECOND at the 18 th line in the rtconfig.h file under the path "/bsp/raspberry-pi/raspi 3-32/rtconfig.h" to 100;
changing the variable "cntfrq" at line 57 in the board.c file under the path "/bsp/raspberry-pi/raspi 3-32/driver/board.c" from 35000 to 10000;
the value of the variable cntfrq, which represents a counter that obtains a clock from an external crystal, is set to 1000;
the CPU is required to perform 1,500,000 auto-increment operations; the execution time of 1500000 times of the incremental operation is taken as a certain task to execute the worst execution time of the tasks in the experiment after the system time 1ticks according to the unit time.
As a further improvement of the invention, the scheduler of the real-time system performs the following steps:
the method comprises the following steps: initializing the release queue omega to be empty, and initializing the current system time tcurrentEither ← 0, initialize task set schedulable Flag ← TRUE;
step two: initialization vector
Figure GDA0003144714910000041
Wherein i 1nextEach element stores the time of the next release of the corresponding task;
step three: at the beginning of the system, all tasks are as follows
Figure GDA0003144714910000042
Adding the non-descending order into a release queue omega;
step four: if Flag is True, starting to schedule the task set to execute the step five, otherwise, the task set can not be scheduled, and exiting the program;
step five: if it is not
Figure GDA0003144714910000043
Step six is executed, otherwise, the suspension self is awakened when waiting for the task to be added into omega;
step six: obtaining the current system time tcurrent←GetSystemTick();
Step seven: acquiring task tau to be released last timex←GetFirstElement(Ω);
Step eight: if it is not
Figure GDA0003144714910000044
It means that t has not been reachedxTime required to be released, sleep
Figure GDA0003144714910000045
Time, otherwise, executing step nine;
step nine: releasing task tauxAnd according to the last τxWhether the task instance finishes judging whether the task set can be scheduled or not, and modifying the value of Flag;
step ten: if Flag is TRUE,
Figure GDA0003144714910000051
and will tauxAnd adding the obtained product into omega again according to the rule of the step three, and repeatedly executing the step four.
The present invention also provides a computer readable storage medium having stored thereon a computer program configured to, when invoked by a processor, implement the steps of the heuristic processor partitioning method of the present invention.
The invention has the beneficial effects that: the invention provides a heuristic processor allocation strategy-GPEC strategy aiming at DAG tasks under the partition fixed priority scheduling, wherein the strategy takes the internal topological structure of the DAG tasks and the influence of self-interference into consideration. In addition, the invention transplants the real-time system to the embedded board and evaluates the performance of the GPEC strategy on a real platform. Compared with two latest processor allocation strategies in experiments, the GPEC strategy of the invention reduces the average WCRT by 35.59% at most and improves the schedulable rate of a DAG task set by 76% at most.
Drawings
Fig. 1 is a schematic diagram illustrating obstacle avoidance operation;
FIG. 2 is a schematic diagram of a DAG task;
FIG. 3 is a graph of worst case response time versus utilization;
FIG. 4 is a diagram of schedulable rate versus utilization for a DAG task set;
FIG. 5 is a graph of worst case percent response time reduction as a function of utilization;
FIG. 6 is a graph of percentage increase in schedulable rate of a set of tasks versus utilization.
Detailed Description
The invention discloses a heuristic processor partitioning method for DAG tasks based on partitioned scheduling, which researches the processor allocation problem of fixed priority level partition scheduling of parallel DAG tasks on a multiprocessor. Since this problem has been demonstrated for NP-hard [15], it is not expected that an optimal processor allocation strategy will be found in polynomial time. Thus, different heuristic processor allocation algorithms are proposed to reduce the task Worst Case Response Time (WCRT) [16 ]. However, existing work does not take into account the topology of the DAG task and the effects of self-interference, resulting in pessimism or long response times to the task in the analysis. For example, if Vi in FIG. 1; 2 and Vi; 3 are all allocated on the processor 2, which causes interference with each other; if they are distributed over different processors, they may run in parallel.
Aiming at the problem, the invention provides a novel processor allocation strategy, which utilizes the topological structure of a DAG task, considers the influence of self-interference among subtasks of the same task and constructs an embedded real-time platform to verify the performance of the strategy. Specifically, the present invention makes the following main contributions:
1. the worst-case response time analysis of parallel DAG tasks under fixed priority partition scheduling is deduced, and the interference of high-priority tasks and the self-interference of sub-tasks of the same task are analyzed.
2. With the topology of DAG tasks, the present invention first defines a Parallel Execution Cluster (PEC) structure and designs a processor allocation strategy that attempts to allocate sub-tasks belonging to the same PEC structure to different processors.
3. Based on intuition of WCRT analysis, the invention improves the algorithm, further reduces self-interference, improves the task schedulability and provides a Greedy Parallel Execution Cluster (GPEC) processor allocation strategy.
4. The invention also transplants the open-source real-time operating system to the embedded development board and rewrites the task release scheduler thereof so as to execute the real-time DAG task in an event-driven manner.
Finally, a large number of experience experiments are carried out on the comprehensive task on the platform, and evaluation results show that compared with two algorithms, the GPEC algorithm provided by the invention reduces WCRT and obviously improves the schedulability of the task set.
The present invention is described in detail below:
the related work is as follows:
the partition scheduling and parallel tasks most relevant to the present invention will be described below. Parallel task scheduling on multiple processors has been extensively studied over the last few years. Scholars propose different parallel task models. A task in the synchronous task model is composed of a series of computing segments, wherein each segment has any number of parallel subtasks. A subtask in a segment can only be executed after all subtasks of the previous segment have been completed. In contrast, Directed Acyclic Graph (DAG) tasks allow for a more general parallel structure, where subtasks may have arbitrary dependencies, as long as there are no dependency loops. In the orthogonal dimension, different types of scheduling algorithms are proposed for parallel real-time tasks, which we will briefly introduce in the following.
Global scheduling of parallel tasks: saifullah et al demonstrate that decomposed DAG tasks can be scheduled with a global earliest deadline first schedule with a speed-up ratio of 4[20 ]. Bonifaci et al demonstrate speed-up ratios of the global earliest deadline first schedule and the global single deadline first schedule for DAG tasks of 2-1/m and 3-1/m, respectively, where m is the number of processors in the system [3 ].
And (3) partition scheduling of parallel synchronous tasks: lakshmann et al propose a task-decomposed partitioned fixed-priority scheduling algorithm for constrained synchronous tasks, whose subtasks have the same length in the same segment [14], and demonstrate a speed-up ratio of 3.42. Based on a similar decomposition idea, Saifullah et al developed a zone scheduling algorithm for unrestricted synchronous tasks and demonstrated an acceleration ratio of 5[5 ]. They also generalize the results to DAG tasks with unit-size subtasks. However, applying this result directly to convert a sub-task of a non-unit size to a node of a unit size may result in the sub-task migrating from one processor to another, since the nodes of a unit size belonging to the same sub-task may be allocated to different processors.
Partitioning scheduling of parallel DAG tasks: unlike synchronous tasks, which perform sub-task synchronization after each segment, DAG tasks have a more complex topology and are more difficult to analyze. Fonseca et al propose a response time analysis method for DAG tasks under partition scheduling to convert the DAG tasks into self-suspending tasks [10 ]. Due to the complexity of the analysis, most existing work focuses on different heuristically partitioning subtasks of the DAG task to the processor. To our knowledge, existing processor allocation strategies for DAG tasks under partitioned scheduling include the dagP algorithm proposed by Herrmann et al [12] and the MACRO algorithm proposed by O zkaya et al [13 ]. In particular, the dagP algorithm allocates subtasks to processors in three phases. First, the topology of the DAG task is roughly divided into convex sets [21 ]. The subtasks are then initially allocated to the processors by calculating the cost of switching a subtask from one processor to another. And finally, refining the partition result calculated in the second stage to obtain a final distribution result. In contrast, the MACRO algorithm uses the BL-EST algorithm to compute the weight for each subtask [22], thereby assigning the subtask to the processor. Similar to dagP, after initial allocation, MACRO will attempt to move subtasks from one processor to another according to a predetermined priority and calculate the cost of the move to optimize the allocation.
Federal scheduling of parallel tasks: li et al propose a federated scheduling strategy that allocates high-utilization tasks to a set of dedicated cores, with the remaining low-utilization tasks sharing the remaining cores [6]. In addition, they have demonstrated that G-EDF and G-RM have
Figure GDA0003144714910000071
And
Figure GDA0003144714910000072
the acceleration ratio of (1). Integration of instruction cache sharing into federated scheduling by reducing usage for high-utilization tasks [9]]The number of processors of (2) improves schedulability.
Secondly, a system model:
the system of the invention is composed of a task set composed of n preemptible real-time tasks, wherein gamma is { tau ═ tau1,...,τnEach of which is a Directed Acyclic Graph (DAG) task. These tasks execute P ═ P on a multi-core platform with m identical processors1,...,pm}. Per DAG task τi=(Vi,Ei,Ci,Ti,Di,fi) There are 6 parameters. Wherein ViRepresenting a set of subtasks (nodes), EiRepresenting a set of edges (inter-subtask dependencies), CiRepresenting the total worst case response time (WCET), T, of all subtasksiIndicating the period of the task, DiIndicates the deadline of the task (D)i≤Ti),fiIndicating the priority of the task.
Per DAG task τiFrom betaiAnd sub-tasks which are divided to different processors and executed based on the division schedule. If the response time (the time interval from task release to task completion) is greater than his deadline, the task is said to be non-dispatchable. Further, a set of tasks is said to be non-dispatchable if it has a task that is non-dispatchable, otherwise it is dispatchable. Subtask Vi,jRepresenting tasks τiJ sub-task, Vi,jHas 2 parameters<Ci,j,Pi,j>In which C isi,jRepresents Vi,jWCET, Pi,jRepresents Vi,jIs distributed toThe processor of (1).
We use pr (τ)i) Representing DAG tasks τiSet of processors used, where | pr (τ)i) Less than or equal to m. The deadline for each subtask is inherited from the task. We use e (V)i,j,Vi,k)∈EjRepresenting a slave Vi,jPoint of direction Vi,kThis means Vi,kOnly when Vi,jExecution can only begin when completed.
When each DAG task is released, all its subtasks are released at the same time, but not all are ready because the dependencies described above exist.
Figure GDA0003144714910000081
For convenience of analysis, we used Ui=CiV (T m) denotes task τiThe utilization ratio of (2). The utilization cannot be greater than 1 in any case, otherwise the task set is not schedulable.
There are n independent priorities in the system corresponding to the n DAG tasks one to one. We use fiDenoting task τ by jiIs j. We specify that the smaller the priority of a task, the higher its priority. In other words, τiPriority higher than τjIf and only if x<y, wherein fi=x,fjY. In the Deadline Monotonics (DM) priority assignment algorithm used in the present invention, priorities are assigned according to deadlines of tasks, i.e., the lower the deadlines, the higher the priority. And the processor will always select the highest priority task that is ready in the current system to execute.
Definition 1: if there is one edge e (V)i,j,Vi,k)∈EiWe then call Vi,jIs Vi,kThe preceding subtask of (1), otherwise Vi,kReferred to as Vi,jThe successor subtask of (1).
Definition 2: if a task has no previous subtask, the subtask is called as a source subtask, and S is usediAnd (4) showing. Similarly, a subtask is said to be a terminator if it has no successor subtaskF (tau) for task, terminating set of subtasksi) And (4) showing.
For a task τiOnly one source subtask, then Si=Vi,1. A task with multiple source subtasks can be easily added by adding a predecessor subtask (V) with WCET 0 to these source subtasksi,0In which C isi,00) into a task with only one source subtask (for convenience of the following analysis), i.e. Si=Vi,0
Definition 3: we use the northern (V)i,j) Represents Vi,jA set of all the preceding subtasks. Similarly, Vi,jThe set of all successor subtasks uses child (V)i,j) And (4) showing.
V is readily found according to definition 3 and the description of the opposite side abovei,jAt the very point (V)i,j) Cannot begin execution until all subtasks in the set are completed.
Definition 4: we use λi,kRepresenting tasks τiThe k-th path.
Figure GDA0003144714910000082
Is a continuous set of subtasks, where Vi,f∈F(τi). We use λ ═ λ1,...,λγiDenotes the task τiSet of all paths in (1), wherein γiIs the number of all paths.
Definition 5: if the following two conditions are satisfied simultaneously, then V is calledi,jIs Vi,kIndirect successor subtask of (V)i,kIs Vi,jThe indirect predecessor subtask of (2):
1) absence of edge e (V)i,j,Vi,k)∈EiFrom Vi,fPoint of direction Vi,k
2) There is a path first through Vi,jAfter passing through Vi,j
For example, as shown in FIG. 2, a DAG task τ is composed of 7 subtasks for one taski。τiIs distributed over two processors, i.e. P ═ P1,P2},m=2。Ci=22,Ti=Di80. Utilization rate of Ui=22/(80·2)=0.275。Vi,1Is the source subtask, Vi,2,Vi,3And Vi,7Is the terminator subtask. Vi,1Is Vi,6Because the two are not directly connected and there is a path { V }i,1→Vi,4→Vi,6→Vi,7Firstly pass through Vi,1Rear pass through Vi,7。τiThere are 4 paths in total, i.e. gammai=4。
Response time analysis:
response time analysis in real-time systems is one way to determine whether a set of tasks is schedulable. The intuition derived from the analysis may help us to develop a good processor allocation strategy that enables the DAG task set to be scheduled. Fonseca et al propose an analysis method for computing WCRT by DAG task under partition scheduling [10]. They demonstrated tauiIs equal to the largest of the WCRTs of all paths, can be calculated by equation (1), where (R (λ)i,k) Can be calculated by the formula (2).
Figure GDA0003144714910000096
Figure GDA0003144714910000091
R(λi,k) The calculation of (c) is divided into 3 parts. Wherein len (lambda)i,k) Represents a path λi,kCan use
Figure GDA0003144714910000092
Figure GDA0003144714910000093
And (4) calculating.
Figure GDA0003144714910000094
And
Figure GDA0003144714910000095
respectively represent lambdai,kSelf-interference (self-interference) and interference from high priority task nodes (high-interference).
The impact of high priority task interference is the workload of the high priority DAG tasks. Note that once the priority of the DAG task is assigned, the high interference per path is determined. The invention mainly researches the influence of self-interference on DAG tasks.
Each DAG task has a unique priority without loss of generality. Since all sub-tasks from the same DAG task share the same priority, they cannot preempt each other, thereby creating self-interference. Furthermore, the sub-tasks from the same path do not interfere with each other because of the dependencies between them. Therefore, will be for λi,kThe subtask that generates self-interference cannot belong to the subtask. We use self (V)i,j) Is shown as pair Vi,jA set of sub-tasks that generate self-interference.
Theorem 1: subtask Vi,kWill pair subtask Vi,jSelf-interference occurs if and only if the following two conditions are satisfied simultaneously:
1) the two subtasks being assigned to the same processor, i.e. Pi,j=Pi,k
2)Vi,jIs other than Vi,jOr an indirect preceding sub-task, and vice versa.
And (3) proving that: in a partitioned real-time system, a subtask can only run on one processor as long as it is assigned to that processor. Furthermore, executing one sub-task on one processor does not interfere with the execution of some sub-tasks on other processors, and vice versa. Considering the first condition if Pi,j≠Pi,kThe two subtasks never interfere with each other's execution, i.e. Vi,kCan not be aligned with Vi,jThe performing of (2) generates self-interference. Without loss of generality, we assume Vi,jIs Vi,kIs (indirectly) a preceding sub-task. Then Vi,kAt Vi,jDo not start until execution is complete, then Vi,kAlso will not be aligned with Vi,jSelf-interference is generated. In summary, if Vi,kWill pair subtask Vi,jSelf-interference is generated, the above two conditions must be satisfied at the same time.
Inference 1: let self (lambda)i,k) Will be opposite to path λi,kSet of tasks that generate self-interference, wherein
Figure GDA0003144714910000101
Then
Figure GDA0003144714910000102
Can be calculated by the formula (3).
Figure GDA0003144714910000103
And (3) proving that: consider 3 subtasks Vi,a,Vi,bAnd Vi,cIn which V isi,aAnd Vi,bBelonging to path λi,kAnd Vi,cWhile belonging to self (V)i,a) And self (V)i,b). Task τ is known from the description in chapter threeiIs released within its period and only one sub-task instance, i.e. V, is releasedi,cMaximum pairs of Vi,aAnd Vi,bOne of the two sub-tasks generates self-interference. So that λ is the worst casei,kThe received self-interference is equal to the sum of the WCET of all the subtasks that would generate self-interference for that path.
The processor is divided into:
in this section, we propose a processor allocation strategy under partition scheduling that takes into account both the topology and the effects of self-interference. It is easy to infer that the worst self-interference per path is determined if all the subtask assigned processors pass observation theorem 1 and equation (3). Our intuition is to configure the strategy (reduce WCRT DAG tasks and increase schedulable probability of DAG task set) for improving the performance of the processor to minimize self-interference of each sub-task.
4.1 heuristic distribution method based on DAG task topological structure
According to theorem 1, if we want to improve the performance of a processor allocation strategy by reducing self-interference among subtasks, we can only allocate subtasks that have a potential self-interference relationship with each other to different processors. In other words, the first condition of theorem 1 should not be satisfied. We cannot break the second condition of theorem 1 by changing the dependencies between subtasks, because the topology of the DAG task is an inherent property we cannot change.
Definition 6: a set of subtasks with and without one identical parent-subtask is referred to as a Parallel Execution Cluster (PEC) structure. In addition, there are at least two subtasks per PEC structure. We use piiTo represent task τiNumber of PEC structures, use
Figure GDA0003144714910000111
Denotes τiThe kth PEC structure of (1), wherein k ∈ [0, πi]。
For example, only one PEC structure is present in FIG. 2
Figure GDA0003144714910000112
Vi,3And Vi,6Does not form a PEC structure because of Vi,6With two successor subtasks Vi,4And Vi,5
Theorem 2:
Figure GDA0003144714910000113
if the subtasks in (1) are distributed to the same processor, they will inevitably interfere with each other, i.e. they will generate self-interference on the path they are on.
And (3) proving that: consider a PEC structure
Figure GDA0003144714910000114
Wherein Vi,aIs thatTheir predecessor subtasks. Because of the fact that
Figure GDA0003144714910000115
All subtasks in (a) inherit from the same preceding subtask, so there is no dependency between them. In other words, there is no path through it at the same time
Figure GDA0003144714910000116
Any two subtasks in (c). As long as Vi,aHas completed execution, then
Figure GDA0003144714910000117
All subtasks in (a) may start executing. According to theorem 1, if these subtasks are allocated to the same processor, they will increase self-interference.
Let LST (V)i,j) Representing a subtask Vi,jTolerable latest start time. If Vi,jIs later than LST (V)i,j) Then task τiMust not be scheduled, LST (V)i,j) Can be calculated by the formula (4).
Figure GDA0003144714910000118
Apparently, LST (V)i,j) The smaller the value of (c) the more should be performed as early as possible, so we follow this order as our processor assignment order. Furthermore, we use LST (τ)i) Represents a betaiA vector of elements, each element corresponding to a task τ one-to-oneiA tolerable latest start time of the subtask in (1), i.e.
Figure GDA0003144714910000119
According to theorem 2, we shall turn
Figure GDA00031447149100001110
The subtasks in (A) are distributed to different processors toReducing inter-task self-interference. The heuristic processor assignment step of the PEC structure (algorithm one) is described as follows:
step 1: initializing remaining utilization U of each processora(i) 1, i, m; initializing Ready queue Ready to be empty; initializing the processor allocation policy to null
Figure GDA00031447149100001111
Step 2: calculating the tolerable latest starting time of each subtask according to the formula (4) and saving the latest starting time into a table LST;
and step 3: slave task tauiCheck whether the PEC structure is derived from the first task to the last task of
Figure GDA0003144714910000121
If get
Figure GDA0003144714910000122
Then step 4 is executed, otherwise, the process is ended;
and 4, step 4: will be provided with
Figure GDA0003144714910000123
All the subtasks in the queue are added into the Ready queue Ready;
and 5: arranging the subtasks in the Ready queue Ready according to the LST non-descending order;
step 6: processors are allocated from the first subtask to the last subtask in Ready.
The step 6 comprises the following steps:
step 61: distributing the subtasks of the current processor to be distributed to the processor p according to a Worst-Fit algorithm;
step 62: updating processor residual utilization Ua(p*);
And step 63: updating processor allocation policy θp*
The data that the above steps need to occupy space for long-term storage is Ua(i) Ready, LST and θkWith the number of subtasks and the number of processorsThere is an ongoing increase. Therefore, the spatial complexity of the above algorithm is O (max { β) }iM }). Each subtask belongs to at most one PEC structure, otherwise this task will have at least 2 previous subtasks, contrary to definition 6. Worst case algorithm 1 performs betaiAnd (5) performing secondary circulation. Therefore, the time complexity is O (. beta.)i)。
4.2 heuristic distribution Algorithm (GPEC strategy) based on self-interference cost function
In this section, we propose a heuristic processor allocation algorithm that takes into account the effects of self-interference. The heuristic algorithm aims to reduce the WCRT of the DAG task by reducing the self-interference of each subtask as much as possible, so that the schedulable probability of the DAG task is improved.
As can be seen from theorem 1, once the topology of a DAG task is determined, the potential self-interference tasks of all the subtasks are determined. For subtask Vi,jIf they are assigned to V, the potential self-interference tasks of (1)i,jOn the same processor, they can then pair Vi,jSelf-interference is generated. We use the potential self-interference table (PST) to store τiThe potential self-interference subtasks for each subtask. A total of beta in the PSTiEach element is a set of subtasks corresponding to τiPotential self-interference subtasks in (2). We use
Figure GDA0003144714910000124
To represent the worst case subtask Vi,jCan be calculated by equation (5).
Figure GDA0003144714910000125
τiEach subtask V in (1)i,jAll have an earliest start time
Figure GDA0003144714910000126
And a latest end time
Figure GDA0003144714910000131
Since there are dependencies within a task, subtask Vi,jIs no more than
Figure GDA0003144714910000132
Early in the day.
Figure GDA0003144714910000133
Depending on the maximum value of the latest end time in its preceding subtask. Furthermore, execution can start immediately, i.e. without the constraint of a preceding subtask by the source node, i.e. it is possible to start immediately
Figure GDA0003144714910000134
The earliest start execution time and the latest end time of the other subtasks can be calculated by equation (6) and equation (7), respectively.
Figure GDA0003144714910000135
Figure GDA0003144714910000136
We use
Figure GDA0003144714910000137
As a cost function directing the subtasks to allocate the processors. Our goal is to reduce the latest end time of each subtask as much as possible. Therefore we use the base
Figure GDA0003144714910000138
To assign each subtask to the processor. For each subtask we calculate m latest end time values
Figure GDA0003144714910000139
Respectively correspondingly distributing the m processorsThe resulting latest end time value. We select the processor at which the minimum is located as the processor of the subtask. If the number of the minimum values exceeds 2 (2 or more than 2 of the latest end times are all minimum values), the processor where the minimum value is located is distributed according to the Worst-Fit algorithm. We combine the heuristic processor allocation step (algorithm one) based on the PEC structure with the above algorithm to get algorithm two, which is described in detail as follows:
step S1: initializing the processor allocation policy to null
Figure GDA00031447149100001310
Initializing a PST (power system time), and initializing a Ready queue Ready;
step S2: a processor that initially allocates a PEC according to a heuristic processor allocation step (Algorithm one) of the PEC structure;
step S3: updating the latest end time and the PST of the subtask of the allocated processor according to the result of the allocation in the step S2;
step S4: adding the subtasks of which all the previous subtasks are distributed to the processor into Ready;
step S5: judging whether Ready is empty or not, and if not, allocating a processor for the subtask in Ready; if Ready is empty, the allocation completes the exit routine.
In step S5, if Ready is not empty, allocating a processor to the subtask in Ready includes:
step S51: sequencing the subtasks in Ready according to the LST non-descending order;
step S52: the first subtask V after sequencing Readyi,jA distribution processor;
step S53: updating the PST;
step S54: and adding a new subtask meeting the condition of the step S4 to Ready, and repeating the step S5.
The step S52 includes:
step S521: respectively calculating and distributing the subtasks to m different processors
Figure GDA0003144714910000141
Wherein k ∈ [1, m)],
Figure GDA0003144714910000142
Representing a subtask Vi,jThe latest end time of;
step S522: k is calculated, wherein
Figure GDA0003144714910000143
Step S523: and if there are 2 or more than 2 k, distributing the subtask to the k with the maximum residual utilization rate according to the Worst-Fit algorithm.
Because the second algorithm only stores four parameters, namely PST thetaiLST and UaThus the spatial complexity of algorithm two is
Figure GDA0003144714910000144
From step S5 and step S52, the time complexity of the second algorithm is O (m.beta.. beta.)i)。
And V, experiment:
in this section, the validity of the processor allocation strategy proposed by the present invention was verified by experiments performed on real embedded devices. We first propose a method for generating a DAG task set based on UUnifast algorithm [23 ]. The multiprocessor platform used by the present invention and the real-time system we have developed are then presented. Next, we rewrite the task release scheduler (hereinafter simply scheduler) of the real-time system to enable the system to support event-driven computing tasks [24 ]. We compare the task set execution based on processor allocation policies in real-time systems under the same DAG task set with the current state-of-the-art MACRO and dagP policies. Finally, the effectiveness of the three processor allocation strategies is analyzed and evaluated according to the execution result. In addition, the priority assignment method of the present invention is a DM method, i.e., the smaller the deadline, the higher the priority of the DAG task.
5.1 Generation of DAG task sets
We generated the set of tasks for the experiment according to the following parameters.
U: representing the utilization of a set of tasks
N: representing the number of DAG tasks in a task set
·βi: representing DAG tasks τiNumber of neutron tasks
M: representing the number of processors
P: probability factor representing per-DAG task topology generation variations
[Cmin,Cmax]Respectively representing the upper and lower bounds of the worst-case execution time of each subtask.
The total utilization of each processor allocation policy is generated from 0.4 to 0.9 in steps of 0.1. For each exact utilization, we generate 100 DAG task sets and use their average to characterize the DAG tasks in this utilization. We specify that the number of tasks in each task set and the number of subtasks in each task are both 10, i.e. n ═ βi10. The worst-case execution time for each subtask is randomly generated from 1 to 5, i.e. Cmin=1,C max5. After the utilization rate of each task is generated, the period of the task can be obtained according to the following formula
Figure GDA0003144714910000151
The topology of each task is generated by randomly adding edges between the subtasks according to the probability p. Herein fixed p ═ 0.15, and one β is usediLine betaiThe matrix a of columns stores the topology, i.e. a (x, y) ═ 1 means that there is an edge from Vi,xPoint of direction Vi,y
5.2 Experimental platform, real-time System
The embedded development board used in the invention is Raspberry Pi 3Model B + [25 ]. The development board has a quad 1.4GHz 64-bit processor based on the Cortex-a53 architecture, i.e., m-4. In addition, it has dual-band wireless local area network, Bluetooth 4:2/BLE, faster Ethernet, power-on-Ethernet support (with separate PoE HAT), which enables it to provide excellent scalability.
The real-time system of choice in the present invention is RT-Thread [26 ]. RT-Thread is an open source real-time operating system that has been licensed under Apache License Version 2.0 starting at v3.1.1. In addition, RT-Thread supports preemptive scheduling. We used the v4.0.2 version of RT-Thread as an experimental system that supports Symmetric Multiprocessing (SMP) scheduling and hardware driving of the platform used by the present invention.
Since the official migration of RT-Thread is too crude for the Raspberry Pi platform, we find some errors when reading the source code. We do a lot of work to correct the known errors. To make it easier for the technician to reproduce our experiments, we list the modifications of the modified file path and source code as follows.
We are in the path: the macro instruction RT _ TICK _ PER _ SECOND at line 18 in the rtconfig.h file under "./bsp/raspberry-pi/raspi 3-32/rtconfig.h" has its value set to 100.
We are in the path: the variable "cntfrq" at line 57 of the board.c document under "./bsp/raspberry-pi/raspi 3-32/driver/board.c" was changed from 35000 to 10000.
The macro definition of RT _ TICK _ PER _ SECOND represents how many system TICKs will be executed in one SECOND. The system tick is the minimum unit of time for all programs executing on the system, that is, it is the atomic time of the system. We set RT _ TICK _ PER _ SECOND to 100, which means 100 TICKs in one SECOND, with 1TICK equal to 10 milliseconds. Therefore, the WCET of a subtask is equal to 4, which means that the subtask will perform 4 ticks (40 milliseconds). The variable cntfrq represents a counter that gets a clock from an external crystal. The value of cntfrq is directly related to the accuracy of the system clock. Based on our extensive experimentation and observation, if we set it to 1000, it will provide an accurate system clock.
The features of a preemptive real-time system would be violated if sleep mode or suspend mode were used to simulate the execution of a subtask. Since both modes may cause the executing subtasks to relinquish the privilege of the processor. To avoid the above problem, we require the CPU to perform a certain number of auto-increment operations. According to our test results, it takes exactly 1 system tick if we do 1,500,000 auto-increment operations. The worst-case execution time of the CPU executing 1500000 times of the ramp operation as a certain task executing a system time (1ticks) in the following experiment depends on the unit time. That is, the worst case execution time of the task in the experiment of the present invention is 5, that is, the task actually executes 5 times 1500000 times of the ramp-up operation instead of occupying 5 unit times of the CPU.
The scheduler is the thread with the highest priority in the system (the priority of the scheduler thread is zero) and it is initialized and started at system start-up. For an event-driven real-time system, the response time of a DAG task is its interval from publication to completion. To satisfy this condition, we rewritten the scheduler of the real-time system. The detailed steps of the new scheduler are described as follows (algorithm three).
The method comprises the following steps: initializing the release queue omega to be empty, and initializing the current system time tcurrentEither ← 0, initialize task set schedulable Flag ← TRUE;
step two: initialization vector
Figure GDA0003144714910000161
Wherein i 1nextEach element stores the time of the next release of the corresponding task;
step three: at the beginning of the system, all tasks are as follows
Figure GDA0003144714910000162
Adding the non-descending order into a release queue omega;
step four: if Flag is True, starting to schedule the task set to execute the step five, otherwise, the task set can not be scheduled, and exiting the program;
step five: if it is not
Figure GDA0003144714910000163
Step six is executed, otherwise, the suspension self is awakened when waiting for the task to be added into omega;
step six: obtaining a current system timetcurrent←GetSystemTick();
Step seven: acquiring task tau to be released last timex←GetFirstElement(Ω);
Step eight: if it is not
Figure GDA0003144714910000164
It means that t has not been reachedxTime required to be released, sleep
Figure GDA0003144714910000165
Time, otherwise, executing step nine;
step nine: releasing task tauxAnd according to the last τxWhether the task instance finishes judging whether the task set can be scheduled or not, and modifying the value of Flag;
step ten: if Flag is TRUE,
Figure GDA0003144714910000166
and will tauxAnd adding the obtained product into omega again according to the rule of the step three, and repeatedly executing the step four.
The dependency relationship between the subtasks is ensured by an event set structure provided by the system, and the event set structure is a mechanism which can communicate between threads and is provided by the RT-Thread real-time system.
5.3 Experimental results and analysis
Because DAG tasks cannot be performed on a physical machine for an infinitely long time as theoretical analysis. Thus, for each set of DAG tasks at each utilization, we release all DAG tasks simultaneously and perform their 30,000 system ticks (5 minutes in our system). Then, we observe the state of each DAG task set and save the results of its response time.
After starting the real-time system, we first run 1,000 ticks empty to avoid interference from the system, and then we perform the time of 1,000 ticks as the system warmup. Furthermore, before experimenting on any DAG task set, we set the state of the system to idle 500 ticks to eliminate any possible interference of the previous DAG task set on the next DAG task set. The 500 ticks number is chosen as the idle time because it is exactly equal to the time that all subtasks of the previous DAG task set were executed once in order under worst case conditions.
FIG. 3 shows the average WCRT of DAG task sets at different utilization rates, where MACRO, dagP, and GPEC represent the MACRO, dagP, and GPEC processor allocation policies, respectively. The average WCRT for all three processor allocation strategies increases with increasing utilization. In addition, under the same utilization rate, the task sets processed by the three processor allocation strategies are the same. When the utilization rate is 0.9, the average WCRT of the DAG task set allocated by the MACRO policy has no corresponding value, because all DAG task sets cannot be scheduled under such a processor allocation policy. Macroscopically, the average WCRT of the GPEC strategy for the DAG task set is smaller than the other two strategies.
We represent the schedulable rate of DAG task sets for a given utilization using the proportion of the number of DAG task sets we can schedule in all 100 DAG task sets we create to 100. FIG. 4 illustrates schedulable rates for DAG task sets for different use cases. In general, the schedulable rates of all three processor allocation strategies decrease as utilization increases. For each particular utilization, the schedulable proportion of the DAG task set based on the GPEC policy is greater than the other two policies. Moreover, the tunable rate of the DAG task set based on the MACRO strategy is greater than the tunable rate based on the dagP strategy.
FIG. 5 shows the average WCRT reduction percentage for different utilization modes, where legends GPEC-MACRO and GPEC-dagP represent the WCRT reduction percentage for GPEC versus MACRO and GPEC versus dagP, respectively. The average WCRT of the DAG task set allocated by the GPEC strategy is smaller than the values of the other two strategies. The maximum percent reduction in average WCRT for GPEC compared to dagP was 35.59% when the utilization was 0.7. Similarly, when the utilization is 0.6, GPEC can reduce the average WCRT by up to 28.59% by comparison to MACRO. Furthermore, the outlier (negative) at 0.9 utilization is due to the large response time when some task sets are scheduled using the GPEC policy, which is not schedulable by the dagP policy. For example, table 1 shows WCRT obtained by two strategies for 3 task sets with a utilization rate of 0.9. Although the GPEC policy is superior to the dagP policy, the GPEC policy is inferior to the dagP policy on average WCRT due to the too small number of schedulable task sets.
TABLE 1 partial task set response time Table
Figure GDA0003144714910000181
Figure 6 shows the percentage increase in schedulable rate using the GPEC policy compared to the mac ro policy and the dagP policy for different utilization scenarios. As the utilization increases, the results of the two comparisons increase and then decrease. The GPEC strategy increases the most at 0.8 utilization compared to the MACRO and dagP strategies, 76% and 72%, respectively.
In the invention, firstly, response time analysis of a DAG task based on a partition fixed priority scheduling algorithm is deduced. Based on the intuition of the analysis, we propose a Greedy Parallel Execution Cluster (GPEC) processor allocation strategy that takes into account the topology of the DAG tasks and self-interference among subtasks within the tasks. In addition, an open-source real-time operating system is transplanted to the embedded development board, and an experimental experiment is carried out on the embedded development board to evaluate the performance of the GPEC strategy provided by the invention. Experimental results show that, compared with the existing processor allocation strategy, the GPEC can reduce the average worst-case response time of the tasks by 35.59%, and improve the schedulable rate of the task set by 76%.
Sixth, reference:
[1]N.Abbas,Y.Zhang,A.Taherkordi,and T.Skeie,“Mobile edge computing:A survey,”IEEE Internet of Things Journal,vol.5,no.1,pp.450–465,Feb2018.
[2]S.Abedi,N.Gandhi,H.M.Demoulin,Y.Li,Y.Wu,and L.T.X.Phan,“Rtnf:Predictable latency for network function virtualization,”in 2019IEEE Real-Time and Embedded Technology and Applications Symposium(RTAS).IEEE,2019,pp.368–379.
[3]V.Bonifaci,A.Marchetti-Spaccamela,S.Stiller,and A.Wiese,“Feasibility analysis in the sporadic dag task model,”in 2013 25th Euromicro Conference on Real-Time Systems,2013,pp.225–233.
[4]H.S.Chwa,J.Lee,J.Lee,K.Phan,A.Easwaran,and I.Shin,“Global edf schedulability analysis for parallel tasks on multi-core platforms,”IEEE Transactions on Parallel and Distributed Systems,vol.28,no.5,pp.1331–1345,2017.
[5]A.Saifullah,J.Li,K.Agrawal,C.Lu,and C.Gill,“Multi-core real-time scheduling for generalized parallel task models,”Real-Time Systems,vol.49,no.4,pp.404–435,2013.
[6]J.Li,J.J.Chen,K.Agrawal,C.Lu,C.Gill,and A.Saifullah,“Analysis of federated and global scheduling for parallel real-time tasks,”in 2014 26th Euromicro Conference on Real-Time Systems.IEEE,2014,pp.85–96.
[7]X.Jiang,N.Guan,X.Long,and W.Yi,“Semi-federated scheduling of parallel real-time tasks on multiprocessors,”in 2017 IEEE Real-Time Systems Symposium(RTSS).IEEE,2017,pp.80–91.
[8]S.Baruah,“Federated scheduling of sporadic dag task systems,”in 2015 IEEE International Parallel and Distributed Processing Symposium.IEEE,2015,pp.179–186.
[9]C.Tessler,V.P.Modekurthy,N.Fisher,and A.Saifullah,“Bringing inter-thread cache benefits to federated scheduling,”in 2020 IEEE Real-Time and Embedded Technology and Applications Symposium(RTAS).IEEE,2020,pp.281–295.
[10]J.Fonseca,G.Nelissen,V.Nelis,and L.M.Pinho,“Response time analysis of sporadic dag tasks under partitioned scheduling,”in 2016 11th IEEE Symposium on Industrial Embedded Systems(SIES).IEEE,2016,pp.1–10.
[11]D.Casini,A.Biondi,G.Nelissen,and G.Buttazzo,“Partitioned fixedpriority scheduling of parallel tasks without preemptions,”in 2018 IEEE Real-Time Systems Symposium(RTSS).IEEE,2018,pp.421–433.
[12]J.Herrmann,J.Kho,B.Uc,ar,K.Kaya,and U¨.V.C,atalyu¨rek,“Acyclic partitioning of large directed acyclic graphs,”in 2017 17th IEEE/ACM international symposium on cluster,cloud and grid computing(CCGRID).IEEE,2017,pp.371–380.
[13]M.Y.O¨zkaya,A.Benoit,B.Uc,ar,J.Herrmann,and U¨.V.C,atalyu¨rek,“A scalable clustering-based task scheduler for homogeneousprocessors using dag partitioning,”in 2019 IEEE International Parallel and Distributed Processing Symposium(IPDPS).IEEE,2019,pp.155–165.
[14]K.Lakshmanan,S.Kato,and R.Rajkumar,“Scheduling parallel realtime tasks on multi-core processors,”in 2010 31st IEEE Real-Time Systems Symposium,2010,pp.259–268.
[15]M.Xu,L.T.X.Phan,H.-Y.Choi,Y.Lin,H.Li,C.Lu,and I.Lee,“Holistic resource allocation for multicore real-time systems,”in 2019 IEEE Real-Time and Embedded Technology and Applications Symposium(RTAS).IEEE,2019,pp.345–356.
[16]M.Fan and G.Quan,“Harmonic-fit partitioned scheduling for fixedpriority real-time tasks on the multiprocessor platform,”in 2011 IFIP 9th International Conference on Embedded and Ubiquitous Computing.IEEE,2011,pp.27–32.
[17]N.Fisher,S.Baruah,and T.P.Baker,“The partitioned scheduling of sporadic tasks according to static-priorities,”in 18th Euromicro Conference on Real-Time Systems(ECRTS’06).IEEE,2006,pp.10–pp.
[18]R.M.Pathan and J.Jonsson,“Load regulating algorithm for staticpriority task scheduling on multiprocessors,”in 2010 IEEE International Symposium on Parallel&Distributed Processing(IPDPS).IEEE,2010,pp.1–12.
[19]F.Fauberteau,S.Midonnet,and L.George,“Allowance-fit:a partitioning algorithm for temporal robustness of hard real-time systems upon multiprocessors,”in 2009 IEEE Conference on Emerging Technologies&Factory Automation.IEEE,2009,pp.1–4.
[20]A.Saifullah,D.Ferry,J.Li,K.Agrawal,C.Lu,and C.D.Gill,“Parallel real-time scheduling of dags,”IEEE Transactions on Parallel and Distributed Systems,vol.25,no.12,pp.3242–3252,2014.
[21]N.Fauzia,V.Elango,M.Ravishankar,J.Ramanujam,F.Rastello,A.Rountev,L.-N.Pouchet,and P.Sadayappan,“Beyond reuse distance analysis:Dynamic analysis for characterization of data locality potential,”ACM Transactions on Architecture and Code Optimization(TACO),vol.10,no.4,pp.1–29,2013.
[22]H.Wang and O.Sinnen,“List-scheduling versus cluster-scheduling,”IEEE Transactions on Parallel and Distributed Systems,vol.29,no.8,pp.1736–1749,2018.
[23]E.Bini and G.C.Buttazzo,“Measuring the performance of schedulability tests,”Real-Time Systems,vol.30,no.1-2,pp.129–154,2005.
[24]S.Chakraborty,T.Erlebach,S.K¨unzli,and L.Thiele,“Schedulability of event-driven code blocks in real-time embedded systems,”in Proceedings of the 39th annual Design Automation Conference,2002,pp.616–621.
[25]“Raspberry pi 3model b+,”https://www.raspberrypi.org/products/raspberry-pi-3-model-b-plus/.
[26]“Rt-thread system,”https://github.com/RT-Thread/rt-thread.
the foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A heuristic processor partitioning method for a scheduled DAG task based on partitioning, comprising a heuristic processor allocation step of a PEC structure, the heuristic processor allocation step of the PEC structure comprising:
step 1: initializing remaining utilization U of each processora(i) 1, i, m; initializing Ready queue Ready to be empty; initializing the processor allocation policy to null
Figure FDA0003144714900000011
Step 2: calculating the tolerable latest starting time of each subtask according to the formula (4) and saving the latest starting time into a table LST;
and step 3: slave task tauiCheck whether the PEC structure is derived from the first task to the last task of
Figure FDA0003144714900000012
If get
Figure FDA0003144714900000013
Then step 4 is executed, otherwise, the process is ended; the set of subtasks with and having only one identical parent-subtask is called the PEC structure, which is a parallel execution cluster structure, using
Figure FDA0003144714900000014
Denotes τiThe kth PEC structure of (1), wherein k ∈ [0, πi],πiRepresenting tasks τiThe number of PEC structures in;
and 4, step 4: will be provided with
Figure FDA0003144714900000015
All the subtasks in the queue are added into the Ready queue Ready;
and 5: arranging the subtasks in the Ready queue Ready according to the LST non-descending order;
step 6: allocating processors from the first subtask to the last subtask in Ready;
Figure FDA0003144714900000016
LST(Vi,j) Representing a subtask Vi,jTolerable latest start time if Vi,jIs later than LST (V)i,j) Then task τiMust not be scheduled, LST (V)i,j) Can be calculated by the formula (4) CiRepresenting the total worst case response time, D, of all subtasksiIndicates the deadline of the task, Ci,jRepresents Vi,jWCET, WCET being the worst case response time, F (τ) is used for the set of terminating subtasksi) And (4) showing.
2. The heuristic processor partitioning method of claim 1, wherein the step 6 comprises:
step 61: distributing the subtasks of the current processor to be distributed to the processor p according to a Worst-Fit algorithm;
step 62: updating processor residual utilization Ua(p*);
And step 63: updating processor allocation policy θp*
3. The heuristic processor partitioning method of any of claims 1-2, wherein the heuristic processor partitioning method comprises the steps of:
step S1: initializing the processor allocation policy to null
Figure FDA0003144714900000021
Initializing a PST, initializing Ready queue Ready, the PST being a table of potential self-interference, the PST storing τiPotential self-interference subtasks for each subtask;
step S2: a processor for initially allocating a PEC according to the heuristic processor allocation step of the PEC structure;
step S3: updating the latest end time and the PST of the subtask of the allocated processor according to the result of the allocation in the step S2;
step S4: adding the subtasks of which all the previous subtasks are distributed to the processor into Ready;
step S5: judging whether Ready is empty or not, and if not, allocating a processor for the subtask in Ready; if Ready is empty, the allocation completes the exit routine.
4. A heuristic processor partitioning method as claimed in claim 3 wherein, in the step S5, if Ready is not empty, allocating processors for the subtasks in Ready comprises:
step S51: sequencing the subtasks in Ready according to the LST non-descending order;
step S52: the first subtask V after sequencing Readyi,jA distribution processor;
step S53: updating the PST;
step S54: and adding a new subtask meeting the condition of the step S4 to Ready, and repeating the step S5.
5. The heuristic processor partitioning method of claim 4, wherein the step S52 comprises:
step S521: respectively calculating and distributing the subtasks to m different processors
Figure FDA0003144714900000022
Wherein k ∈ [1, m)],
Figure FDA0003144714900000023
Representing a subtask Vi,jThe latest end time of;
step S522: k is calculated, wherein
Figure FDA0003144714900000024
Step S523: and if there are 2 or more than 2 k, distributing the subtask to the k with the maximum residual utilization rate according to the Worst-Fit algorithm.
6. A real-time system, wherein the real-time system runs on an embedded development board, and the steps in the heuristic processor partitioning method of any of claims 3 to 5 run on the real-time system.
7. The real-time system of claim 6, wherein the embedded development board is a Raspberry Pi 3Model B +, the real-time system is version v4.0.2 of RT-Thread, and RT-Thread is an open-source real-time operating system.
8. The real-time system of claim 7, wherein the file path and source code for the real-time system are modified as follows:
setting the value of the macro instruction RT _ TICK _ PER _ SECOND in the rtconfig.h file under the path "/bsp/raspberry-pi/raspi 3-32/rtconfig.h" to 100;
changing the variable "cntfrq" in the board.c file under the path "/bsp/raspberry-pi/raspi 3-32/driver/board.c" from 35000 to 10000;
the value of the variable cntfrq, which represents a counter that obtains a clock from an external crystal, is set to 1000;
the CPU is required to perform 1,500,000 auto-increment operations; the execution time of 1500000 times of the incremental operation is taken as a certain task to execute the worst execution time of the tasks in the experiment after the system time 1ticks according to the unit time.
9. A real-time system according to any one of claims 6-8, characterized in that the scheduler of the real-time system performs the steps of:
the method comprises the following steps: initializing the release queue omega to be empty, and initializing the current system time tcurrentEither ← 0, initialize task set schedulable Flag ← TRUE;
step two: initialization vector
Figure FDA0003144714900000031
Wherein i 1nextEach element stores the time of the next release of the corresponding task;
step three: in-systemAt the beginning, all tasks are as follows
Figure FDA0003144714900000032
Adding the non-descending order into a release queue omega;
step four: if Flag is True, starting to schedule the task set to execute the step five, otherwise, the task set can not be scheduled, and exiting the program;
step five: if it is not
Figure FDA0003144714900000033
Step six is executed, otherwise, the suspension self is awakened when waiting for the task to be added into omega;
step six: obtaining the current system time tcurrent←GetSystemTick();
Step seven: acquiring task tau to be released last timex←GetFirstElement(Ω);
Step eight: if it is not
Figure FDA0003144714900000041
It means that t has not been reachedxTime required to be released, sleep
Figure FDA0003144714900000042
Time, otherwise, executing step nine;
step nine: releasing task tauxAnd according to the last τxWhether the task instance finishes judging whether the task set can be scheduled or not, and modifying the value of Flag;
step ten: if Flag is TRUE,
Figure FDA0003144714900000043
and will tauxAnd adding the obtained product into omega again according to the rule of the step three, and repeatedly executing the step four.
10. A computer-readable storage medium characterized by: the computer readable storage medium stores a computer program configured to implement the steps of the heuristic processor partitioning method of any of claims 1-5 when invoked by a processor.
CN202011631493.0A 2020-12-31 2020-12-31 Heuristic processor partitioning method, system and storage medium for DAG task based on partition scheduling Active CN112463346B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011631493.0A CN112463346B (en) 2020-12-31 2020-12-31 Heuristic processor partitioning method, system and storage medium for DAG task based on partition scheduling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011631493.0A CN112463346B (en) 2020-12-31 2020-12-31 Heuristic processor partitioning method, system and storage medium for DAG task based on partition scheduling

Publications (2)

Publication Number Publication Date
CN112463346A CN112463346A (en) 2021-03-09
CN112463346B true CN112463346B (en) 2021-10-15

Family

ID=74802788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011631493.0A Active CN112463346B (en) 2020-12-31 2020-12-31 Heuristic processor partitioning method, system and storage medium for DAG task based on partition scheduling

Country Status (1)

Country Link
CN (1) CN112463346B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880083A (en) * 2022-03-24 2022-08-09 哈尔滨工业大学(深圳) Optimization method of logic complexity of DAG task execution and storage medium
CN115544321B (en) * 2022-11-28 2023-03-21 厦门渊亭信息科技有限公司 Method and device for realizing graph database storage and storage medium
CN116739319B (en) * 2023-08-15 2023-10-13 中国兵器装备集团兵器装备研究所 Method and system for improving task execution time safety of intelligent terminal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035819A (en) * 2014-06-27 2014-09-10 清华大学深圳研究生院 Scientific workflow scheduling method and device
CN106991006A (en) * 2017-03-30 2017-07-28 浙江天正信息科技有限公司 Support the cloud workflow task clustering method relied on and the time balances
US10002029B1 (en) * 2016-02-05 2018-06-19 Sas Institute Inc. Automated transfer of neural network definitions among federated areas
CN110362394A (en) * 2019-07-22 2019-10-22 北京明略软件系统有限公司 Task processing method and device, storage medium, electronic device
CN111061569A (en) * 2019-12-18 2020-04-24 北京工业大学 Heterogeneous multi-core processor task allocation and scheduling strategy based on genetic algorithm
CN111176817A (en) * 2019-12-30 2020-05-19 哈尔滨工业大学 Method for analyzing interference between DAG (demand-oriented architecture) real-time tasks on multi-core processor based on division scheduling

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109445926A (en) * 2018-11-09 2019-03-08 杭州玳数科技有限公司 Data task dispatching method and data task dispatch system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035819A (en) * 2014-06-27 2014-09-10 清华大学深圳研究生院 Scientific workflow scheduling method and device
US10002029B1 (en) * 2016-02-05 2018-06-19 Sas Institute Inc. Automated transfer of neural network definitions among federated areas
CN106991006A (en) * 2017-03-30 2017-07-28 浙江天正信息科技有限公司 Support the cloud workflow task clustering method relied on and the time balances
CN110362394A (en) * 2019-07-22 2019-10-22 北京明略软件系统有限公司 Task processing method and device, storage medium, electronic device
CN111061569A (en) * 2019-12-18 2020-04-24 北京工业大学 Heterogeneous multi-core processor task allocation and scheduling strategy based on genetic algorithm
CN111176817A (en) * 2019-12-30 2020-05-19 哈尔滨工业大学 Method for analyzing interference between DAG (demand-oriented architecture) real-time tasks on multi-core processor based on division scheduling

Also Published As

Publication number Publication date
CN112463346A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN112463346B (en) Heuristic processor partitioning method, system and storage medium for DAG task based on partition scheduling
Saifullah et al. Multi-core real-time scheduling for generalized parallel task models
Kato et al. Semi-partitioned fixed-priority scheduling on multiprocessors
Ueter et al. Reservation-based federated scheduling for parallel real-time tasks
Chen et al. Adaptive multiple-workflow scheduling with task rearrangement
Lee et al. Orchestrating multiple data-parallel kernels on multiple devices
Guan et al. Exact schedulability analysis for static-priority global multiprocessor scheduling using model-checking
Kim et al. Segment-fixed priority scheduling for self-suspending real-time tasks
CN111176817A (en) Method for analyzing interference between DAG (demand-oriented architecture) real-time tasks on multi-core processor based on division scheduling
Suzuki et al. Real-time ros extension on transparent cpu/gpu coordination mechanism
Guan et al. DAG-fluid: A real-time scheduling algorithm for DAGs
Roy et al. SLAQA: Quality-level aware scheduling of task graphs on heterogeneous distributed systems
Zahaf et al. A c-dag task model for scheduling complex real-time tasks on heterogeneous platforms: preemption matters
Akram et al. Efficient task allocation for real-time partitioned scheduling on multi-core systems
Socci et al. Time-triggered mixed-critical scheduler on single and multi-processor platforms
Jiang et al. Suspension-based locking protocols for parallel real-time tasks
Voronov et al. AI meets real-time: Addressing real-world complexities in graph response-time analysis
Cho et al. Conditionally optimal parallelization of real-time DAG tasks for global EDF
Saranya et al. Dynamic partitioning based scheduling of real-time tasks in multicore processors
Shi et al. Multiprocessor synchronization of periodic real-time tasks using dependency graphs
Maia et al. Scheduling parallel real-time tasks using a fixed-priority work-stealing algorithm on multiprocessors
Tran et al. Efficient contention-aware scheduling of SDF graphs on shared multi-bank memory
Wu et al. TDTA: Topology-based Real-Time DAG Task Allocation on Identical Multiprocessor Platforms
Ruaro et al. Dynamic real-time scheduler for large-scale MPSoCs
Nemati et al. Efficiently migrating real-time systems to multi-cores

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant