CN103473120A

CN103473120A - Acceleration-factor-based multi-core real-time system task partitioning method

Info

Publication number: CN103473120A
Application number: CN2012105729998A
Authority: CN
Inventors: 张炯; 龙其民; 牛天放; 李莹
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2012-12-25
Filing date: 2012-12-25
Publication date: 2013-12-25

Abstract

The invention discloses an RM (rate monotonic) real-time task scheduling algorithm-based multi-core real-time task partitioning method, wherein a task acceleration effect is taken into account in the method. The influence of a mutual acceleration function of tasks in an execution process on the utilization rate of a CPU (central processing unit) is taken into account on the basis of a conventional task partitioning strategy, so that an available spring coefficient in a spring algorithm is found to compress task execution time to improve the RM schedulability of a task set. According to the method, the acceleration effect of the tasks is represented by an acceleration factor. The method comprises the following steps of improving a BF (best-fit) algorithm to obtain a BF-lambda algorithm by utilizing the acceleration factor; finishing partitioning the task set on a plurality of processor cores by utilizing the BF-lambda algorithm.

Description

A kind of multinuclear real-time system task division method based on speedup factor

Technical field

What the present invention relates to is a kind of task division method, for be the real-time system operated on multi-core platform, belong to real-time multinuclear/multicomputer system task scheduling field.

Background technology

Be accompanied by that aviation electronics, arrow are carried, onboard system in the evolution of synthesization computing technique to the raising of capability requirement, increasingly sophisticated with the embedded real time computation system of these System Dependents, need simultaneously treated number of tasks greatly to increase.In actual system design planning, a kind of method that meets this demand has been proposed at present, adopt polycaryon processor to substitute the performance that existing monokaryon/uniprocessor improves these hard real-time systems, but cause thereupon, some are basic and general issue is urgently to be resolved hurrily, one of them very important problem be exactly in existing system the real-time scheduling towards uniprocessor how to be mapped in the multiplied unit environment, and putting forward high performance correctness and the real-time of simultaneously guarantee calculating.

From the development of real-time system software, the Real-Time Scheduling model of comparative maturity is mostly for the monokaryon environment, as classical RMS, DMS, EDF, LST etc. at present.And, in the multiplied unit environment, the scheduling of task becomes more complicated, need to consider the correlative factors such as efficiency, justice, load balancing.Real-Time Scheduling research for multinuclear/multiprocessor has in the last few years also obtained paying attention to widely, and becomes one of hot issue of academic circles at present research.Multi-processor task scheduling mainly contains two kinds of methods at present: splitting scheme and global scheme.In overall scheduling scheme method, the appearance each time of real-time task is all carried out on different processors, only moves the same dispatching algorithm on all processors.Task can be preempted and can be in scheduling scheme is being divided in migration between different processors before not executing, and the institute of a task occurs and all carries out on same processor, and all task is divided into processor in advance by task allocation algorithms; Each processor can move different or identical uniprocessor task scheduling algorithm.But, for splitting scheme, the responsibility of scheduler is no longer as the scheduling of only executing the task in the past.Except according to some specific dispatching algorithm, processing unit being switched between different tasks, scheduler also must be finished the work and be collected the division on a plurality of processing units before task scheduling.

Spring algorithm: Buttazzo etc. utilize the principle of telescopic spring the task in set of tasks to be regarded as to the spring of series connection, and each task is controlled the change in its cycle by a parameter that is referred to as elasticity coefficient, thereby reach the purpose of regulating its utilization factor.

Summary of the invention

The present invention is directed to the multinuclear real-time system and provided a kind of task division method based on speedup factor.

Usually due to the resource-constrained of system, the competition of resource often occurs and shares in the inter-related task that runs on same computing system, and the execution of task is not separate completely.While carrying out on the processor group of same processor or shared Cache such as any two or more tasks, can, because the correlativity of data is brought influencing each other in various degree, can be referred to as " cohort effect " each other.It can be both to promote that this impact is embodied on execution speed, can be also to suppress, and this impact effect depends on characteristic and the dispatching algorithm of task itself.In some situation because data dependence between task is good, first carrying out of task just follow-up work required data call in high-speed cache, thereby reduced the number of times that Cache does not hit and replaces, make execution speed accelerate; Inhibition may be because data dependence between successively carrying out of task is poor, need to be continually when order is carried out to Cache capablely write back, replacement operation, thereby execution speed is reduced.Relatively traditional monokaryon/uniprocessor computing environment, the computing platform of multiplied unit has dirigibility more, can set of tasks be divided according to " cohort effect " of task, then independent scheduled for executing on each processor core.

Below related definition will be introduced and model is set forth.

The MAMORTS task model: in a real-time system multiprocessor multi-core platform, definition P={p1, p2, p3 ..., pm} (m > 1) is one and has m the processor chips set that structure is identical, forms the SMP structure.CPU interconnects by high-speed communication interface, and the communication overhead between processor is ignored.Single processor chips pi={q in set _i1, q _i2..., q _in(n>1,1≤i≤n) set of forming for the isomorphism processor core, q _ij(1≤i≤m, 1≤j≤n) is multithreading based on SMT or CMP framework/polycaryon processor core.

There is real-time task set Ψ={ Ψ in define system ₁, Ψ ₂..., Ψ _k(k≤m*n), Ψ ₁to Ψ _kfor the subtask collection in Ψ, be assigned to different processor core q _ijupper execution.At any one processor q _ijon the subtask collection Ψ that moves _t={ τ ₁, τ ₂, τ ₃τ _l(l>0,1≤t≤k), all meet following constraint condition:

(C1) all task requests are all periodically, have hard time limit requirement, must within the time limit limited, complete;

(C2) time limit of task requires the task of only limiting to complete before the next one request generation of this task;

(C3) between task, can not be independently, the request of each task can depend on having started of other task requests, can exist the memory access space that the set of tasks of common factor is arranged;

(C4) time of scheduling and task switching ignores;

(C5) between task be preemptible;

(C6) determine working time during each task isolated operation, be finger processor in without the interruption situation for the treatment of the time C of this task the working time of task here _i; Consider subtask collection Ψ _t={ τ ₁, τ ₂, τ ₃τ _lin task the speedup factor relation may be arranged between any two, now consider no longer one by one the accelerating effect of other all tasks for a certain particular task, but consider that this subtask collection does the as a whole accelerating effect for all tasks of this subset.That is to say, now this subtask collection Ψ _tspeedup factor for each subtask collection, when speedup factor works, determine the working time of subtask collection,

Σ_{i = 1}^{k} c_{i}^{'} = Σ_{i = 1}^{k} c_{i} / λ_{t}

Determine;

In the spring model dispatching algorithm, elasticity coefficient is the key of utilization factor of setting the tasks, in fact this parameter has determined the controlled degree of task, but the unified quantization method of neither one is determined it, and the general available elasticity coefficient of even basic neither one can be utilized.In the present invention, speedup factor, as elasticity coefficient, proposes a kind of new task division method, by the cpu busy percentage of acceleration effect compression section task, improves overall schedulability.

Speedup factor λ: under the multinuclear computation model, consider influencing each other of task, task T _iactual execution time by C _ibe no longer constant, but can be subject to certain has correlativity task T with it _jthe perhaps S set of task _iimpact.Therefore, when considering task T _idivision the time, if the task T that the existing scheduler task of certain core concentrates existence to have with it correlativity _j, T _iexecution can obtain acceleration, exist so-called speedup factor λ to make execution time C _ishorten to

thereby reduce the utilization factor U of final processor core _i.Here in fact speedup factor λ has provided a kind of definition and computing method of concrete spring algorithm elasticity coefficient.Therefore, as allocating task T _ito on certain processor core the time, thereby can consider that this acceleration effect improves schedulability based on spring algorithm.In practice, can obtain exactly the speedup factor between task in twos by simulation and test; But, after completing division, can not guarantee that the number of tasks of distributing on each processor core is no more than 2.For certain processor core, if its current task number will be can not determine new task T so more than two _ithe speedup factor obtained due to the impact of other all tasks in this set, the method can't complete division.So the present invention has further provided the description for the whole acceleration effect of task-set.

The speedup factor of task-set: as previously mentioned, if in the method fully based on task speedup factor between any two carry out task division, so by a new task τ _tadd processor core q to _ijthe time, in order to consider τ _tadd q _ijoriginal subtask collection Ψ _kimpact, need to consider one by one τ _tto Ψ _kin the speedup factor of each task, and the speedup factor between task can not guarantee in the situation that exist other tasks to disturb in set in twos, under the set of tasks environment

still with measurement environment under λ _tibe consistent.Therefore, the speedup factor that is based on task-set that reality adopts in the present invention

according to the constraint condition of MAMORTS model, the speedup factor of any given subtask collection remains unchanged.Speedup factor based on task-set is convenient for task-set is carried out to the schedulability judgement, but, with respect to the difficulty when measuring of the speedup factor between task is larger in twos, needs to cover any one subset that Given task is concentrated.

The set that is 4 for a number of tasks, need all subtasks set of test as following table, a speedup factor of each task-set correspondence, as following table:

The subtask set	Speedup factor
		{τ ₁}
{τ ₂}
		{τ ₃}
{τ ₄}
		{τ ₁，τ ₂}
{τ ₁，τ ₃}
		{τ ₁，τ ₄}

{τ ₂，τ ₃}
		{τ ₂，τ ₄}
{τ ₃，τ ₄}
		{τ ₁，τ ₂，τ ₃}
{τ ₁，τ ₂，τ ₄}
		{τ ₁，τ ₃，τ ₄}
{τ ₂，τ ₃，τ ₄}
		{τ ₁，τ ₂，τ ₃，τ ₄}

Table 1

In the MAMORTS task model, set of tasks Ψ _t={ τ ₁, τ ₂, τ ₃τ _nwhole speedup factor

as long as set of tasks is definite, T also determines, because the execution time of each task is determined; And, for definite task-set, under specific dispatching algorithm, the relative order of tasks carrying also can be determined, but its actual execution time being not equal to

the theoretical execution time also is not equal to

because the cycle difference of each task, the number of times of carrying out in the set time section is also different.Therefore, by the theoretical execution time T " and actual execution time T " ' that provides task-set integral body, speedup factor is revised, obtained

in the method, T " being that the actual execution sequence under the RM algorithm carries out cumulative obtaining of theory time, T according to task " ' is to carry out test by the actual task collection at RM dispatching algorithm dispatching to draw.The λ occurred hereinafter _tif do not do special instruction refer to revised speedup factor.Determine the speedup factor of all possible subtask collection of the task-set that a task quantity is n, need to carry out

inferior different subtasks collection is carried out respectively to the measurement of speedup factor.So, the feasibility of the method in consideration engineering reality, the present invention increases a constraint to given division methods, that is:

On random processor core, assignable number of tasks is no more than certain restriction L, and when number of processor cores is k, total task quantity n must meet: n≤k*L (1)

Finally obtaining band affects the task-set speedup factor of distance L.

Band affects the task-set speedup factor of distance L: known in the description of the above-mentioned speedup factor based on task-set, and in order to obtain the speedup factor table based on task-set, before loading, task should carry out a large amount of task-set tests, and the number of times of test is O (2 ⁿ), and along with the expansion of task-set, the cost that the single test spends increasing (test of same subtask collection may need to consider different task order orders).In order to reduce the difficulty of test assignment collection speedup factor, can consider that setting one according to the principle of locality of program affects distance L, what this distance L was described is task quantity.

Concentrate the subtask that is M in task quantity:

(1) if M > L, run counter to constraint condition (1), think this task-set non-scheduling all on arbitrary processor core;

(2) if M≤L, the whole speedup factor of this subtask collection of consideration that can be complete, this is desired just.

In particular cases, if L=N, the speedup factor based on task-set that band affects distance L deteriorates to the speedup factor of not being with the task-set that affects distance.

Now, N=5, the form of speedup factor table during L=3 is as follows:

Table 2

The foundation of speedup factor table: this speedup factor table is the form of an one dimension, each subtask set Ψ _ta corresponding speedup factor λ _i.From the angle of function, this table can be expressed as function δ: Ψ _t→ λ _i, Ψ _tfor set of tasks territory, λ _ifor the speedup factor territory, function δ completes the mapping from given subtask collection to speedup factor, obtains the speedup factor of this subtask set.Ideally, the field of definition of δ should comprise set of tasks ψ={ τ ₁, τ ₂, τ ₃τ _nin all possible subset Ψ _i, 1≤i≤2 ⁿ.When n is very large, too many for the test that subset is carried out one by one, and can take a large amount of storage spaces, so proposed to be with the speedup factor table that affects distance L, utilize L to limit to need the length of the subtask collection of test and storage, the subset that task quantity is greater than L is not tested.

BF-λ algorithm: this algorithm is the Best-Fit algorithm of having considered speedup factor λ, and purpose is to select the processor core j (1≤j≤k) of applicable current task in the middle of k processor core.Be task τ _tin the time of distribution processor core, pay the utmost attention to those and both can meet Schedulable conditions:

(U<n (2 ^1/n-1)=L (n), (processor utilization that U is task-set ψ, n is task quantity in ψ) 1.

And the λ acceleration effect again significantly the processor core of (showing as the processor core utilization factor increment Delta U minimum that new task is brought) as the recipient of current task; If meet more than one of the processor core of this condition, so again according to the Best-Fit algorithm, therefrom select the recipient of the processor core of utilization factor maximum as this task.

This execution step of searching algorithm is:

Step 1: receive a task τ to be divided _l, initialization target processor core numbering j=0, the quantity that the processor core Base Serial Number i=1(k of sequential search is processor core);

Step 2: obtain the current set of tasks Ψ distributed of processor core i _i, Ψ _itask quantity be m, calculation task set Ψ _iutilization factor U at processor core i _i.

Step 2.1: if i > k, all processor cores all travel through, and finish to search, and return to lookup result j;

Step 2.2: if m>L, according to constraint condition (1), can not be by task τ _lbe assigned to current processor core, perform step 3; Otherwise, search the speedup factor table and obtain new set of tasks (Ψ _i+ { τ _l) speedup factor λ _i, and calculating adds task τ _lafter utilization factor

calculate the utilization factor increment

Step 2.3: according to the decision condition of RM dispatching algorithm, 1. judge task-set (Ψ _i+ { τ _l) on processor core i, whether can dispatch: if non-scheduling performs step 3; Otherwise, finish to search, return to the numbering j of optimum processor core;

Step 2.4: carry out the utilization factor arbitration of core i and core j: if utilization factor increment Delta U _i<Δ U _j, the task recipient that set handling device core i is current the best, make j=i; Execution step 3; If utilization factor increment Delta U _i=Δ U _j, perform step 2.5;

Step 2.5: carry out Best-Fit, if U _iu _j, j=i; This step is to adopt greedy algorithm meeting in schedulable situation, makes the utilization factor of processor core high as far as possible;

Step 3:i=i+1, perform step 2.1, checks next processor core;

BF-λ division methods: this division methods is to consider the λ speedup factor and a kind of task division method of obtaining on the Best-Fit bin packing algorithm, and the core algorithm of the method is BF-λ algorithm.Simple declaration for this division methods:

In the computing environment with k processor core, the general assignment collection S={ τ that is n for task quantity ₁, τ ₂..., τ _n, the method attempts to find a kind of suitable scheme that S is divided into to k subset: for each core i (1≤i≤k), the task subset of distribution is Ψ _i, S={ Ψ is arranged ₁, Ψ ₂... Ψ _i... Ψ _k; If there is no so a kind of division, judge that task-set S does not meet the Schedulable conditions of the method.

The implementation step of the method is as follows:

Step 1: initialization, empty the set of tasks of each processor core, Ψ is set _i=Φ, (1≤i≤k); Simultaneously, the S set of finishing the work by successively decrease DU sequence of utilization factor, for the τ of any two tasks in S _pand τ _q, (1≤p≤q≤n),, U is arranged _p>=U _q;

Step 2: from S, according to the non-order that increases of utilization factor, obtain a task τ _l; Then carrying out BF-λ algorithm, is task τ _lfind the processor core an of the best, establish it and be numbered j (1≤j≤k), j=BF-λ (τ is arranged _l);

Step 3: if obtain j=0 in step 2, BF-λ algorithm is searched unsuccessfully, illustrates and does not now have to receive task τ _lprocessor core, this time divides unsuccessfully end; Otherwise by τ _lbe assigned to processor core j, upgrade the task subset Ψ of j _j=Ψ _j+ { τ _l, upgrade general assignment collection S=S-{ τ _l;

Step 4: if S ≠ Φ also exists task to be divided in task-set S, perform step 2; Otherwise, output splitting scheme Ψ _i(1≤i≤k), this time divide successfully, finishes.

The present invention, on the basis of RM algorithm, spring algorithm and Best-Fit bin packing algorithm, has proposed a kind of multinuclear real-time task division methods based on speedup factor, compares original task division method, has the following advantages and effect:

1) consider influencing each other in the tasks carrying process, described really task execution environment

Influencing each other between not consideration task of existing partition strategy, such as traditional bin packing algorithm is applicable to (Next-Fit) algorithm, is applicable to (First-Fit) algorithm at first next time, best (Best-Fit) fit algorithm etc. has only been considered the absolute attribute of task.But in reality due to resource contention with share influencing each other between causing of task and also can be used as the execution environment that parameter is described task more realistically.

2) can improve the RM schedulability of part set of tasks

Under the RMS model, the schedulability test condition of task-set is U<n (2 ^1/n-1)=L (n), (processor utilization that U is task-set ψ, n is task quantity in ψ).The calculating of utilization factor U does not consider that the execution time that the impact between task brings changes this spring-compressed effect.If consider this acceleration effect, the utilization factor U ' calculated has U '<U for some task-set, thereby improves the schedulability of task-set.

The accompanying drawing explanation

Accompanying drawing 1: be the multinuclear real-time task partition functionality module map based on speedup factor in the present invention

Accompanying drawing 2: the improvement Best-Fit algorithm flow chart that is based on speedup factor

Accompanying drawing 3: the process flow diagram that is the multinuclear real-time task division methods based on speedup factor in the present invention

Embodiment

In following concrete exemplifying embodiment, by reference to the accompanying drawings the present invention is described in further detail.

As shown in Figure 1, the multinuclear real-time task division methods based on speedup factor of the present invention's design is divided into 3 modules on function: divide control module 1, task choosing module 2, and the module 3 of carrying out the processor core selection according to BF-λ algorithm.Wherein, module 3 also comprises following functional module: search control module 31, speedup factor λ computing module 32, and processing unit utilization factor computing module 33, wherein, search control module 31 and receiving new task τ _lthe traversal of rear control to processor core, speedup factor λ computing module 32 carries out searching of speedup factor table, obtains the speedup factor of task-set; Processor core utilization factor computing module 33 is according to the utilization factor of task-set and corresponding speedup factor calculation processing unit.

Divide the main initialization of each processing unit task-set and the traversal of initiating task S set be responsible for of control module 1.Each traversal all makes in S to reduce an element, until S becomes empty set or can not proceed and failure because certain task non-scheduling causes division methods before becoming empty set.If divide successfully, obtained being assigned to the set Ψ that the task subset of each core forms={ Ψ ₁, Ψ ₂..., Ψ _k}=S.

The main fit module 1 of task choosing module 2 is responsible for selecting the next task of distributing of carrying out.The order of task can be random, can be also orderly (according to the utilization factor DU that successively decreases, execution time successively decrease DC etc.).This method adopts utilization factor sort descending DU, and task is each time chosen the task of all selecting utilization factor maximum in S set.

Processor core selection module 3 fit module 1 complete carries out the processor core selection to the task of appointment, and the algorithm of choosing is BF-λ algorithm.If there is suitable processor core, provide its numbering, be numbered 0 expression and do not have the processor core that can receive current task, now module 1 can be announced the division failure for S.

Number for element in the set of tasks S(S of k processor core is n) division, the multinuclear real-time task division methods based on speedup factor that the present invention proposes comprises following steps:

Step 1: initialization, empty the task-set of each processor core i (1≤i≤k), Ψ is set _i=Φ; Simultaneously, the S set of finishing the work by successively decrease DU sequence of utilization factor, for the τ of any two tasks in S _pand τ _q, (1≤p≤q≤n),, U is arranged _p>=U _q;

Step 2: from S, according to the non-order that increases of utilization factor, obtain a task τ _l; Then carrying out following steps according to BF-λ algorithm is task τ _lfind the processor core j an of the best;

Step 2.1: initialization target processor core numbering j=0, and the initiated process device core of sequential search numbering i=1;

Step 2.2: obtain the current set of tasks Ψ distributed of processor core i _i, Ψ _itask quantity be m, calculation task set Ψ _iutilization factor U at processor core i _i.

Step 2.2.1: if i > k, all processor cores all travel through, and finish to search, and return to lookup result j; Otherwise, execution step 2.2.2;

Step 2.2.2: if m>L, according to constraint condition (1), can not be by task τ _lbe assigned to current processor core, perform step 2.3; Otherwise, search the speedup factor table and obtain new set of tasks (Ψ _i+ { τ _l) speedup factor λ _i, and calculating adds task τ _lafter utilization factor

calculate the utilization factor increment

carry out next step;

Step 2.2.3: according to the decision condition of RM dispatching algorithm, 1. judge task-set (Ψ _i+ { τ _l) on core i, whether can dispatch: if non-scheduling performs step 2.3; Otherwise, finish to search, return to the numbering j of optimum processor core;

Step 2.2.4: carry out the utilization factor arbitration of core i and core j: if utilization factor increment Delta U _i<Δ U _j, the task recipient that set handling device core i is current the best, make j=i; Execution step 2.3; If utilization factor increment Delta U _i=Δ U _j, perform step 2.2.5;

Step 2.2.5: carry out the Best-Fit matching algorithm and selected: if U _iu _j, j=i; This step is to meet in schedulable situation, makes the utilization factor of processor core high as far as possible;

Step 2.3:i=i+1, continue to check next processor core, execution step 2.2.1;

Step 3: if j=0, BF-λ algorithm is searched unsuccessfully, illustrates and does not now have to receive task τ _lprocessor core, this time divides unsuccessfully end; Otherwise, by τ _lbe assigned to processor core j, upgrade the task subset Ψ of j _j=Ψ _j+ { τ _l, upgrade general assignment collection S=S-{ τ _l;

Step 4: if S ≠ Φ performs step 2; Otherwise, output splitting scheme Ψ _i(1≤i≤k), this time divide successfully, finishes.

Claims

1. the use of the interactional speedup factor table of description task, it is characterized in that: the S set that is n for a task quantity, set up the corresponding relation from the subset to the speedup factor for arbitrary task subset, what this speedup factor was portrayed is due to resource contention and shared interactional degree or the correlativity caused between task.The division of the multinuclear real-time system task based on the task speedup factor is served in the foundation of this speedup factor table, decides the grouping relation of task in concrete partition process by the speedup factor of task.

2. the modified based on speedup factor preferably is applicable to (Best-Fit) algorithm---BF-λ algorithm, it is characterized in that, is implemented as follows step:

Step 2.2: search the speedup factor table and obtain new set of tasks (Ψ _i+ { τ _l) speedup factor λ _i, and calculating adds task τ _lafter utilization factor calculate the utilization factor increment

Step 2.3: according to the decision condition judgement task-set (Ψ of RM dispatching algorithm _i+ { τ _l) on processor core i, whether can dispatch: if non-scheduling performs step 3; Otherwise, finish to search, return to the numbering j of optimum processor core;

Step 3:i=i+1, perform step 2.1, checks next processor core.

3. the multinuclear real-time system task division method based on speedup factor, is characterized in that, is implemented as follows step:

4. the multinuclear real-time system task division method based on speedup factor according to claim 3, is characterized in that, in described step 2, is a certain task τ _lcomplete being specially of processor core selection: be task τ _lin the time of distribution processor core, pay the utmost attention to those and both can meet Schedulable conditions, and the λ acceleration effect again significantly the processor core of (showing as the processor core utilization factor increment Delta U minimum that new task is brought) as the recipient of current task; If meet more than one of the processor core of this condition, so again according to the Best-Fit algorithm, therefrom select the recipient of the processor core of utilization factor maximum as this task.