CN103246563A

CN103246563A - Multi-layer block scheduling method with storage sensing function

Info

Publication number: CN103246563A
Application number: CN201310145363XA
Authority: CN
Inventors: 王艳; 李肯立; 杜家宜; 唐卓; 肖正; 朱宁波
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2013-04-24
Filing date: 2013-04-24
Publication date: 2013-08-14
Anticipated expiration: 2033-04-24
Also published as: CN103246563B

Abstract

The invention discloses a multi-layer block scheduling method with a storage sensing function. The method includes: firstly, acquiring directions of block vectors (Pi and Pj) according to task dependency; secondly, acquiring a relation between block vector magnitudes f and h and the size of data to be loaded and stored in each cycle, and schedule length Ls at each iteration; thirdly, determining the size of sub-blocks of the blocks according to the size of a local storage and the schedule length; fourthly, changing dependency of the tasks by iterative re-timing technique, and reconstructing positions of the blocks; fifthly, reconstructing a block space by using each sub-block in primary block division as a node, and performing secondary block division; and partitioning the iterative space to obtain an execution sequence diagram according to the two block vectors, and scheduling the tasks according to the execution sequence diagram. The method has the advantages that storage capacity and storage delay are combined, the blocks and the dependency of the tasks are changed, and accordingly task parallelism is improved, the amount of write operation is reduced and average scheduling time is reduced.

Description

A kind of multilayer piecemeal dispatching method with storage perception

Technical field

The present invention relates to a kind of multilayer piecemeal dispatching method with storage perception.

Background technology

It all is the iterative recursive circulation that the processing of most of science and data-signal is used.This generic task has run into two challenges when embedded multiprocessor is carried out: first, most of data-signal Processing tasks are to calculate responsive type and the application of data sensitive type, use for this class, the bad scheduling strategy of efficient will produce a large amount of write operations, therefore can consume a large amount of time and energy consumption; The second, the development of CPU speed is too quick with respect to the storer inbound pacing, and the storage speed of storer has slowly seriously hindered the raising of system performance.Though embedded multiprocessor has the one's own instruction set of a cover, can realize different calculation tasks flexibly by software programming, but the coding that is instructed and execution sequence restriction, the memory access bottleneck reaches the restriction of fixing hierarchy of control structure, often can not arrive top speed and optimum efficiency.

(prefetching) strategy of looking ahead is a kind of technology that can effectively improve the performance of system that proposes at storage delay, namely before data have demand, just these data are deposited in cache memory (cache), can tolerate long storage time-delay like this.Prefetch policy of the prior art can be divided three classes: hardware based prefetch policy, and based on the prefetch policy of software and based on the prefetch policy of hardware and software.But hardware based prefetch policy requires some supporter to be linked to cache memory cache, and depends on dynamic information available in the process of carrying out.Remove to analyze one section static routine and depend on compiler technologies based on the prefetch policy of software, and in program code, add prefetched instruction.But too many prefetch operation will cause a unbalanced scheduling and storage time-delay meeting very long.

For this reason, a lot of embedded multiprocessors have used SPM to come replaced C ache, and SPM is a kind of minimum storage that is embedded on the chip, are a kind of compiler support and the storer that can manage by software.In fact the SPM storer can regard the local storage of each nuclear core as, and energy is the performance of optimization system further, and can effectively reduce the consumption of energy.But for the large-scale data signal processing tasks, resource management scheduling strategy implementation will produce a large amount of write operations improperly.

In order to increase this locality storage of data, a lot of researchs all are devoted to according to the management of implemening resource integration of the situation of task.Traditional multidimensional task resource management method, executing the task is that dependence according to task is carried out in proper order with row-Lie (row-column) or column-row.Because the restriction of local storage, this manner of execution can produce lot of data and need write in the main memory, even can cause losing of a lot of data.

Summary of the invention

The invention provides a kind of multilayer piecemeal dispatching method with storage perception, its purpose is, by coming resource is managed distribution and scheduling to carrying out reasonably repeatedly piecemeal between iterative space, when overcoming the resource unreasonable distribution that exists in the prior art and causing scheduling strategy to be carried out, the problem that deadline is long and energy consumption is many, overcome the restriction owing to local storage simultaneously, cause the problem of loss of data easily.

A kind of multilayer piecemeal dispatching method with storage perception may further comprise the steps:

Step 1: all tasks are performed once as an iteration, carry out between the iterative space that the group task with execution sequence repeatedly makes up as the piecemeal object with needs, determine the piecemeal vector (P between iterative space _i, P _j) direction, the size of piecemeal on the Pj direction is f, and the size of piecemeal on the Pi direction is h, and two that find out ragged edge the dependence set D between task rely on CW and CCW, P _i=CCW and P _j=CW;

Step 2: determine the size of data of the required loading of current iteration and preservation and the relational expression of piecemeal vector magnitude f and h, and the scheduling length Ls of current iteration;

Step 3: the big or small f and the h that determine the piecemeal vector according to strategy one and strategy two;

1) setting f is 1, calculates h according to strategy one and strategy two, obtains h1 and h2 respectively;

Strategy one: 2NUM _Other+ NUM _Top+ NUM _Next≤ M _s

Strategy two: (NUM _Top+ NUM _Other) Tw+NUM _Other* Tr≤Ls * f * h;

2) if h1〉h2, then the value of h is h2, adopts strategy one and strategy two to calculate f, obtains f1 and f2 respectively, enters 3);

Otherwise dividing the value of block size h is h1, and the value of f is 1;

3) if f1〉1, divide block size to be defined as f1*h2; Otherwise the branch block size is f*h1, and f=1;

Step 4: technology when adopting the iteration restatement, the dependence of nexine circulation between the time-delay change task between the dispersion task, reconstruct divides block space;

Step 5: divide block space for the first time according to a minute block size f*h division, each sub-piecemeal that the first time, piecemeal produced is used as a node namely as a bunch of task, constitute between new iterative space, successively each sub-piecemeal is carried out piecemeal according to step 1, obtain the direction vector (P2 of piecemeal for the second time _i, P2 _j);

Step 6: determine the size of piecemeal vector for the second time;

The piecemeal vector is at P2 for the second time _iSize on the direction is N _Core, at P2 _jSize on the direction is 1, N _CoreQuantity for processor cores;

Step 7: obtain execution sequence figure to carrying out between iterative space behind the piecemeal according to two piecemeal vectors that obtain, dispatch according to the task of execution sequence figure.

The concrete deterministic process of the direction of piecemeal vector is as follows in the described step 1:

Dependence between task refers to the execution sequence between the task, uses d _k=(d _Ki, d _Kj) expression, wherein d _KiRepresent that two tasks are at the execution dependence of nexine circulation, d _KjRepresent two tasks in the execution dependence of skin circulation, two that find out ragged edge the dependence set D between task rely on CW and CCW, P _i=CCW and P _j=CW;

The counterclockwise interval vector of CCW refers to the vector with j vector angle maximum, and the clockwise interval vector of CW refers to the vector with j vector angle minimum.

Wherein, dependence between task refers to the execution sequence between the task, namely a task must be waited for the relation that just can be performed after another task is finished, in computing machine, represent the task executions order with figure usually, each node among the figure represents a task, and the dependence between the limit representative task between node and the node is the suffered specific limited of task execution sequence just; An iteration represents all tasks and all is performed once, and all iteration constitute between iterative space; The i.e. circulation of iteration, i represent certain task in a circulation (nexine circulation) i be performed, j represent j circulation (skin circulates) namely all tasks be performed for the j time.

The relational expression of the size of data of the required loading of current iteration and preservation and piecemeal vector magnitude f and h is as follows in the described step 2:

(1) size of data of the required loading of current iteration and preservation comprises two parts: first, the size of data that current iteration produces is NUM _Next+ NUM _Top+ NUM _OtherSecond portion loads the size of data that current iteration and next iteration need use in advance and is NUM _Other

{NUM}_{other} = \underset{d_{k}}{Σ} A_{goto_others} (d_{k}) = \underset{d_{k}}{Σ} (d_{ki}) (d_{kj})

{NUM}_{top} = \underset{d_{k}}{Σ} A_{goto - top} (d_{k}) = \underset{d_{k}}{Σ} d_{ki} (f - d_{kj})

{NUM}_{next} = \underset{d_{k}}{Σ} A_{goto - next} (d_{k}) = \underset{d_{k}}{Σ} d_{kj} (h - d_{ki})

Wherein, NUM _NextRepresent this piecemeal to produce and next piecemeal is badly in need of the size of data used, NUM _TopRepresent this piecemeal generation and be positioned at vertical with the Pi direction, simultaneously adjacent with the current piecemeal required size of data of using of piecemeal;

NUM _OtherRepresent that this piecemeal produces and except next piecemeal (next partition) and top piecemeal (top partition), the size of data that other all piecemeals need be used;

Scheduling length Ls refers to the time that an iteration is carried out in the described step 2.

NUM in the described step 3 in strategy one and the strategy two _NextRepresent this piecemeal to produce and next piecemeal is badly in need of the data used, NUM _TopRepresent this piecemeal generation and be positioned at vertical with the Pi direction, simultaneously adjacent with the current piecemeal required data of using of piecemeal; NUM _OtherRepresent that this piecemeal produces and except next piecemeal and top piecemeal, the data that other all piecemeals need be used; Ls represents the scheduling length of each iteration, and Tr represents to read a needed time of data from main memory, and Tw represents to write data to the needed time of main memory; Ms refers to the SPM(scratch-pad storage) amount of capacity.

(retiming) a kind ofly optimizes the technology of cycle period by assignment latency during restatement, and the rotation scheduling be a kind of during based on restatement the resource restriction of technology optimize scheduling strategy, it postpones to obtain a compacter scheduling by redistributing.The piecemeal dispatching technique is technology and prefetching technique during in conjunction with the iteration restatement, each iteration is regarded as a point, (it should be noted that dividing between iterative space then, an iteration refers to all tasks are carried out once, and comprise all iteration between iterative space), the execution of a piecemeal of a piecemeal then.Because the dependence between the task, so in piecemeal, how to consider emphatically to make piecemeal reasonable by piecemeal.That is to say between piece and the piece can not have endless loop, can a piecemeal of a piecemeal carry out scheduled for executing.

Beneficial effect

The invention provides a kind of multilayer piecemeal dispatching method with storage perception, the first step draws the direction that the correct shape of piecemeal is for the first time namely determined two piecemeal vectors according to the dependence of task, note do (Pi, Pj); Second step obtained each required loading of circulation and the size of data of preservation and the relational expression of piecemeal vector magnitude f and h, and the scheduling length Ls of each iteration; In the 3rd step, decide the son of the piecemeal first time to divide block size according to local storage size and scheduling length; The 4th step, the dependence when utilizing the iteration restatement between the technology change task, reconstruct piecemeal position; In the 5th step, on the basis of the piecemeal first time, each sub-piecemeal of piecemeal is used as a node and is rebuild a branch block space the first time,, carry out the piecemeal second time; Obtain execution sequence figure to carrying out between iterative space behind the piecemeal according to two piecemeal vectors that obtain, dispatch according to the task of execution sequence figure; Multilayer piecemeal dispatching method with storage perception has been taken all factors into consideration memory span and storage delay, carries out and to the adjustment of dependence between the task, has improved the degree of parallelism of task by piecemeal, has reduced the quantity of write operation and has reduced average scheduling time.

Very strict for branch selection block-shaped and direction, because the dependence between the task, irrational piecemeal scheme will cause task to carry out, and reasonably piecemeal will reduce the time of task scheduling and the generation of write operation; The inventive method is taken all factors into consideration memory span and storage delay, effectively improves the overall performance of system.

Description of drawings

Fig. 1 is process flow diagram of the present invention;

Fig. 2 is synoptic diagram between iterative space;

Fig. 3 is two dimensional application task model and corresponding task image MDFG, and wherein, figure (a) is a two dimensional application task model, the MDFG figure of figure (b) for obtaining according to figure (a);

Fig. 4 be dependence between task at the location drawing of counterclockwise interval CCW and clockwise interval CW, wherein figure (a) be CCW and CW area schematic, scheming (b) is the location drawing of dependence in CCW and CW zone between task;

The reasonable piecemeal of Fig. 5 and unreasonable piecemeal synoptic diagram, wherein, figure (a) is unreasonable piecemeal, the execution sequence figure of figure (b) for obtaining according to figure (a) piecemeal, figure (c) is reasonable piecemeal, the execution sequence figure of figure (d) for obtaining according to figure (c);

Fig. 6 is the nexine dependence synoptic diagram between technology change task when using restatement;

Fig. 7 piecemeal dispatching sequence figure schemes (a) expression by the face over one's competence that the second time, piecemeal was formed, and figure (b) represents piecemeal and the relation of piecemeal for the first time for the second time, and figure (c) is the execution sequence of piecemeal;

Fig. 8 processor part concerns with the scheduling of memory portion;

Fig. 9 is the write operation number contrast synoptic diagram of multiple dispatching method;

The average scheduling time contrast of the task of the multiple dispatching method of Figure 10 synoptic diagram.

Embodiment

The present invention is described in further detail below in conjunction with accompanying drawing.

For more convenient resource block management, we at first carry out modelling to the task execution sequence and handle, the use two-dimensional coordinate is represented, i represents the direction of nexine circulation, j represents the direction of outer circulation, each circulation can be used iteration iteration(i, j) expression, and an iteration represents all tasks and all is performed once.

In this example, the configuration of the computing machine that inspection is executed the task, the kernel number that obtains computer processor is 3, the amount of capacity of the SPM that inlays on each kernel is that the computing unit number of 128KB and each kernel is 3, read a needed time T r=2clock cycle(clock period of data from main memory), write data to the needed time T w=4clock cycle(clock period of main memory).

As shown in Figure 1, have the process flow diagram of the multilayer piecemeal dispatching method of storage perception for the present invention is a kind of, its concrete operations step is as follows:

Step 1: all tasks are performed once as an iteration, carry out between the iterative space that the group task with execution sequence repeatedly makes up as the piecemeal object with need, determine the piecemeal vector (P between iterative space _i, P _j) direction, the size of piecemeal on the Pj direction is f, and the size of piecemeal on the Pi direction is h, and two that find out ragged edge the dependence set D between task rely on CW and CCW, P _i=CCW and P _j=CW;

Wherein, dependence between task refers to the execution sequence between the task, namely a task must be waited for the relation that just can be performed after another task is finished, in computing machine, represent the task executions order with figure usually, each node among the figure represents a task, and the dependence between the limit representative task between node and the node is the suffered specific limited of task execution sequence just; An iteration represents all tasks and all is performed once, and all iteration constitute between iterative space; The i.e. circulation of iteration, each task all will be performed repeatedly, which time of a task A carried out us and use A[i, j] expression, A[i, j] expression task A will i circulation in the nexine circulation, and j circulation was performed once during skin circulated.

As shown in Figure 3, be a concrete two dimensional application task model and corresponding task image MDFG, task image MDFG=＜V, E, d, t〉be an X-Y scheme with node weights and limit weights, wherein the V representation node just represents a task in this application, dependence between the E representative task, (u, v) ∈ E means between node u and the node v and exists dependence, and d (e)=(d _i, d _j) representing a delay, the concrete dependence between two tasks has been described.

If task a is expressed as d to the dependence between the task b _k=(x y), means that then (i, task b j) depend on iteration iteration (i-x, task a j-y) to iteration iteration.d _kThe dependence of=(0,0) expression between the task of same iteration.Two dimensional application among Fig. 2 comprises 4 tasks, is respectively A, B, C and D, is d as task A to the dependence between the task B _k=(0,0), task C is d to the dependence between the task D _k=(0,1).Dependence set D={d1 between CCW and CW and task, d2, d3, d4, the relation of d5}, as shown in Figure 4.

In this example, there are 3 nonzero-lag vectors (0,1) (1,0) and (1,1) in the dependence D set.So (1,0) is the CW vector, (1,1) is CCW, and the vector of piecemeal is P for the first time _i=CCW and P _j=CW;

Because cycle applications has basic characteristics: the task that each iteration (circulation) is carried out is all identical, and each task order of carrying out is all identical, so each dependence that iteration possesses is all consistent.In order to find out the clockwise interval vector of CW(more easily) and the counterclockwise interval vector of CCW(), we use vector representation to all dependences, and CCW refers to and the vector of j vector angle maximum so, and CW refers to the vector with j vector angle minimum.

Find out ragged edge from D two rely on CW and CCW, P _i=CCW and P _j=CW, i.e. d ₄=CCW, d ₃=CW;

Step 2: calculate the size of data of the required loading of current iteration and preservation and the relational expression between f and the h, represent with the equation that contains h or f respectively, and the scheduling length Ls of current iteration, suppose that namely f or h determine;

(1) size of data of the required loading of current iteration and preservation comprises two parts: first, the size of data that current iteration produces is NUM _Next+ NUM _Top+ NUM _OtherSecond: load the size of data that current iteration and next iteration need use in advance and be NUM _Other(2) scheduling length refers to the time that an iteration is carried out, in this example, it is all consistent to set all task execution times, suppose to contain in the iteration n task, and there be m core to carry out, scheduling length Ls=(n/m so) * and each task executions time, each task executions time is i.e. 1 clock period unit interval.

{NUM}_{other} = \underset{d_{k}}{Σ} A_{goto_others} (d_{k}) = \underset{d_{k}}{Σ} (d_{ki}) (d_{kj})

{NUM}_{top} = \underset{d_{k}}{Σ} A_{goto - top} (d_{k}) = \underset{d_{k}}{Σ} d_{ki} (f - d_{kj})

{NUM}_{next} = \underset{d_{k}}{Σ} A_{goto - next} (d_{k}) = \underset{d_{k}}{Σ} d_{kj} (h - d_{ki})

Wherein, NUM _NextRepresent this piecemeal to produce and next piecemeal is badly in need of the size of data used, NUM _TopRepresent this piecemeal generation and be positioned at the required size of data of using of vertical with the Pi direction and adjacent with current piecemeal piecemeal; NUM _OtherRepresent the size of data that this piecemeal produces and other all piecemeals need be used except next piecemeal (next partition) and top piecemeal (top partition);

Step 3: the big or small f*h that determines piecemeal;

1) when f=1, draws h1 and h2 respectively according to strategy one and strategy two;

Strategy one: 2NUM _Other+ NUM _Top+ NUM _Next≤ M _s

Strategy two: (NUM _Top+ NUM _Other) Tw+NUM _Other* Tr≤Ls * f * h;

Wherein: NUM _NextRepresent this piecemeal to produce and next piecemeal is badly in need of the size of data used, NUM _TopRepresent this piecemeal generation and be positioned at the required size of data of using of vertical with the Pi direction and adjacent with current piecemeal piecemeal; NUM _OtherRepresent the size of data that this piecemeal produces and other all piecemeals need be used except the next piecemeal top of institute piecemeal; Ls represents the scheduling length of each iteration, and Tr represents to read a needed time of data from main memory, and Tw represents to write data to the needed time of main memory; Ms refers to the SPM(scratch-pad storage) amount of capacity;

2) size of judgement h1 and h2 if h1 greater than h2, then makes h=h2, is utilized strategy one and strategy two calculating f ₁Otherwise the branch block size is f*h1, and f=1;

3) if f1 greater than 1, divides block size to be defined as f1*h2 so; Otherwise the branch block size is f*h1, and f=1;

In this example, try to achieve and divide block size f=1, h=4 for the first time.

After carrying out the piecemeal first time between iterative space, its piecemeal synoptic diagram as shown in Figure 5, wherein, figure (a) be unreasonable piecemeal, figure (b) finds out therefrom that for according to scheming execution sequence figure that (a) piecemeal obtains there is endless loop in this execution sequence, can't carry out; The reasonable piecemeal of figure (c) for using the inventive method to obtain, the execution sequence figure of figure (d) for obtaining according to figure (c); From figure (d) as can be seen, after the piecemeal, the dependence between piece and the piece is remaining (1,0) only, (1,1) two kinds for the first time.

(retiming) a kind ofly optimizes the technology of cycle period by assignment latency during restatement, and the rotation scheduling be a kind of during based on restatement the resource restriction of technology optimize scheduling strategy, it postpones to obtain a compacter scheduling by redistributing.The piecemeal dispatching technique is technology and prefetching technique during in conjunction with the iteration restatement, and each iteration is regarded as a point, then to dividing between iterative space, and the execution of a piecemeal of a piecemeal then.Because the dependence between the task, so in piecemeal, how to consider emphatically to make piecemeal reasonable by piecemeal.That is to say not have endless loop between piece and the piece, can a piecemeal of a piecemeal carry out scheduled for executing.

In order to obtain a compacter scheduling, technology was come the dependence between the change task when we will carry out an iteration restatement, technology is the dependence between the delay reconstruction task of utilizing between the dispersion task during iteration restatement, shortens the task executions cycle with this.In order to keep the execution sequence of row-wise, need guarantee not change the dependence between piecemeal and the piecemeal during iteration restatement, so the circulation dependence of innermost layer between our the change task postpones to be d such as existing between task A and the task B ₃=(1,1), after postponing by dispersion, the delay between A and the B becomes d ₃=(0,1), that is to say, when not utilizing the iteration restatement before the technology, iteration(i, carrying out j) of task A must depend on iteration iteration(i+1, and j-1) the task B in is after technology changes delay when utilizing the iteration restatement, iteration(i, carrying out j) of task A depend on iteration (i, the B that carries out in j-1), as shown in Figure 6.

Step 5: divide block space for the first time according to a minute block size f*h division, each sub-piecemeal that the first time, piecemeal produced is used as a node namely as a bunch of task, constitute between new iterative space, obtain the direction P2 of piecemeal for the second time according to step 1 _iAnd P2 _j

For the first time piecemeal is to carrying out piecemeal between iterative space, and for the second time piecemeal be to the first time piecemeal sub-piecemeal carry out piecemeal, then each for the first time the sub-piece of piecemeal be defined as partition (i, j).

The framework of task scheduling as shown in Figure 7 and Figure 8, Fig. 7 has illustrated the execution sequence of task scheduling, and Fig. 8 has illustrated the scheduling of a task.In the method, for convenience's sake, we are divided three classes to the situation of the piecemeal first time (first_level_partition) according to the current first_level_partition that is carrying out: next first_level_partition, top first_level _ partition and other first_level_partition.And according to the position that utilizes of data it is carried out subregion for each first_level_partition piecemeal, shown in Fig. 8 (a), be divided into four zones altogether, first zone, the task of this piecemeal of data that representative produces will be used, task among the data n ext first_level_partition that second region representation produces need be used, task among the data top first_level_partition that the 3rd region representation produces need be used, and the 4th zone refers to that the data other first_level_partition that produces need use.We can calculate each delay (d(e) among the first_level_partition very soon like this): d _k=(d _Ki, d _Kj), the data of other first_level_partition of confession of generation:

A _{goto_top}(d _k)=area(PQVU)=d _ki(f-d _kj)

A _{goto_next}(d _k)=area(VSWX)=d _kj(h-d _ki)

A _{goto_other}(d _k)=area(UVRS)=d _kid _kj

Further, when we determined the branch block size of first_level_partition, we can calculate the data that a first_level_partition the inside produces and will store very soon.

{NUM}_{other} = \underset{d_{k}}{Σ} A_{goto_others} (d_{k}) = \underset{d_{k}}{Σ} (d_{ki}) (d_{kj})

{NUM}_{top} = \underset{d_{k}}{Σ} A_{goto - top} (d_{k}) = \underset{d_{k}}{Σ} d_{ki} (f - d_{kj})

{NUM}_{next} = \underset{d_{k}}{Σ} A_{goto - next} (d_{k}) = \underset{d_{k}}{Σ} d_{kj} (h - di)

Step 6: determine the size of piecemeal for the second time; From hardware configuration information, obtain the quantity N of processor cores _Core=3.P2 then _iThe size of direction is N _Core=3, P2 _jThe size of direction is 1.

As shown in Figure 9, using multiple bencmark is that benchmark is tested the inventive method TLP and other two kinds of algorithm List and IRP in the performance of task on average scheduling time; As seen from the figure, the inventive method TLP(has the multilayer piecemeal task scheduling strategy of storage perception) obviously be better than other two kinds of dispatching algorithms in the performance of task aspect average scheduling time.The performance increase rate reaches about 30%.This be because have the storage perception the piecemeal scheduling strategy when carrying out the branch block dispatching, not only consider the degree of parallelism of task, also fully taken into account storage delay, guarantee that the scheduling time of storer is no longer than the scheduling time of processor, some storage time-delays have been avoided like this, save the stand-by period, thereby improved the performance of system, reduced scheduling time.

Performance aspect write operation is used multiple bencmark and is benchmark test the inventive method TLP and other two kinds of algorithm List and the performance of IRP aspect write operation as shown in figure 10; As seen from the figure, the inventive method TLP(have the storage perception multilayer piecemeal task scheduling strategy) than other two kinds of algorithms, the write operation decreased average about 45%.This is because fully taken into account the capacity of local storage in the partition strategy of the present invention, through twice piecemeal, guarantee that there is local storage as far as possible in the needed data of the treatable piecemeal of each kernel core, saved a large amount of write operations like this, reduce the consumption of scheduling time and energy accordingly, thereby improved the performance of system.But certain when the capacity of local storage, along with the expansion of task scale, the data of required storage increase, and the number of write operation also will increase, thus task average scheduling time can increase, the performance of system can reduce.

IIR, 2D, WDF(1 among Fig. 9 and Figure 10), WDF(2), DPCM(1), DPCM(2), DPCM(3), FLOYD(1), FLOYD(2) and FLOYD(3) to be data handling utility bencmark be benchmark.

Task is that multidimensional DSP uses among the present invention, but the multilayer partition strategy with storage consciousness that proposes can expand on the DSP and other application with cycle specificity of n dimension.

Claims

1. the multilayer piecemeal dispatching method with storage perception is characterized in that, may further comprise the steps:

Step 1: all tasks are performed and once are called an iteration, carry out between the iterative space that the group task with execution sequence repeatedly makes up as the piecemeal object with needs, determine the piecemeal vector (P between iterative space _i, P _j) direction, the size of piecemeal on the Pj direction is f, and the size of piecemeal on the Pi direction is h, and two that find out ragged edge the dependence set D between task rely on CW and CCW, P _i=CCW and P _j=CW, described dependence refers to the execution sequence between task;

Strategy one: 2NUM _Other+ NUM _Top+ NUM _Next≤ M _s

Strategy two: (NUM _Top+ NUM _Other) Tw+NUM _Other* Tr≤Ls * f * h;

2) if h1〉h2, then the value of h is h2, adopts strategy one and strategy two to calculate f, obtains f1 and f2 respectively, enters 3); Otherwise dividing the value of block size h is h1, and the value of f is 1;

Step 6: determine the size of piecemeal vector for the second time;

According to claim 1 have the storage perception multilayer piecemeal dispatching method, it is characterized in that the concrete deterministic process of the direction of piecemeal vector is as follows in the described step 1:

According to claim 1 have the storage perception multilayer piecemeal dispatching method, it is characterized in that the relational expression of the size of data of the required loading of current iteration and preservation and piecemeal vector magnitude f and h is as follows in the described step 2:

{NUM}_{other} = \underset{d_{k}}{Σ} A_{goto_others} (d_{k}) = \underset{d_{k}}{Σ} (d_{ki}) (d_{kj})

{NUM}_{top} = \underset{d_{k}}{Σ} A_{goto - top} (d_{k}) = \underset{d_{k}}{Σ} d_{ki} (f - d_{kj})

{NUM}_{next} = \underset{d_{k}}{Σ} A_{goto - next} (d_{k}) = \underset{d_{k}}{Σ} d_{kj} (h - d_{ki})

4. the multilayer piecemeal dispatching method with storage perception according to claim 1 is characterized in that, the NUM in the described step 3 in strategy one and the strategy two _NextRepresent this piecemeal to produce and next piecemeal is badly in need of the data used, NUM _TopRepresent this piecemeal generation and be positioned at vertical with the Pi direction, simultaneously adjacent with the current piecemeal required data of using of piecemeal; NUM _OtherRepresent that this piecemeal produces and except the next piecemeal top of institute piecemeal, the data that other all piecemeals need be used; Ls represents the scheduling length of each iteration, and Tr represents to read a needed time of data from main memory, and Tw represents to write data to the needed time of main memory; Ms refers to the SPM(scratch-pad storage) amount of capacity.