CN103246563B

CN103246563B - A kind of multilamellar piecemeal dispatching method with storage perception

Info

Publication number: CN103246563B
Application number: CN201310145363.XA
Authority: CN
Inventors: 李肯立; 王艳; 杜家宜; 唐卓; 肖正; 朱宁波
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2013-04-24
Filing date: 2013-04-24
Publication date: 2016-06-08
Anticipated expiration: 2033-04-24
Also published as: CN103246563A

Abstract

The invention discloses a kind of multilamellar piecemeal dispatching method with storage perception, the first step, draw the direction of piecemeal vector (Pi, Pj) according to the dependence of task; Second step, it is thus achieved that every time circulate the relational expression of the required size of data loaded and preserve and piecemeal vector magnitude f and h and the scheduling length Ls of each iteration; 3rd step, determines the sub-piecemeal size of piecemeal according to local storage size and scheduling length; 4th step, utilizes the dependence between iteration weight clocking technique change task, reconstructs piecemeal position; 5th step, is used as every sub-piecemeal of first time piecemeal as a node and rebuilds a segmented spaces, carry out second time piecemeal; Obtain execution sequence figure after iteration space being carried out piecemeal according to two piecemeal vectors, according to execution sequence figure, task is scheduling; The method, in conjunction with memory span and storage delay, by piecemeal with to the adjustment of dependence between task, improves the degree of parallelism of task, reduces the quantity of write operation and reduce average scheduled time.

Description

A kind of multilamellar piecemeal dispatching method with storage perception

Technical field

The present invention relates to a kind of multilamellar piecemeal dispatching method with storage perception.

Background technology

The application that processes of most of science and data signal is all iterative recursive circulation. This generic task encounters two challenges when performing on embedded multiprocessor: first, it is calculate responsive type and the application of data responsive type that most of data signals process task, for this kind of application, the bad scheduling strategy of efficiency will produce substantial amounts of write operation, therefore can consume substantial amounts of time and energy consumption; Second, receiving speed relative to memorizer, the development of CPU speed is excessively quick, and the storage speed of memorizer slowly seriously hinders the raising of systematic function. Although embedded multiprocessor has a set of one's own instruction set, software programming can be passed through and realize different calculating tasks flexibly, but it is affected by coding and the execution sequence restriction of instruction, the restriction of memory access bottleneck and fixing control architecture, tends not to arrive maximum speed and optimum efficiency.

Prefetch the technology that (prefetching) strategy is a kind of performance that can be effectively improved system proposed for storage delay, namely before data have demand, just these data are stored in cache memory (cache), so can tolerate storing time delay for a long time.The strategy that prefetches of the prior art can be divided three classes: hardware based prefetches strategy, prefetching strategy and prefetching strategy based on hardware and software based on software. But hardware based some supporter of policy mandates that prefetch are linked to cache memory cache, and depend on dynamically available information in the process performed. And based on software prefetch strategy depend on compiler technologies go analyze one section of static routine, and in program code add prefetched instruction. But, too many pre-extract operation will cause that a unbalanced scheduling and storage time delay can be very long.

For this, a lot of embedded multiprocessors have used SPM to be a kind of minimum storage being embedded on chip to replace Cache, SPM, are a kind of compiler support and the memorizer can being managed by software. SPM memorizer can essentially regard the local storage of each core core as, can optimize the performance of system further, and can effectively reduce the consumption of energy. But for large-scale data signal processing tasks, resource management's scheduling strategy performs process and will produce substantial amounts of write operation improperly.

In order to increase the locally stored of data, a lot of research situations about being devoted to according to task are implement resource integration management. Traditional multidimensional task resource management method, execution task is that the dependence according to task performs with row-column (row-column) or column-row order. Due to the restriction of local storage, this execution method can produce substantial amounts of data to be needed, in write main memory, even to cause very multidata loss.

Summary of the invention

The invention provides a kind of multilamellar piecemeal dispatching method with storage perception, its object is to, by iteration space being carried out reasonably repeatedly piecemeal, resource is managed distribution and scheduling, the resource unreasonable distribution existed in prior art is overcome to cause when scheduling strategy performs, deadline length and the many problem of energy expenditure, overcome the restriction due to local storage, it is easy to the problem causing loss of data simultaneously.

A kind of multilamellar piecemeal dispatching method with storage perception, comprises the following steps:

Step 1: all of task is executed once as an iteration, performs between the iterative space that a group task with execution sequence repeatedly builds as piecemeal object needing, it is determined that the piecemeal vector (P in iteration space_i, P_j) direction, piecemeal is sized to f on Pj direction, and piecemeal is sized to h on Pi direction, and two that find out ragged edge from the dependence set D between task rely on CW and CCW, P_i=CCW and P_j=CW;

Step 2: determine the relational expression of the required size of data loaded and preserve of current iteration and piecemeal vector magnitude f and h and the scheduling length Ls of current iteration;

Step 3: strategically one and tactful two determine size f and the h of piecemeal vector;

1) set f as 1, calculate h according to strategy one and strategy two, respectively obtain h1 and h2;

Strategy one: 2NUM_other+NUM_top+NUM_next��M_s;

Strategy two: (NUM_top+NUM_other)Tw+NUM_other�� Tr��Ls �� f �� h;

2) if h1 > h2, then the value of h is h2, adopts strategy one and strategy two to calculate f, respectively obtains f1 and f2, enter 3);

Otherwise, the value of piecemeal size h is the value of h1, f is 1;

3) if f1 > 1, piecemeal is sized to f1*h2; Otherwise piecemeal is sized to f*h1 and f=1;

Step 4: adopting iteration weight clocking technique, between the time delay change task between decentralized task, the dependence of nexine circulation, reconstructs segmented spaces;

Step 5: divide first time segmented spaces according to piecemeal size f*h, each sub-piecemeal produced by first time piecemeal is used as a node namely as a bunch of task, constitute new iteration space, successively every sub-piecemeal is carried out piecemeal according to step 1, it is thus achieved that the direction vector (P2 of second time piecemeal_i, P2_j);

Step 6: determine the size of second time piecemeal vector;

Second time piecemeal vector is at P2_iDirection is sized to N_core, at P2_jDirection is sized to 1, N_coreQuantity for processor cores;

Step 7: obtain execution sequence figure after iteration space being carried out piecemeal according to two the piecemeal vectors obtained, according to execution sequence figure, task is scheduling.

In described step 1 piecemeal vector direction specifically determine that process is as follows:

Dependence between task refers to the execution sequence between task, uses d_k=(d_ki, d_kj) represent, wherein d_kiRepresent the execution dependence that two tasks circulate, d at nexine_kjRepresenting two tasks execution dependence at outer loop, two that find out ragged edge from the dependence set D between task rely on CW and CCW, P_i=CCW and P_j=CW;

CCW is counterclockwise, and interval vector refers to the vector maximum with j vector angle, and CW interval vector clockwise refers to the vector minimum with j vector angle.

Wherein, dependence between task refers to the execution sequence between task, namely task has to wait for the relation that just can be performed after another task completes, the execution sequence of task is represented in a computer typically by figure, one task of each node on behalf in figure, the limit between node and node represents the dependence between task i.e. the specific restriction suffered by tasks carrying order; One iteration represents all of task and is all executed once, and all of iteration constitutes iteration space; One i.e. circulation of iteration, i represents that certain task i-th in a circulation (nexine circulation) is performed, and j represents that jth circulation (outer loop) i.e. all task jth time are performed.

In described step 2, the required size of data loaded and preserve of current iteration is as follows with the relational expression of piecemeal vector magnitude f and h:

(1) the required size of data loaded and preserve of current iteration includes two parts: Part I, size of data produced by current iteration is NUM_next+NUM_top+NUM_other; Part II, loads current iteration in advance and following iteration needs the size of data used to be NUM_other;

{NUM}_{other} = \underset{d_{k}}{Σ} A_{goto_others} (d_{k}) = \underset{d_{k}}{Σ} (d_{ki}) (d_{kj})

{NUM}_{top} = \underset{d_{k}}{Σ} A_{goto - top} (d_{k}) = \underset{d_{k}}{Σ} d_{ki} (f - d_{kj})

{NUM}_{next} = \underset{d_{k}}{Σ} A_{goto - next} (d_{k}) = \underset{d_{k}}{Σ} d_{kj} (h - d_{ki})

Wherein, NUM_nextThis piecemeal is represented to produce and next piecemeal is badly in need of the size of data used, NUM_topRepresent that this piecemeal produces and is positioned at vertical with Pi direction, the size of data used required for simultaneously adjacent with current piecemeal piecemeal;

NUM_otherRepresenting that this piecemeal produces and except next piecemeal (nextpartition) and top piecemeal (toppartition), other all piecemeals need the size of data used;

In described step 2, scheduling length Ls refers to the time that an iteration performs.

NUM in strategy one and tactful two in described step 3_nextThis piecemeal is represented to produce and next piecemeal is badly in need of the data used, NUM_topRepresent that this piecemeal produces and is positioned at vertical with Pi direction, the data used required for simultaneously adjacent with current piecemeal piecemeal; NUM_otherRepresenting that this piecemeal produces and except next piecemeal and top piecemeal, other all piecemeals need the data used; Ls represents the scheduling length of each iteration, and Tr represents and reads the time required for data from main memory, and Tw represents and writes data to the time required for main memory; Ms refers to SPM(scratch-pad storage) amount of capacity.

During restatement, (retiming) is a kind of technology being optimized cycle period by assignment latency, and rotating scheduling is a kind of resource limit Optimized Operation strategy based on weight clocking technique, and it obtains a greater compactness of scheduling by redistributing delay. Piecemeal dispatching technique is in conjunction with iteration weight clocking technique and prefetching technique, each iteration is regarded as a point, then iteration space is divided and (it should be noted that, an iteration refers to all of tasks carrying once, and iteration space comprises all of iteration), the execution of right one piecemeal of later piecemeal. Due to the dependence between task, so when piecemeal, piecemeal will be considered how emphatically so that piecemeal is reasonable. That is do not have endless loop between block and block, it is possible to one piecemeal of a piecemeal be scheduling perform.

Beneficial effect

The invention provides a kind of multilamellar piecemeal dispatching method with storage perception, the first step, show that namely the correct shape of first time piecemeal determines the direction of two piecemeal vectors according to the dependence of task, be denoted as (Pi, Pj);Second step, it is thus achieved that every time circulate the relational expression of the required size of data loaded and preserve and piecemeal vector magnitude f and h and the scheduling length Ls of each iteration; 3rd step, determines the sub-piecemeal size of first time piecemeal according to local storage size and scheduling length; 4th step, utilizes the dependence between iteration weight clocking technique change task, reconstructs piecemeal position; 5th step, on the basis of first time piecemeal, is used as every sub-piecemeal of first time piecemeal as a node and rebuilds a segmented spaces, carry out second time piecemeal; Obtain execution sequence figure after iteration space being carried out piecemeal according to two the piecemeal vectors obtained, according to execution sequence figure, task is scheduling; The multilamellar piecemeal dispatching method with storage perception has considered memory span and storage delay, is performed by piecemeal and to the adjustment of dependence between task, improves the degree of parallelism of task, decreases the quantity of write operation and reduces average scheduled time.

Selection for Partitional form and direction is very strict, and due to the dependence between task, irrational segment partition scheme will cause that task cannot perform, and reasonably piecemeal will reduce the time of task scheduling and the generation of write operation; The inventive method considers memory span and storage delay, the effective overall performance improving system.

Accompanying drawing explanation

Fig. 1 is the flow chart of the present invention;

Fig. 2 is iteration space schematic diagram;

Fig. 3 is the task image MDFG of two dimensional application task model and correspondence, and wherein, figure (a) is a two dimensional application task model, and figure (b) is according to figure (a) MDFG obtained figure;

Fig. 4 is dependence between task at the location drawing of interval CCW counterclockwise and interval CW clockwise, and wherein figure (a) is CCW and CW area schematic, and figure (b) is the location drawing in CCW and CW region of the dependence between task;

The reasonable piecemeal of Fig. 5 and unreasonable piecemeal schematic diagram, wherein, figure (a) is unreasonable piecemeal, and figure (b) is the execution sequence figure obtained according to figure (a) piecemeal, and figure (c) is reasonable piecemeal, and figure (d) is according to figure (c) the execution sequence figure obtained;

Fig. 6 is the nexine dependence schematic diagram between application weight clocking technique change task;

Fig. 7 piecemeal dispatching sequence schemes, and figure (a) represents that the face over one's competence being made up of second time piecemeal, figure (b) represent the relation of first time piecemeal and second time piecemeal, the execution sequence that figure (c) is piecemeal;

The scheduling relation of Fig. 8 processor part and memory portion;

Fig. 9 is the write operation number contrast schematic diagram of multiple dispatching method;

The task average scheduled time contrast schematic diagram of the multiple dispatching method of Figure 10.

Detailed description of the invention

Below in conjunction with accompanying drawing, the present invention is described in further detail.

For more convenient resource block management, first tasks carrying order is carried out modelling process by us, use two-dimensional coordinate represents, i represents the direction that nexine circulates, j represents the direction of outer loop, each circulation can use iteration iteration(i, j) represents, an iteration represents all of task and is all executed once.

In this example, check the configuration of the computer of execution task, the kernel number obtaining computer processor is 3, the amount of capacity of the SPM inlayed on each kernel is the computing unit number of 128KB and each kernel is 3, clock cycle time Tr=2clockcycle(required for data is read from main memory), write data to clock cycle time Tw=4clockcycle(required for main memory).

As it is shown in figure 1, be a kind of flow chart with the multilamellar piecemeal dispatching method storing perception of the present invention, its concrete operation step is as follows:

Step 1: all of task is executed once as an iteration, using between the iterative space that the group task with execution sequence that need to perform repeatedly builds as piecemeal object, it is determined that the piecemeal vector (P in iteration space_i, P_j) direction, piecemeal is sized to f on Pj direction, and piecemeal is sized to h on Pi direction, and two that find out ragged edge from the dependence set D between task rely on CW and CCW, P_i=CCW and P_j=CW;

Wherein, dependence between task refers to the execution sequence between task, namely task has to wait for the relation that just can be performed after another task completes, the execution sequence of task is represented in a computer typically by figure, one task of each node on behalf in figure, the limit between node and node represents the dependence between task i.e. the specific restriction suffered by tasks carrying order; One iteration represents all of task and is all executed once, and all of iteration constitutes iteration space; One i.e. circulation of iteration, each task will be performed a number of times, and which time of a task A performs us and use A [i, j] represent, A [i, j] represents that task A i-th will circulate in nexine circulates, and in outer loop, jth circulation is executed once.

As shown in Figure 3, it is the task image MDFG of a concrete two dimensional application task model and correspondence, task image MDFG=<V, E, d, t>it is the X-Y scheme with node weights and limit weights, wherein a V representation node, namely represent a task in this application, E represents the dependence between task, (u, v) �� E means to also exist between node u and node v dependence, and d (e)=(d_i,d_j) represent a delay, describe the concrete dependence between two tasks.

If the dependence between task a to task b is expressed as d_k=(x y), then means that (i, task b j) depends on the task a of iteration iteration (i-x, j-y) to iteration iteration. d_k=(0,0) represents the dependence between the task of same iteration. Two dimensional application in Fig. 2 comprises 4 tasks, respectively A, B, C and D, if the dependence between task A to task B is d_k=(0,0), the dependence between task C to task D is d_k=(0,1). The relation of dependence set D={d1, d2, d3, d4, d5} between CCW and CW and task, as shown in Figure 4.

In this example, there is 3 nonzero-lag vectors (0,1) (1,0) and (-1,1) in dependence D set. So (1,0) is CW vector, (-1,1) is CCW, and the first time vector of piecemeal is P_i=CCW and P_j=CW;

Because cycle applications has basic characteristics: the task that iteration (circulation) performs is all identical, and the task order performed is all identical every time every time, so the possessed dependence of iteration is all consistent every time. In order to find out CW(interval vector clockwise more easily) and CCW(interval vector counterclockwise), we use vector representation all of dependence, then CCW refers to the vector maximum with j vector angle, and CW refers to the vector minimum with j vector angle.

Two that find out ragged edge from D rely on CW and CCW, P_i=CCW and P_j=CW, i.e. d₄=CCW, d₃=CW;

Step 2: calculate the relational expression between required size of data and f and h loaded and preserve of current iteration, represent with the equation containing h or f respectively, and the scheduling length Ls of current iteration, namely suppose f or h it has been determined that;

(1) the required size of data loaded and preserve of current iteration includes two parts: Part I, size of data produced by current iteration is NUM_next+NUM_top+NUM_other; Second: load current iteration in advance and following iteration needs the size of data used to be NUM_other; (2) scheduling length refers to the time that an iteration performs, in this example, set all of task execution time all consistent, assume in an iteration containing n task, and have m core to perform, so scheduling length Ls=(n/m) execution time of �� each task, the execution time of each task is unit interval i.e. 1 clock cycle.

{NUM}_{other} = \underset{d_{k}}{Σ} A_{goto_others} (d_{k}) = \underset{d_{k}}{Σ} (d_{ki}) (d_{kj})

{NUM}_{top} = \underset{d_{k}}{Σ} A_{goto - top} (d_{k}) = \underset{d_{k}}{Σ} d_{ki} (f - d_{kj})

{NUM}_{next} = \underset{d_{k}}{Σ} A_{goto - next} (d_{k}) = \underset{d_{k}}{Σ} d_{kj} (h - d_{ki})

Wherein, NUM_nextThis piecemeal is represented to produce and next piecemeal is badly in need of the size of data used, NUM_topRepresent that this piecemeal produces and is positioned at the size of data used required for vertical with Pi direction and adjacent with current piecemeal piecemeal; NUM_otherRepresent the size of data that this piecemeal produces and other all piecemeal needs are used except next piecemeal (nextpartition) and top piecemeal (toppartition);

Step 3: determine the size f*h of piecemeal;

1) as f=1, h1 and h2 is drawn respectively according to strategy one and strategy two;

Strategy one: 2NUM_other+NUM_top+NUM_next��M_s;

Wherein: NUM_nextThis piecemeal is represented to produce and next piecemeal is badly in need of the size of data used, NUM_topRepresent that this piecemeal produces and is positioned at the size of data used required for vertical with Pi direction and adjacent with current piecemeal piecemeal; NUM_otherRepresent the size of data that this piecemeal produces and other all piecemeal needs are used except next piecemeal institute top piecemeal; Ls represents the scheduling length of each iteration, and Tr represents and reads the time required for data from main memory, and Tw represents and writes data to the time required for main memory; Ms refers to SPM(scratch-pad storage) amount of capacity;

2) judge the size of h1 and h2, if h1 is more than h2, then make h=h2, Utilization strategies one and strategy two calculate f₁; Otherwise piecemeal is sized to f*h1 and f=1;

3) if f1 is more than 1, then piecemeal is sized to f1*h2; Otherwise piecemeal is sized to f*h1 and f=1;

In this example, try to achieve first time piecemeal size f=1, h=4.

After iteration space carries out first time piecemeal, as it is shown in figure 5, wherein, figure (a) is unreasonable piecemeal to its piecemeal schematic diagram, and figure (b) is the execution sequence figure obtained according to figure (a) piecemeal, therefrom finds out that this execution sequence exists endless loop, it is impossible to perform; Figure (c) is the reasonable piecemeal using the inventive method to obtain, and figure (d) is according to figure (c) the execution sequence figure obtained; From figure (d) it can be seen that after first time piecemeal, the dependence between block and block is only remaining (1,0), (-1,1) two kinds.

During restatement, (retiming) is a kind of technology being optimized cycle period by assignment latency, and rotating scheduling is a kind of resource limit Optimized Operation strategy based on weight clocking technique, and it obtains a greater compactness of scheduling by redistributing delay. Piecemeal dispatching technique, in conjunction with iteration weight clocking technique and prefetching technique, is regarded a point as each iteration, then iteration space is divided, the execution of right one piecemeal of later piecemeal. Due to the dependence between task, so when piecemeal, piecemeal will be considered how emphatically so that piecemeal is reasonable. That is not endless loop between block and block, it is possible to one piecemeal of a piecemeal be scheduling perform.

In order to obtain a greater compactness of scheduling, we carry out the dependence between change task by carrying out an iteration weight clocking technique, iteration weight clocking technique is to utilize the dependence between the delay reconstruction task between decentralized task, shortens the execution cycle of task with this.In order to keep the execution sequence of row-wise, during iteration restatement, need the dependence ensureing not change between piecemeal and piecemeal, so the Circular dependency relation of innermost layer between our a change task, such as exist between task A and task B and postpone as d₃=(-1,1), after passing through to disperse to postpone, the delay between A and B becomes d₃=(0,1), say, that before not utilizing iteration weight clocking technique, iteration(i, j) in perform task A be necessarily dependent upon iteration iteration(i+1, j-1) in task B, utilize iteration weight clocking technique change postpone after, iteration(i, performing in j) of task A depends on the B performed in iteration (i, j-1), as shown in Figure 6.

Step 5: divide first time segmented spaces according to piecemeal size f*h, is used as each sub-piecemeal produced by first time piecemeal as a node namely as a bunch of task, constitutes new iteration space, obtain the direction P2 of piecemeal for the second time according to step 1_iAnd P2_j��

First time piecemeal is that iteration space carries out piecemeal, and second time piecemeal is that the sub-piecemeal to first time piecemeal carries out piecemeal, then the sub-block of each first time piecemeal be defined as partition (i, j).

As shown in Figure 7 and Figure 8, Fig. 7 describes the execution sequence of task scheduling to the framework of task scheduling, and Fig. 8 describes the scheduling of a task. in the method, for convenience's sake, first time piecemeal (first_level_partition) is divided three classes by we according to the situation of the first_level_partition being currently executing: nextfirst_level_partition, topfirst_level_partition and otherfirst_level_partition. and to it, subregion is carried out according to the position that utilizes of data for each first_level_partition piecemeal, as shown in Figure 8 (a), one is divided into four regions, first region, the task of representing produced this piecemeal of data will be used, task in data nextfirst_level_partition produced by second region representation needs to use, task in data topfirst_level_partition produced by 3rd region representation needs to use, 4th region refers to that the data otherfirst_level_partition of generation needs to use. so we can quickly calculate each delay (d(e) in a first_level_partition): d_k=(d_ki,d_kj), the data for other first_level_partition of generation:

A_{goto_top}(d_k)=area(PQVU)=d_ki(f-d_kj)

A_{goto_next}(d_k)=area(VSWX)=d_kj(h-d_ki)

A_{goto_other}(d_k)=area(UVRS)=d_kid_kj

Further, when we determined that the piecemeal size of first_level_partition, we quickly can calculate the data being produced and storing inside a first_level_partition.

{NUM}_{other} = \underset{d_{k}}{Σ} A_{goto_others} (d_{k}) = \underset{d_{k}}{Σ} (d_{ki}) (d_{kj})

{NUM}_{top} = \underset{d_{k}}{Σ} A_{goto - top} (d_{k}) = \underset{d_{k}}{Σ} d_{ki} (f - d_{kj})

{NUM}_{next} = \underset{d_{k}}{Σ} A_{goto - next} (d_{k}) = \underset{d_{k}}{Σ} d_{kj} (h - di)

Step 6: determine the size of second time piecemeal; The quantity N of processor cores is obtained from hardware configuration information_core=3. Then P2_iDirection be sized to N_core=3, P2_jDirection be sized to 1.

As it is shown in figure 9, apply multiple bencmark and benchmark tests the inventive method TLP and other two kinds of algorithm List and IRP performances on task average scheduled time; As seen from the figure, the inventive method TLP(has the multilamellar piecemeal task scheduling strategy of storage perception) performance in task average scheduled time is substantially better than other two kinds of dispatching algorithms. Performance increase rate reaches about 30%.This is because there is the piecemeal scheduling strategy storing perception when carrying out piecemeal scheduling, consider not only the degree of parallelism of task, also fully take into account storage delay, ensure the scheduling time no longer than the processor scheduling time of memorizer, this avoid some storage time delays, saving the waiting time, thus improve the performance of system, decreasing scheduling time.

Performance in write operation as shown in Figure 10, is applied multiple bencmark and benchmark is tested the inventive method TLP and other two kinds of algorithm List and IRP performances in write operation; As seen from the figure, the inventive method TLP(has the multilamellar piecemeal task scheduling strategy of storage perception) than other two kinds of algorithms, write operation on average decreases about 45%. This is because the partition strategy of the present invention has fully taken into account the capacity of local storage, through twice piecemeal, there is local storage as far as possible in the data required for ensureing each kernel treatable piecemeal of core, which save substantial amounts of write operation, consequently reduce the consumption of scheduling time and energy, thus improve the performance of system. But when the capacity of local storage is certain, along with the expansion of task scale, the data of required storage increase, and the number of write operation also will increase, thus task average scheduled time can increase, the performance of system can reduce.

IIR, 2D, WDF(1 in Fig. 9 and Figure 10), WDF(2), DPCM(1), DPCM(2), DPCM(3), FLOYD(1), FLOYD(2) and FLOYD(3) be data handling utility bencmark and benchmark.

In the present invention, task is multidimensional DSP application, but the multilamellar partition strategy with storage consciousness proposed can expand to the n DSP tieed up and other have in the application of cycle specificity.

Claims

1. a multilamellar piecemeal dispatching method with storage perception, it is characterised in that comprise the following steps:

Step 1: all of task is executed once and is called an iteration, performs between the iterative space that a group task with execution sequence repeatedly builds as piecemeal object needing, it is determined that the piecemeal vector (P in iteration space_i, P_j) direction, piecemeal is at P_jBeing sized to f on direction, piecemeal is at P_iBeing sized to h on direction, two that find out ragged edge from the dependence set D between task rely on CW and CCW, P_i=CCW and P_j=CW, described dependence refers to the execution sequence between task;

Strategy one: 2NUM_other+NUM_top+NUM_next��M_s;

2) if h1 > h2, then the value of h is h2, adopts strategy one and strategy two to calculate f, respectively obtains f1 and f2, enter 3); Otherwise, the value of h is the value of h1, f is 1, enters step 4;

3) if the value of f1 > 1, f is f1, piecemeal is sized to f1*h2; Otherwise piecemeal is sized to f*h1 and f=1;

Step 6: determine the size of second time piecemeal vector;

Step 7: obtain execution sequence figure after iteration space being carried out piecemeal according to two the piecemeal vectors obtained, according to execution sequence figure, task is scheduling;

CCW is counterclockwise, and interval vector refers to the vector maximum with j vector angle, and CW interval vector clockwise refers to the vector minimum with j vector angle;

{NUM}_{o t h e r} = \underset{d_{k}}{Σ} A_{g o t o_o t h e r s} (d_{k}) = \underset{d_{k}}{Σ} (d_{k i}) (d_{k j})

{NUM}_{t o p} = \underset{d_{k}}{Σ} A_{g o t o - t o p} (d_{k}) = \underset{d_{k}}{Σ} d_{k i} (f - d_{k j})

{NUM}_{n e x t} = \underset{d_{k}}{Σ} A_{g o t o - n e x t} (d_{k}) = \underset{d_{k}}{Σ} d_{k j} (h - d_{k i})

Wherein, NUM_nextThis piecemeal is represented to produce and next piecemeal is badly in need of the size of data used, NUM_topRepresent that this piecemeal produces and is positioned at and P_iThe size of data used required for the piecemeal that direction is vertical and adjacent with current piecemeal; NUM_otherRepresent the size of data that this piecemeal produces and other all piecemeal needs are used except next piecemeal and top piecemeal;

A_{goto_others}(d_k) belong to intermediate variable, represent when piecemeal performs task, the dependence d between task_kThe data for otherfirst_level_partition (other ground floor piecemeals) produced;

A_goto-top(d_k) belong to intermediate variable, represent when piecemeal performs task, the dependence d between task_kThe data for topfirst_level_partition produced;

A_goto-next(d_k) belong to intermediate variable, represent when piecemeal performs task, the dependence d between task_kThe data for nextfirst_level_partition produced;

Otherfirst_level_partition, topfirst_level_partition and nextfirst_level_partition are relative to current ground floor piecemeal currentfirstlevelpartition and describe, otherfirst_level_partition represents other ground floor piecemeals, and topfirst_level_partition represents that top ground floor piecemeal and nextfirst_level_partition represent next ground floor piecemeal;

In described step 2, scheduling length Ls refers to the time that an iteration performs;

NUM in strategy one and tactful two in described step 3_nextThis piecemeal is represented to produce and next piecemeal is badly in need of the data used, NUM_topRepresent that this piecemeal produces and is positioned at and P_iDirection is vertical, the data used required for simultaneously adjacent with current piecemeal piecemeal; NUM_otherRepresenting that this piecemeal produces and except next piecemeal and top piecemeal, other all piecemeals need the data used; Ls represents the scheduling length of each iteration, and Tr represents and reads the time required for data from main memory, and Tw represents and writes data to the time required for main memory; M_sRefer to the amount of capacity of scratch-pad storage SPM.