CN107402805A

CN107402805A - A kind of way to play for time and system of multi-stage pipeline parallel computation

Info

Publication number: CN107402805A
Application number: CN201610331646.7A
Authority: CN
Inventors: 吴玉平
Original assignee: Institute of Microelectronics of CAS
Current assignee: Institute of Microelectronics of CAS
Priority date: 2016-05-18
Filing date: 2016-05-18
Publication date: 2017-11-28

Abstract

The invention provides a kind of way to play for time and system of multi-stage pipeline parallel computation, the method comprising the steps of：Each task of streamline is divided into multistage subtask in advance, the same one-level subtask of the different task of streamline is handled by same subtask processing module；The independent data space fixed for each task setting of streamline, each independent data space have fixed address；Share the independent data space of the task in the subtasks at different levels of the same task of streamline, the subtask processing module of processing rear stage subtask is by updating the address of pointer, the data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer.Due to the address by updating pointer, data transfer is realized, copy is avoided and loses time and power consumption.

Description

A kind of way to play for time and system of multi-stage pipeline parallel computation

Technical field

The present invention relates to data processing field, more particularly to a kind of buffering of multi-stage pipeline parallel computation Method and system.

Background technology

To each serial task, sequential processes, the processing of each task include traditional serial computing one by one There are respective processing module, task Tn processing elder generation in the processing of some subtasks, each subtask Each subtask processing module is sequentially undergone afterwards, just now to task after the processing of Tn subtasks at different levels Tn+1 processing.If task Tn processing arrival time point of the end time point earlier than task Tn+1, This serial computing method is feasible.

But if task Tn processing end time point is later than task Tn+1 arrival time point, this Kind serial computing method is infeasible, at this moment needs to cause task Tn processing still by parallel computation Can is not terminated when task Tn+1 arrives to Tn+1 processing, these subtask processing modules After the subtask of current task has been handled, the next stage subtask on streamline is transferred data to The subtask of processing module, then these next tasks of subtask processing module start to process.

For traditional streamline, streamline subtask processing modules at different levels have each independent independent digit According to space, subtask processing module is passed data by copying after corresponding subtask execution terminates Give streamline next stage subtask processing module.

Following shortcoming can so be caused：Data transfer is realized by copy between each subtask processing module, The unnecessary data copy time is taken, it is temporal when transmission data volume is larger between subtask Expense is obvious, have impact on the performance of program；Meanwhile perform that this data copy result in need not The power dissipation overhead wanted.

The content of the invention

The present invention provides a kind of way to play for time and system of multi-stage pipeline parallel computation, solves existing skill Lost time and power consumption by way of copy realizes data transfer between each subtask processing module in art The problem of.

The invention provides a kind of way to play for time of multi-stage pipeline parallel computation, including step：

Each task of streamline is divided into multistage subtask, the same one-level of the different task of streamline in advance Subtask is handled by same subtask processing module；

The independent data space fixed for each task setting of streamline, each independent data space has Fixed address；

The independent data space of the task is shared in the subtasks at different levels of the same task of streamline, after processing The subtask processing module of one-level subtask inherits the previous of same task by updating the address of pointer The data in the independent data space of level subtask, so as to realize data transfer.

Preferably, methods described also includes：

After each task of streamline is divided into multistage subtask in advance, for each of streamline Business, the execution duration to subtasks at different levels optimize so that the subtasks at different levels of the task of streamline The absolute value for performing the difference of duration is as minimum as possible.

Preferably, execution all streamlines of duration ＜ of the first order subtask of each task of streamline The minimum generating period of task.

Preferably, the minimum series of the subtasks at different levels of the task of streamline is not less than current pipeline Task in maximum and the subtasks at different levels of execution duration of each task perform the average value of duration The smallest positive integral of ratio.

Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located The independent data space for managing the task of streamline forms ring-type independent data space.

Preferably, any layer task of streamline is divided into multiple next straton tasks according to demand.

A kind of buffer system of multi-stage pipeline parallel computation, including：

Task cutting module, for each task of streamline to be divided into multistage subtask, streamline in advance The same one-level subtask of different task handled by same subtask processing module；

Memory space setting module, for for streamline each task setting fixation independent data space, Each independent data space has fixed address；

The only of the task is shared in data transfer module, the subtasks at different levels for the same task of streamline Vertical data space, the subtask processing module of processing rear stage subtask by updating the address of pointer, The data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer.

Preferably, the system also includes：

Duration optimization module is performed, for being in advance divided into each task of streamline in task dividing die block After multistage subtask, for each task of streamline, progress during execution to subtasks at different levels Row optimization so that the subtasks at different levels of the task of streamline perform the absolute value of the difference of duration as far as possible most It is small.

Preferably, the task cutting module is specifically used for according to demand by any layer task of streamline It is divided into multiple next straton tasks, the same one-level subtask of the same layer task of streamline is by same height Task processing module processing.

The way to play for time and system of a kind of multi-stage pipeline parallel computation provided by the invention, by advance will Each task of streamline is divided into multistage subtask, and the independent digit of each task setting fixation for streamline According to space, in actual use, the task is shared in the subtasks at different levels of the same task of streamline Independent data space, the subtask processing module of processing rear stage subtask by updating the address of pointer, The data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer. Due to the address of the pointer by updating subprocessing module, data transfer is realized, copy is avoided and wastes Time and power consumption.

Further, when the time span of each subtask execution is different, task T can be caused_N One-level subtask pursuit task T_N-xAfterbody subtask situation, so as to influence streamline appoint Be engaged in T_NTo the task T of streamline during first order subtask_N-xAfterbody subtask wait.This hair The bright execution duration by subtasks at different levels is optimized so that the sons at different levels of the task of streamline are appointed The absolute value that business performs the difference of duration is as minimum as possible, can effectively solve the above problems.

Further, when the present invention is by the execution of the first order subtask of each task for limiting streamline The minimum generating period of the task of long all streamlines of ＜ so that streamline can mutually tackle next in real time The processing of task (the first subtask), preferably to improve parallel speed.

Further, The present invention gives the series of optimization, it is ensured that and real-time parallel handles each task, Reduce to greatest extent because a certain subtask calculates overlong time and caused by calculate congestion and wait.

Further, it is annular memory space present invention defines memory space so that the present invention can be with Updated by simple pointer address, such as address+1 or -1 can obtain next pending son and appoint The address of the memory space of data needed for business, it is simple efficient and not error-prone.

Brief description of the drawings

, below will be to implementing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art The required accompanying drawing used is briefly described in example, it should be apparent that, drawings in the following description are only Some embodiments described in the present invention, for those of ordinary skill in the art, can also be according to these Accompanying drawing obtains other accompanying drawings.

Fig. 1 (a) is multi-stage pipeline serial computing logical schematic in the prior art；

Fig. 1 (b) is multi-stage pipeline parallel computation logical schematic in the prior art；

Fig. 2 (a) to Fig. 2 (f) is the logic that each subtask processing module handles subtask in the prior art Schematic diagram；

Fig. 3 is a kind of flow according to the way to play for time of multi-stage pipeline parallel computation provided by the invention Figure；

Fig. 4 is the knot according to the ring-type independent data space with 12 memory spaces provided by the invention Structure schematic diagram；

Fig. 5 is that the subtask for transferring data to next stage subtask by copy in the prior art is handled The logical schematic of module；

Fig. 6 (a) to Fig. 6 (f) is according to a kind of schematic diagram for determining region to be filled provided by the invention；

Fig. 7 is another for the way to play for time according to multi-stage pipeline parallel computation provided in an embodiment of the present invention Kind flow chart；

Fig. 8 (a) is that the subtasks at different levels of parallel computation in the prior art perform the schematic diagram of duration；

Fig. 8 (b) is to perform duration according to the subtasks at different levels of parallel computation provided in an embodiment of the present invention Schematic diagram；

Fig. 9 is to be occurred according to the task of each streamline during parallel computation provided in an embodiment of the present invention Time；

Figure 10 is one of the buffer system according to multi-stage pipeline parallel computation provided in an embodiment of the present invention Kind structural representation；

Figure 11 is another for the buffer system according to multi-stage pipeline parallel computation provided in an embodiment of the present invention A kind of structural representation.

Embodiment

Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein certainly Begin to same or similar label eventually to represent same or similar element or there is the member of same or like function Part.The embodiments described below with reference to the accompanying drawings are exemplary, is only used for explaining the present invention, and can not It is construed to limitation of the present invention.

To each serial task, sequential processes, the processing of each task include traditional serial computing one by one There is respective processing module the processing of some subtasks, each subtask.Below with by each of streamline Task illustrates exemplified by being divided into 6 grades of subtasks.As shown in Fig. 1 (a), a task T_nProcessing it is first Subtask processing module S1, subtask processing module S2, subtask processing module are sequentially undergone afterwards S3, subtask processing module S4, subtask processing module S5, subtask processing module S6, in T_n Just now to task T after being handled by S1, S2, S3, S4, S5, S6_n+1Handled.If appoint Be engaged in T_nProcessing end time point earlier than task T_n+1Arrival time point, this serial computing method is Feasible.

If task T_nProcessing end time point be later than task T_n+1Arrival time point, this serial meter Calculation method is infeasible, at this moment needs to cause task T by parallel computation_nProcessing not yet terminate just Can be in task T_n+1To T during arrival_n+1Handled, as shown in Fig. 1 (b), subtask processing module S1, S2, S3, S4, S5, S6 concurrent working, these subtask processing modules are being handled as predecessor After the subtask of business, the next stage subtask processing module on streamline is transferred data to, then The subtask of these complete next tasks of subtask processing module start to process.Currently processed subtask is T_{N- (i-1), i}, wherein N- (i-1) is current task number, and i is current subtask number, i=1,2,3 ..., N=1,2,3 ....

During pipeline parallel computing being shown such as Fig. 2 (a), the subtask of each subtask processing module processing, Subtask processing module S1 processing tasks T_nSubtask T_n,1, subtask processing module S2 processing tasks T_n-1Subtask T_n-1,2, subtask processing module S3 processing tasks T_n-2Subtask T_n-2,3, subtask Processing module S4 processing tasks T_n-3Subtask T_n-3,4, subtask processing module S5 processing tasks T_n-4 Subtask T_n-4,5, subtask processing module S6 processing tasks T_n-5Subtask T_n-5,6。

In the pipeline parallel computing as shown in Fig. 2 (b), subtask processing module S1 processing tasks T_n+1Son appoint Be engaged in T_n+1,1, subtask processing module S2 processing tasks T_nSubtask T_n,2, subtask processing module S3 Processing task T_n-1Subtask T_n-1,3, subtask processing module S4 processing tasks T_n-2Subtask T_n-2,4, Subtask processing module S5 processing tasks T_n-3Subtask T_n-3,5, subtask processing module S6 processing times Be engaged in T_n-4Subtask T_n-4,6。

In the pipeline parallel computing as shown in Fig. 2 (c), subtask processing module S1 processing tasks T_n+2Son appoint Be engaged in T_n+2,1, subtask processing module S2 processing tasks T_n+1Subtask T_n+1,2, subtask processing module S3 processing tasks T_nSubtask T_n,3, subtask processing module S4 processing tasks T_n-1Subtask T_n-1,4, Subtask processing module S5 processing tasks T_n-2Subtask T_n-2,5, subtask processing module S6 processing times Be engaged in T_n-3Subtask T_n-3,6。

In the pipeline parallel computing as shown in Fig. 2 (d), subtask processing module S1 processing tasks T_n+3Son appoint Be engaged in T_n+3,1, subtask processing module S2 processing tasks T_n+2Subtask T_n+2,2, subtask processing module S3 processing tasks T_n+1Subtask T_n+1,3, subtask processing module S4 processing tasks T_nSubtask T_n,4, Subtask processing module S5 processing tasks T_n-1Subtask T_n-1,5, subtask processing module S6 processing times Be engaged in T_n-2Subtask T_n-2,6。

In the pipeline parallel computing as shown in Fig. 2 (e), subtask processing module S1 processing tasks T_n+4Son appoint Be engaged in T_n+4,1, subtask processing module S2 processing tasks T_n+3Subtask T_n+3,2, subtask processing module S3 processing tasks T_n+2Subtask T_n+2,3, subtask processing module S4 processing tasks T_n+1Subtask T_n+1,4, subtask processing module S5 processing tasks T_nSubtask T_n,5, at the processing module S6 of subtask Reason task T_n-1Subtask T_n-1,6。

In the pipeline parallel computing as shown in Fig. 2 (f), subtask processing module S1 processing tasks T_n+5Son Task T_n+5,1, subtask processing module S2 processing tasks T_n+4Subtask T_n+4,2, subtask processing mould Block S3 processing tasks T_n+3Subtask T_n+3,3, subtask processing module S4 processing tasks T_n+2Son appoint Be engaged in T_n+2,4, subtask processing module S5 processing tasks T_n+1Subtask T_n+1,5, subtask processing module S6 processing tasks T_nSubtask T_n,6。

For traditional streamline, streamline subtask processing modules at different levels have each independent independent data empty Between, subtask processing module is transferred data to next after corresponding subtask execution terminates by copy Level subtask processing module.As shown in figure 3, there is fixed independent data each subtask of pipeline parallel computing Space, data transfer is realized by copy between subtask.Task T_iSubtask T_i,1Fixation independent digit It is MS1, task T according to space_iSubtask T_i,2Fixation independent data space be MS2, task T_i's Subtask T_i,3Fixation independent data space be MS3, task T_iSubtask T_i,4Fixation independent data Space is MS4, task T_iSubtask T_i,5Fixation independent data space be MS5, task T_iSon Task T_i,6Fixation independent data space be MS6, wherein i=0,1,2 ....Its shortcoming is subtask Between by copy realize that data transfer takes the unnecessary data copy time, number is transmitted between subtask During according to measuring larger, temporal expense is obvious, have impact on the performance of program, while performs this data Copy result in unnecessary power dissipation overhead.

The way to play for time and system of a kind of multi-stage pipeline parallel computation provided by the invention, by advance will Each task of streamline is divided into multistage subtask, is the independent data that each task setting of streamline is fixed The independent data space of the task is shared in space, the subtasks at different levels of the same task of streamline, processing The subtask processing module of rear stage subtask is by updating the address of pointer, before inheriting same task The data in the independent data space of one-level subtask, so as to realize data transfer.Due to streamline not Same one-level subtask with task is handled by same subtask processing module, each task tool of streamline There is fixed independent data space, and each independent data space has fixed address so that each Subtask processing module can be by updating the address of the pointer of memory space pointed by the submodule.Allow Inherit the number in the independent data space of the previous stage subtask of same task in pending rear stage subtask According to so as to realize data transfer.Data transfer can be achieved without copy in the process, can effectively be lifted Buffering efficiency simultaneously reduces power consumption.

In order to be better understood from technical scheme and technique effect, below with reference to flow chart and Specific embodiment is described in detail, and flow chart is as shown in Figure 3.

Step S01, each task of streamline is divided into multistage subtask, the different task of streamline in advance Same one-level subtask handled by same subtask processing module.

In this embodiment, each task of streamline is divided into how many grades of subtasks, depending on real needs, The maximum duration ttaskmax of the task of a streamline is handled, during the minimum that two neighboring task arrives Between be spaced Ttaskperiod_min, then the calculating time of each subtask should be less than Ttaskperiod_min, can ensure that real-time parallel handles each task, reduce to greatest extent because A certain subtask calculates overlong time and causes to calculate congestion and wait.Therefore, can be according to each Each task of streamline is divided into multistage subtask by the time needed for the calculating of subtask.Streamline is not Same one-level subtask with task is handled by same subtask processing module.

Further, any layer task of streamline is divided into multiple next straton tasks according to demand. That is, for the calculating inside subtask, the computational methods of parallel pipelining process can be also used, so as to Reduce subtask and calculate the response time.Correspondingly, the same one-level subtask of the same layer task of streamline Handled by same subtask processing module.

Step S02, for the independent data space of each task setting fixation of streamline, each independent data Space has fixed address.

In the present embodiment, the number in independent data space can be equal to the task number of streamline, solely The number of vertical data space can also be more than the task number of streamline.Specifically, current subtask T_i,j, Corresponding memory space is MS_k, wherein, when i%N is not 0, k=i%N；When i%N is 0, K=N, N are independent data space number, and wherein % represents remainder.The space in independent data space is big It is small should be equal, i.e., the byte number of memory space should be equal.Specifically, when the task of streamline is 6 When individual, the number in independent data space can be no less than 6, below with 12 independent data spaces Exemplified by illustrate, independent data space includes：MS01、MS02、MS03、MS04、MS05、 MS06、MS07、MS08、MS09、MS10、MS11、MS12。

Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located The independent data space for managing the task of streamline forms ring-type independent data space.This can so be caused Invention can be updated by simple pointer address, such as address+1 or -1 can obtain next treat The address of the memory space of data needed for subtask is handled, it is simple efficient and not error-prone.

In a specific embodiment, each independent data Simulation spatial service is in a task, adjacent The independent data space of business is logically adjacent, MS01, MS02, MS03, MS04, MS05, It is empty that MS06, MS07, MS08, MS09, MS10, MS11, MS12 form ring-type independent data Between, as shown in Figure 4.

The independent data sky of the task is shared in step S03, the subtasks at different levels of the same task of streamline Between, the subtask processing module of processing rear stage subtask is inherited same by updating the address of pointer The data in the independent data space of the previous stage subtask of task, so as to realize data transfer.

In the present embodiment, the subtask processing module of rear stage subtask is handled by updating pointer Address, the data in the independent data space of the previous stage subtask of same task are inherited, so as to realize number According to transmission.As shown in figure 5, to transfer data to next stage subtask by copy in the prior art Subtask processing module logical schematic.Benefit of the invention is that copy is avoided between subtask Process, the pointer address of the memory space by updating each subtask processing module, it is possible to realize number According to transmission, thus the unnecessary data copy time is eliminated, improve the performance of program, exempt from simultaneously Unnecessary power dissipation overhead is removed.

In a specific embodiment, shown in independent data space such as Fig. 6 (a) corresponding to Fig. 2 (a), task T_n Independent data space be MS01, the task T of streamline_nIts subtask T_n,1Subtask processing module Data storage handled by S1 is in MS01；Task T_n-1Independent data space be MS12, streamline Task T_n-1Its subtask T_n-1,2Subtask processing module S2 handled by data storage in MS12； Task T_n-2Independent data space be MS11, the task T of streamline_n-2Its subtask T_n-2,3Subtask Data storage handled by processing module S3 is in MS11；Task T_n-3Independent data space be MS10, The task T of streamline_n-3Its subtask T_n-3,4Subtask processing module S4 handled by data storage exist In MS10；Task T_n-4Independent data space be MS09, the task T of streamline_n-4Its subtask T_n-4,5 Subtask processing module S5 handled by data storage in MS09；Task T_n-5Independent data it is empty Between be MS08, the task T of streamline_n-5Its subtask T_n-5,6Subtask processing module S6 handled by Data storage is in MS08.

Shown in independent data space such as Fig. 6 (b) corresponding to Fig. 2 (b), task T_n+1Independent data space be MS02, the task T of streamline_n+1Its subtask T_n+1,1Subtask processing module S1 handled by data It is stored in MS02；Task T_nIndependent data space be MS01, the task T of streamline_nIts son is appointed Be engaged in T_n,2Subtask processing module S2 handled by data storage in MS01；Task T_n-1Independence Data space is MS12, the task T of streamline_n-1Subtask T_n-1,3Subtask processing module S3 institutes The data storage of processing is in MS12；Task T_n-2Independent data space be MS11, streamline is appointed Be engaged in T_n-2Its subtask T_n-2,4Subtask processing module S4 handled by data storage in MS11；Appoint Be engaged in T_n-3Independent data space be MS10, the task T of streamline_n-3Its subtask T_n-3,5Subtask at The data storage handled by module S5 is managed in MS10；Task T_n-4Independent data space be MS09, The task T of streamline_n-4Its subtask T_n-4,6Subtask processing module S6 handled by data storage exist In MS09.

Shown in independent data space such as Fig. 6 (c) corresponding to Fig. 2 (c), task T_n+2Independent data space be MS03, the task T of streamline_n+2Its subtask T_n+2,1Subtask processing module S1 handled by data It is stored in MS03；Task T_n+1Independent data space be MS02, the task T of streamline_n+1Its son Task T_n+1,2Subtask processing module S2 handled by data storage in MS02；Task T_nIt is only Vertical data space is MS01, the task T of streamline_nSubtask T_n,3Subtask processing module S3 institutes The data storage of processing is in MS01；Task T_n-1Independent data space be MS12 streamlines task T_n-1Its subtask T_n-1,4Subtask processing module S4 handled by data storage in MS12；Task T_n-2Independent data space be MS11, the task T of streamline_n-2Its subtask T_n-2,5Subtask processing Data storage handled by module S5 is in MS11；Task T_n-3Independent data space be MS10, stream The task T of waterline_n-3Its subtask T_n-3,6Subtask processing module S6 handled by data storage exist In MS10.

Shown in independent data space such as Fig. 6 (d) corresponding to Fig. 2 (d), task T_n+3Independent data space be MS04, the task T of streamline_n+3Its subtask T_n+3,1Subtask processing module S1 handled by data It is stored in MS04；Task T_n+2Independent data space be MS03, the task T of streamline_n+2Its son Task T_n+2,3Subtask processing module S2 handled by data storage in MS03；Task T_n+1It is only Vertical data space is MS02, the task T of streamline_n+1Subtask T_n+1,3Subtask processing module S3 Handled data storage is in MS02；Task T_nIndependent data space for MS01 streamlines appoint Be engaged in T_nIts subtask T_n,4Subtask processing module S4 handled by data storage in MS01；Task T_n-1Independent data space be MS12, the task T of streamline_n-1Its subtask T_n-1,5Subtask processing Data storage handled by module S5 is in MS12；Task T_n-2Independent data space be MS11, stream The task T of waterline_n-2Its subtask T_n-2,6Subtask processing module S6 handled by data storage exist In MS11.

Shown in independent data space such as Fig. 6 (e) corresponding to Fig. 2 (e), task T_n+4Independent data space be MS05, the task T of streamline_n+4Its subtask T_n+4,1Subtask processing module S1 handled by data It is stored in MS05；Task T_n+3Independent data space be MS04, the task T of streamline_n+3Its son Task T_n+3,3Subtask processing module S2 handled by data storage in MS04；Task T_n+2It is only Vertical data space is MS03, the task T of streamline_n+2Subtask T_n+2,3Subtask processing module S3 Handled data storage is in MS03；Task T_n+1Independent data space for MS02 streamlines appoint Be engaged in T_n+1Its subtask T_n+1,4Subtask processing module S4 handled by data storage in MS02；Appoint Be engaged in T_nIndependent data space be MS01, the task T of streamline_nIts subtask T_n,5Subtask processing Data storage handled by module S5 is in MS01；Task T_n-1Independent data space be MS12, stream The task T of waterline_n-1Its subtask T_n-1,6Subtask processing module S6 handled by data storage exist In MS12.

Shown in independent data space such as Fig. 6 (f) corresponding to Fig. 2 (f), task T_n+5Independent data space be MS06, the task T of streamline_n+5Its subtask T_n+5,1Subtask processing module S1 handled by data It is stored in MS06；Task T_n+4Independent data space be MS05, the task T of streamline_n+4Its son Task T_n+4,3Subtask processing module S2 handled by data storage in MS05；Task T_n+3It is only Vertical data space is MS04, the task T of streamline_n+3Subtask T_n+3,3Subtask processing module S3 Handled data storage is in MS04；Task T_n+2Independent data space for MS03 streamlines appoint Be engaged in T_n+2Its subtask T_n+2,4Subtask processing module S4 handled by data storage in MS03；Appoint Be engaged in T_n+1Independent data space be MS02, the task T of streamline_n+1Its subtask T_n+1,5Subtask Data storage handled by processing module S5 is in MS02；Task T_nIndependent data space be MS01, The task T of streamline_nIts subtask T_n,6Subtask processing module S6 handled by data storage exist In MS01.

The way to play for time of multi-stage pipeline parallel computation provided by the invention, in advance by each task of streamline It is divided into multistage subtask, and the independent data space of each task setting fixation for streamline, actually makes During, the independent data space of the task is shared in the subtasks at different levels of the same task of streamline, The subtask processing module of processing rear stage subtask inherits same task by updating the address of pointer Previous stage subtask independent data space data, so as to realize data transfer.Due to by more The address of the pointer of new subprocessing module, realizes data transfer, avoids copy and loses time and power consumption.

As shown in fig. 7, the way to play for time for multi-stage pipeline parallel computation provided in an embodiment of the present invention Another flow chart.This method includes：

Step S71, each task of streamline is divided into multistage subtask, the different task of streamline in advance Same one-level subtask handled by same subtask processing module；

Step S72, for each task of streamline, the execution duration to subtasks at different levels carries out excellent Change so that the absolute value that the subtasks at different levels of the task of streamline perform the difference of duration is as minimum as possible.

In the present embodiment, streamline calculating subtask at different levels is optimized, so that streamline is at different levels The absolute value for calculating the subtask time difference is as minimum as possible.

As Fig. 8 (a) shows the run time length of first order subtask (subtask 1, other to analogize) Tsub1, the run time length tsub2 of subtask 2, subtask 3 run time length tsub3, The run time length tsub4 of subtask 4, the run time length tsub5 of subtask 5, subtask 6 Run time length tsub6, due to each subtask perform time span differ, task can be caused TN subtasks 1 pursue the situation of task TN-x subtasks 6, so as to influence streamline first order flowing water During the task TN subtasks 1 of line to the task TN-x subtasks 6 of the level production line of streamline the 6th etc. Treat, played so as to limit the performance of streamline, also the data spaces at different levels to streamline design Improve difficulty.

Therefore, the subtask at different levels to streamline optimizes adjustment so that streamline calculating at different levels The absolute value of subtask time difference is as minimum as possible, for example, setting the difference of the execution duration of each subtask It is worth within the specific limits, it is of course also possible to the execution duration for setting each subtask is identical, such as： Tsub1opt=tsub2opt=tsub3opt=tsub4opt=tsub5opt=tsub6opt=ts ubmean= (tsub1+tsub2+tsub3+tsub4+tsub5+tsub6)/6, as shown in Fig. 8 (b).Need to illustrate , the execution total duration of each subtask is not quite similar, and locking-mutual exclusion is used in concurrent program Mechanism with avoid duration it is inconsistent caused by conflict.

Further, the calculating subtask time of the streamline first order is optimized, so that it is less than task Minimum generating period.So, streamline can be enabled mutually to tackle next task in real time, and (the first son is appointed Business) processing, preferably to improve parallel speed.As shown in figure 9, task T_N-1Going out current moment is t_N-1, task T_NIt is t to go out current moment_N, task T_N+1It is t to go out current moment_N+1, task T_N-1With task T_N Time interval be t_N–t_N-1, task T_NWith task T_N+1Time interval be t_N+1–t_N, therefore task is most Small generating period is Ttaskperiod=min (t_N–t_N-1,t_N+1–t_N).Streamline subtask at different levels performs the time tsubmean<Ttaskperiod, the streamline execution time at different levels to each task be ttask1, ttask2, Ttask3 ..., its maximum is ttaskmax=max (ttask1, ttask2, ttask3 ...).Preferably, flowing water The minimum series Npipelines of the task of line is the smallest positive integral not less than ttaskmax/tsubmean.

Step S73, for the independent data space of each task setting fixation of streamline, each independent data Space has fixed address.

The independent data sky of the task is shared in step S74, the subtasks at different levels of the same task of streamline Between, the subtask processing module of processing rear stage subtask is inherited same by updating the address of pointer The data in the independent data space of the previous stage subtask of task, so as to realize data transfer.

When the time span that each subtask performs is different, task T can be caused_NFirst order subtask Pursuit task T_N-xAfterbody subtask situation, so as to cause pipeline processes task T_NFirst To pipeline processes task T during level subtask_N-xAfterbody subtask wait.The present invention passes through Execution duration to subtasks at different levels optimizes so that the subtasks at different levels of the task of streamline perform The absolute value of the difference of duration is as minimum as possible, can effectively solve the above problems.

Correspondingly, present invention also offers a kind of buffer system of multi-stage pipeline parallel computation, such as scheme Shown in 10, including：

Task cutting module 101, for each task of streamline to be divided into multistage subtask in advance, stream The same one-level subtask of the different task of waterline is handled by same subtask processing module；

Memory space setting module 102, the independent data for each task setting fixation for streamline Space, each independent data space have fixed address；

The task is shared in data transfer module 103, the subtasks at different levels for the same task of streamline Independent data space, the subtask processing module of processing rear stage subtask is by updating the ground of pointer Location, the data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data Transmit.

As shown in figure 11, for multi-stage pipeline parallel computation provided by the invention buffer system it is another Kind structural representation, the system also include：Duration optimization module 114 is performed, for being cut in task After each task of streamline is divided into multistage subtask by sub-module 101 in advance, for the every of streamline One task, the execution duration to subtasks at different levels optimize so that the task of streamline it is at different levels The absolute value that subtask performs the difference of duration is as minimum as possible.It may insure that real-time parallel handles each Task, reduce to greatest extent because a certain subtask calculates overlong time and caused by calculate congestion and wait.

Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located The independent data space for managing the task of streamline forms ring-type independent data space.So that the present invention can be with Updated by simple pointer address, such as address+1 or -1 can obtain next pending son and appoint The address of the memory space of data needed for business, it is simple efficient and not error-prone.

In addition, when needing the task by streamline to be divided into multilayer subtask, the task cutting module 101 are specifically used for any layer task of streamline is divided into multiple next straton tasks according to demand, flow The same one-level subtask of the same layer task of waterline is handled by same subtask processing module.

Certainly, the system can further include memory module (not shown), and the memory module can be with , certainly can be with store tasks data, such as first order for storing series, tasks carrying duration etc. The output data of task.So, pending domain progress computer is automatically processed with facilitating, and can Output result relevant information of each task to store streamline etc..

The buffering method, system of multi-stage pipeline parallel computation provided in an embodiment of the present invention, is cut by task Each task of streamline is divided into multistage subtask by sub-module 101 in advance, is then set by memory space The independent data space that cover half block 102 is fixed for each task setting of streamline, in actual use, Share the independent data space of the task, data transfer mould in the subtasks at different levels of the same task of streamline Block 103 is used to handle the address of the subtask processing module of rear stage subtask by updating pointer, after The data in the independent data space of the previous stage subtask of same task are held, so as to realize data transfer. Due to the address of the pointer by updating subprocessing module, data transfer is realized, copy is avoided and wastes Time and power consumption.

Each embodiment in this specification is described by the way of progressive, phase between each embodiment With similar part mutually referring to.For components of system as directed, because it is according to this The system that the method that invention provides is formed, so describing fairly simple, related part is referring to method portion Defend oneself bright.Embodiments described above is only schematical, and those of ordinary skill in the art exist In the case of not paying creative work, you can to understand and implement.

Although the present invention is disclosed as above with preferred embodiment, but is not limited to the present invention.Appoint What those skilled in the art, without departing from the scope of the technical proposal of the invention, can profit Many possible variations are made to technical solution of the present invention with the methods and technical content of the disclosure above and are repaiied Decorations, or it is revised as the equivalent embodiment of equivalent variations.Therefore, it is every without departing from technical solution of the present invention Content, the technical spirit according to the present invention is to any simple modification made for any of the above embodiments, equivalent Change and modification, still fall within technical solution of the present invention protection in the range of.

Claims

1. a kind of way to play for time of multi-stage pipeline parallel computation, it is characterised in that including step：

2. according to the method for claim 1, it is characterised in that methods described also includes：

3. according to the method for claim 2, it is characterised in that the first of each task of streamline The minimum generating period of the task of execution all streamlines of duration ＜ of level subtask.

4. according to the method for claim 3, it is characterised in that the sons at different levels of the task of streamline The minimum series of task is not less than the maximum of the execution duration of each task in the task of current pipeline Value performs the smallest positive integral of the ratio of the average value of duration with subtasks at different levels.

5. according to the method described in any one of Claims 1-4, it is characterised in that adjacent streamline Task independent data space it is logically adjacent, the task of streamline to be handled independent digit Ring-type independent data space is formed according to space.

6. according to the method described in any one of Claims 1-4, it is characterised in that according to demand will Any layer task of streamline is divided into multiple next straton tasks.

A kind of 7. buffer system of multi-stage pipeline parallel computation, it is characterised in that including：

8. processing system according to claim 7, it is characterised in that the system also includes：

9. the processing system according to claim 7 or 8, it is characterised in that adjacent streamline The independent data space of task is logically adjacent, the task of streamline to be handled independent data Space forms ring-type independent data space.

10. the processing system according to claim 7 or 8, it is characterised in that the task is cut Sub-module is specifically used for any layer task of streamline is divided into multiple next straton tasks according to demand, The same one-level subtask of the same layer task of streamline is handled by same subtask processing module.