CN107402805A - A kind of way to play for time and system of multi-stage pipeline parallel computation - Google Patents
A kind of way to play for time and system of multi-stage pipeline parallel computation Download PDFInfo
- Publication number
- CN107402805A CN107402805A CN201610331646.7A CN201610331646A CN107402805A CN 107402805 A CN107402805 A CN 107402805A CN 201610331646 A CN201610331646 A CN 201610331646A CN 107402805 A CN107402805 A CN 107402805A
- Authority
- CN
- China
- Prior art keywords
- task
- subtask
- streamline
- data space
- independent data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/48—Indexing scheme relating to G06F9/48
- G06F2209/483—Multiproc
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a kind of way to play for time and system of multi-stage pipeline parallel computation, the method comprising the steps of:Each task of streamline is divided into multistage subtask in advance, the same one-level subtask of the different task of streamline is handled by same subtask processing module;The independent data space fixed for each task setting of streamline, each independent data space have fixed address;Share the independent data space of the task in the subtasks at different levels of the same task of streamline, the subtask processing module of processing rear stage subtask is by updating the address of pointer, the data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer.Due to the address by updating pointer, data transfer is realized, copy is avoided and loses time and power consumption.
Description
Technical field
The present invention relates to data processing field, more particularly to a kind of buffering of multi-stage pipeline parallel computation
Method and system.
Background technology
To each serial task, sequential processes, the processing of each task include traditional serial computing one by one
There are respective processing module, task Tn processing elder generation in the processing of some subtasks, each subtask
Each subtask processing module is sequentially undergone afterwards, just now to task after the processing of Tn subtasks at different levels
Tn+1 processing.If task Tn processing arrival time point of the end time point earlier than task Tn+1,
This serial computing method is feasible.
But if task Tn processing end time point is later than task Tn+1 arrival time point, this
Kind serial computing method is infeasible, at this moment needs to cause task Tn processing still by parallel computation
Can is not terminated when task Tn+1 arrives to Tn+1 processing, these subtask processing modules
After the subtask of current task has been handled, the next stage subtask on streamline is transferred data to
The subtask of processing module, then these next tasks of subtask processing module start to process.
For traditional streamline, streamline subtask processing modules at different levels have each independent independent digit
According to space, subtask processing module is passed data by copying after corresponding subtask execution terminates
Give streamline next stage subtask processing module.
Following shortcoming can so be caused:Data transfer is realized by copy between each subtask processing module,
The unnecessary data copy time is taken, it is temporal when transmission data volume is larger between subtask
Expense is obvious, have impact on the performance of program;Meanwhile perform that this data copy result in need not
The power dissipation overhead wanted.
The content of the invention
The present invention provides a kind of way to play for time and system of multi-stage pipeline parallel computation, solves existing skill
Lost time and power consumption by way of copy realizes data transfer between each subtask processing module in art
The problem of.
The invention provides a kind of way to play for time of multi-stage pipeline parallel computation, including step:
Each task of streamline is divided into multistage subtask, the same one-level of the different task of streamline in advance
Subtask is handled by same subtask processing module;
The independent data space fixed for each task setting of streamline, each independent data space has
Fixed address;
The independent data space of the task is shared in the subtasks at different levels of the same task of streamline, after processing
The subtask processing module of one-level subtask inherits the previous of same task by updating the address of pointer
The data in the independent data space of level subtask, so as to realize data transfer.
Preferably, methods described also includes:
After each task of streamline is divided into multistage subtask in advance, for each of streamline
Business, the execution duration to subtasks at different levels optimize so that the subtasks at different levels of the task of streamline
The absolute value for performing the difference of duration is as minimum as possible.
Preferably, execution all streamlines of duration < of the first order subtask of each task of streamline
The minimum generating period of task.
Preferably, the minimum series of the subtasks at different levels of the task of streamline is not less than current pipeline
Task in maximum and the subtasks at different levels of execution duration of each task perform the average value of duration
The smallest positive integral of ratio.
Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located
The independent data space for managing the task of streamline forms ring-type independent data space.
Preferably, any layer task of streamline is divided into multiple next straton tasks according to demand.
A kind of buffer system of multi-stage pipeline parallel computation, including:
Task cutting module, for each task of streamline to be divided into multistage subtask, streamline in advance
The same one-level subtask of different task handled by same subtask processing module;
Memory space setting module, for for streamline each task setting fixation independent data space,
Each independent data space has fixed address;
The only of the task is shared in data transfer module, the subtasks at different levels for the same task of streamline
Vertical data space, the subtask processing module of processing rear stage subtask by updating the address of pointer,
The data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer.
Preferably, the system also includes:
Duration optimization module is performed, for being in advance divided into each task of streamline in task dividing die block
After multistage subtask, for each task of streamline, progress during execution to subtasks at different levels
Row optimization so that the subtasks at different levels of the task of streamline perform the absolute value of the difference of duration as far as possible most
It is small.
Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located
The independent data space for managing the task of streamline forms ring-type independent data space.
Preferably, the task cutting module is specifically used for according to demand by any layer task of streamline
It is divided into multiple next straton tasks, the same one-level subtask of the same layer task of streamline is by same height
Task processing module processing.
The way to play for time and system of a kind of multi-stage pipeline parallel computation provided by the invention, by advance will
Each task of streamline is divided into multistage subtask, and the independent digit of each task setting fixation for streamline
According to space, in actual use, the task is shared in the subtasks at different levels of the same task of streamline
Independent data space, the subtask processing module of processing rear stage subtask by updating the address of pointer,
The data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer.
Due to the address of the pointer by updating subprocessing module, data transfer is realized, copy is avoided and wastes
Time and power consumption.
Further, when the time span of each subtask execution is different, task T can be causedN
One-level subtask pursuit task TN-xAfterbody subtask situation, so as to influence streamline appoint
Be engaged in TNTo the task T of streamline during first order subtaskN-xAfterbody subtask wait.This hair
The bright execution duration by subtasks at different levels is optimized so that the sons at different levels of the task of streamline are appointed
The absolute value that business performs the difference of duration is as minimum as possible, can effectively solve the above problems.
Further, when the present invention is by the execution of the first order subtask of each task for limiting streamline
The minimum generating period of the task of long all streamlines of < so that streamline can mutually tackle next in real time
The processing of task (the first subtask), preferably to improve parallel speed.
Further, The present invention gives the series of optimization, it is ensured that and real-time parallel handles each task,
Reduce to greatest extent because a certain subtask calculates overlong time and caused by calculate congestion and wait.
Further, it is annular memory space present invention defines memory space so that the present invention can be with
Updated by simple pointer address, such as address+1 or -1 can obtain next pending son and appoint
The address of the memory space of data needed for business, it is simple efficient and not error-prone.
Brief description of the drawings
, below will be to implementing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art
The required accompanying drawing used is briefly described in example, it should be apparent that, drawings in the following description are only
Some embodiments described in the present invention, for those of ordinary skill in the art, can also be according to these
Accompanying drawing obtains other accompanying drawings.
Fig. 1 (a) is multi-stage pipeline serial computing logical schematic in the prior art;
Fig. 1 (b) is multi-stage pipeline parallel computation logical schematic in the prior art;
Fig. 2 (a) to Fig. 2 (f) is the logic that each subtask processing module handles subtask in the prior art
Schematic diagram;
Fig. 3 is a kind of flow according to the way to play for time of multi-stage pipeline parallel computation provided by the invention
Figure;
Fig. 4 is the knot according to the ring-type independent data space with 12 memory spaces provided by the invention
Structure schematic diagram;
Fig. 5 is that the subtask for transferring data to next stage subtask by copy in the prior art is handled
The logical schematic of module;
Fig. 6 (a) to Fig. 6 (f) is according to a kind of schematic diagram for determining region to be filled provided by the invention;
Fig. 7 is another for the way to play for time according to multi-stage pipeline parallel computation provided in an embodiment of the present invention
Kind flow chart;
Fig. 8 (a) is that the subtasks at different levels of parallel computation in the prior art perform the schematic diagram of duration;
Fig. 8 (b) is to perform duration according to the subtasks at different levels of parallel computation provided in an embodiment of the present invention
Schematic diagram;
Fig. 9 is to be occurred according to the task of each streamline during parallel computation provided in an embodiment of the present invention
Time;
Figure 10 is one of the buffer system according to multi-stage pipeline parallel computation provided in an embodiment of the present invention
Kind structural representation;
Figure 11 is another for the buffer system according to multi-stage pipeline parallel computation provided in an embodiment of the present invention
A kind of structural representation.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein certainly
Begin to same or similar label eventually to represent same or similar element or there is the member of same or like function
Part.The embodiments described below with reference to the accompanying drawings are exemplary, is only used for explaining the present invention, and can not
It is construed to limitation of the present invention.
To each serial task, sequential processes, the processing of each task include traditional serial computing one by one
There is respective processing module the processing of some subtasks, each subtask.Below with by each of streamline
Task illustrates exemplified by being divided into 6 grades of subtasks.As shown in Fig. 1 (a), a task TnProcessing it is first
Subtask processing module S1, subtask processing module S2, subtask processing module are sequentially undergone afterwards
S3, subtask processing module S4, subtask processing module S5, subtask processing module S6, in Tn
Just now to task T after being handled by S1, S2, S3, S4, S5, S6n+1Handled.If appoint
Be engaged in TnProcessing end time point earlier than task Tn+1Arrival time point, this serial computing method is
Feasible.
If task TnProcessing end time point be later than task Tn+1Arrival time point, this serial meter
Calculation method is infeasible, at this moment needs to cause task T by parallel computationnProcessing not yet terminate just
Can be in task Tn+1To T during arrivaln+1Handled, as shown in Fig. 1 (b), subtask processing module
S1, S2, S3, S4, S5, S6 concurrent working, these subtask processing modules are being handled as predecessor
After the subtask of business, the next stage subtask processing module on streamline is transferred data to, then
The subtask of these complete next tasks of subtask processing module start to process.Currently processed subtask is
TN- (i-1), i, wherein N- (i-1) is current task number, and i is current subtask number, i=1,2,3 ..., N=1,2,3 ....
During pipeline parallel computing being shown such as Fig. 2 (a), the subtask of each subtask processing module processing,
Subtask processing module S1 processing tasks TnSubtask Tn,1, subtask processing module S2 processing tasks
Tn-1Subtask Tn-1,2, subtask processing module S3 processing tasks Tn-2Subtask Tn-2,3, subtask
Processing module S4 processing tasks Tn-3Subtask Tn-3,4, subtask processing module S5 processing tasks Tn-4
Subtask Tn-4,5, subtask processing module S6 processing tasks Tn-5Subtask Tn-5,6。
In the pipeline parallel computing as shown in Fig. 2 (b), subtask processing module S1 processing tasks Tn+1Son appoint
Be engaged in Tn+1,1, subtask processing module S2 processing tasks TnSubtask Tn,2, subtask processing module S3
Processing task Tn-1Subtask Tn-1,3, subtask processing module S4 processing tasks Tn-2Subtask Tn-2,4,
Subtask processing module S5 processing tasks Tn-3Subtask Tn-3,5, subtask processing module S6 processing times
Be engaged in Tn-4Subtask Tn-4,6。
In the pipeline parallel computing as shown in Fig. 2 (c), subtask processing module S1 processing tasks Tn+2Son appoint
Be engaged in Tn+2,1, subtask processing module S2 processing tasks Tn+1Subtask Tn+1,2, subtask processing module
S3 processing tasks TnSubtask Tn,3, subtask processing module S4 processing tasks Tn-1Subtask Tn-1,4,
Subtask processing module S5 processing tasks Tn-2Subtask Tn-2,5, subtask processing module S6 processing times
Be engaged in Tn-3Subtask Tn-3,6。
In the pipeline parallel computing as shown in Fig. 2 (d), subtask processing module S1 processing tasks Tn+3Son appoint
Be engaged in Tn+3,1, subtask processing module S2 processing tasks Tn+2Subtask Tn+2,2, subtask processing module
S3 processing tasks Tn+1Subtask Tn+1,3, subtask processing module S4 processing tasks TnSubtask Tn,4,
Subtask processing module S5 processing tasks Tn-1Subtask Tn-1,5, subtask processing module S6 processing times
Be engaged in Tn-2Subtask Tn-2,6。
In the pipeline parallel computing as shown in Fig. 2 (e), subtask processing module S1 processing tasks Tn+4Son appoint
Be engaged in Tn+4,1, subtask processing module S2 processing tasks Tn+3Subtask Tn+3,2, subtask processing module
S3 processing tasks Tn+2Subtask Tn+2,3, subtask processing module S4 processing tasks Tn+1Subtask
Tn+1,4, subtask processing module S5 processing tasks TnSubtask Tn,5, at the processing module S6 of subtask
Reason task Tn-1Subtask Tn-1,6。
In the pipeline parallel computing as shown in Fig. 2 (f), subtask processing module S1 processing tasks Tn+5Son
Task Tn+5,1, subtask processing module S2 processing tasks Tn+4Subtask Tn+4,2, subtask processing mould
Block S3 processing tasks Tn+3Subtask Tn+3,3, subtask processing module S4 processing tasks Tn+2Son appoint
Be engaged in Tn+2,4, subtask processing module S5 processing tasks Tn+1Subtask Tn+1,5, subtask processing module
S6 processing tasks TnSubtask Tn,6。
For traditional streamline, streamline subtask processing modules at different levels have each independent independent data empty
Between, subtask processing module is transferred data to next after corresponding subtask execution terminates by copy
Level subtask processing module.As shown in figure 3, there is fixed independent data each subtask of pipeline parallel computing
Space, data transfer is realized by copy between subtask.Task TiSubtask Ti,1Fixation independent digit
It is MS1, task T according to spaceiSubtask Ti,2Fixation independent data space be MS2, task Ti's
Subtask Ti,3Fixation independent data space be MS3, task TiSubtask Ti,4Fixation independent data
Space is MS4, task TiSubtask Ti,5Fixation independent data space be MS5, task TiSon
Task Ti,6Fixation independent data space be MS6, wherein i=0,1,2 ....Its shortcoming is subtask
Between by copy realize that data transfer takes the unnecessary data copy time, number is transmitted between subtask
During according to measuring larger, temporal expense is obvious, have impact on the performance of program, while performs this data
Copy result in unnecessary power dissipation overhead.
The way to play for time and system of a kind of multi-stage pipeline parallel computation provided by the invention, by advance will
Each task of streamline is divided into multistage subtask, is the independent data that each task setting of streamline is fixed
The independent data space of the task is shared in space, the subtasks at different levels of the same task of streamline, processing
The subtask processing module of rear stage subtask is by updating the address of pointer, before inheriting same task
The data in the independent data space of one-level subtask, so as to realize data transfer.Due to streamline not
Same one-level subtask with task is handled by same subtask processing module, each task tool of streamline
There is fixed independent data space, and each independent data space has fixed address so that each
Subtask processing module can be by updating the address of the pointer of memory space pointed by the submodule.Allow
Inherit the number in the independent data space of the previous stage subtask of same task in pending rear stage subtask
According to so as to realize data transfer.Data transfer can be achieved without copy in the process, can effectively be lifted
Buffering efficiency simultaneously reduces power consumption.
In order to be better understood from technical scheme and technique effect, below with reference to flow chart and
Specific embodiment is described in detail, and flow chart is as shown in Figure 3.
Step S01, each task of streamline is divided into multistage subtask, the different task of streamline in advance
Same one-level subtask handled by same subtask processing module.
In this embodiment, each task of streamline is divided into how many grades of subtasks, depending on real needs,
The maximum duration ttaskmax of the task of a streamline is handled, during the minimum that two neighboring task arrives
Between be spaced Ttaskperiod_min, then the calculating time of each subtask should be less than
Ttaskperiod_min, can ensure that real-time parallel handles each task, reduce to greatest extent because
A certain subtask calculates overlong time and causes to calculate congestion and wait.Therefore, can be according to each
Each task of streamline is divided into multistage subtask by the time needed for the calculating of subtask.Streamline is not
Same one-level subtask with task is handled by same subtask processing module.
Further, any layer task of streamline is divided into multiple next straton tasks according to demand.
That is, for the calculating inside subtask, the computational methods of parallel pipelining process can be also used, so as to
Reduce subtask and calculate the response time.Correspondingly, the same one-level subtask of the same layer task of streamline
Handled by same subtask processing module.
Step S02, for the independent data space of each task setting fixation of streamline, each independent data
Space has fixed address.
In the present embodiment, the number in independent data space can be equal to the task number of streamline, solely
The number of vertical data space can also be more than the task number of streamline.Specifically, current subtask Ti,j,
Corresponding memory space is MSk, wherein, when i%N is not 0, k=i%N;When i%N is 0,
K=N, N are independent data space number, and wherein % represents remainder.The space in independent data space is big
It is small should be equal, i.e., the byte number of memory space should be equal.Specifically, when the task of streamline is 6
When individual, the number in independent data space can be no less than 6, below with 12 independent data spaces
Exemplified by illustrate, independent data space includes:MS01、MS02、MS03、MS04、MS05、
MS06、MS07、MS08、MS09、MS10、MS11、MS12。
Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located
The independent data space for managing the task of streamline forms ring-type independent data space.This can so be caused
Invention can be updated by simple pointer address, such as address+1 or -1 can obtain next treat
The address of the memory space of data needed for subtask is handled, it is simple efficient and not error-prone.
In a specific embodiment, each independent data Simulation spatial service is in a task, adjacent
The independent data space of business is logically adjacent, MS01, MS02, MS03, MS04, MS05,
It is empty that MS06, MS07, MS08, MS09, MS10, MS11, MS12 form ring-type independent data
Between, as shown in Figure 4.
The independent data sky of the task is shared in step S03, the subtasks at different levels of the same task of streamline
Between, the subtask processing module of processing rear stage subtask is inherited same by updating the address of pointer
The data in the independent data space of the previous stage subtask of task, so as to realize data transfer.
In the present embodiment, the subtask processing module of rear stage subtask is handled by updating pointer
Address, the data in the independent data space of the previous stage subtask of same task are inherited, so as to realize number
According to transmission.As shown in figure 5, to transfer data to next stage subtask by copy in the prior art
Subtask processing module logical schematic.Benefit of the invention is that copy is avoided between subtask
Process, the pointer address of the memory space by updating each subtask processing module, it is possible to realize number
According to transmission, thus the unnecessary data copy time is eliminated, improve the performance of program, exempt from simultaneously
Unnecessary power dissipation overhead is removed.
In a specific embodiment, shown in independent data space such as Fig. 6 (a) corresponding to Fig. 2 (a), task Tn
Independent data space be MS01, the task T of streamlinenIts subtask Tn,1Subtask processing module
Data storage handled by S1 is in MS01;Task Tn-1Independent data space be MS12, streamline
Task Tn-1Its subtask Tn-1,2Subtask processing module S2 handled by data storage in MS12;
Task Tn-2Independent data space be MS11, the task T of streamlinen-2Its subtask Tn-2,3Subtask
Data storage handled by processing module S3 is in MS11;Task Tn-3Independent data space be MS10,
The task T of streamlinen-3Its subtask Tn-3,4Subtask processing module S4 handled by data storage exist
In MS10;Task Tn-4Independent data space be MS09, the task T of streamlinen-4Its subtask Tn-4,5
Subtask processing module S5 handled by data storage in MS09;Task Tn-5Independent data it is empty
Between be MS08, the task T of streamlinen-5Its subtask Tn-5,6Subtask processing module S6 handled by
Data storage is in MS08.
Shown in independent data space such as Fig. 6 (b) corresponding to Fig. 2 (b), task Tn+1Independent data space be
MS02, the task T of streamlinen+1Its subtask Tn+1,1Subtask processing module S1 handled by data
It is stored in MS02;Task TnIndependent data space be MS01, the task T of streamlinenIts son is appointed
Be engaged in Tn,2Subtask processing module S2 handled by data storage in MS01;Task Tn-1Independence
Data space is MS12, the task T of streamlinen-1Subtask Tn-1,3Subtask processing module S3 institutes
The data storage of processing is in MS12;Task Tn-2Independent data space be MS11, streamline is appointed
Be engaged in Tn-2Its subtask Tn-2,4Subtask processing module S4 handled by data storage in MS11;Appoint
Be engaged in Tn-3Independent data space be MS10, the task T of streamlinen-3Its subtask Tn-3,5Subtask at
The data storage handled by module S5 is managed in MS10;Task Tn-4Independent data space be MS09,
The task T of streamlinen-4Its subtask Tn-4,6Subtask processing module S6 handled by data storage exist
In MS09.
Shown in independent data space such as Fig. 6 (c) corresponding to Fig. 2 (c), task Tn+2Independent data space be
MS03, the task T of streamlinen+2Its subtask Tn+2,1Subtask processing module S1 handled by data
It is stored in MS03;Task Tn+1Independent data space be MS02, the task T of streamlinen+1Its son
Task Tn+1,2Subtask processing module S2 handled by data storage in MS02;Task TnIt is only
Vertical data space is MS01, the task T of streamlinenSubtask Tn,3Subtask processing module S3 institutes
The data storage of processing is in MS01;Task Tn-1Independent data space be MS12 streamlines task
Tn-1Its subtask Tn-1,4Subtask processing module S4 handled by data storage in MS12;Task
Tn-2Independent data space be MS11, the task T of streamlinen-2Its subtask Tn-2,5Subtask processing
Data storage handled by module S5 is in MS11;Task Tn-3Independent data space be MS10, stream
The task T of waterlinen-3Its subtask Tn-3,6Subtask processing module S6 handled by data storage exist
In MS10.
Shown in independent data space such as Fig. 6 (d) corresponding to Fig. 2 (d), task Tn+3Independent data space be
MS04, the task T of streamlinen+3Its subtask Tn+3,1Subtask processing module S1 handled by data
It is stored in MS04;Task Tn+2Independent data space be MS03, the task T of streamlinen+2Its son
Task Tn+2,3Subtask processing module S2 handled by data storage in MS03;Task Tn+1It is only
Vertical data space is MS02, the task T of streamlinen+1Subtask Tn+1,3Subtask processing module S3
Handled data storage is in MS02;Task TnIndependent data space for MS01 streamlines appoint
Be engaged in TnIts subtask Tn,4Subtask processing module S4 handled by data storage in MS01;Task
Tn-1Independent data space be MS12, the task T of streamlinen-1Its subtask Tn-1,5Subtask processing
Data storage handled by module S5 is in MS12;Task Tn-2Independent data space be MS11, stream
The task T of waterlinen-2Its subtask Tn-2,6Subtask processing module S6 handled by data storage exist
In MS11.
Shown in independent data space such as Fig. 6 (e) corresponding to Fig. 2 (e), task Tn+4Independent data space be
MS05, the task T of streamlinen+4Its subtask Tn+4,1Subtask processing module S1 handled by data
It is stored in MS05;Task Tn+3Independent data space be MS04, the task T of streamlinen+3Its son
Task Tn+3,3Subtask processing module S2 handled by data storage in MS04;Task Tn+2It is only
Vertical data space is MS03, the task T of streamlinen+2Subtask Tn+2,3Subtask processing module S3
Handled data storage is in MS03;Task Tn+1Independent data space for MS02 streamlines appoint
Be engaged in Tn+1Its subtask Tn+1,4Subtask processing module S4 handled by data storage in MS02;Appoint
Be engaged in TnIndependent data space be MS01, the task T of streamlinenIts subtask Tn,5Subtask processing
Data storage handled by module S5 is in MS01;Task Tn-1Independent data space be MS12, stream
The task T of waterlinen-1Its subtask Tn-1,6Subtask processing module S6 handled by data storage exist
In MS12.
Shown in independent data space such as Fig. 6 (f) corresponding to Fig. 2 (f), task Tn+5Independent data space be
MS06, the task T of streamlinen+5Its subtask Tn+5,1Subtask processing module S1 handled by data
It is stored in MS06;Task Tn+4Independent data space be MS05, the task T of streamlinen+4Its son
Task Tn+4,3Subtask processing module S2 handled by data storage in MS05;Task Tn+3It is only
Vertical data space is MS04, the task T of streamlinen+3Subtask Tn+3,3Subtask processing module S3
Handled data storage is in MS04;Task Tn+2Independent data space for MS03 streamlines appoint
Be engaged in Tn+2Its subtask Tn+2,4Subtask processing module S4 handled by data storage in MS03;Appoint
Be engaged in Tn+1Independent data space be MS02, the task T of streamlinen+1Its subtask Tn+1,5Subtask
Data storage handled by processing module S5 is in MS02;Task TnIndependent data space be MS01,
The task T of streamlinenIts subtask Tn,6Subtask processing module S6 handled by data storage exist
In MS01.
The way to play for time of multi-stage pipeline parallel computation provided by the invention, in advance by each task of streamline
It is divided into multistage subtask, and the independent data space of each task setting fixation for streamline, actually makes
During, the independent data space of the task is shared in the subtasks at different levels of the same task of streamline,
The subtask processing module of processing rear stage subtask inherits same task by updating the address of pointer
Previous stage subtask independent data space data, so as to realize data transfer.Due to by more
The address of the pointer of new subprocessing module, realizes data transfer, avoids copy and loses time and power consumption.
As shown in fig. 7, the way to play for time for multi-stage pipeline parallel computation provided in an embodiment of the present invention
Another flow chart.This method includes:
Step S71, each task of streamline is divided into multistage subtask, the different task of streamline in advance
Same one-level subtask handled by same subtask processing module;
Step S72, for each task of streamline, the execution duration to subtasks at different levels carries out excellent
Change so that the absolute value that the subtasks at different levels of the task of streamline perform the difference of duration is as minimum as possible.
In the present embodiment, streamline calculating subtask at different levels is optimized, so that streamline is at different levels
The absolute value for calculating the subtask time difference is as minimum as possible.
As Fig. 8 (a) shows the run time length of first order subtask (subtask 1, other to analogize)
Tsub1, the run time length tsub2 of subtask 2, subtask 3 run time length tsub3,
The run time length tsub4 of subtask 4, the run time length tsub5 of subtask 5, subtask 6
Run time length tsub6, due to each subtask perform time span differ, task can be caused
TN subtasks 1 pursue the situation of task TN-x subtasks 6, so as to influence streamline first order flowing water
During the task TN subtasks 1 of line to the task TN-x subtasks 6 of the level production line of streamline the 6th etc.
Treat, played so as to limit the performance of streamline, also the data spaces at different levels to streamline design
Improve difficulty.
Therefore, the subtask at different levels to streamline optimizes adjustment so that streamline calculating at different levels
The absolute value of subtask time difference is as minimum as possible, for example, setting the difference of the execution duration of each subtask
It is worth within the specific limits, it is of course also possible to the execution duration for setting each subtask is identical, such as:
Tsub1opt=tsub2opt=tsub3opt=tsub4opt=tsub5opt=tsub6opt=ts ubmean=
(tsub1+tsub2+tsub3+tsub4+tsub5+tsub6)/6, as shown in Fig. 8 (b).Need to illustrate
, the execution total duration of each subtask is not quite similar, and locking-mutual exclusion is used in concurrent program
Mechanism with avoid duration it is inconsistent caused by conflict.
Further, the calculating subtask time of the streamline first order is optimized, so that it is less than task
Minimum generating period.So, streamline can be enabled mutually to tackle next task in real time, and (the first son is appointed
Business) processing, preferably to improve parallel speed.As shown in figure 9, task TN-1Going out current moment is
tN-1, task TNIt is t to go out current momentN, task TN+1It is t to go out current momentN+1, task TN-1With task TN
Time interval be tN–tN-1, task TNWith task TN+1Time interval be tN+1–tN, therefore task is most
Small generating period is Ttaskperiod=min (tN–tN-1,tN+1–tN).Streamline subtask at different levels performs the time
tsubmean<Ttaskperiod, the streamline execution time at different levels to each task be ttask1, ttask2,
Ttask3 ..., its maximum is ttaskmax=max (ttask1, ttask2, ttask3 ...).Preferably, flowing water
The minimum series Npipelines of the task of line is the smallest positive integral not less than ttaskmax/tsubmean.
Step S73, for the independent data space of each task setting fixation of streamline, each independent data
Space has fixed address.
The independent data sky of the task is shared in step S74, the subtasks at different levels of the same task of streamline
Between, the subtask processing module of processing rear stage subtask is inherited same by updating the address of pointer
The data in the independent data space of the previous stage subtask of task, so as to realize data transfer.
When the time span that each subtask performs is different, task T can be causedNFirst order subtask
Pursuit task TN-xAfterbody subtask situation, so as to cause pipeline processes task TNFirst
To pipeline processes task T during level subtaskN-xAfterbody subtask wait.The present invention passes through
Execution duration to subtasks at different levels optimizes so that the subtasks at different levels of the task of streamline perform
The absolute value of the difference of duration is as minimum as possible, can effectively solve the above problems.
Correspondingly, present invention also offers a kind of buffer system of multi-stage pipeline parallel computation, such as scheme
Shown in 10, including:
Task cutting module 101, for each task of streamline to be divided into multistage subtask in advance, stream
The same one-level subtask of the different task of waterline is handled by same subtask processing module;
Memory space setting module 102, the independent data for each task setting fixation for streamline
Space, each independent data space have fixed address;
The task is shared in data transfer module 103, the subtasks at different levels for the same task of streamline
Independent data space, the subtask processing module of processing rear stage subtask is by updating the ground of pointer
Location, the data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data
Transmit.
As shown in figure 11, for multi-stage pipeline parallel computation provided by the invention buffer system it is another
Kind structural representation, the system also include:Duration optimization module 114 is performed, for being cut in task
After each task of streamline is divided into multistage subtask by sub-module 101 in advance, for the every of streamline
One task, the execution duration to subtasks at different levels optimize so that the task of streamline it is at different levels
The absolute value that subtask performs the difference of duration is as minimum as possible.It may insure that real-time parallel handles each
Task, reduce to greatest extent because a certain subtask calculates overlong time and caused by calculate congestion and wait.
Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located
The independent data space for managing the task of streamline forms ring-type independent data space.So that the present invention can be with
Updated by simple pointer address, such as address+1 or -1 can obtain next pending son and appoint
The address of the memory space of data needed for business, it is simple efficient and not error-prone.
In addition, when needing the task by streamline to be divided into multilayer subtask, the task cutting module
101 are specifically used for any layer task of streamline is divided into multiple next straton tasks according to demand, flow
The same one-level subtask of the same layer task of waterline is handled by same subtask processing module.
Certainly, the system can further include memory module (not shown), and the memory module can be with
, certainly can be with store tasks data, such as first order for storing series, tasks carrying duration etc.
The output data of task.So, pending domain progress computer is automatically processed with facilitating, and can
Output result relevant information of each task to store streamline etc..
The buffering method, system of multi-stage pipeline parallel computation provided in an embodiment of the present invention, is cut by task
Each task of streamline is divided into multistage subtask by sub-module 101 in advance, is then set by memory space
The independent data space that cover half block 102 is fixed for each task setting of streamline, in actual use,
Share the independent data space of the task, data transfer mould in the subtasks at different levels of the same task of streamline
Block 103 is used to handle the address of the subtask processing module of rear stage subtask by updating pointer, after
The data in the independent data space of the previous stage subtask of same task are held, so as to realize data transfer.
Due to the address of the pointer by updating subprocessing module, data transfer is realized, copy is avoided and wastes
Time and power consumption.
Each embodiment in this specification is described by the way of progressive, phase between each embodiment
With similar part mutually referring to.For components of system as directed, because it is according to this
The system that the method that invention provides is formed, so describing fairly simple, related part is referring to method portion
Defend oneself bright.Embodiments described above is only schematical, and those of ordinary skill in the art exist
In the case of not paying creative work, you can to understand and implement.
Although the present invention is disclosed as above with preferred embodiment, but is not limited to the present invention.Appoint
What those skilled in the art, without departing from the scope of the technical proposal of the invention, can profit
Many possible variations are made to technical solution of the present invention with the methods and technical content of the disclosure above and are repaiied
Decorations, or it is revised as the equivalent embodiment of equivalent variations.Therefore, it is every without departing from technical solution of the present invention
Content, the technical spirit according to the present invention is to any simple modification made for any of the above embodiments, equivalent
Change and modification, still fall within technical solution of the present invention protection in the range of.
Claims (10)
1. a kind of way to play for time of multi-stage pipeline parallel computation, it is characterised in that including step:
Each task of streamline is divided into multistage subtask, the same one-level of the different task of streamline in advance
Subtask is handled by same subtask processing module;
The independent data space fixed for each task setting of streamline, each independent data space has
Fixed address;
The independent data space of the task is shared in the subtasks at different levels of the same task of streamline, after processing
The subtask processing module of one-level subtask inherits the previous of same task by updating the address of pointer
The data in the independent data space of level subtask, so as to realize data transfer.
2. according to the method for claim 1, it is characterised in that methods described also includes:
After each task of streamline is divided into multistage subtask in advance, for each of streamline
Business, the execution duration to subtasks at different levels optimize so that the subtasks at different levels of the task of streamline
The absolute value for performing the difference of duration is as minimum as possible.
3. according to the method for claim 2, it is characterised in that the first of each task of streamline
The minimum generating period of the task of execution all streamlines of duration < of level subtask.
4. according to the method for claim 3, it is characterised in that the sons at different levels of the task of streamline
The minimum series of task is not less than the maximum of the execution duration of each task in the task of current pipeline
Value performs the smallest positive integral of the ratio of the average value of duration with subtasks at different levels.
5. according to the method described in any one of Claims 1-4, it is characterised in that adjacent streamline
Task independent data space it is logically adjacent, the task of streamline to be handled independent digit
Ring-type independent data space is formed according to space.
6. according to the method described in any one of Claims 1-4, it is characterised in that according to demand will
Any layer task of streamline is divided into multiple next straton tasks.
A kind of 7. buffer system of multi-stage pipeline parallel computation, it is characterised in that including:
Task cutting module, for each task of streamline to be divided into multistage subtask, streamline in advance
The same one-level subtask of different task handled by same subtask processing module;
Memory space setting module, for for streamline each task setting fixation independent data space,
Each independent data space has fixed address;
The only of the task is shared in data transfer module, the subtasks at different levels for the same task of streamline
Vertical data space, the subtask processing module of processing rear stage subtask by updating the address of pointer,
The data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer.
8. processing system according to claim 7, it is characterised in that the system also includes:
Duration optimization module is performed, for being in advance divided into each task of streamline in task dividing die block
After multistage subtask, for each task of streamline, progress during execution to subtasks at different levels
Row optimization so that the subtasks at different levels of the task of streamline perform the absolute value of the difference of duration as far as possible most
It is small.
9. the processing system according to claim 7 or 8, it is characterised in that adjacent streamline
The independent data space of task is logically adjacent, the task of streamline to be handled independent data
Space forms ring-type independent data space.
10. the processing system according to claim 7 or 8, it is characterised in that the task is cut
Sub-module is specifically used for any layer task of streamline is divided into multiple next straton tasks according to demand,
The same one-level subtask of the same layer task of streamline is handled by same subtask processing module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610331646.7A CN107402805A (en) | 2016-05-18 | 2016-05-18 | A kind of way to play for time and system of multi-stage pipeline parallel computation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610331646.7A CN107402805A (en) | 2016-05-18 | 2016-05-18 | A kind of way to play for time and system of multi-stage pipeline parallel computation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107402805A true CN107402805A (en) | 2017-11-28 |
Family
ID=60394358
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610331646.7A Pending CN107402805A (en) | 2016-05-18 | 2016-05-18 | A kind of way to play for time and system of multi-stage pipeline parallel computation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107402805A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271245A (en) * | 2018-09-13 | 2019-01-25 | 腾讯科技(深圳)有限公司 | A kind of control method and device of block processes task |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110078692A1 (en) * | 2009-09-25 | 2011-03-31 | Nickolls John R | Coalescing memory barrier operations across multiple parallel threads |
CN102402493A (en) * | 2010-09-07 | 2012-04-04 | 国际商业机器公司 | System and method for a hierarchical buffer system for a shared data bus |
US20130054938A1 (en) * | 2007-04-20 | 2013-02-28 | The Regents Of The University Of Colorado | Efficient pipeline parallelism using frame shared memory |
CN104753533A (en) * | 2013-12-26 | 2015-07-01 | 中国科学院电子学研究所 | Staged shared double-channel assembly line type analog to digital converter |
-
2016
- 2016-05-18 CN CN201610331646.7A patent/CN107402805A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130054938A1 (en) * | 2007-04-20 | 2013-02-28 | The Regents Of The University Of Colorado | Efficient pipeline parallelism using frame shared memory |
US20110078692A1 (en) * | 2009-09-25 | 2011-03-31 | Nickolls John R | Coalescing memory barrier operations across multiple parallel threads |
CN102402493A (en) * | 2010-09-07 | 2012-04-04 | 国际商业机器公司 | System and method for a hierarchical buffer system for a shared data bus |
CN104753533A (en) * | 2013-12-26 | 2015-07-01 | 中国科学院电子学研究所 | Staged shared double-channel assembly line type analog to digital converter |
Non-Patent Citations (2)
Title |
---|
QJZLDO10: ""第3章流水线技术"", 《百度文库》 * |
梁强 等: ""仿真软件多进程间数据交互实现研究"", 《系统仿真学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271245A (en) * | 2018-09-13 | 2019-01-25 | 腾讯科技(深圳)有限公司 | A kind of control method and device of block processes task |
CN110457123A (en) * | 2018-09-13 | 2019-11-15 | 腾讯科技(深圳)有限公司 | A kind of control method and device of block processes task |
CN109271245B (en) * | 2018-09-13 | 2021-04-27 | 腾讯科技(深圳)有限公司 | Control method and device for block processing task |
CN110457123B (en) * | 2018-09-13 | 2021-06-15 | 腾讯科技(深圳)有限公司 | Control method and device for block processing task |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10884795B2 (en) | Dynamic accelerator scheduling and grouping for deep learning jobs in a computing cluster | |
US9584430B2 (en) | Traffic scheduling device | |
Wang et al. | Load balancing task scheduling based on genetic algorithm in cloud computing | |
WO2017185394A1 (en) | Device and method for performing reversetraining of fully connected layers of neural network | |
CN105450618B (en) | A kind of operation method and its system of API server processing big data | |
CN103309738B (en) | User job dispatching method and device | |
JP2017050001A (en) | System and method for use in efficient neural network deployment | |
US20150199216A1 (en) | Scheduling and execution of tasks | |
CN113051053B (en) | Heterogeneous resource scheduling method, heterogeneous resource scheduling device, heterogeneous resource scheduling equipment and computer readable storage medium | |
CN110889510B (en) | Online scheduling method and device for distributed machine learning task | |
US11663461B2 (en) | Instruction distribution in an array of neural network cores | |
Li et al. | Leveraging endpoint flexibility when scheduling coflows across geo-distributed datacenters | |
Yin et al. | Two-agent single-machine scheduling with unrestricted due date assignment | |
CN103914556A (en) | Large-scale graph data processing method | |
CN106550042B (en) | Multithreading method for down loading and device and calculating equipment | |
Xu et al. | An improved binary PSO-based task scheduling algorithm in green cloud computing | |
CN104125166A (en) | Queue scheduling method and computing system | |
WO2022062648A1 (en) | Automatic driving simulation task scheduling method and apparatus, device, and readable medium | |
WO2017185248A1 (en) | Apparatus and method for performing auto-learning operation of artificial neural network | |
CN109710372A (en) | A kind of computation-intensive cloud workflow schedule method based on cat owl searching algorithm | |
CN107402805A (en) | A kind of way to play for time and system of multi-stage pipeline parallel computation | |
CN112862083B (en) | Deep neural network inference method and device in edge environment | |
CN110958192B (en) | Virtual data center resource allocation system and method based on virtual switch | |
KR102332523B1 (en) | Apparatus and method for execution processing | |
CN109976873A (en) | The scheduling scheme acquisition methods and dispatching method of containerization distributed computing framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171128 |
|
RJ01 | Rejection of invention patent application after publication |