CN107402805A - A kind of way to play for time and system of multi-stage pipeline parallel computation - Google Patents

A kind of way to play for time and system of multi-stage pipeline parallel computation Download PDF

Info

Publication number
CN107402805A
CN107402805A CN201610331646.7A CN201610331646A CN107402805A CN 107402805 A CN107402805 A CN 107402805A CN 201610331646 A CN201610331646 A CN 201610331646A CN 107402805 A CN107402805 A CN 107402805A
Authority
CN
China
Prior art keywords
task
subtask
streamline
data space
independent data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610331646.7A
Other languages
Chinese (zh)
Inventor
吴玉平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Microelectronics of CAS
Original Assignee
Institute of Microelectronics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Microelectronics of CAS filed Critical Institute of Microelectronics of CAS
Priority to CN201610331646.7A priority Critical patent/CN107402805A/en
Publication of CN107402805A publication Critical patent/CN107402805A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/483Multiproc
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of way to play for time and system of multi-stage pipeline parallel computation, the method comprising the steps of:Each task of streamline is divided into multistage subtask in advance, the same one-level subtask of the different task of streamline is handled by same subtask processing module;The independent data space fixed for each task setting of streamline, each independent data space have fixed address;Share the independent data space of the task in the subtasks at different levels of the same task of streamline, the subtask processing module of processing rear stage subtask is by updating the address of pointer, the data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer.Due to the address by updating pointer, data transfer is realized, copy is avoided and loses time and power consumption.

Description

A kind of way to play for time and system of multi-stage pipeline parallel computation
Technical field
The present invention relates to data processing field, more particularly to a kind of buffering of multi-stage pipeline parallel computation Method and system.
Background technology
To each serial task, sequential processes, the processing of each task include traditional serial computing one by one There are respective processing module, task Tn processing elder generation in the processing of some subtasks, each subtask Each subtask processing module is sequentially undergone afterwards, just now to task after the processing of Tn subtasks at different levels Tn+1 processing.If task Tn processing arrival time point of the end time point earlier than task Tn+1, This serial computing method is feasible.
But if task Tn processing end time point is later than task Tn+1 arrival time point, this Kind serial computing method is infeasible, at this moment needs to cause task Tn processing still by parallel computation Can is not terminated when task Tn+1 arrives to Tn+1 processing, these subtask processing modules After the subtask of current task has been handled, the next stage subtask on streamline is transferred data to The subtask of processing module, then these next tasks of subtask processing module start to process.
For traditional streamline, streamline subtask processing modules at different levels have each independent independent digit According to space, subtask processing module is passed data by copying after corresponding subtask execution terminates Give streamline next stage subtask processing module.
Following shortcoming can so be caused:Data transfer is realized by copy between each subtask processing module, The unnecessary data copy time is taken, it is temporal when transmission data volume is larger between subtask Expense is obvious, have impact on the performance of program;Meanwhile perform that this data copy result in need not The power dissipation overhead wanted.
The content of the invention
The present invention provides a kind of way to play for time and system of multi-stage pipeline parallel computation, solves existing skill Lost time and power consumption by way of copy realizes data transfer between each subtask processing module in art The problem of.
The invention provides a kind of way to play for time of multi-stage pipeline parallel computation, including step:
Each task of streamline is divided into multistage subtask, the same one-level of the different task of streamline in advance Subtask is handled by same subtask processing module;
The independent data space fixed for each task setting of streamline, each independent data space has Fixed address;
The independent data space of the task is shared in the subtasks at different levels of the same task of streamline, after processing The subtask processing module of one-level subtask inherits the previous of same task by updating the address of pointer The data in the independent data space of level subtask, so as to realize data transfer.
Preferably, methods described also includes:
After each task of streamline is divided into multistage subtask in advance, for each of streamline Business, the execution duration to subtasks at different levels optimize so that the subtasks at different levels of the task of streamline The absolute value for performing the difference of duration is as minimum as possible.
Preferably, execution all streamlines of duration < of the first order subtask of each task of streamline The minimum generating period of task.
Preferably, the minimum series of the subtasks at different levels of the task of streamline is not less than current pipeline Task in maximum and the subtasks at different levels of execution duration of each task perform the average value of duration The smallest positive integral of ratio.
Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located The independent data space for managing the task of streamline forms ring-type independent data space.
Preferably, any layer task of streamline is divided into multiple next straton tasks according to demand.
A kind of buffer system of multi-stage pipeline parallel computation, including:
Task cutting module, for each task of streamline to be divided into multistage subtask, streamline in advance The same one-level subtask of different task handled by same subtask processing module;
Memory space setting module, for for streamline each task setting fixation independent data space, Each independent data space has fixed address;
The only of the task is shared in data transfer module, the subtasks at different levels for the same task of streamline Vertical data space, the subtask processing module of processing rear stage subtask by updating the address of pointer, The data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer.
Preferably, the system also includes:
Duration optimization module is performed, for being in advance divided into each task of streamline in task dividing die block After multistage subtask, for each task of streamline, progress during execution to subtasks at different levels Row optimization so that the subtasks at different levels of the task of streamline perform the absolute value of the difference of duration as far as possible most It is small.
Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located The independent data space for managing the task of streamline forms ring-type independent data space.
Preferably, the task cutting module is specifically used for according to demand by any layer task of streamline It is divided into multiple next straton tasks, the same one-level subtask of the same layer task of streamline is by same height Task processing module processing.
The way to play for time and system of a kind of multi-stage pipeline parallel computation provided by the invention, by advance will Each task of streamline is divided into multistage subtask, and the independent digit of each task setting fixation for streamline According to space, in actual use, the task is shared in the subtasks at different levels of the same task of streamline Independent data space, the subtask processing module of processing rear stage subtask by updating the address of pointer, The data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer. Due to the address of the pointer by updating subprocessing module, data transfer is realized, copy is avoided and wastes Time and power consumption.
Further, when the time span of each subtask execution is different, task T can be causedN One-level subtask pursuit task TN-xAfterbody subtask situation, so as to influence streamline appoint Be engaged in TNTo the task T of streamline during first order subtaskN-xAfterbody subtask wait.This hair The bright execution duration by subtasks at different levels is optimized so that the sons at different levels of the task of streamline are appointed The absolute value that business performs the difference of duration is as minimum as possible, can effectively solve the above problems.
Further, when the present invention is by the execution of the first order subtask of each task for limiting streamline The minimum generating period of the task of long all streamlines of < so that streamline can mutually tackle next in real time The processing of task (the first subtask), preferably to improve parallel speed.
Further, The present invention gives the series of optimization, it is ensured that and real-time parallel handles each task, Reduce to greatest extent because a certain subtask calculates overlong time and caused by calculate congestion and wait.
Further, it is annular memory space present invention defines memory space so that the present invention can be with Updated by simple pointer address, such as address+1 or -1 can obtain next pending son and appoint The address of the memory space of data needed for business, it is simple efficient and not error-prone.
Brief description of the drawings
, below will be to implementing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art The required accompanying drawing used is briefly described in example, it should be apparent that, drawings in the following description are only Some embodiments described in the present invention, for those of ordinary skill in the art, can also be according to these Accompanying drawing obtains other accompanying drawings.
Fig. 1 (a) is multi-stage pipeline serial computing logical schematic in the prior art;
Fig. 1 (b) is multi-stage pipeline parallel computation logical schematic in the prior art;
Fig. 2 (a) to Fig. 2 (f) is the logic that each subtask processing module handles subtask in the prior art Schematic diagram;
Fig. 3 is a kind of flow according to the way to play for time of multi-stage pipeline parallel computation provided by the invention Figure;
Fig. 4 is the knot according to the ring-type independent data space with 12 memory spaces provided by the invention Structure schematic diagram;
Fig. 5 is that the subtask for transferring data to next stage subtask by copy in the prior art is handled The logical schematic of module;
Fig. 6 (a) to Fig. 6 (f) is according to a kind of schematic diagram for determining region to be filled provided by the invention;
Fig. 7 is another for the way to play for time according to multi-stage pipeline parallel computation provided in an embodiment of the present invention Kind flow chart;
Fig. 8 (a) is that the subtasks at different levels of parallel computation in the prior art perform the schematic diagram of duration;
Fig. 8 (b) is to perform duration according to the subtasks at different levels of parallel computation provided in an embodiment of the present invention Schematic diagram;
Fig. 9 is to be occurred according to the task of each streamline during parallel computation provided in an embodiment of the present invention Time;
Figure 10 is one of the buffer system according to multi-stage pipeline parallel computation provided in an embodiment of the present invention Kind structural representation;
Figure 11 is another for the buffer system according to multi-stage pipeline parallel computation provided in an embodiment of the present invention A kind of structural representation.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein certainly Begin to same or similar label eventually to represent same or similar element or there is the member of same or like function Part.The embodiments described below with reference to the accompanying drawings are exemplary, is only used for explaining the present invention, and can not It is construed to limitation of the present invention.
To each serial task, sequential processes, the processing of each task include traditional serial computing one by one There is respective processing module the processing of some subtasks, each subtask.Below with by each of streamline Task illustrates exemplified by being divided into 6 grades of subtasks.As shown in Fig. 1 (a), a task TnProcessing it is first Subtask processing module S1, subtask processing module S2, subtask processing module are sequentially undergone afterwards S3, subtask processing module S4, subtask processing module S5, subtask processing module S6, in Tn Just now to task T after being handled by S1, S2, S3, S4, S5, S6n+1Handled.If appoint Be engaged in TnProcessing end time point earlier than task Tn+1Arrival time point, this serial computing method is Feasible.
If task TnProcessing end time point be later than task Tn+1Arrival time point, this serial meter Calculation method is infeasible, at this moment needs to cause task T by parallel computationnProcessing not yet terminate just Can be in task Tn+1To T during arrivaln+1Handled, as shown in Fig. 1 (b), subtask processing module S1, S2, S3, S4, S5, S6 concurrent working, these subtask processing modules are being handled as predecessor After the subtask of business, the next stage subtask processing module on streamline is transferred data to, then The subtask of these complete next tasks of subtask processing module start to process.Currently processed subtask is TN- (i-1), i, wherein N- (i-1) is current task number, and i is current subtask number, i=1,2,3 ..., N=1,2,3 ....
During pipeline parallel computing being shown such as Fig. 2 (a), the subtask of each subtask processing module processing, Subtask processing module S1 processing tasks TnSubtask Tn,1, subtask processing module S2 processing tasks Tn-1Subtask Tn-1,2, subtask processing module S3 processing tasks Tn-2Subtask Tn-2,3, subtask Processing module S4 processing tasks Tn-3Subtask Tn-3,4, subtask processing module S5 processing tasks Tn-4 Subtask Tn-4,5, subtask processing module S6 processing tasks Tn-5Subtask Tn-5,6
In the pipeline parallel computing as shown in Fig. 2 (b), subtask processing module S1 processing tasks Tn+1Son appoint Be engaged in Tn+1,1, subtask processing module S2 processing tasks TnSubtask Tn,2, subtask processing module S3 Processing task Tn-1Subtask Tn-1,3, subtask processing module S4 processing tasks Tn-2Subtask Tn-2,4, Subtask processing module S5 processing tasks Tn-3Subtask Tn-3,5, subtask processing module S6 processing times Be engaged in Tn-4Subtask Tn-4,6
In the pipeline parallel computing as shown in Fig. 2 (c), subtask processing module S1 processing tasks Tn+2Son appoint Be engaged in Tn+2,1, subtask processing module S2 processing tasks Tn+1Subtask Tn+1,2, subtask processing module S3 processing tasks TnSubtask Tn,3, subtask processing module S4 processing tasks Tn-1Subtask Tn-1,4, Subtask processing module S5 processing tasks Tn-2Subtask Tn-2,5, subtask processing module S6 processing times Be engaged in Tn-3Subtask Tn-3,6
In the pipeline parallel computing as shown in Fig. 2 (d), subtask processing module S1 processing tasks Tn+3Son appoint Be engaged in Tn+3,1, subtask processing module S2 processing tasks Tn+2Subtask Tn+2,2, subtask processing module S3 processing tasks Tn+1Subtask Tn+1,3, subtask processing module S4 processing tasks TnSubtask Tn,4, Subtask processing module S5 processing tasks Tn-1Subtask Tn-1,5, subtask processing module S6 processing times Be engaged in Tn-2Subtask Tn-2,6
In the pipeline parallel computing as shown in Fig. 2 (e), subtask processing module S1 processing tasks Tn+4Son appoint Be engaged in Tn+4,1, subtask processing module S2 processing tasks Tn+3Subtask Tn+3,2, subtask processing module S3 processing tasks Tn+2Subtask Tn+2,3, subtask processing module S4 processing tasks Tn+1Subtask Tn+1,4, subtask processing module S5 processing tasks TnSubtask Tn,5, at the processing module S6 of subtask Reason task Tn-1Subtask Tn-1,6
In the pipeline parallel computing as shown in Fig. 2 (f), subtask processing module S1 processing tasks Tn+5Son Task Tn+5,1, subtask processing module S2 processing tasks Tn+4Subtask Tn+4,2, subtask processing mould Block S3 processing tasks Tn+3Subtask Tn+3,3, subtask processing module S4 processing tasks Tn+2Son appoint Be engaged in Tn+2,4, subtask processing module S5 processing tasks Tn+1Subtask Tn+1,5, subtask processing module S6 processing tasks TnSubtask Tn,6
For traditional streamline, streamline subtask processing modules at different levels have each independent independent data empty Between, subtask processing module is transferred data to next after corresponding subtask execution terminates by copy Level subtask processing module.As shown in figure 3, there is fixed independent data each subtask of pipeline parallel computing Space, data transfer is realized by copy between subtask.Task TiSubtask Ti,1Fixation independent digit It is MS1, task T according to spaceiSubtask Ti,2Fixation independent data space be MS2, task Ti's Subtask Ti,3Fixation independent data space be MS3, task TiSubtask Ti,4Fixation independent data Space is MS4, task TiSubtask Ti,5Fixation independent data space be MS5, task TiSon Task Ti,6Fixation independent data space be MS6, wherein i=0,1,2 ....Its shortcoming is subtask Between by copy realize that data transfer takes the unnecessary data copy time, number is transmitted between subtask During according to measuring larger, temporal expense is obvious, have impact on the performance of program, while performs this data Copy result in unnecessary power dissipation overhead.
The way to play for time and system of a kind of multi-stage pipeline parallel computation provided by the invention, by advance will Each task of streamline is divided into multistage subtask, is the independent data that each task setting of streamline is fixed The independent data space of the task is shared in space, the subtasks at different levels of the same task of streamline, processing The subtask processing module of rear stage subtask is by updating the address of pointer, before inheriting same task The data in the independent data space of one-level subtask, so as to realize data transfer.Due to streamline not Same one-level subtask with task is handled by same subtask processing module, each task tool of streamline There is fixed independent data space, and each independent data space has fixed address so that each Subtask processing module can be by updating the address of the pointer of memory space pointed by the submodule.Allow Inherit the number in the independent data space of the previous stage subtask of same task in pending rear stage subtask According to so as to realize data transfer.Data transfer can be achieved without copy in the process, can effectively be lifted Buffering efficiency simultaneously reduces power consumption.
In order to be better understood from technical scheme and technique effect, below with reference to flow chart and Specific embodiment is described in detail, and flow chart is as shown in Figure 3.
Step S01, each task of streamline is divided into multistage subtask, the different task of streamline in advance Same one-level subtask handled by same subtask processing module.
In this embodiment, each task of streamline is divided into how many grades of subtasks, depending on real needs, The maximum duration ttaskmax of the task of a streamline is handled, during the minimum that two neighboring task arrives Between be spaced Ttaskperiod_min, then the calculating time of each subtask should be less than Ttaskperiod_min, can ensure that real-time parallel handles each task, reduce to greatest extent because A certain subtask calculates overlong time and causes to calculate congestion and wait.Therefore, can be according to each Each task of streamline is divided into multistage subtask by the time needed for the calculating of subtask.Streamline is not Same one-level subtask with task is handled by same subtask processing module.
Further, any layer task of streamline is divided into multiple next straton tasks according to demand. That is, for the calculating inside subtask, the computational methods of parallel pipelining process can be also used, so as to Reduce subtask and calculate the response time.Correspondingly, the same one-level subtask of the same layer task of streamline Handled by same subtask processing module.
Step S02, for the independent data space of each task setting fixation of streamline, each independent data Space has fixed address.
In the present embodiment, the number in independent data space can be equal to the task number of streamline, solely The number of vertical data space can also be more than the task number of streamline.Specifically, current subtask Ti,j, Corresponding memory space is MSk, wherein, when i%N is not 0, k=i%N;When i%N is 0, K=N, N are independent data space number, and wherein % represents remainder.The space in independent data space is big It is small should be equal, i.e., the byte number of memory space should be equal.Specifically, when the task of streamline is 6 When individual, the number in independent data space can be no less than 6, below with 12 independent data spaces Exemplified by illustrate, independent data space includes:MS01、MS02、MS03、MS04、MS05、 MS06、MS07、MS08、MS09、MS10、MS11、MS12。
Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located The independent data space for managing the task of streamline forms ring-type independent data space.This can so be caused Invention can be updated by simple pointer address, such as address+1 or -1 can obtain next treat The address of the memory space of data needed for subtask is handled, it is simple efficient and not error-prone.
In a specific embodiment, each independent data Simulation spatial service is in a task, adjacent The independent data space of business is logically adjacent, MS01, MS02, MS03, MS04, MS05, It is empty that MS06, MS07, MS08, MS09, MS10, MS11, MS12 form ring-type independent data Between, as shown in Figure 4.
The independent data sky of the task is shared in step S03, the subtasks at different levels of the same task of streamline Between, the subtask processing module of processing rear stage subtask is inherited same by updating the address of pointer The data in the independent data space of the previous stage subtask of task, so as to realize data transfer.
In the present embodiment, the subtask processing module of rear stage subtask is handled by updating pointer Address, the data in the independent data space of the previous stage subtask of same task are inherited, so as to realize number According to transmission.As shown in figure 5, to transfer data to next stage subtask by copy in the prior art Subtask processing module logical schematic.Benefit of the invention is that copy is avoided between subtask Process, the pointer address of the memory space by updating each subtask processing module, it is possible to realize number According to transmission, thus the unnecessary data copy time is eliminated, improve the performance of program, exempt from simultaneously Unnecessary power dissipation overhead is removed.
In a specific embodiment, shown in independent data space such as Fig. 6 (a) corresponding to Fig. 2 (a), task Tn Independent data space be MS01, the task T of streamlinenIts subtask Tn,1Subtask processing module Data storage handled by S1 is in MS01;Task Tn-1Independent data space be MS12, streamline Task Tn-1Its subtask Tn-1,2Subtask processing module S2 handled by data storage in MS12; Task Tn-2Independent data space be MS11, the task T of streamlinen-2Its subtask Tn-2,3Subtask Data storage handled by processing module S3 is in MS11;Task Tn-3Independent data space be MS10, The task T of streamlinen-3Its subtask Tn-3,4Subtask processing module S4 handled by data storage exist In MS10;Task Tn-4Independent data space be MS09, the task T of streamlinen-4Its subtask Tn-4,5 Subtask processing module S5 handled by data storage in MS09;Task Tn-5Independent data it is empty Between be MS08, the task T of streamlinen-5Its subtask Tn-5,6Subtask processing module S6 handled by Data storage is in MS08.
Shown in independent data space such as Fig. 6 (b) corresponding to Fig. 2 (b), task Tn+1Independent data space be MS02, the task T of streamlinen+1Its subtask Tn+1,1Subtask processing module S1 handled by data It is stored in MS02;Task TnIndependent data space be MS01, the task T of streamlinenIts son is appointed Be engaged in Tn,2Subtask processing module S2 handled by data storage in MS01;Task Tn-1Independence Data space is MS12, the task T of streamlinen-1Subtask Tn-1,3Subtask processing module S3 institutes The data storage of processing is in MS12;Task Tn-2Independent data space be MS11, streamline is appointed Be engaged in Tn-2Its subtask Tn-2,4Subtask processing module S4 handled by data storage in MS11;Appoint Be engaged in Tn-3Independent data space be MS10, the task T of streamlinen-3Its subtask Tn-3,5Subtask at The data storage handled by module S5 is managed in MS10;Task Tn-4Independent data space be MS09, The task T of streamlinen-4Its subtask Tn-4,6Subtask processing module S6 handled by data storage exist In MS09.
Shown in independent data space such as Fig. 6 (c) corresponding to Fig. 2 (c), task Tn+2Independent data space be MS03, the task T of streamlinen+2Its subtask Tn+2,1Subtask processing module S1 handled by data It is stored in MS03;Task Tn+1Independent data space be MS02, the task T of streamlinen+1Its son Task Tn+1,2Subtask processing module S2 handled by data storage in MS02;Task TnIt is only Vertical data space is MS01, the task T of streamlinenSubtask Tn,3Subtask processing module S3 institutes The data storage of processing is in MS01;Task Tn-1Independent data space be MS12 streamlines task Tn-1Its subtask Tn-1,4Subtask processing module S4 handled by data storage in MS12;Task Tn-2Independent data space be MS11, the task T of streamlinen-2Its subtask Tn-2,5Subtask processing Data storage handled by module S5 is in MS11;Task Tn-3Independent data space be MS10, stream The task T of waterlinen-3Its subtask Tn-3,6Subtask processing module S6 handled by data storage exist In MS10.
Shown in independent data space such as Fig. 6 (d) corresponding to Fig. 2 (d), task Tn+3Independent data space be MS04, the task T of streamlinen+3Its subtask Tn+3,1Subtask processing module S1 handled by data It is stored in MS04;Task Tn+2Independent data space be MS03, the task T of streamlinen+2Its son Task Tn+2,3Subtask processing module S2 handled by data storage in MS03;Task Tn+1It is only Vertical data space is MS02, the task T of streamlinen+1Subtask Tn+1,3Subtask processing module S3 Handled data storage is in MS02;Task TnIndependent data space for MS01 streamlines appoint Be engaged in TnIts subtask Tn,4Subtask processing module S4 handled by data storage in MS01;Task Tn-1Independent data space be MS12, the task T of streamlinen-1Its subtask Tn-1,5Subtask processing Data storage handled by module S5 is in MS12;Task Tn-2Independent data space be MS11, stream The task T of waterlinen-2Its subtask Tn-2,6Subtask processing module S6 handled by data storage exist In MS11.
Shown in independent data space such as Fig. 6 (e) corresponding to Fig. 2 (e), task Tn+4Independent data space be MS05, the task T of streamlinen+4Its subtask Tn+4,1Subtask processing module S1 handled by data It is stored in MS05;Task Tn+3Independent data space be MS04, the task T of streamlinen+3Its son Task Tn+3,3Subtask processing module S2 handled by data storage in MS04;Task Tn+2It is only Vertical data space is MS03, the task T of streamlinen+2Subtask Tn+2,3Subtask processing module S3 Handled data storage is in MS03;Task Tn+1Independent data space for MS02 streamlines appoint Be engaged in Tn+1Its subtask Tn+1,4Subtask processing module S4 handled by data storage in MS02;Appoint Be engaged in TnIndependent data space be MS01, the task T of streamlinenIts subtask Tn,5Subtask processing Data storage handled by module S5 is in MS01;Task Tn-1Independent data space be MS12, stream The task T of waterlinen-1Its subtask Tn-1,6Subtask processing module S6 handled by data storage exist In MS12.
Shown in independent data space such as Fig. 6 (f) corresponding to Fig. 2 (f), task Tn+5Independent data space be MS06, the task T of streamlinen+5Its subtask Tn+5,1Subtask processing module S1 handled by data It is stored in MS06;Task Tn+4Independent data space be MS05, the task T of streamlinen+4Its son Task Tn+4,3Subtask processing module S2 handled by data storage in MS05;Task Tn+3It is only Vertical data space is MS04, the task T of streamlinen+3Subtask Tn+3,3Subtask processing module S3 Handled data storage is in MS04;Task Tn+2Independent data space for MS03 streamlines appoint Be engaged in Tn+2Its subtask Tn+2,4Subtask processing module S4 handled by data storage in MS03;Appoint Be engaged in Tn+1Independent data space be MS02, the task T of streamlinen+1Its subtask Tn+1,5Subtask Data storage handled by processing module S5 is in MS02;Task TnIndependent data space be MS01, The task T of streamlinenIts subtask Tn,6Subtask processing module S6 handled by data storage exist In MS01.
The way to play for time of multi-stage pipeline parallel computation provided by the invention, in advance by each task of streamline It is divided into multistage subtask, and the independent data space of each task setting fixation for streamline, actually makes During, the independent data space of the task is shared in the subtasks at different levels of the same task of streamline, The subtask processing module of processing rear stage subtask inherits same task by updating the address of pointer Previous stage subtask independent data space data, so as to realize data transfer.Due to by more The address of the pointer of new subprocessing module, realizes data transfer, avoids copy and loses time and power consumption.
As shown in fig. 7, the way to play for time for multi-stage pipeline parallel computation provided in an embodiment of the present invention Another flow chart.This method includes:
Step S71, each task of streamline is divided into multistage subtask, the different task of streamline in advance Same one-level subtask handled by same subtask processing module;
Step S72, for each task of streamline, the execution duration to subtasks at different levels carries out excellent Change so that the absolute value that the subtasks at different levels of the task of streamline perform the difference of duration is as minimum as possible.
In the present embodiment, streamline calculating subtask at different levels is optimized, so that streamline is at different levels The absolute value for calculating the subtask time difference is as minimum as possible.
As Fig. 8 (a) shows the run time length of first order subtask (subtask 1, other to analogize) Tsub1, the run time length tsub2 of subtask 2, subtask 3 run time length tsub3, The run time length tsub4 of subtask 4, the run time length tsub5 of subtask 5, subtask 6 Run time length tsub6, due to each subtask perform time span differ, task can be caused TN subtasks 1 pursue the situation of task TN-x subtasks 6, so as to influence streamline first order flowing water During the task TN subtasks 1 of line to the task TN-x subtasks 6 of the level production line of streamline the 6th etc. Treat, played so as to limit the performance of streamline, also the data spaces at different levels to streamline design Improve difficulty.
Therefore, the subtask at different levels to streamline optimizes adjustment so that streamline calculating at different levels The absolute value of subtask time difference is as minimum as possible, for example, setting the difference of the execution duration of each subtask It is worth within the specific limits, it is of course also possible to the execution duration for setting each subtask is identical, such as: Tsub1opt=tsub2opt=tsub3opt=tsub4opt=tsub5opt=tsub6opt=ts ubmean= (tsub1+tsub2+tsub3+tsub4+tsub5+tsub6)/6, as shown in Fig. 8 (b).Need to illustrate , the execution total duration of each subtask is not quite similar, and locking-mutual exclusion is used in concurrent program Mechanism with avoid duration it is inconsistent caused by conflict.
Further, the calculating subtask time of the streamline first order is optimized, so that it is less than task Minimum generating period.So, streamline can be enabled mutually to tackle next task in real time, and (the first son is appointed Business) processing, preferably to improve parallel speed.As shown in figure 9, task TN-1Going out current moment is tN-1, task TNIt is t to go out current momentN, task TN+1It is t to go out current momentN+1, task TN-1With task TN Time interval be tN–tN-1, task TNWith task TN+1Time interval be tN+1–tN, therefore task is most Small generating period is Ttaskperiod=min (tN–tN-1,tN+1–tN).Streamline subtask at different levels performs the time tsubmean<Ttaskperiod, the streamline execution time at different levels to each task be ttask1, ttask2, Ttask3 ..., its maximum is ttaskmax=max (ttask1, ttask2, ttask3 ...).Preferably, flowing water The minimum series Npipelines of the task of line is the smallest positive integral not less than ttaskmax/tsubmean.
Step S73, for the independent data space of each task setting fixation of streamline, each independent data Space has fixed address.
The independent data sky of the task is shared in step S74, the subtasks at different levels of the same task of streamline Between, the subtask processing module of processing rear stage subtask is inherited same by updating the address of pointer The data in the independent data space of the previous stage subtask of task, so as to realize data transfer.
When the time span that each subtask performs is different, task T can be causedNFirst order subtask Pursuit task TN-xAfterbody subtask situation, so as to cause pipeline processes task TNFirst To pipeline processes task T during level subtaskN-xAfterbody subtask wait.The present invention passes through Execution duration to subtasks at different levels optimizes so that the subtasks at different levels of the task of streamline perform The absolute value of the difference of duration is as minimum as possible, can effectively solve the above problems.
Correspondingly, present invention also offers a kind of buffer system of multi-stage pipeline parallel computation, such as scheme Shown in 10, including:
Task cutting module 101, for each task of streamline to be divided into multistage subtask in advance, stream The same one-level subtask of the different task of waterline is handled by same subtask processing module;
Memory space setting module 102, the independent data for each task setting fixation for streamline Space, each independent data space have fixed address;
The task is shared in data transfer module 103, the subtasks at different levels for the same task of streamline Independent data space, the subtask processing module of processing rear stage subtask is by updating the ground of pointer Location, the data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data Transmit.
As shown in figure 11, for multi-stage pipeline parallel computation provided by the invention buffer system it is another Kind structural representation, the system also include:Duration optimization module 114 is performed, for being cut in task After each task of streamline is divided into multistage subtask by sub-module 101 in advance, for the every of streamline One task, the execution duration to subtasks at different levels optimize so that the task of streamline it is at different levels The absolute value that subtask performs the difference of duration is as minimum as possible.It may insure that real-time parallel handles each Task, reduce to greatest extent because a certain subtask calculates overlong time and caused by calculate congestion and wait.
Preferably, the independent data space of the task of adjacent streamline is logically adjacent, needs to be located The independent data space for managing the task of streamline forms ring-type independent data space.So that the present invention can be with Updated by simple pointer address, such as address+1 or -1 can obtain next pending son and appoint The address of the memory space of data needed for business, it is simple efficient and not error-prone.
In addition, when needing the task by streamline to be divided into multilayer subtask, the task cutting module 101 are specifically used for any layer task of streamline is divided into multiple next straton tasks according to demand, flow The same one-level subtask of the same layer task of waterline is handled by same subtask processing module.
Certainly, the system can further include memory module (not shown), and the memory module can be with , certainly can be with store tasks data, such as first order for storing series, tasks carrying duration etc. The output data of task.So, pending domain progress computer is automatically processed with facilitating, and can Output result relevant information of each task to store streamline etc..
The buffering method, system of multi-stage pipeline parallel computation provided in an embodiment of the present invention, is cut by task Each task of streamline is divided into multistage subtask by sub-module 101 in advance, is then set by memory space The independent data space that cover half block 102 is fixed for each task setting of streamline, in actual use, Share the independent data space of the task, data transfer mould in the subtasks at different levels of the same task of streamline Block 103 is used to handle the address of the subtask processing module of rear stage subtask by updating pointer, after The data in the independent data space of the previous stage subtask of same task are held, so as to realize data transfer. Due to the address of the pointer by updating subprocessing module, data transfer is realized, copy is avoided and wastes Time and power consumption.
Each embodiment in this specification is described by the way of progressive, phase between each embodiment With similar part mutually referring to.For components of system as directed, because it is according to this The system that the method that invention provides is formed, so describing fairly simple, related part is referring to method portion Defend oneself bright.Embodiments described above is only schematical, and those of ordinary skill in the art exist In the case of not paying creative work, you can to understand and implement.
Although the present invention is disclosed as above with preferred embodiment, but is not limited to the present invention.Appoint What those skilled in the art, without departing from the scope of the technical proposal of the invention, can profit Many possible variations are made to technical solution of the present invention with the methods and technical content of the disclosure above and are repaiied Decorations, or it is revised as the equivalent embodiment of equivalent variations.Therefore, it is every without departing from technical solution of the present invention Content, the technical spirit according to the present invention is to any simple modification made for any of the above embodiments, equivalent Change and modification, still fall within technical solution of the present invention protection in the range of.

Claims (10)

1. a kind of way to play for time of multi-stage pipeline parallel computation, it is characterised in that including step:
Each task of streamline is divided into multistage subtask, the same one-level of the different task of streamline in advance Subtask is handled by same subtask processing module;
The independent data space fixed for each task setting of streamline, each independent data space has Fixed address;
The independent data space of the task is shared in the subtasks at different levels of the same task of streamline, after processing The subtask processing module of one-level subtask inherits the previous of same task by updating the address of pointer The data in the independent data space of level subtask, so as to realize data transfer.
2. according to the method for claim 1, it is characterised in that methods described also includes:
After each task of streamline is divided into multistage subtask in advance, for each of streamline Business, the execution duration to subtasks at different levels optimize so that the subtasks at different levels of the task of streamline The absolute value for performing the difference of duration is as minimum as possible.
3. according to the method for claim 2, it is characterised in that the first of each task of streamline The minimum generating period of the task of execution all streamlines of duration < of level subtask.
4. according to the method for claim 3, it is characterised in that the sons at different levels of the task of streamline The minimum series of task is not less than the maximum of the execution duration of each task in the task of current pipeline Value performs the smallest positive integral of the ratio of the average value of duration with subtasks at different levels.
5. according to the method described in any one of Claims 1-4, it is characterised in that adjacent streamline Task independent data space it is logically adjacent, the task of streamline to be handled independent digit Ring-type independent data space is formed according to space.
6. according to the method described in any one of Claims 1-4, it is characterised in that according to demand will Any layer task of streamline is divided into multiple next straton tasks.
A kind of 7. buffer system of multi-stage pipeline parallel computation, it is characterised in that including:
Task cutting module, for each task of streamline to be divided into multistage subtask, streamline in advance The same one-level subtask of different task handled by same subtask processing module;
Memory space setting module, for for streamline each task setting fixation independent data space, Each independent data space has fixed address;
The only of the task is shared in data transfer module, the subtasks at different levels for the same task of streamline Vertical data space, the subtask processing module of processing rear stage subtask by updating the address of pointer, The data in the independent data space of the previous stage subtask of same task are inherited, so as to realize data transfer.
8. processing system according to claim 7, it is characterised in that the system also includes:
Duration optimization module is performed, for being in advance divided into each task of streamline in task dividing die block After multistage subtask, for each task of streamline, progress during execution to subtasks at different levels Row optimization so that the subtasks at different levels of the task of streamline perform the absolute value of the difference of duration as far as possible most It is small.
9. the processing system according to claim 7 or 8, it is characterised in that adjacent streamline The independent data space of task is logically adjacent, the task of streamline to be handled independent data Space forms ring-type independent data space.
10. the processing system according to claim 7 or 8, it is characterised in that the task is cut Sub-module is specifically used for any layer task of streamline is divided into multiple next straton tasks according to demand, The same one-level subtask of the same layer task of streamline is handled by same subtask processing module.
CN201610331646.7A 2016-05-18 2016-05-18 A kind of way to play for time and system of multi-stage pipeline parallel computation Pending CN107402805A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610331646.7A CN107402805A (en) 2016-05-18 2016-05-18 A kind of way to play for time and system of multi-stage pipeline parallel computation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610331646.7A CN107402805A (en) 2016-05-18 2016-05-18 A kind of way to play for time and system of multi-stage pipeline parallel computation

Publications (1)

Publication Number Publication Date
CN107402805A true CN107402805A (en) 2017-11-28

Family

ID=60394358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610331646.7A Pending CN107402805A (en) 2016-05-18 2016-05-18 A kind of way to play for time and system of multi-stage pipeline parallel computation

Country Status (1)

Country Link
CN (1) CN107402805A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271245A (en) * 2018-09-13 2019-01-25 腾讯科技(深圳)有限公司 A kind of control method and device of block processes task

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078692A1 (en) * 2009-09-25 2011-03-31 Nickolls John R Coalescing memory barrier operations across multiple parallel threads
CN102402493A (en) * 2010-09-07 2012-04-04 国际商业机器公司 System and method for a hierarchical buffer system for a shared data bus
US20130054938A1 (en) * 2007-04-20 2013-02-28 The Regents Of The University Of Colorado Efficient pipeline parallelism using frame shared memory
CN104753533A (en) * 2013-12-26 2015-07-01 中国科学院电子学研究所 Staged shared double-channel assembly line type analog to digital converter

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130054938A1 (en) * 2007-04-20 2013-02-28 The Regents Of The University Of Colorado Efficient pipeline parallelism using frame shared memory
US20110078692A1 (en) * 2009-09-25 2011-03-31 Nickolls John R Coalescing memory barrier operations across multiple parallel threads
CN102402493A (en) * 2010-09-07 2012-04-04 国际商业机器公司 System and method for a hierarchical buffer system for a shared data bus
CN104753533A (en) * 2013-12-26 2015-07-01 中国科学院电子学研究所 Staged shared double-channel assembly line type analog to digital converter

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QJZLDO10: ""第3章流水线技术"", 《百度文库》 *
梁强 等: ""仿真软件多进程间数据交互实现研究"", 《系统仿真学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271245A (en) * 2018-09-13 2019-01-25 腾讯科技(深圳)有限公司 A kind of control method and device of block processes task
CN110457123A (en) * 2018-09-13 2019-11-15 腾讯科技(深圳)有限公司 A kind of control method and device of block processes task
CN109271245B (en) * 2018-09-13 2021-04-27 腾讯科技(深圳)有限公司 Control method and device for block processing task
CN110457123B (en) * 2018-09-13 2021-06-15 腾讯科技(深圳)有限公司 Control method and device for block processing task

Similar Documents

Publication Publication Date Title
US10884795B2 (en) Dynamic accelerator scheduling and grouping for deep learning jobs in a computing cluster
US9584430B2 (en) Traffic scheduling device
Wang et al. Load balancing task scheduling based on genetic algorithm in cloud computing
WO2017185394A1 (en) Device and method for performing reversetraining of fully connected layers of neural network
CN105450618B (en) A kind of operation method and its system of API server processing big data
CN103309738B (en) User job dispatching method and device
JP2017050001A (en) System and method for use in efficient neural network deployment
US20150199216A1 (en) Scheduling and execution of tasks
CN113051053B (en) Heterogeneous resource scheduling method, heterogeneous resource scheduling device, heterogeneous resource scheduling equipment and computer readable storage medium
CN110889510B (en) Online scheduling method and device for distributed machine learning task
US11663461B2 (en) Instruction distribution in an array of neural network cores
Li et al. Leveraging endpoint flexibility when scheduling coflows across geo-distributed datacenters
Yin et al. Two-agent single-machine scheduling with unrestricted due date assignment
CN103914556A (en) Large-scale graph data processing method
CN106550042B (en) Multithreading method for down loading and device and calculating equipment
Xu et al. An improved binary PSO-based task scheduling algorithm in green cloud computing
CN104125166A (en) Queue scheduling method and computing system
WO2022062648A1 (en) Automatic driving simulation task scheduling method and apparatus, device, and readable medium
WO2017185248A1 (en) Apparatus and method for performing auto-learning operation of artificial neural network
CN109710372A (en) A kind of computation-intensive cloud workflow schedule method based on cat owl searching algorithm
CN107402805A (en) A kind of way to play for time and system of multi-stage pipeline parallel computation
CN112862083B (en) Deep neural network inference method and device in edge environment
CN110958192B (en) Virtual data center resource allocation system and method based on virtual switch
KR102332523B1 (en) Apparatus and method for execution processing
CN109976873A (en) The scheduling scheme acquisition methods and dispatching method of containerization distributed computing framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171128

RJ01 Rejection of invention patent application after publication