EP4298513A1 - Verfahren zur software-task-planung auf zumindest einem heterogenen rechnersystem - Google Patents
Verfahren zur software-task-planung auf zumindest einem heterogenen rechnersystemInfo
- Publication number
- EP4298513A1 EP4298513A1 EP21719102.2A EP21719102A EP4298513A1 EP 4298513 A1 EP4298513 A1 EP 4298513A1 EP 21719102 A EP21719102 A EP 21719102A EP 4298513 A1 EP4298513 A1 EP 4298513A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- tasks
- data
- processing
- scheduler
- processing system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 190
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000004458 analytical method Methods 0.000 claims abstract description 40
- 230000000116 mitigating effect Effects 0.000 claims abstract description 28
- 238000005457 optimization Methods 0.000 claims abstract description 8
- 239000000872 buffer Substances 0.000 claims description 45
- 230000015654 memory Effects 0.000 claims description 43
- 230000008569 process Effects 0.000 claims description 14
- 238000012546 transfer Methods 0.000 claims description 13
- 125000004122 cyclic group Chemical group 0.000 claims description 7
- 238000004088 simulation Methods 0.000 claims description 4
- 230000006399 behavior Effects 0.000 claims description 3
- 230000008901 benefit Effects 0.000 description 6
- 230000002093 peripheral effect Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000009434 installation Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 230000001960 triggered effect Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000005265 energy consumption Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 241000238876 Acari Species 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002485 combustion reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000017525 heat dissipation Effects 0.000 description 1
- 230000020169 heat generation Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000013021 overheating Methods 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
Definitions
- the invention is concerned with method for scheduling software tasks on at least one processing system.
- Each processing system is situated in a respective vehicle and the tasks are executed in different driving situations, such that their behavior with regard to at least one predefined performance attribute may vary.
- Each processing system executes its own instances of the tasks.
- the invention is also concerned with control framework for at least one vehicle for executing the tasks according to a scheduling plan.
- an electronic control unit or a network of several electronic control units may constitute a so-called heterogenous processing system of the vehicle, i.e. a processing system that comprises several different processing units of different technology.
- a processing system may comprise at least one CPU (central processing unit) with one or more processing kernels and/or at least one GPU (graphical processing unit) and/or one or more TPUs (tensor processing units) and/or DLAs (deep learning accelerators) and/or at least one DMA controller (DMA - direct memory access) and/or a NIC (network interface controller) and/or a RAM memory controller (RAM - random access memory), just to name examples.
- CPU central processing unit
- TPUs tensor processing units
- DLAs deep learning accelerators
- DMA controller DMA - direct memory access
- NIC network interface controller
- RAM memory controller RAM - random access memory
- the processing units may run autonomously or asynchronously, that is the processing units may perform or execute their respective task in parallel or at the same time. However, some or all of the tasks may belong to one specific application program of the processing system, i.e. these task in combination fulfill or perform a specific function as defined by the application program.
- the tasks interact in that the output data of one task constitute the input data of another task.
- the tasks are concatenated and their execution needs to be coordinated such that the described concatenation of the tasks is obtained in that one task produces its output data fast enough to supply the next task in the concatenation with input data.
- a task should not produce its output data faster than they can be taken in by the next task.
- the order in which the tasks are triggered or executed on the respective processing units of a processing system can be coordinated or defined by a scheduling plan that associates the tasks to the available processing units and defines the condition for triggering the tasks or executing the tasks (i.e. triggering a task whenever new input data is available in a buffer).
- the data stream may provide a continuous flow of stream data, as can be the case for processing camera images of a video camera for surveilling or observing the environment of the vehicle, or processing sensor data for controlling an engine of a vehicle, both of which are controlling application programs that may last or continue throughout a whole driving phase of a vehicle.
- a processing system may process the data stream in chunks or frames, that is a subset of the stream data is received and processed and forwarded at a time wherein the throughput or processing speed or data rate of this processing of chunks or frames must meet real-time conditions or online-conditions, that is the rate of processing the stream data must be the same or larger than the data rate of the data stream itself in order to prevent an overrun.
- the tasks are executed repeatedly in a cyclic execution (e.g. one cycle per frame) which introduces the additional complexity that a task that produces output data for consumption by the next task as input data, should not override its previous output data of the previous cycle, if the next task has not yet read in or accepted the previous output data.
- Document WO 2009 / 029549 A2 describes a method for performance management of a computer system. The method assigns a specific time slot to individual processing tasks executed by a processing unit. At the end of each time slot, the next task is assigned to that processing unit for execution.
- tasks may be interrupted while processing stream data, if they take longer than originally planned. This may cause have an unforeseen side effect on the coordination of the tasks.
- Document WO 2014 / 207759 A2 describes a scheduler that may schedule the same task on one out of several different processing units depending on a respective constrain of a processing unit, for example, a constrain regarding band width, energy, computation capability.
- the method is applied to a network of computing units that may comprise smartphones and other so- called edge devices. Instead of processing a data stream, this method is used for processing data that may be processed independently on one of the edge device independently. The processed data may then be re-collected by a master node for combining them resulting in an overall processing result.
- the invention provides a method for scheduling software tasks on at least one processing system of a given common system type, wherein each processing system is situated in a respective vehicle and the tasks are executed in different or varying driving situations.
- the tasks may be executed on a respective processing unit.
- Each processing system may have several processing units of different technology or type.
- Such a processing unit can be, for example, a CPU kernel, an hardware encoder, a hardware decoder, a memory storage controller, GPU-x, DLA-x, NIC (network interface controller) or one of the already described exemplary processing units, just to name examples.
- a processing unit is a hardware module or an integrated circuit that may perform or execute the task asynchronously or independently from the other processing units.
- a task can be, for example, a threat or a job for a DMA transfer or a matrix operation or an SIMD-instruction (single instruction multiple data), just to name examples.
- Data transfer (DMA / Interconnects) and also setup/context-switches of said processing units add further costs on top of executing task. It can also be considered part of the task execution time. It's not so much part of the task but rather something the scheduler must also consider for the sake of deterministic execution on such a processing unit.
- Execution time of an individual task would vary more when setup/teardown are considered to be part of it, because that depends on what other tasks are running simultaneously. That is something the scheduler might want to measure independently.
- the method considers one single processing system or several processing systems that each are of the same common system type.
- the system type may be defined as the specific combination of processing units built into that processing system.
- the respective processing system is preferably a heterogenous system as has already been described.
- the tasks may belong to one software application program or several software application programs.
- An example for such a software application program can be, for example, an object recognition based on an image data stream of a camera or an engine control for an combustion engine or an electric engine based on sensor data of at least one sensor of the engine.
- a scheduler determines a concatenation of the tasks on the basis of task data, wherein
- the task data at least describe respective input data needed by some or all of the tasks for being executed and respective output data produced by some or all of the tasks when executed, wherein
- the tasks are concatenated in that the output data of at least some of the tasks correspond to the input data of at least one respective other task.
- the task data may be provided with each task or each software application program.
- the task data may be provided by a developer of the respective software application program.
- the task data may be part of the source code and/or the binary code of the software application program.
- the task data may provide a data type and/or a buffer size of the input data and/or output data as they may be processed by the tasks in one execution cycle.
- the scheduler determines the scheduling plan that allocates the tasks to processing units of the respective system for executing the tasks, wherein the allocation corresponds to the concatenation of the tasks.
- the first version or an initial version of the scheduling plan can be derived on the basis of a method according to the prior art. Flowever, in order to ensure that the scheduling plan runs the tasks such that they provide their output data reliably in different or changing driving situations, the scheduling plan is provided to the at least one processing system. Each processing system (i.e. each vehicle) then executes a respective instance of the tasks according to the scheduling plan. From the at least one processing system respective analysis data are received. These analysis data describe a respective performance value of at least one predefined performance attribute (e.g.
- a respective performance value for at least one performance attribute is measured or determined for the respective tasks.
- a performance attribute can be, for example, an execution time or duration for executing the task and/or an energy consumption.
- the analysis data describe how much of the respective performance attribute is required by the respective task, for example, a performance attribute of time duration or power consumption or access collisions when two tasks try to access the same resource at the same time.
- the scheduler then generates an update of the scheduling plan by applying at least one mitigation measure that adapts the allocation of the tasks to the processing units in dependence on the analysis data, wherein the respective mitigation measure is applied if the at least one performance attribute is improved according to a predefined optimization criterion as compared to the received analysis data. Therefore, on the basis of the analysis data, it is observed or monitored if or how well the scheduling plan fits a performance requirement or a respective threshold for the performance value of the at least one predefined performance attribute for the one driving situation or the several driving situations that have been experienced by the respective processing system while the scheduling plan was in use.
- the respective mitigation measure is applied to the scheduling data of the scheduling plan such that when the scheduling plan is updated and used in the respective processing system, for a future execution of the tasks in the respective processing system, an improvement with regard to the at least one predefined performance attribute is achieved.
- the method provides the advantage that the scheduling plan is adapted or improved while the at least one vehicle experiences different driving situations (e.g. driving in different air temperatures) such that a scheduling plan is iteratively developed or derived that will cope with an increased number of driving situations.
- a driving situation may be defined by a respective interval for values of one or more driving parameters, for example, driving speed, engine speed, air temperature, number of traffic participants around the vehicle, just to name examples.
- the invention also comprises embodiments that provide features which afford additional technical advantages.
- One embodiment comprises that, as one mitigation measure, the scheduler links at least two tasks through a queue buffer and/or a double-buffer for providing the output data of one of these tasks as input data to the other one of these tasks, if an access conflict (so-called hazard) is detected.
- an access conflict for example, a so-called FIFO (first in first out)
- FIFO first in first out
- Introducing a buffer provides the advantage that a first task can be already re- executed or repeated for generating new output data while the next, second task is still reading or processing the previous input data. There is no need to wait for the first task until the second task has finished reading or processing the previous input data.
- a scheduling plan results that is optimized or minimized in terms of waiting time for the first task.
- One embodiment comprises that the at least one processing attribute comprises a power consumption and/or temperature and as one mitigation measure the scheduler transfers the execution of at least one of the tasks from a first processing unit to another second processing unit for reducing a load on the first processing unit and/or for switch off the first processing unit.
- the scheduler transfers the execution of at least one of the tasks from a first processing unit to another second processing unit for reducing a load on the first processing unit and/or for switch off the first processing unit.
- two processing units are available that provide the same functionality (although they may have different performance power)
- it can be of advantage to transfer a task from one processing unit to another one, if this allows to reduce energy consumption and/or dissipation of heat by deactivating a first processing unit.
- a scheduling plan results that is optimized or minimized in terms of power consumption and/or temperature generation or heat generation.
- the at least one processing attribute comprises a memory bandwidth for a data throughput (duration of data transfer) between at least two tasks and as one mitigation measure the scheduler introduces memory remapping for avoiding copy operations of data between these tasks.
- a task may read input data via a memory pointer that may be re-assigned to the respective new input data whenever they are available. This saves the necessity of copying these data into a specific input data buffer. Instead, the pointer is set to the respective data that shall be used as new input data.
- the respective processing attribute may be monitored or measured using at least one hardware component of the processing system, for example, a temperature sensor and/or a timing clock and/or a register for counting clock ticks of a clock of the processing system and/or an interrupt counter (for example, for detecting hazards).
- a hardware component of the processing system for example, a temperature sensor and/or a timing clock and/or a register for counting clock ticks of a clock of the processing system and/or an interrupt counter (for example, for detecting hazards).
- one embodiment comprises that confines for the analysis data are estimated on the basis of a predefined system model for the system type that is used in the at least one vehicle. On the basis of the received analysis data at least one task is identified that is running outside the confines. In order to identify the reason, input data exchanged between the tasks and/or situation data describing the current driving situation (e.g. air temperature and /or engine speed) is recorded.
- the input data that caused this behavior of the task are provided in at least one mitigation measure. If in a specific processing system a task results in an outlier regarding the range of values for the respective performance value, the input data and/or information about the driving situation that caused this outlier, are available and may be considered in the update of the scheduling plan. Thus, experience or information gained from different driving situations can be considered in the scheduling plan.
- the scheduler is a master scheduler in a backend computing system.
- a distributed scheduler comprising the master scheduler and a respective local scheduler in each of the processing systems is provided.
- Several processing systems in different vehicles may be provided with the same scheduling plan and corresponding analysis data are received from several vehicles, such that the update of the scheduling plan considers driving situations that not all of the vehicles have experienced.
- This provides the advantage that an update of the scheduling plan may also be based on experience or information, i.e. a specific driving situation that a specific vehicle may not have experienced as the corresponding analysis data were received from another vehicle.
- the scheduling plan will thus converge faster to a scheduling plan considering different driving situations, as several vehicles may be operated in different driving situations at the same time.
- the optimization criterion comprises that in comparison to the analysis data a number of outliers and/or an average of the performance values of at least one performance attribute are reduced. For example, if the analysis data describe that the execution duration of a task is twice as long as expected due to a reduced clock frequency of an over-heated processing unit, by applying the mitigation measure of transferring the task to a cooler processing unit, it can be expected that with this new or updated scheduling plan, the task will not be executed on the hot processing unit the next time such that the outlier is prevented.
- the respective effect or outcome of each mitigation measure can be estimated on the basis of a model of a processing system of the known given system type. Such models are available in the prior art.
- One embodiment comprises that a process model of the executed tasks is generated that describes or is adapted to the analysis data.
- the model may provide a statistical description (e.g. mean value and variance) for the respective performance value of at least one performance attribute (e.g. performance duration and/or power consumption and/or number of hazards) for the respective task.
- a software developer may be supported in developing an update of the at least one software application program.
- a simulator of the system type is configured using the analysis data and a simulation of the execution of the tasks is performed and estimates for the analysis data are generated for at least one predefined situation data set that describes a respective driving situation.
- the simulator can simulate a driving situation that, for example, has not been experienced by one of the real processing systems or the real processing system (if only one is used). This allows to optimize or adapt or update the scheduling plan even for such driving situations that the respective vehicle has not experienced so far. This prevents difficulties as the scheduling plan can even be prepared for such a driving situation before the vehicle actually experiences this specific driving situation.
- the scheduling plan is generated as an acyclic graph from the task data.
- Such an acyclic graph has proven as a very reliable means for defining the scheduling plan.
- the edges of the graph can comprise weights that are set according to the performance values as they are described by the analysis data.
- One embodiment comprises that the tasks are executed asynchronously on the respective processing unit. This allows to handle a heterogeneous processing system on the basis of the described method.
- the execution is thus triggered to execute asynchronously (i.e. independently from a central CPU) but the scheduler may receive feedback from the respective processing unit and/or task, when it was completed, for example in order to tell if the time constraints where met, or to tell that the processing unit / task did not crash (thus the scheduler might also function as a watchdog or provide information to a central watchdog, interestingly). So a function to query the task "are you still working" is beneficial, too.
- One embodiment comprises that the at least one performance attribute comprises: latency, throughput, power consumption, temperature, access collisions. Monitoring one or more or all of these performance attributes has proven to result in a reliable or robust scheduling plan that yields a robust or real-time performance ability in a processing system of a vehicle in varying or different driving situations.
- One embodiment comprises that different operating modes, in particular a low power mode and/or a high performance mode, are pre-defined for the given system type as respective processing constraints and at least one version of the scheduling plan is evaluated in regard of its suitability for the different processing constraints on the basis of the analysis data and if a specific version of the scheduling plan fulfills a predefined suitability criterion (with respect to the processing constraint), plan data describing that version of the scheduling plan are stored and later used as scheduling plan, when the corresponding operating mode is activated in a processing system.
- different operating modes in particular a low power mode and/or a high performance mode
- a current version of a scheduling plan might not meet the requirements of a current driving situation (for example, the performance of the tasks might be too slow), the scheduling plan might have the advantage that another goal or condition is met, for example, the power consumption might be lower than for other versions of the scheduling plan.
- the plan data of the scheduling plan are saved, the scheduling plan can be reused or applied in a different driving situation, when a specific processing constrain, for example, a low power mode, is needed.
- different scheduling plans can be derived on the basis of observing the tasks in different driving situations and collecting the corresponding analysis data. If one of the operating modes is demanded, for example, by a corresponding mode signal, the corresponding scheduling plan can be implemented or activated in the respective processing system.
- One embodiment comprises that a cyclic execution of the tasks is performed and for some or each cycle or loop of the cyclic execution, individual analysis data are provided and the scheduling plan is iteratively adapted to a respective current driving situation by performing the update after each cycle or after each predefined number of cycles.
- the scheduling plan is adapted or updated while the data stream continues streaming into the processing system.
- the scheduling plan is therefore adapted to the data stream and therefore dynamically adapts to the current driving situation.
- the cyclic updating of the scheduling plan may be continued throughout the whole driving situation and/or the whole data stream or it may be stopped once a predefined convergence criterion is fulfilled for the scheduling plan, for example, a change that is achieved in the optimization of the at least one performance attribute (for example, a reduction of task execution time is lower than a specific percentage value, for example, less than 10% change or less than 5% change of execution time reduction is achieved).
- the iterative updating can then be interrupted.
- One embodiment comprises that the tasks are designed for processing stream data of a data stream (i.e. a flow of data, especially video data, measurement data) and the tasks process the data stream in real-time, that is at the speed that new stream data arrive and processed stream data are passed on, wherein the processing is done in each cycle on a subset of the overall stream data at a time.
- exemplary tasks for processing a data stream are an ingest task, a processing task, and an output task.
- the invention also provides a control framework for at least one vehicle, wherein for each vehicle a processing system comprising processing units is provided and the control framework comprises at least one processor circuitry that is designed to provide a local scheduler in the respective processing system of the vehicles or a master scheduler in a stationary backend computing system or a distributed scheduler comprising the respective local scheduler and the master scheduler, wherein the at least one processor circuitry is designed to perform an embodiment of the inventive method.
- the control framework can be designed for one single vehicle, that is all the components are included in the vehicle, for example, in the processor circuitry of an electronic control unit and/or a central computing unit (head unit) of the vehicle.
- the vehicle may be connected to a stationary backend computing system, for example, a cloud computer or a computer server that may be operated in the internet.
- a link between the vehicle and the backend computer system may be provided on the basis of an internet connection and/or a wireless connection based on, for example, WiFi technology or mobile communication technology, e.g. 5G.
- the control framework may also comprise several vehicles, that may be linked to the stationary backend computing system, such that the vehicles may be operated on the basis of a common scheduling plan and the analysis data of each of the vehicle may be gathered or combined by the master scheduler in the backend computing system to update the scheduling plan for all of the vehicles.
- the respective vehicle is preferably designed as a motor vehicle, in particular as a passenger vehicle or a truck, or as a bus or a motorcycle.
- the invention also comprises the combinations of the features of the different embodiments.
- Fig. 1 a schematic illustration of an embodiment of the inventive control framework with a respective processing system in at least one vehicle and a backend computing system;
- Fig. 2 a schematic illustration of the control framework after an update of a scheduling plan based on a mitigation measure
- Fig. 3 a sketch for illustrating the effect of the mitigation measure.
- Fig. 1 shows a control framework 10 that may comprise at least one vehicle 11 and a stationary backend computing system 12.
- the backend computing system 12 can be an internet server and/or a cloud server operated in the internet 13.
- the vehicle 11 can be, for example, a motor vehicle, e.g. a passenger vehicle or a truck.
- a processing system 14 may be provided for processing a data stream 15 that may be provided in chunks or frames 16 to the processing system 14 and for each frame 16, a predefined application program 17 (Prog) is applied to the stream data 18 of the frames 16 such that an output data stream 19 is generated.
- the data stream may be comprising raw image data (.raw) and from the raw image data an encoded image data stream (.avi) may be generated as output data stream 19.
- the stream data 18 may be stored in a storage 20, for example, based on the technology NVMe (non-volatile memory express).
- NVMe non-volatile memory express
- the application program 17 may define or demand several tasks 21 that make up or result in the overall functionality of the application program 17.
- the tasks 21 may be performed by processing resources or processing units 22 that are available in the processing system 14.
- Fig. 1 shows as an example a CPU with processing units 22 in the form of a kernels C, an encryption module Cryp and a video encoder Vid, which are exemplary processing units 22.
- At least one processing unit 22 may be provided in the form of a CPU and/or a GPU.
- Shared memory SFIM and dedicated memory DEM for the GPU may also be processing units 22 or resources of the processing system 14.
- a direct memory access controller for transferring data from the shared memory SFIM to the dedicated memory DEM and in the opposite direction may also be a processing unit 22.
- a memory controller for transferring data between the storage 20 and the shared memory SFIM may also be a processing unit.
- the processing unit may perform a respective task 21 once they have been initiated or programmed based on software data from the application program 17. For processing the incoming stream data 18 in the correct order and/or one after another, a specific order or concatenation of the tasks 21 needs to be considered.
- a local scheduler 23 of the processing system 14 may gather or receive task data 24 regarding the application program 17 and may set up a scheduling plan 25 indicating, how the tasks 21 shall interact and/or when the tasks 21 shall be triggered, in other words, which task 21 shall be associated or assigned to which processing unit 22.
- the tasks may be linked or interconnected by respective buffers 26 for using output data 27 of one task 21 as corresponding input data 28 of the next of following task.
- the tasks 21 are executed in a cyclic execution, i.e. they are repeated for each chunk or frame 16.
- respective analysis data 30 may be measured or observed or provided by the backend computing system 12.
- each of the vehicle may send analysis data 30 to the backend computing system 12.
- the scheduling plan 25 may additionally or alternatively to the local scheduler 23 be derived by a master scheduler 31 based on task data 24. The scheduling plan 25 may then be provided to each of the vehicles 11 .
- the scheduling plan 25 may be updated, if from the analysis data 30 it becomes apparent that at least one performance attribute 32 results in a performance value for at least one task 21 that indicates a non-real-time performance of the application software 17 in at least one driving situation 33.
- a mitigation measure MM may be applied to the scheduling plan 25 and the updated scheduling plan may be performed or used in the processing system 14 instead of the previous scheduling plan 25.
- Fig. 2 and Fig. 3 illustrate possible mitigation measures.
- the following comparison shows the scheduling plan 25 in its original version as shown in Fig. 1 and the updated scheduling plan.
- the exemplary performance attributes bandwidth and duration may be measured, resulting is task-specific performance values x, y that may be provided as part of analysis data.
- Fig. 3 illustrates that the mitigation measure may avoid access collisions or hazards 35 that may be indicated by the analysis data 30 for the buffers 26.
- the initial scheduling plan 25 is applied, and after an update 36 the updated scheduling plan 25’ with introduced mitigation measures MM is shown.
- an additional remap task for introducing, for example, a double buffer logic can be applied such that the output data 27 may already be written to a buffer 27 while the next task 21 reads in the input data 28 from the previous cycle.
- Central purposing units (CPU processors) and additional processing unit may handle e.g. 3d graphics and video (multimedia in general) customer applications and at the same time the increased demand for mobile computing (laptops and smartphones) with limited battery capacity.
- Application developers manage the input, processing and output of their applications, running in the so-called user-space of an operating system on the CPU; an environment with less privileges than the operating systems kernel and drivers.
- Data that is written to, or read from input/output peripherals or special purpose processors is copied from one devices address space to the other - even when using the same physical memory. This may or may not take place efficiently depending on the operating systems subsystems for the management of the peripherals that may or may not make use of hardware accelerated transfers. Hence, performance is heavily reliant on the system configuration at hand.
- the devoid of standardized interconnects and subsystems for peripheral memory management may come at the price of very slow memory transfers between devices through the user-space / application on the CPU.
- a scheduler is needed to manage resources, resource usage and mitigate problems but will allow for a simultaneous execution of multiple programs.
- the execution of a programs individual tasks in such a heterogeneous system will be managed by the scheduler and take place asynchronously, i.e. will not need the schedulers attention while executing.
- FIFOs buffers
- the statistics of the execution of tasks, programs and the schedule/pipelines derived and intermediate buffers needed, shall be sent to a backend. Data exchanged (input and output) between tasks can be recorded to find edge cases if the processing of a task took exceptionally long, i.e. longer than expected.
- a schedule can be derived in which multiple programs can schedule and execute tasks in parallel asynchronously in a heterogeneous system whilst providing a high reliability in regards to execution time and power consumption (if needed).
- the precision is further increased by aggregating this data in a backend from multiple installations of the same system. System- and application developers can utilize this information to further tweak individual tasks or programs.
- a preferred implementation of the control framework comprises the following components:
- multiprocessor, multicore, specialized processing units such as GPUs, TPU/DLAs, video encoders/decoders, encryption/decryption accelerators, compression/decompression accelerators
- Tasks are managed by the scheduler and run asynchronously in said heterogeneous system.
- - Applications are managed by the scheduler or by the operating system.
- the scheduler may be embedded in the operating system an independent user-space application (daemon) and may/may not be embedded into a framework providing further functionality to application developers. - Furthermore, said scheduler provides means to measure the execution duration of tasks and the duration to exchange data between execution tasks and also measure attributes such as the power consumption / scaling of processors or processing units.
- the scheduler may insert buffers automatically to decouple the execution time of concatenated tasks, when access conflicts are detected (hazards)
- said scheduler sends the measured attributes to a backend.
- the attributes can be used to derive a schedule when all programs (concatenation of tasks) are known, or programs are added.
- the derived schedule will provide the estimated execution time and power consumption of a number of programs/tasks running distributed in the heterogeneous system (in parallel and asynchronously) but managed through the scheduler.
- tasks running outside of the confines of the estimated schedule can be analyzed in the backend by transmitting the tasks input data and the generated schedule to a backend.
- the estimations aggregated in the backend can then be adapted or the task can be optimized to better handle certain conditions / exceptions.
- aggregated attribute-data is used to provide an accurate simulation of the execution within the system.
- Tasks have dependencies to each other and to resources needed to execute.
- the scheduler maintains a list of consecutive tasks, i.e. the order of execution is determined by the precondition of input data. Any data that is the output of another task must be available before the task can be executed in every cycle.
- Tasks carry information about the needed/reserved hardware resourced (execution units, bandwith).
- the scheduler will only trigger the execution of a task, if the required resources are available and will reserve there resources until the execution of the task is complete.
- the schedule is described by the start point of execution as well as the execution time. Metrics such as CPU, memory bandwidth, power, consumption are also tracked.
- the measured data becomes more representative for an individual tasks execution time and contribution to resource usage. This is especially true if applications and their tasks are executed in different combinations to other applications.
- - Resulting schedules may be uploaded to a backend. It shall be possible to narrow / filter data specifically for worst case results measured or an arbitrary deviation from average values. Input data used for the execution of an individual task, or chain of tasks may also be uploaded to the backend, to recreate the situation encountered in the field.
- - Mitigation measures for updating a dynamic scheduling plan may include at least one of the following: o Scorboarding, o Tomasulo-Algorithm, o Elimination of task-processing/data transfers when task- inputs where unchanged (propagation of unchanged outputs), o Re-ordering of tasks / attempts to optimize a schedule by applying known patterns to it whilst considering predefined profiles/constraints (e.g. optimize task order, or task variants according to the “Low Power Usage” profile).
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Traffic Control Systems (AREA)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2021/059568 WO2022218510A1 (en) | 2021-04-13 | 2021-04-13 | Method for scheduling software tasks on at least one heterogeneous processing system using a backend computer system, wherein each processing system is situated in a respective vehicle of a vehicle fleet, and control framework for at least one vehicle |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4298513A1 true EP4298513A1 (de) | 2024-01-03 |
Family
ID=75539325
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP21719102.2A Pending EP4298513A1 (de) | 2021-04-13 | 2021-04-13 | Verfahren zur software-task-planung auf zumindest einem heterogenen rechnersystem |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP4298513A1 (de) |
WO (1) | WO2022218510A1 (de) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118334914B (zh) * | 2024-06-13 | 2024-08-13 | 中国航空工业集团公司沈阳飞机设计研究所 | 一种飞机编队多任务航路点综合引导方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4972314A (en) | 1985-05-20 | 1990-11-20 | Hughes Aircraft Company | Data flow signal processor method and apparatus |
WO2009029549A2 (en) | 2007-08-24 | 2009-03-05 | Virtualmetrix, Inc. | Method and apparatus for fine grain performance management of computer systems |
WO2014207759A2 (en) | 2013-06-20 | 2014-12-31 | Tata Consultancy Services Limited | System and method for distributed computation using heterogeneous computing nodes |
US11934865B2 (en) * | 2018-06-13 | 2024-03-19 | Hitachi, Ltd. | Vehicle control system for dynamically updating system components |
US20210049757A1 (en) * | 2019-08-14 | 2021-02-18 | Nvidia Corporation | Neural network for image registration and image segmentation trained using a registration simulator |
-
2021
- 2021-04-13 EP EP21719102.2A patent/EP4298513A1/de active Pending
- 2021-04-13 WO PCT/EP2021/059568 patent/WO2022218510A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2022218510A1 (en) | 2022-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11567780B2 (en) | Apparatus, systems, and methods for providing computational imaging pipeline | |
JP6208769B2 (ja) | スケーラブル計算ファブリックを提供する方法、スケーラブル計算ファブリックを有するコンピューティングデバイス、及びプリントデバイス | |
Yang et al. | Making openvx really" real time" | |
US20200073830A1 (en) | Method, apparatus, and system for an architecture for machine learning acceleration | |
US8972702B2 (en) | Systems and methods for power management in a high performance computing (HPC) cluster | |
US8522254B2 (en) | Programmable integrated processor blocks | |
US11281967B1 (en) | Event-based device performance monitoring | |
US20140143524A1 (en) | Information processing apparatus, information processing apparatus control method, and a computer-readable storage medium storing a control program for controlling an information processing apparatus | |
KR20160004365A (ko) | 크레딧 기반의 중재를 위한 서비스 레이트 재분배 | |
US9471387B2 (en) | Scheduling in job execution | |
WO2022218510A1 (en) | Method for scheduling software tasks on at least one heterogeneous processing system using a backend computer system, wherein each processing system is situated in a respective vehicle of a vehicle fleet, and control framework for at least one vehicle | |
Shabbir et al. | Distributed resource management for concurrent execution of multimedia applications on MPSoC platforms | |
Afonso et al. | Heterogeneous CPU/FPGA reconfigurable computing system for avionic test application | |
US10261817B2 (en) | System on a chip and method for a controller supported virtual machine monitor | |
Hsiung et al. | Model-driven development of multi-core embedded software | |
Borrmann et al. | Safe and efficient runtime resource management in heterogeneous systems for automated driving | |
US20220058062A1 (en) | System resource allocation for code execution | |
US20220365813A1 (en) | Apparatus, Device, Method, and Computer Program for Scheduling an Execution of Compute Kernels | |
Kim et al. | Pipelined Scheduling of Functional HW/SW Modules for Platform‐Based SoC Design | |
Hsiung et al. | Vertaf/multi-core: A sysml-based application framework for multi-core embedded software development | |
Meena et al. | Hardware Analysis on NVDLA Using RESNET50 | |
Spasic | Improved hard real-time scheduling and transformations for embedded Streaming Applications | |
Ko et al. | Buffer management for multi-application image processing on multi-core platforms: Analysis and case study | |
Armaoui et al. | On the Use of Models for Real-time Reconfigurations of Embedded Systems. | |
김종찬 | Component-based scheduling and system optimization for automotive control systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20230928 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) |