CN113806044A - Heterogeneous platform task bottleneck elimination method for computer vision application - Google Patents

Heterogeneous platform task bottleneck elimination method for computer vision application Download PDF

Info

Publication number
CN113806044A
CN113806044A CN202111008450.1A CN202111008450A CN113806044A CN 113806044 A CN113806044 A CN 113806044A CN 202111008450 A CN202111008450 A CN 202111008450A CN 113806044 A CN113806044 A CN 113806044A
Authority
CN
China
Prior art keywords
task
tasks
bottleneck
execution
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111008450.1A
Other languages
Chinese (zh)
Other versions
CN113806044B (en
Inventor
王祎
刘志磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202111008450.1A priority Critical patent/CN113806044B/en
Publication of CN113806044A publication Critical patent/CN113806044A/en
Application granted granted Critical
Publication of CN113806044B publication Critical patent/CN113806044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a method for eliminating task bottleneck of a heterogeneous platform for computer vision application, which comprises the following steps: splitting a computer vision application into a plurality of semantically independent tasks, wherein each task needs to realize a predefined uniform interface; connecting the tasks through queues, and organizing the application into a directed graph form; discovering a bottleneck task; processing a bottleneck task; and packaging the tasks into coroutine tasks, and submitting the coroutine tasks to a coroutine scheduler for execution.

Description

Heterogeneous platform task bottleneck elimination method for computer vision application
The technical field is as follows:
the invention relates to the fields of computer vision, a streaming computing system, dynamic scheduling and the like, in particular to a heterogeneous platform-oriented task bottleneck elimination method based on a streaming computing model.
Background art:
today, with the rapid development of the computer vision field, there are already a lot of basic researches to make the computer vision method have the possibility of being applied in the fields such as security and logistics. Computer vision algorithm engineering is becoming a crucial ring for promoting algorithm landing and application. However, the training process of the computer vision algorithm is often more complex than the inference process, and the inference process usually does not include a complex iterative algorithm and is often regarded as an easier part, so the research in the computer vision field focuses on improving the efficiency or accuracy of model training, the framework in the computer vision field is also focused on model training, and the inference process only provides simple basic functions. As computer vision becomes more and more applied, a computer vision reasoning framework which is highly available, easy to expand and capable of helping a computer vision algorithm to rapidly land is becoming a new demand for rapid growth.
As shown in fig. 1, a computer vision application is generally formed by connecting a plurality of network models in series, has a certain sequence and dependency relationship, and is usually executed by a special computing device, and an image or video stream to be analyzed is sequentially analyzed by the network models, and is assisted by some logic control, and finally, an analysis result is fed back to a user. Such a processing pipeline is well suited for abstraction using a streaming computing model, which contains three concepts, graph, task, edge. Where a task is an abstraction of a task, typically model reasoning or logic control in computer vision applications. An edge is an abstraction of dependencies and communication patterns between tasks, usually a queue. One task may correspond to multiple edges, and an edge may also connect multiple tasks. The graph is an abstraction of the task pipeline, and a plurality of tasks and edges between the tasks form a directed graph. By applying the streaming computing model, the machine learning application is abstracted into a directed graph structure, the flow structure of the machine learning application can be well expressed, and the potential parallel capability of the computer vision application is fully exposed.
In the landing process of computer vision application, cost control under high load is a core requirement of engineering, and two important performance indexes exist generally, namely: throughput rate and latency. The throughput rate and the delay have essential conflict, that is, theoretically, the higher the throughput rate is, the higher the delay is on the premise that the algorithm completes the equivalent function. The invention focuses on solving the problem of eliminating the bottleneck of computer vision application tasks under a heterogeneous platform, and provides a comprehensive scheduling method which can be transparent to specific applications, and can adapt to two indexes of throughput rate and delay during operation, thereby improving the landing efficiency of algorithm application and reducing the enterprise development cost.
The invention content is as follows:
the invention aims to provide a method for a heterogeneous platform, which is transparent to specific application and can eliminate throughput rate and delay bottleneck during operation, and the implementation steps of the technical scheme are as follows:
a heterogeneous platform task bottleneck elimination method for computer vision applications, comprising the steps of:
(1) splitting a computer vision application into a plurality of semantically independent tasks, wherein each task needs to realize a predefined uniform interface;
(2) the tasks are connected through a queue, and the queue provides functions of shunting, broadcasting, aggregating and preserving according to indexes; each task has a plurality of input queues and a plurality of output queues, and the input queues and the output queues are shared by the tasks, so that the application is organized into a directed graph form;
(3) finding a bottleneck task, the method comprises the following steps:
1) if the flow rate of the output end of the input queue of one task is less than that of the input end, the task is considered to be on a bottleneck path, and the last task on the bottleneck path is considered to be a bottleneck task; for any task, judging the execution overhead of the task through the change of the residual data volume of the input pipeline before and after the single execution of the task, and calculating the task execution overhead C:
C=max(N1-N2,1)
Costn+1=Costn×S+C×(1-S)
wherein N is1Is the minimum value of the remaining data amount in all input queues before the single execution of the task, N2The minimum value of the residual data amount in all input queues after the task is executed once, S is a smoothing coefficient, the value used by the method is 0.7, CostnIs the last execution Cost, the initial value is a maximum value, Costn+1Is this execution overhead;
2) in order to combine the execution overhead of the task with the execution overhead of the task depending before and after the task to more accurately judge the bottleneck task of the current application, the execution overhead of the context task is calculated:
Figure BDA0003237790940000021
wherein N is3The minimum value of the residual data amount in all output pipelines after the task is executed once;
(4) the bottleneck task processing method comprises the following steps: traversing all tasks in any order, and calculating the execution cost of the context task, wherein the task with the largest execution cost of the context task is the global bottleneck task; if the bottleneck task is a computing task on the CPU, executing replication according to whether the bottleneck task has a state or not, or replicating according to the index to parallelize the bottleneck task and eliminate the bottleneck; if the bottleneck task is a computing task on the gpu, increasing the overtime time of batch processing input data to improve the throughput rate of the bottleneck task; for all tasks with the execution cost of the context task being 0.5 times of the execution cost of the context task, the tasks are called low-cost tasks, if the low-cost tasks are stateful computing tasks on a cpu, the tasks are copied according to indexes, a computing task with the same unique identification value is aggregated, and the system load is reduced; if the low-overhead task is a calculation task on the gpu, the overtime time of batch processing input data is reduced, the current batch processing is executed as soon as possible, and the delay of the task is reduced;
(5) packaging the tasks into coroutine tasks, and submitting the coroutine tasks to a coroutine scheduler for execution;
(6) when the coroutine scheduler calls the coroutine task, traversing the input queue of the coroutine task in any order, sequentially executing dynamic batch processing, determining the batch processing size of each input current execution of the task and specific input data in the batch processing according to the remaining waiting time, then executing the task, and the remaining waiting time TrThe calculation formula is as follows:
Figure BDA0003237790940000031
Figure BDA0003237790940000032
Tr=Te×(Wbefore+Wn)-Tbefore-Tn
wherein T isnFor the average execution time of the nth task, TtotalFor the average delay of all data in the whole flow, WnIs the delay weight of the nth task at TnUpdate-time update delay weight Wn,TeIs the expected delay, WbeforeIs the cumulative delay weight sum from the 1 st task to the n-1 st task, i.e. the delay weight sum of the n-1 tasks that a data has been subjected to when reaching the n-th task, TbeforeIs the sum of the delays experienced by a task until the data reaches the nth task.
The invention provides a comprehensive scheduling algorithm by utilizing the good decoupling property and the easy parallel characteristic of a flow type calculation model, and the comprehensive scheduling algorithm can be transparent to specific computer visual application and can adapt to two indexes of throughput rate and delay during operation, thereby improving the landing efficiency of algorithm application and reducing the enterprise development cost.
Description of the drawings:
FIG. 1: computer vision applications a common flow abstraction.
FIG. 2: the invention provides an automatic parallel strategy diagram.
FIG. 3: the invention provides an automatic batch processing strategy schematic diagram.
FIG. 4: effect maps in real computer vision projects. The numbers on each line in the figure refer to the number of data in the pipe/input rate/output rate.
The specific implementation mode is as follows:
the method comprises the following specific steps:
(1) the computer vision application is divided into a plurality of semantically independent tasks, and each task needs to realize a predefined uniform interface. The interface is specifically defined as follows:
fn id (& self) - > use; // obtaining a unique identification for a task
fn exec (& mut self); // task execution interface
fn set _ input (& mut self, i: use, edge: Queue); // set input edge
fn get _ input (& mut self, i: use) - > & mut Queue; // get input edge
fn set _ output (& mut self, i: use, edge: Queue); // set the output edge
fn get _ output (& mut self, i: usize) - > & mut Queue; // obtaining the output edge
fn cost (& mut self) - > f 64; update and return last execution overhead
fn indexes (& self) - > Vec < use >; if the current task is empty, the task is a task without context
fn clone (& Self) - > Self; // copy the current task
fn clone _ by _ index (& Self, index: use) - > Self; // copy Current task by index
fn collect _ by _ index (& Self, other: Self); // aggregating tasks by index
fn Device (& self) - > Device; // get task execution device
fn cost _ time (& mut self) - > use; update and Return to average execution time consumption
fn weight (& self) - > f 64; // delay weight
(2) The tasks are connected through queues, each task has a plurality of input queues and a plurality of output queues, and the queues are shared by the tasks, so that the application is organized into a directed graph form. The queue is implemented based on current _ queue in intel tbb, but different from the current _ queue, the queue provides functions of shunting, broadcasting, aggregating and preserving according to indexes, and provides functional support for different forms of task dependency relationships.
(3) The bottleneck task is discovered by a bottleneck detection method, which is specifically explained as follows:
as shown in fig. 2a), if the output flow rate of the input queue of a task is less than the input flow rate, the task can be considered to be on the bottleneck path, and the last task on the bottleneck path can be considered to be a bottleneck task, which needs to be executed by more physical threads to eliminate the performance bottleneck. One task may be stateful or stateless, for stateless tasks, runtime parallelization is relatively easy, and the key challenge of runtime dynamic parallelization of one task is how to handle stateful tasks. Regarding to the problem, as shown in fig. 2b), we have invented a run-time index-based splitting scheme, a stateful task can only process a full data set related to some indexes, a run-time global scheduling algorithm can dynamically adjust the existing indexes of the task, exchange index-related cache data, and further allocate more physical threads to the stateful task to process the indexed data streams in parallel, so as to improve the throughput rate of the stateful task.
Regarding the detection of the bottleneck, for any task, the execution overhead of the task is judged through the change of the residual data volume of the input pipeline before and after the task is executed once, and the formula for calculating the execution overhead is as follows:
C=max(N1-N2,1)
Costn+1=Costn×S+C×(1-S)
wherein N is1Is the minimum value of the remaining data amount in all input queues before the single execution of the task, N2The minimum value of the residual data amount in all input queues after the task is executed once, S is a smoothing coefficient, the value used by the method is 0.7, CostnIs the last execution Cost, the initial value is a maximum value, Costn+1Is this execution overhead.
In order to combine the execution overhead of the task itself with the execution overhead of the pre-task and the post-task to more accurately judge the bottleneck task of the current application, the Cost is set as aboven+1Multiplying the value by a dependent cost coefficient to obtain the execution cost of the context task, wherein the formula is as follows:
Figure BDA0003237790940000051
wherein N is3Is the minimum of the amount of data remaining in all output pipes after a single execution of the task.
(4) And traversing all the tasks in any order, calculating the execution overhead of the context task, and executing the task with the highest execution overhead of the context task, namely the global bottleneck task. If the bottleneck task is a computing task on the CPU, the bottleneck task is parallelized by executing the copying or copying according to the index according to whether the bottleneck task has the state, so that the bottleneck is eliminated. If the bottleneck task is a calculation task on the gpu, the timeout time for collecting batch input data is increased in consideration of avoiding manufacturing video memory fragments, the batch input data is executed in a larger batch as much as possible, and the throughput rate of the bottleneck task is improved.
And for all tasks with the execution cost of the context task being 0.5 times that of the bottleneck task, the tasks are called low-cost tasks, if the low-cost tasks are stateful computing tasks on a cpu, the tasks are copied according to the index, a computing task with the same unique identification value is aggregated, and the system load is reduced. If the low-overhead task is a calculation task on gpu, the overtime time of batch processing input data is reduced, the current batch processing is executed as soon as possible, and the delay of the task is reduced.
(5) And packaging the tasks into coroutine tasks, and submitting the coroutine tasks to a coroutine scheduler for execution. Because computer vision has a plurality of lighter logic control tasks, coroutine scheduling is adopted, so that larger thread overhead is avoided.
(6) When the coroutine scheduler calls a coroutine task, traversing the input queue of the task, sequentially executing a dynamic batch processing algorithm, and determining the batch processing size of each input current execution of the task and specific input data in batch processing. The exec interface of the task is then called. The dynamic batch algorithm is specified as follows:
as shown in FIG. 3, the dynamic batch processing algorithm calculates the remaining waiting time of the data in the current task according to the delay expectation and the spent time for each data by specifying a global delay expectation, executes the data immediately if the remaining waiting time is 0, waits for the next round of execution if the remaining waiting time is not 0 and the current task does not save enough input data for batch processing, prioritizes the execution of the batch processing according to the smaller remaining waiting time to the larger remaining waiting time if the remaining waiting time is not 0 and the current task does not save enough input data for batch processing, and waits for the next round of execution of the data.
Remaining waiting time TrThe calculation formula is as follows:
Figure BDA0003237790940000061
Figure BDA0003237790940000062
Tr=Te×(Wbefore+Wn)-Tbefore-Tn
wherein T isnFor the average execution time of the nth task, TtotalFor the average delay of all data in the whole flow, WnIs the delay weight of the nth task at TnUpdate-time update delay weight Wn,TeIs the expected delay, WbeforeIs the cumulative delay weight sum from the 1 st task to the n-1 st task, i.e. the delay weight sum of the n-1 tasks that a data has been subjected to when reaching the n-th task, TbeforeIs the sum of the delays experienced by a task until the data reaches the nth task.
(7) And (5) repeating the steps (3) to (5).

Claims (1)

1. A heterogeneous platform task bottleneck elimination method for computer vision applications, comprising the steps of:
(1) splitting a computer vision application into a plurality of semantically independent tasks, wherein each task needs to realize a predefined uniform interface;
(2) the tasks are connected through a queue, and the queue provides functions of shunting, broadcasting, aggregating and preserving according to indexes; each task has a plurality of input queues and a plurality of output queues, and the input queues and the output queues are shared by the tasks, so that the application is organized into a directed graph form;
(3) finding a bottleneck task, the method comprises the following steps:
1) if the flow rate of the output end of the input queue of one task is less than that of the input end, the task is considered to be on a bottleneck path, and the last task on the bottleneck path is considered to be a bottleneck task; for any task, judging the execution overhead of the task through the change of the residual data volume of the input pipeline before and after the single execution of the task, and calculating the task execution overhead C:
C=max(N1-N2,1)
Costn+1=Costn×S+C×(1-S)
wherein N is1Is the minimum value of the remaining data amount in all input queues before the single execution of the task, N2The minimum value of the residual data amount in all input queues after the task is executed once, S is a smoothing coefficient, S is 0.7, CostnIs the last execution Cost, the initial value is a maximum value, Costn+1Is this execution overhead;
2) in order to combine the execution overhead of the task with the execution overhead of the task depending before and after the task to more accurately judge the bottleneck task of the current application, the execution overhead of the context task is calculated:
Figure FDA0003237790930000011
wherein N is3The minimum value of the residual data amount in all output pipelines after the task is executed once;
(4) the bottleneck task processing method comprises the following steps: traversing all tasks in any order, and calculating the execution cost of the context task, wherein the task with the largest execution cost of the context task is the global bottleneck task; if the bottleneck task is a computing task on the CPU, executing replication according to whether the bottleneck task has a state or not, or replicating according to the index to parallelize the bottleneck task and eliminate the bottleneck; if the bottleneck task is a computing task on the gpu, increasing the overtime time of batch processing input data to improve the throughput rate of the bottleneck task; for all tasks with the execution cost of the context task being 0.5 times of the execution cost of the context task, the tasks are called low-cost tasks, if the low-cost tasks are stateful computing tasks on a cpu, the tasks are copied according to indexes, a computing task with the same unique identification value is aggregated, and the system load is reduced; if the low-overhead task is a calculation task on the gpu, the overtime time of batch processing input data is reduced, the current batch processing is executed as soon as possible, and the delay of the task is reduced;
(5) packaging the tasks into coroutine tasks, and submitting the coroutine tasks to a coroutine scheduler for execution;
(6) when the coroutine scheduler calls the coroutine task, traversing the input queue of the coroutine task in any order, sequentially executing dynamic batch processing, determining the batch processing size of each input current execution of the task and specific input data in the batch processing according to the remaining waiting time, then executing the task, and the remaining waiting time TrThe calculation formula is as follows:
Figure FDA0003237790930000021
Figure FDA0003237790930000022
Tr=Te×(Wbefore+Wn)-Tbefore-Tn
wherein T isnFor the average execution time of the nth task, TtotalFor the average delay of all data in the whole flow, WnIs the delay weight of the nth task at TnUpdate-time update delay weight Wn,TeIs the expected delay, WbeforeIs the cumulative delay weight sum from the 1 st task to the n-1 st task, i.e. the delay weight sum of the n-1 tasks that a data has been subjected to when reaching the n-th task, TbeforeIs the sum of the delays experienced by a task until the data reaches the nth task.
CN202111008450.1A 2021-08-31 2021-08-31 Heterogeneous platform task bottleneck eliminating method for computer vision application Active CN113806044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111008450.1A CN113806044B (en) 2021-08-31 2021-08-31 Heterogeneous platform task bottleneck eliminating method for computer vision application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111008450.1A CN113806044B (en) 2021-08-31 2021-08-31 Heterogeneous platform task bottleneck eliminating method for computer vision application

Publications (2)

Publication Number Publication Date
CN113806044A true CN113806044A (en) 2021-12-17
CN113806044B CN113806044B (en) 2023-11-07

Family

ID=78941988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111008450.1A Active CN113806044B (en) 2021-08-31 2021-08-31 Heterogeneous platform task bottleneck eliminating method for computer vision application

Country Status (1)

Country Link
CN (1) CN113806044B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160171330A1 (en) * 2014-12-15 2016-06-16 Reflex Robotics, Inc. Vision based real-time object tracking system for robotic gimbal control
CN105869117A (en) * 2016-03-28 2016-08-17 上海交通大学 Method for accelerating GPU directed at deep learning super-resolution technology
CN111782355A (en) * 2020-06-03 2020-10-16 上海交通大学 Cloud computing task scheduling method and system based on mixed load
CN112612615A (en) * 2020-12-28 2021-04-06 中孚安全技术有限公司 Data processing method and system based on multithreading memory allocation and context scheduling
CN112905317A (en) * 2021-02-04 2021-06-04 西安电子科技大学 Task scheduling method and system under rapid reconfigurable signal processing heterogeneous platform
CN113191945A (en) * 2020-12-03 2021-07-30 陕西师范大学 High-energy-efficiency image super-resolution system and method for heterogeneous platform
CN113238837A (en) * 2020-07-10 2021-08-10 北京旷视科技有限公司 Computing flow chart construction method, computing efficiency optimization method, computing efficiency construction device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160171330A1 (en) * 2014-12-15 2016-06-16 Reflex Robotics, Inc. Vision based real-time object tracking system for robotic gimbal control
CN105869117A (en) * 2016-03-28 2016-08-17 上海交通大学 Method for accelerating GPU directed at deep learning super-resolution technology
CN111782355A (en) * 2020-06-03 2020-10-16 上海交通大学 Cloud computing task scheduling method and system based on mixed load
CN113238837A (en) * 2020-07-10 2021-08-10 北京旷视科技有限公司 Computing flow chart construction method, computing efficiency optimization method, computing efficiency construction device and electronic equipment
CN113191945A (en) * 2020-12-03 2021-07-30 陕西师范大学 High-energy-efficiency image super-resolution system and method for heterogeneous platform
CN112612615A (en) * 2020-12-28 2021-04-06 中孚安全技术有限公司 Data processing method and system based on multithreading memory allocation and context scheduling
CN112905317A (en) * 2021-02-04 2021-06-04 西安电子科技大学 Task scheduling method and system under rapid reconfigurable signal processing heterogeneous platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高原;顾文杰;丁雨恒;彭晖;陈泊宇;顾雯轩;: "异构集群中CPU与GPU协同调度算法的设计与实现", 计算机工程与设计, no. 02 *

Also Published As

Publication number Publication date
CN113806044B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
US10373053B2 (en) Stream-based accelerator processing of computational graphs
Ousterhout et al. Monotasks: Architecting for performance clarity in data analytics frameworks
US10089142B2 (en) Dynamic task prioritization for in-memory databases
CN110717574B (en) Neural network operation method and device and heterogeneous intelligent chip
US10366084B2 (en) Optimizing pipelining result sets with fault tolerance in distributed query execution
CN112052081B (en) Task scheduling method and device and electronic equipment
WO2017005115A1 (en) Adaptive optimization method and device for distributed dag system
Song et al. Bridging the semantic gaps of GPU acceleration for scale-out CNN-based big data processing: Think big, see small
CN114217966A (en) Deep learning model dynamic batch processing scheduling method and system based on resource adjustment
WO2016041126A1 (en) Method and device for processing data stream based on gpu
CN104346220A (en) Task scheduling method and system
CN114217930A (en) Accelerator system resource optimization management method based on mixed task scheduling
US20210390405A1 (en) Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof
Chakrabarti et al. Resource scheduling for parallel database and scientific applications
US10592473B2 (en) Method for improving energy efficiency of map-reduce system and apparatus thereof
CN113806044B (en) Heterogeneous platform task bottleneck eliminating method for computer vision application
CN108710640B (en) Method for improving search efficiency of Spark SQL
Tang et al. A network load perception based task scheduler for parallel distributed data processing systems
Nasr et al. Task scheduling algorithm for high performance heterogeneous distributed computing systems
CN113076181B (en) Data processing flow optimization method, system and storage medium
JP2023544911A (en) Method and apparatus for parallel quantum computing
CN113094155B (en) Task scheduling method and device under Hadoop platform
Zheng et al. Conch: A cyclic mapreduce model for iterative applications
Liu et al. Multivariate modeling and two-level scheduling of analytic queries
CN111984398A (en) Method and computer readable medium for scheduling operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant