CN112181620B - Big data workflow scheduling method for sensing service capability of virtual machine in cloud environment - Google Patents

Big data workflow scheduling method for sensing service capability of virtual machine in cloud environment Download PDF

Info

Publication number
CN112181620B
CN112181620B CN202011031844.4A CN202011031844A CN112181620B CN 112181620 B CN112181620 B CN 112181620B CN 202011031844 A CN202011031844 A CN 202011031844A CN 112181620 B CN112181620 B CN 112181620B
Authority
CN
China
Prior art keywords
task
virtual machine
data
service
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011031844.4A
Other languages
Chinese (zh)
Other versions
CN112181620A (en
Inventor
曹洁
张志锋
桑永宜
王博
崔霄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Priority to CN202011031844.4A priority Critical patent/CN112181620B/en
Publication of CN112181620A publication Critical patent/CN112181620A/en
Application granted granted Critical
Publication of CN112181620B publication Critical patent/CN112181620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The invention relates to the technical field of big data, in particular to a big data workflow scheduling method for sensing the service capability of a virtual machine in a cloud environment, which is a measurement method for evaluating the service capability requirement of a task on resources and the service capability guarantee of the virtual machine on the task, provides a necessary reference basis for the scheduling execution of big data parallel tasks, and provides a service dynamic level scheduling algorithm for matching the service capability requirement of the task with the service capability guarantee of the virtual machine; the method comprises the following steps: modeling and assuming a big data workflow and a cloud system; secondly, calculating service capacity; and thirdly, dynamic scheduling of the big data workflow service.

Description

Big data workflow scheduling method for sensing service capability of virtual machine in cloud environment
Technical Field
The invention relates to the technical field of big data, in particular to a big data workflow scheduling method for sensing service capability of a virtual machine in a cloud environment.
Background
Massive big data needs to be processed in various commercial and scientific fields such as banking industry, fraud detection, medical health, demand prediction, scientific exploration and the like, big data processing technology becomes very important in the industries, and most big data applications are generally composed of a plurality of interdependent calculation-intensive and data-intensive tasks.
The types of big data tasks can be divided into CPU intensive, data intensive (there are a lot of data analysis processing operations), and I/O intensive (transfer a lot of data to a subsequent task).
Huang, etc. provides a dynamic distributed scheduling algorithm CASA aiming at the characteristics of cloud environment resource distribution, each scheduling node in the algorithm is provided with a meta-scheduler for receiving tasks submitted by local users and performing task distribution, meanwhile, information sharing is also kept among the scheduling nodes so as to achieve the purpose of load balancing among the nodes, Mezmaz, etc. takes minimum completion time and minimum energy consumption as scheduling targets for dependent tasks under the cloud environment, a hybrid scheduling algorithm based on genetic algorithm and energy consumption perception is provided, the purpose of energy consumption optimization is realized by adopting a voltage dynamic adjustment technology, scheduling is centered on service quality, in cloud computing, QoS requirements of users comprise completion response time of the tasks, scheduling budget, reliability and availability of a system, John, etc. provides a cloud task scheduling and resource distribution method based on particle swarm optimization, the method fully considers two QoS constraint conditions of the deadline and the scheduling budget of the scheduling task, has good universality and scalability, provides a task scheduling model based on QoS grade division aiming at different QoS constraint requirements, Li and the like, and for application tasks with lower QoS requirements, the strategy adopts an optimized Chord algorithm to schedule the task, ensures that any application task with QoS constraint requirement can obtain a satisfactory task completion time, the scheduling method based on the economy as a principle and a model, the super-large scale of cloud computing and a commercial operation mode thereof enable economic factors to become scheduling indexes of key consideration, and Wei and the like provide various configuration combinations of virtual machines with different prices for users aiming at cloud service providers, and provide a big data processing workflow scheduling algorithm which minimizes operation cost and meets the requirement of completion deadline.
Aiming at the scheduling of big data processing tasks, scholars at home and abroad also conduct extensive research, and in a cloud computing environment, Gaith and the like provide a big data task scheduling method for virtual machine trust perception, which specifically comprises three stages: the method comprises the steps of evaluating the trust level of a virtual machine, determining the priority of a task, scheduling the trust perception of the virtual machine, providing an efficient operation scheduling method for energy consumption perception large data application by Yanling and the like, modeling the fair scheduling problem of energy consumption perception as a multidimensional knapsack problem, providing a large data transmission scheduling method for maximizing the large data computing throughput under a cloud environment by Ruitao and the like, minimizing the data retrieval time of an application program, providing a large data application scheduling method for maximizing the network throughput by dynamic load balancing of a cloud data center by Feilong and the like, considering the relevance among input data and the data locality, establishing the corresponding relation among data, tasks and nodes, minimizing the data migration cost as a performance improvement target, designing and optimizing a data placement strategy and a scheduling mechanism, and providing a relevance-driven large data processing task scheduling scheme.
The above documents study task scheduling problems in a cloud environment from different sides, and they mostly assume that task types are compute-intensive, computational performance of computational resources is fixed, matching problems of task types and resource types are not considered, and how to measure matching degrees of tasks and computations are not considered, compared with parallel task scheduling processing in traditional high-performance computing and grid computing, scheduling processing of big data parallel tasks in the cloud environment is more complex, traditional parallel tasks are mostly compute-intensive, and big data parallel tasks may be compute-intensive, data-intensive, and I/O-intensive; the performance of computing resources in a traditional computing environment is relatively fixed, and the physical resources of a virtual machine in a cloud computing environment are dynamically changed, namely the performance of the virtual machine is dynamically changed; cloud service providers are numerous, and the actual performance of the provided virtual machines and the performance of the virtual machines declared by the virtual machines are often in and out; the effect of different configured virtual machines on different types of task processing is greatly different.
Disclosure of Invention
In order to solve the technical problems, the invention provides a measurement method for evaluating service capacity requirements of tasks on resources and service capacity guarantees of virtual machines on the tasks, provides necessary reference for scheduling and execution of big data parallel tasks, and provides a virtual machine service capacity perception big data workflow scheduling method in a cloud environment of a service dynamic level scheduling algorithm, wherein the service capacity requirements of the tasks are matched with the service capacity guarantees of the virtual machines.
The invention discloses a big data workflow scheduling method for sensing the service capability of a virtual machine in a cloud environment, which comprises the following steps: modeling and assuming a big data workflow and a cloud system; secondly, calculating service capacity; thirdly, dynamic scheduling of big data workflow service;
the first step further comprises the following steps:
define 1 big data workflow: a big data workflow can be abstracted as a DAG graph, i.e. a 6-tuple DAG ═ V, R, Q, D, O, C, where the concrete meaning of each element in the tuple is as follows:
(1)V={v 1 ,v 2 ,...,v n denotes the set of sub-tasks of the big data workflow, n denotes the number of sub-tasks, v i Representing the ith subtask;
(2)R={r ij |v i ,v j e is the V }, E is the V multiplied by V, and the execution sequence precedence relationship and the data dependency relationship among the subtasks are represented;
(3)Q={q 1 ,q 2 ,...,q n is the set of computation quantities for the subtasks, q i e.Q denotes the subtask v i The calculated amount of (2);
(4)D={d 1 ,d 2 ,...,d n is the set of data processing capacities of the subtasks, d i e.D represents a subtask v i The data throughput of (a);
(5)O={o 1 ,o 2 ,...,o n denotes the set of I/O data throughput of the subtask, O i Representing subtasks v i The I/O data processing amount of the task is that the data to be communicated between the task and the subsequent task needs to be transmitted after I/O processing, such as coding, encryption and the like before data transmission;
(6)C={c ij |v i ,v j e.g., V }, e.g., V × V, is a set of traffic between subtasks, c ij Representing subtasks v i To subtask v j The amount of communication data of;
defining 2 the cloud platform: a real Cloud system can be abstractly described as a cluster system composed of different types of virtual machines virtualized by a plurality of physical servers through virtualization technologies, the virtual machines are connected through a network to form a graph Cloud computing system, which can be represented as a 6-tuple Cloud (VM, CS, DS, OS, E, B), wherein the concrete meaning of each tuple in the tuple is as follows:
(1)VM={vm 1 ,vm 2 ,...,vm m denotes the set of virtual machines, m is the total number of virtual machines, vm i Representing the ith virtual machine;
(2)CS={cs 1 ,cs 2 ,...,cs m denotes the set of virtual machine computing speeds, cs i Representing virtual machines vm i The calculation speed of (a) represents the calculation amount processed in a unit time;
(3)DS={ds 1 ,ds 2 ,...,ds m denotes the set of data processing speeds of the virtual machines, ds i Representing virtual machines vm i The data processing speed of (2), which represents the amount of data processed per unit time;
(4)OS={os 1 ,os 2 ,...,os m denotes the set of I/O processing speeds, os, of the virtual machine i Representing virtual machines vm i The I/O processing speed of (2) represents the I/O data processing amount processed in a unit time;
(5)E={e ij |vm i ,vm j the element belongs to VM } } represents a communication link set among the virtual machines;
(6)B={b ij |vm i ,vm j ∈VM,e ij e is the set of communication bandwidths of the edges in E; b ij E B is a communication link roadside e between the virtual machines ij =(vm j ,vm j ) E is the time for transmitting unit data; the service capacity of the virtual machine is evaluated through three evaluation indexes of the computing capacity, the data processing capacity and the I/O processing capacity of the virtual machine, a sample data matrix X of 3 evaluation indexes of the computing capacity, the data processing capacity and the I/O processing capacity of the virtual machine which obtains m service transactions of a cloud user is set to be { xij } mx 3, and due to the fact that the dimension, the order of magnitude and the orientation of the indexes are different greatly, initial data do not need to be subjected to evaluationPerforming dimensionalization treatment, namely performing normalization treatment on the data by adopting a min-max standardization mode, wherein the calculation formula is as follows:
Figure GDA0003791584800000041
wherein max { xik } and min { xik } respectively represent the maximum value and the minimum value in the index, and the values of all the indexes are converted into forward increment values in the range of [0, 1] through conversion, so that the larger each index value is desired to be, the better each index value is, and the normalized matrix after dimensionless processing is Y ═ yij } mx 3, the information entropy of each index is:
Figure GDA0003791584800000051
wherein j is 1, 2, 3, where the constant k is related to the number m of samples of the system, and for a system with completely disordered information, the degree of order is zero, and its entropy is maximum, e is 1, and when m samples are in a completely disordered distribution state, yij is 1/m, then:
Figure GDA0003791584800000052
thus, we obtain: k ═ 1 (lnm) -1;
since the information entropy ej can be used to measure the utility value of the information of the jth evaluation index, when the information entropy ej is completely unordered, ej is 1, at this time, the utility value of the information of ej to the comprehensive evaluation is zero, and therefore, the information utility value of a certain index depends on the difference hj between the information entropy ej of the index and 1: estimating the weight of each index by using an entropy method, wherein the weight is essentially calculated by using the utility value of index information, and the higher the utility value is, the greater the importance of the evaluation is, so that the weight of the j evaluation index is:
Figure GDA0003791584800000053
wherein the content of the first and second substances,wj is not less than 0 and not more than 1, and
Figure GDA0003791584800000054
in the second step, based on the weight of each evaluation index obtained by user evaluation and the normalization processing of the calculation speed, data processing speed and I/O speed of each virtual machine, the service guarantee capability of the virtual machine and vm of the virtual machine are calculated in a linear weighting mode i SA (vm) for service capability guarantee i ) The definition is as follows:
Figure GDA0003791584800000055
wherein the content of the first and second substances,
Figure GDA0003791584800000056
are respectively cs i 、ds i 、os i Normalized value of (w) 1 、w 2 、w 3 Weights of evaluation indexes of cs, ds and os calculated by formula (1) respectively;
in the second step, assuming that a big data parallel computing task includes n subtasks, the task computation amount, the data processing amount, and the I/O data processing amount of 3 dimensions of the n subtasks can be represented by An n row 3 column matrix An × 3 ═ aij) n × 3, z-score normalization processing is performed on each dimension, and the processed data will conform to the standard normal distribution by using the mean value and standard deviation of the original data, as shown in the following formula,
Figure GDA0003791584800000061
wherein the content of the first and second substances,
Figure GDA0003791584800000062
mu j is the mean value of the jth dimension service capability requirement of the matrix Anx 3, and sigma j is the standard deviation of the jth dimension service capability requirement of the matrix Anx 3, and the standard deviation is obtained after z-score standardization processingMatrix array
Figure GDA0003791584800000063
The mean value of each dimension is 0, the standard deviation is 1, and through conversion, each dimension required by task service capacity is converted into [0, 1]]The larger each dimension value is, the larger the demand for the corresponding type of service capability is, and the service capability demand sd (vi) of the task vi is defined as follows:
Figure GDA0003791584800000064
wherein, w 1 、w 2 、w 3 Weights of evaluation indexes of cs, ds and os calculated by formula (1) respectively;
in step three, in order to fully consider the service guarantee capability of the virtual machine, the definition of the service dynamic level is as follows:
Figure GDA0003791584800000065
wherein, SA (v) i ,vm j ) Represent a task v i Scheduling to virtual machines vm j Service guarantee capability of the virtual machine when the virtual machine is executed; SD (v) i ) Representing subtasks v i Requirement for service capability, max { SD (v) } k ) Represents the maximum value of the service capability requirements of all the subtasks of the parallel task, for the task-virtual machine pair (v) i ,vm j ) When SD (v) i ) When increasing, i.e. task v i When the service capacity requirement of the virtual machine is increased, the scheduling level is correspondingly reduced, SL (vi) is a task static level, the maximum value of the average computing time sum of all tasks from the task vi to all reachable paths of the parallel task exit task is represented, and the importance of a task node on the execution priority level is implied;
Figure GDA0003791584800000071
indicating that task vi reflects virtual machine and communication resources at the time virtual machine vmj begins executionAvailability, penalizing pairs of task nodes that incur large communication costs, wherein
Figure GDA0003791584800000072
Indicating the time when the required input data is available when task vi is scheduled onto virtual machine vmj,
Figure GDA0003791584800000073
represents the time at which virtual machine vmj is idle and thus available to perform task vi;
Figure GDA0003791584800000074
reflecting the computing performance of the virtual machines, increasing the priority for the virtual machines with higher processing speed and decreasing the priority for the virtual machines with lower processing speed, wherein
Figure GDA0003791584800000075
Representing the average of the time required for task vi to execute on all machines,
Figure GDA0003791584800000076
indicating the time required for task vi to execute on virtual machine vmj.
Drawings
FIG. 1 is an example diagram of a parallel task DAC graph;
FIG. 2 is a graphical block diagram of a cloud platform;
FIG. 3 is an average completion time for different numbers of subtasks;
FIG. 4 is an average completion time for different numbers of virtual machines;
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Example, as shown in fig. 1 to 4:
1. big data workflow and cloud system modeling and assumptions
1.1 DAG graph modeling of big data workflows
A big data workflow can be represented as a directed Acyclic graph dag (directed Acyclic graph), nodes represent tasks, edges between nodes represent data dependencies between tasks, and the specific definition is as follows:
define 1 big data workflow: a large data workflow can be abstractly represented as a DAG graph, i.e. a 6-tuple DAG ═ (V, R, Q, D, 0, C), where the concrete meaning of each element in the tuple is as follows:
(1)V={v 1 ,v 2 ,...,v n denotes the set of subtasks of a big data workflow (also referred to herein as parallel tasks), n denotes the number of subtasks, v i Representing the ith subtask.
(2)R={r ij |v i ,v j And E is V, and the execution sequence precedence relationship and the data dependency relationship among the subtasks are represented.
(3)Q={q 1 ,q 2 ,...,q n Is the set of computation quantities for the subtasks, q i e.Q denotes the subtask v i The amount of calculation of (a).
(4)D={d 1 ,d 2 ,...,d n Is the set of data processing capacities of the subtasks, d i e.D represents a subtask v i The data throughput of (2).
(5)O={o 1 ,o 2 ,...,o n Denotes the set of I/O data throughput of the subtask, O i Representing subtasks v i The I/O data processing amount of the task and the data to be communicated between the task and the subsequent task need to be transmitted after I/O processing, such as encoding, encryption and the like before data transmission.
(6)C={c ij |v i ,v j E.g., V }, e.g., V × V, is a set of traffic between subtasks, c ij Representing subtasks v i To subtask v j The amount of communication data.
FIG. 1 is a diagram of a parallel computing task comprising 8 subtasks, v within a circle i Number indicating task, q i Representing the amount of computation of the task, d i Representing a task v i Data throughput of o i Representing a task v i I/O throughput ofThe numbers next to the edges represent the traffic between the nodes, and by processing, it can be assumed that the parallel task DAG graph has only one ingress task and one egress task.
1.2 graphical modeling of cloud Environment
The server virtualization provides hardware resource abstraction capable of supporting the operation of the virtual server for the virtual server, the hardware resource abstraction comprises a virtual BIOS (basic input output System), a virtual processor, a virtual memory, virtual equipment and I/O (input/output), and meanwhile, good isolation and safety are provided for the virtual server.
Defining 2 the cloud platform: a real Cloud system can be abstractly described as a cluster system composed of different types of virtual machines virtualized by a plurality of physical servers through virtualization technologies, the virtual machines are connected through a network to form a graph Cloud computing system, which can be represented as a 6-tuple Cloud (VM, CS, DS, OS, E, B), wherein the concrete meaning of each tuple in the tuple is as follows:
(1)VM={vm 1 ,vm 2 ,...,vm m denotes the set of virtual machines, m is the total number of virtual machines, vm i Representing the ith virtual machine.
(2)CS={cs 1 ,cs 2 ,...,cs m Denotes the set of virtual machine computing speeds, cs i Representing virtual machines vm i The calculation speed of (2) represents the amount of calculation processed per unit time.
(3)DS={ds 1 ,ds 2 ,...,ds m Denotes the set of data processing speeds of the virtual machines, ds i Representing virtual machines vm i The data processing speed of (2) indicates the amount of data processed per unit time.
(4)OS={os 1 ,os 2 ,...,os m Denotes the set of I/O processing speeds, os, of the virtual machine i Representing virtual machines vm i At I/O ofThe processing speed indicates the processing amount of I/O data processed in a unit time.
(5)E={e ij |vm i ,vm j E.g. VM } } represents the set of communication links between virtual machines.
(6)B={b ij |vm i ,vm j ∈VM,e ij E is the set of communication bandwidths of the edges in E. b ij E B is a communication link roadside e between the virtual machines ij =(vm i ,vm j ) E the time it takes to transmit a unit of data.
FIG. 2 is a cloud platform graph topology containing 6 resource nodes, vm within a circle i Denotes the virtual machine number, cs i Representing virtual machines vm i Calculated speed of (ds) i Representing virtual machines vm i Data processing speed of (os) i Representing virtual machines vm i I/O processing speed, the numbers on the edges indicate the communication bandwidth of the link.
1.3 assumptions for big data workflow task scheduling problem
Parallel task scheduling under a graph cloud platform is a process of distributing each subtask in a parallel task DAG graph to resource nodes for parallel cooperative computing on the basis of fully considering the dependency relationship among the tasks, under the graph cloud platform, the tasks are assumed to be executed in a non-preemptive way and not to be migrated, the tasks are uniformly managed by a central scheduler, each subtask is distributed to a proper virtual machine according to a certain strategy, the scheduler and each virtual machine operate independently, communication is controlled by a communication subsystem to be executed, communication operation can be executed in a concurrent way, the condition of communication conflict is not considered temporarily, and if the dependency relationship r exists ij =(v i ,v j ) The two tasks are distributed to the same virtual machine to be executed, and the communication overhead between the two tasks is ignored; if the virtual machine vm is distributed to two different virtual machines s And vm d And the communication overhead between them is the sum of the communication time of the data on each link.
Hypothesis by ct ij Representing a task v i Facing task v through graph cloud platform j Communication completion time of transmission data, let t s (v k ,vm j ) Representing a task v k Virtual machine vm of place j Free and thus available to perform task v k Time of (T) comm (v k )=max{ct ik |v i ∈pred(v k ) Denotes the task v k Is the time of arrival of all the parent tasks' data, pred (v) k ) Representing a task v k Set of predecessor tasks of, t e (v k ,vm j ) Representing a task v k In virtual machine vm j At the completion time of o k Representing a task v k I/O data throughput of t e (v k ,vm j ) The calculation is as follows:
Figure GDA0003791584800000101
2. computation of service capabilities
A cloud computing environment composed of different cloud virtual machines is generally referred to as a heterogeneous cloud computing system, in which due to the existence of virtual machines with widely different performances, there are various allocation methods for scheduling parallel tasks to be executed on the virtual machines of the heterogeneous cloud computing system, different allocation methods can generate different computing effects, and it can be seen that whether the parallel tasks can be efficiently executed depends on not only the computing speed, the data processing speed, the communication data processing speed and the transmission speed of the communication link of the virtual machine, but also the matching degree of the parallel tasks and the heterogeneous cloud computing system, since a sub-task works well on one kind of virtual machine, not necessarily on another, but, on the contrary, the effect of the method can be poor, so that the service capability matching condition between the task and the virtual machine must be considered when the parallel task is researched and optimized to be executed on the heterogeneous cloud computing system.
2.1 concept of service capabilities
Define 3 service capabilities: service capability refers to the degree of capability of a service system to provide services, and is generally defined as the maximum throughput rate of the service system.
The definition of service capacity shows that the service capacity of a virtual machine is a numerical value between 0 and 1, the larger the numerical value is, the higher the service capacity of the virtual machine is, in order to further discuss the service capacity matching concept, the service capacity is further divided into a service capacity requirement and a service capacity guarantee, and the service capacity requirement is relative to a parallel computing task, namely the strong degree of the task on the service function requirement; service guarantees are relative to computing resources (herein computing resources are virtual machines), i.e., the extent to which the computing resources can provide service functionality.
2.2 service capability guarantee of cloud platform virtual machine
When the virtual machines are distributed to tasks with different service capacity requirements, in order to ensure that the tasks run according to an expected processing mode, the service guarantee capacity of the distributed virtual machines must be matched with the service capacity requirements of the tasks as much as possible, the satisfaction degree of cloud users on the execution of tasks submitted in the past is the most direct and obvious index for judging the matching degree of the service capacity of the virtual machines, whether the cloud users are satisfied with the services is obtained by integrating evaluation indexes of the services, in cloud computing, the performance difference of the virtual machines is large, such as the performance of different computing capacities, data processing capacities, I/O capacities, reliability capacities, available capacities and the like, and the evaluation of the service capacity of the virtual machines is considered through the three evaluation indexes of the computing capacity, the data processing capacity and the I/O processing capacity of the virtual machines.
When a cloud user determines the satisfaction degree of a service, the weights of all indexes are generally different, an objective determination method based on entropy weight is adopted to determine the weight of an evaluation index, in an information theory, entropy is regarded as the uncertainty degree of the state of an information source, and therefore the order degree of system information obtained by information entropy evaluation and the utility value of the information are natural.
Setting the sample data matrix X of the 3 evaluation indexes obtained by the cloud user for m service transactions as { X ═ X } ij } m×3 Because the dimension, order of magnitude and orientation of the indexes are different, the initial data needs to be dimensionless, and the user expects them for the specific values increasing in the forward direction, such as computing power, data processing power, I/O power, etcThe larger the data is, the better the data is, the min-max standardization mode is adopted to carry out normalization processing on the data, and the calculation formula is as follows:
Figure GDA0003791584800000111
where max { x ik And min { x } ik Denotes the maximum and minimum values in the index, respectively, and by the above conversion, the values of all indexes are converted to [0, 1]]The forward increment value in the range is preferably larger for each index value, and the normalized matrix after the dimensionless process is set to Y ═ Y ij } m×3 Then, the information entropy of each index is:
Figure GDA0003791584800000121
where j is 1, 2, 3, where the constant k is related to the number m of samples in the system, and for a system with completely disordered information, the degree of order is zero and its entropy is maximum, e is 1, and when m samples are in completely disordered distribution, y is ij 1/m, then:
Figure GDA0003791584800000122
thus, we obtain: k ═ lnm -1
Due to the information entropy e j The utility value of the information that can be used to measure the jth evaluation index, e, when completely out of order j When it is 1, e j The utility value of the information (i.e. the data of the j index) on the comprehensive evaluation is zero, so the information utility value of a certain index depends on the information entropy e of the index j Difference h from 1 j :h j =1-e j The weight of each index is estimated by using an entropy method, the essence of the weight is calculated by using the utility value of the index information, the higher the utility value is, the greater the importance of the evaluation is, and thus the weight of j evaluation index is:
Figure GDA0003791584800000123
based on the weight of each evaluation index obtained by user evaluation and the normalization processing of the calculation speed, data processing speed and I/O speed of each virtual machine, the service guarantee capability of the virtual machine and the vm of the virtual machine are calculated in a linear weighting mode i SA (vm) for service capability guarantee i ) The definition is as follows:
Figure GDA0003791584800000124
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003791584800000125
are respectively cs i 、ds i 、os i Normalized value of (w) 1 、w 2 、w 3 The weights of the evaluation indexes cs, ds, and os calculated by the formula (1) are respectively used.
2.3 service capability requirements for tasks in DAG graphs
The large data parallel computing task can be composed of a plurality of subtasks, each subtask has different functions, roles and task types in the whole task, each subtask has to treat the degree of the service capacity required by the subtasks in a different way, and the requirement of the large data parallel computing task subtasks on the service capacity of the virtual machine is quantized and subdivided into the linear weighted sum of task computing requirement, data processing requirement and I/O processing requirement.
In order to eliminate the difference of different dimensions and units and realize the comparability of different dimensions and units, data standardization processing needs to be carried out on each dimension, and assuming that a big data parallel computing task comprises n subtasks, the task computing amount, the data processing amount and the I/O data processing amount of 3 dimensions of the n subtasks can use a matrix A with n rows and 3 columns n×3 =(a ij ) n×3 For each dimension, goThe z-score normalization process is performed, and the processed data will conform to the standard normal distribution using the mean and standard deviation of the raw data, as shown in the following formula,
Figure GDA0003791584800000131
wherein the content of the first and second substances,
Figure GDA0003791584800000132
Figure GDA0003791584800000133
μ j is a matrix A n×3 Of the j-th dimension of service capability requirement, σ j Is a matrix A n×3 The standard deviation of the j-dimension service capability requirement is subjected to z-score standardization to obtain a matrix
Figure GDA0003791584800000134
The mean value of each dimension is 0, the standard deviation is 1, and through the conversion, each dimension required by the task service capacity is converted into [0, 1]]Forward increment value within range, so that the larger each dimension value, the greater the demand for service capability of corresponding type, task v i Service capability requirement SD (v) i ) The definition is as follows:
Figure GDA0003791584800000141
wherein, w 1 、w 2 、w 3 The weights of the evaluation indexes cs, ds, and os calculated by the formula (1) are respectively used.
3. Dynamic scheduling algorithm for big data workflow service
Based on the cloud computing model of virtual machine service guarantee and task service capacity requirements, a DLS scheduling algorithm is modified, the completion time of parallel tasks is reduced by fully utilizing the heterogeneity of virtual machines, and the service quality requirements of the sub tasks of the parallel tasks on the virtual machine service guarantee capacity are met, so that DLS parallel task scheduling based on Directed Acyclic Graph (DAG) becomes more reasonable.
The DLS algorithm is a compiling and heuristic table scheduling algorithm and is used for effectively distributing applications based on a DAG graph to a heterogeneous resource machine set so as to reduce the execution time 0 of the applications, and if a ready subtask v is obtained by calculation at each step of scheduling i And idle virtual machines vm j The matching dynamic level is highest, then the DLS algorithm will assign task v i Scheduling to virtual machines vm j On-run, a task virtual machine pair (v) i ,vm j ) Dynamic level of (d) DL (v) i ,vm j ) Is defined as
Figure GDA0003791584800000142
Wherein, SL (v) i ) For task static level, represent task v i The maximum value of the average calculation time sum of all tasks on all reachable paths of the parallel task exit task implies the importance of the task node on the execution priority level;
Figure GDA0003791584800000143
representing a task v i In virtual machine vm j The starting time of execution reflects the availability of virtual machine and communication resource, and penalizes the task node pair generating large communication cost
Figure GDA0003791584800000144
Indicating when task v i Scheduling to virtual machines vm j The time when the required input data is available,
Figure GDA0003791584800000145
representing virtual machines vm j Free and thus available to perform task v i The time of day;
Figure GDA0003791584800000146
reflecting the computing performance of the virtual machines, increasing the priority for the virtual machines with higher processing speed and reducing the priority for the virtual machines with lower processing speed, wherein
Figure GDA0003791584800000151
Representing a task v i The average of the required time is performed on all machines,
Figure GDA0003791584800000152
representing a task v i In virtual machine vm j The time required for execution.
When a Scheduling decision is made, the DLS algorithm considers the heterogeneity of machines, which can effectively adapt to the computing speed heterogeneity characteristics of resources in a cloud computing environment, but does not consider the comprehensive service guarantee capability of computing resources in a cloud computing system, and when a task is scheduled to a target virtual machine for execution, the service guarantee capability reflects the comprehensive capability of the target virtual machine for providing computing capability, data processing capability, and I/O processing capability, in order to fully consider the service guarantee capability of the virtual machine, a service Dynamic Level Scheduling algorithm sdls (Dynamic Level Scheduling algorithm) is proposed herein on the basis of the DLS algorithm, and the service Dynamic Level is defined as follows:
Figure GDA0003791584800000153
wherein, SA (v) i ,vm j ) Represent a task v i Scheduling to virtual machines vm j Service guarantee capability of the virtual machine during execution; SD (v) i ) Representing subtasks v i Requirement for service capability, max { SD (v) } k ) Represents the maximum value of the service capability requirements of all the subtasks of the parallel task, for the task-virtual machine pair (v) i ,vm j ) When SD (v) i ) When increasing, i.e. task v i When the service capacity requirement of the virtual machine is increased, the scheduling level is correspondingly reduced, so that the service dynamic level scheduling in the cloud computing environment provided by the inventionThe algorithm has strong flexibility, and the matched virtual machines are set according to different requirements of each subtask on the service capability, so that different requirements of each subtask on the service capability are met.
The pseudo code of the dynamic level scheduling algorithm SDLS for parallel task services is given below,
the algorithm is as follows: SDLS (parallel task service) dynamic scheduling algorithm
Inputting: big data workflow DAG ═ (V, R, Q, D, C), graph Cloud platform Cloud ═(VM, CS, DS, CF, E, B)
And (3) outputting: sub-task virtual machine allocation sequence Assign { (v) i ,vm j )}
Figure GDA0003791584800000154
Figure GDA0003791584800000161
Figure GDA0003791584800000171
4. Simulation experiment and result analysis
The effect of the service matching priority scheduling algorithm smFsa provided by the method is tested through a simulation experiment, cloud simulation software Cludsim 3.00 is adopted to carry out the simulation experiment, CludSim is a discrete event simulation toolkit based on Java and supports resource management and task scheduling simulation of cloud computing, and the simulation process is not influenced by the performance of a machine and is simulated through virtual time and tasks.
The main flow of the CloudSim simulation is as follows: initializing each discrete object according to set parameters → starting simulation → resource registration → agent broker inquiring resource to information center → parallel task subtask service capability demand calculation → virtual machine service guarantee capability calculation → distributing resources matched with serviceability to tasks according to set scheduling strategy → cloud resource execution task → task execution completion → returning final result → ending simulation, and writing a simulation program by adopting Java language, wherein the development environment is extensible integrated development platform Eclipse based on Java and open source codes.
To test the solution effect of the algorithm presented herein on the parallel task scheduling problem, the following two sets of experiments were considered for validation.
4.1 average completion time for different task counts
In order to evaluate the performance of the algorithm proposed herein, we compare the service dynamic level scheduling algorithm SDLS and the dynamic level scheduling algorithm DLS under the condition of different numbers of subtasks of parallel tasks, in this experiment: setting the number of computing resource nodes as 200, the number of links as 300, and randomly generating computing speed, data processing speed and I/O speed of computing resources among [20,60], [10,30] and [10,30], and randomly generating communication speed among computing resources among [10,30 ]; randomly generating a parallel task DAG graph with 30-130 subtasks, wherein the calculated amount, the data processing amount and the maximum communication amount between the subtasks and the directly subsequent tasks are randomly generated among [120,260], [40,180] and [50,100], respectively, executing a multi-time scheduling algorithm by the tasks of each scale in the experimental process, and taking the average value of the parallel task completion time, wherein the performance comparison of the average completion time under different subtasks is shown in FIG. 3.
As can be seen from fig. 3, with the increase of the number of the subtasks, the average scheduling lengths of the two algorithms are increased, and the scheduling length of the SDLS is smaller than the scheduling length of the DLS, because the SDLS takes into account not only the comprehensive capabilities of the computing capability, the data processing capability, and the I/O processing capability of the virtual machine when allocating the virtual machine to the task, but also the requirement of the subtask on the service capability and the service capability guarantee of the virtual machine, when the dynamic levels of the tasks on the two resources are equal, the virtual machine with a larger service guarantee capability has a larger service dynamic level, and the SDLS algorithm selects the resource with a larger service dynamic level when allocating the virtual machine to the subtask, which can increase the comprehensive processing speed of task execution, and thus the scheduling length of the SDLS is always smaller than the scheduling length of the DLS.
4.2 average completion time for different virtual machine counts
In this experiment: randomly generating a parallel task DAG graph with 400 subtasks, wherein the calculation amount, the data processing amount and the maximum communication amount between the subtasks and the directly subsequent tasks are randomly generated among [120,260], [40,180] and [50,100] respectively; the random generation has 200 to 500 virtual machines, the number of links is 500, the calculation speed, the data processing speed and the I/O speed of the calculation resources are respectively randomly generated among [20,60], [10,30] and [10,30], the communication speed among the virtual machines is randomly generated among [10,30], tasks of each scale in the experimental process execute a scheduling algorithm for multiple times, the average value of the parallel task completion time is taken, and fig. 4 shows the comparison of the average completion time under different virtual machine numbers.
As can be seen from fig. 4: with the increase of the number of virtual machines, it can be concluded that the average completion time is consistent with experiment 5.1, i.e. the scheduling length of SDLS is always smaller than that of DLS.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be also considered as the protection scope of the present invention.

Claims (1)

1. A big data workflow scheduling method for sensing virtual machine service capability in a cloud environment is characterized by comprising the following steps: modeling and assuming a big data workflow and a cloud system; secondly, calculating service capacity; thirdly, dynamic scheduling of big data workflow service;
the first step further comprises the following steps:
define 1 big data workflow: a large data workflow can be abstractly represented as a DAG graph, i.e. a 6-tuple DAG ═ (V, R, Q, D, O, C), where the concrete meaning of each element in the tuple is as follows:
(1)V={v 1 ,v 2 ,...,v n denotes the set of sub-tasks of the big data workflow, n denotes the number of sub-tasks, v i Representing the ith subtask;
(2)R={r ij |v i ,v j e is the V }, E is the V multiplied by V, and the execution sequence precedence relationship and the data dependency relationship among the subtasks are represented;
(3)Q={q 1 ,q 2 ,...,q n is the set of computation quantities for the subtasks, q i e.Q denotes the subtask v i The calculated amount of (2);
(4)D={d 1 ,d 2 ,...,d n is the set of data processing capacities of the subtasks, d i e.D represents a subtask v i The data throughput of (a);
(5)O={o 1 ,o 2 ,...,o n denotes the set of I/O data throughput of the subtask, O i Representing subtasks v i The data to be communicated between the task and the subsequent task can be transmitted after I/O processing, such as coding, encryption and the like before data transmission;
(6)C={c ij |v i ,v j e.g., V }, e.g., V × V, is a set of traffic between subtasks, c ij Representing subtasks v i To subtask v j The amount of communication data of;
defining 2 the cloud platform: a real Cloud system can be abstractly described as a cluster system composed of different types of virtual machines virtualized by a plurality of physical servers through virtualization technologies, the virtual machines are connected through a network to form a graph Cloud computing system, which can be represented as a 6-tuple Cloud (VM, CS, DS, OS, E, B), wherein the concrete meaning of each tuple in the tuple is as follows:
(1)VM={vm 1 ,vm 2 ,...,vm m denotes the set of virtual machines, m is the total number of virtual machines, vm i Representing the ith virtual machine;
(2)CS={cs 1 ,cs 2 ,...,cs m denotes the set of virtual machine computing speeds, cs i Representing virtual machines vm i The calculation speed of (a) represents the calculation amount processed in a unit time;
(3)DS={ds 1 ,ds 2 ,...,ds m denotes a set of data processing speeds of the virtual machines, ds i Representing virtual machines vm i The data processing speed of (2), which represents the amount of data processed per unit time;
(4)OS={os 1 ,os 2 ,...,os m denotes the set of I/O processing speeds, os, of the virtual machine i Representing virtual machines vm i The I/O processing speed of (2) represents the I/O data processing amount processed in a unit time;
(5)E={e ij |vm i ,vm j the element belongs to VM } } represents a communication link set among the virtual machines;
(6)B={b ij |vm i ,vm j ∈VM,e ij e is the set of communication bandwidths of the edges in E; b ij E B is a communication link roadside e between the virtual machines ij =(vm i ,vm j ) E is the time for transmitting unit data; the service capacity of the virtual machine is evaluated through three evaluation indexes of the computing capacity, the data processing capacity and the I/O processing capacity of the virtual machine, a sample data matrix X of 3 evaluation indexes of the computing capacity, the data processing capacity and the I/O processing capacity of the virtual machine, which is obtained m times of service transactions of a cloud user, is set to be { xij } mx 3, and due to the fact that the dimension, the order of magnitude and the orientation of the indexes are different greatly, non-dimensionalization processing needs to be carried out on initial data, normalization processing is carried out on the data in a min-max standardization mode, and the computing formula is as follows:
Figure FDA0003791584790000021
wherein max { xik } and min { xik } respectively represent the maximum value and the minimum value in the index, and the values of all the indexes are converted into forward increment values in the range of [0, 1] through conversion, so that the larger each index value is desired to be, the better each index value is, and the normalized matrix after dimensionless processing is Y ═ yij } mx 3, the information entropy of each index is:
Figure FDA0003791584790000022
wherein j is 1, 2, 3, where the constant k is related to the number m of samples of the system, and for a system with completely disordered information, the degree of order is zero, and its entropy is maximum, e is 1, and when m samples are in a completely disordered distribution state, yij is 1/m, then:
Figure FDA0003791584790000031
thus, we obtain: k ═ 1 (lnm) -1;
since the information entropy ej can be used to measure the utility value of the information of the jth evaluation index, when the information entropy ej is completely unordered, ej is 1, at this time, the utility value of the information of ej to the comprehensive evaluation is zero, and therefore, the information utility value of a certain index depends on the difference hj between the information entropy ej of the index and 1: estimating the weight of each index by using an entropy method, wherein the weight is essentially calculated by using the utility value of index information, and the higher the utility value is, the greater the importance of the evaluation is, so that the weight of the j evaluation index is:
Figure FDA0003791584790000032
wherein wj is more than or equal to 0 and less than or equal to 1, and
Figure FDA0003791584790000033
in the second step, based on the weight of each evaluation index obtained by user evaluation and the normalization processing of the calculation speed, data processing speed and I/O speed of each virtual machine, the service guarantee capability of the virtual machine and vm of the virtual machine are calculated in a linear weighting mode i SA (vm) for service capability guarantee i ) The definition is as follows:
Figure FDA0003791584790000034
wherein the content of the first and second substances,
Figure FDA0003791584790000035
are respectively cs i 、ds i 、os i Normalized value of (w) 1 、w 2 、w 3 Weights of evaluation indexes of cs, ds and os calculated by formula (1) respectively;
in the second step, assuming that a big data parallel computing task includes n subtasks, the task computation amount, the data processing amount, and the I/O data processing amount of 3 dimensions of the n subtasks can be represented by An n row 3 column matrix An × 3 ═ aij) n × 3, z-score normalization processing is performed on each dimension, and the processed data will conform to the standard normal distribution by using the mean value and standard deviation of the original data, as shown in the following formula,
Figure FDA0003791584790000036
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003791584790000041
mu j is the mean value of the jth dimension service capability requirement of the matrix Anx 3, sigma j is the standard deviation of the jth dimension service capability requirement of the matrix Anx 3, and the matrix is obtained after z-score standardization processing
Figure FDA0003791584790000042
The mean value of each dimension is 0, the standard deviation is 1, and each dimension required by the task service capacity is converted into [0, 1] through conversion]The larger each dimension value is, the larger the demand for the corresponding type of service capability is, and the service capability demand sd (vi) of the task vi is defined as follows:
Figure FDA0003791584790000043
wherein, w 1 、w 2 、w 3 Cs, ds, and,A weight of an evaluation index of os;
in step three, in order to fully consider the service guarantee capability of the virtual machine, the definition of the service dynamic level is as follows:
Figure FDA0003791584790000044
wherein, SA (v) i ,vm j ) Represent a task v i Scheduling to virtual machines vm j Service guarantee capability of the virtual machine during execution; SD (v) i ) Representing subtasks v i Requirement for service capability, max { SD (v) } k ) Represents the maximum value of the service capability requirements of all the subtasks of the parallel task, for the task-virtual machine pair (v) i ,vm j ) When SD (v) i ) When increasing, i.e. task v i When the service capacity requirement of the virtual machine is increased, the scheduling level is correspondingly reduced, SL (vi) is a task static level, the maximum value of the average computing time sum of all tasks from the task vi to all reachable paths of the parallel task exit task is represented, and the importance of a task node on the execution priority level is implied;
Figure FDA0003791584790000045
indicating the time when task vi begins execution at virtual machine vmj, a task node pair that penalizes the large communication cost, reflecting the availability of virtual machines and communication resources, where
Figure FDA0003791584790000051
Indicating the time when the required input data is available when task vi is scheduled onto virtual machine vmj,
Figure FDA0003791584790000052
represents the time at which virtual machine vmj is idle and thus available to perform task vi;
Figure FDA0003791584790000053
reflecting the computing performance of the virtual machine, i.e. the processing speedThe faster virtual machine increases priority and decreases priority for the slower processing virtual machine, wherein
Figure FDA0003791584790000054
Representing the average of the time required for task vi to execute on all machines,
Figure FDA0003791584790000055
indicating the time required for task vi to execute on virtual machine vmj.
CN202011031844.4A 2020-09-27 2020-09-27 Big data workflow scheduling method for sensing service capability of virtual machine in cloud environment Active CN112181620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011031844.4A CN112181620B (en) 2020-09-27 2020-09-27 Big data workflow scheduling method for sensing service capability of virtual machine in cloud environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011031844.4A CN112181620B (en) 2020-09-27 2020-09-27 Big data workflow scheduling method for sensing service capability of virtual machine in cloud environment

Publications (2)

Publication Number Publication Date
CN112181620A CN112181620A (en) 2021-01-05
CN112181620B true CN112181620B (en) 2022-09-20

Family

ID=73944102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011031844.4A Active CN112181620B (en) 2020-09-27 2020-09-27 Big data workflow scheduling method for sensing service capability of virtual machine in cloud environment

Country Status (1)

Country Link
CN (1) CN112181620B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113568747B (en) * 2021-07-27 2024-04-12 上海交通大学 Cloud robot resource scheduling method and system based on task classification and time sequence prediction
CN113821308B (en) * 2021-09-29 2023-11-24 上海阵量智能科技有限公司 System on chip, virtual machine task processing method and device and storage medium
CN114077498B (en) * 2021-11-20 2023-03-28 郑州轻工业大学 Method and system for selecting and transferring calculation load facing to mobile edge calculation
CN114168353A (en) * 2022-01-13 2022-03-11 中国联合网络通信集团有限公司 Task joint execution method and system based on end edge resource scheduling
CN117527881A (en) * 2023-11-20 2024-02-06 广东省电子商务认证有限公司 Dynamic cipher machine dispatching system and dispatching method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970609A (en) * 2014-04-24 2014-08-06 南京信息工程大学 Cloud data center task scheduling method based on improved ant colony algorithm
CN106056294A (en) * 2016-06-06 2016-10-26 四川大学 Hybrid cloud scientific workflow scheduling strategy based on task probability clustering and multi-constraint workflow division
CN106951330A (en) * 2017-04-10 2017-07-14 郑州轻工业学院 A kind of maximized virtual machine distribution method of cloud service center service utility
CN107038070A (en) * 2017-04-10 2017-08-11 郑州轻工业学院 The Parallel Task Scheduling method that reliability is perceived is performed under a kind of cloud environment
CN110232085A (en) * 2019-04-30 2019-09-13 中国科学院计算机网络信息中心 A kind of method of combination and system of big data ETL task
CN111522637A (en) * 2020-04-14 2020-08-11 重庆邮电大学 Storm task scheduling method based on cost benefit

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3226133A1 (en) * 2016-03-31 2017-10-04 Huawei Technologies Co., Ltd. Task scheduling and resource provisioning system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970609A (en) * 2014-04-24 2014-08-06 南京信息工程大学 Cloud data center task scheduling method based on improved ant colony algorithm
CN106056294A (en) * 2016-06-06 2016-10-26 四川大学 Hybrid cloud scientific workflow scheduling strategy based on task probability clustering and multi-constraint workflow division
CN106951330A (en) * 2017-04-10 2017-07-14 郑州轻工业学院 A kind of maximized virtual machine distribution method of cloud service center service utility
CN107038070A (en) * 2017-04-10 2017-08-11 郑州轻工业学院 The Parallel Task Scheduling method that reliability is perceived is performed under a kind of cloud environment
CN110232085A (en) * 2019-04-30 2019-09-13 中国科学院计算机网络信息中心 A kind of method of combination and system of big data ETL task
CN111522637A (en) * 2020-04-14 2020-08-11 重庆邮电大学 Storm task scheduling method based on cost benefit

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"A dynamic resource allocation and task scheduling strategy with uncertain task runtime on IaaS clouds";Shaowei Liu等;《2016 Sixth International Conference on Information Science and Technology (ICIST)》;20160602;第174-180页 *
"云环境下服务信任感知的可信动态级调度方法";曹洁等;《通信学报》;20141130;第35卷(第11期);第39-49页 *
异构云系统中预算成本约束下高效的工作流调度算法;张龙信等;《小型微型计算机系统》;20200529(第06期);全文 *

Also Published As

Publication number Publication date
CN112181620A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN112181620B (en) Big data workflow scheduling method for sensing service capability of virtual machine in cloud environment
Zhou et al. Cost and makespan-aware workflow scheduling in hybrid clouds
Selvarani et al. Improved cost-based algorithm for task scheduling in cloud computing
Polo et al. Performance management of accelerated mapreduce workloads in heterogeneous clusters
Chen et al. A multi-objective optimization for resource allocation of emergent demands in cloud computing
Konjaang et al. Multi-objective workflow optimization strategy (MOWOS) for cloud computing
Li et al. An effective scheduling strategy based on hypergraph partition in geographically distributed datacenters
Thaman et al. Green cloud environment by using robust planning algorithm
Lagwal et al. Load balancing in cloud computing using genetic algorithm
Javadpour et al. An intelligent energy-efficient approach for managing IoE tasks in cloud platforms
Emara et al. Genetic-Based Multi-objective Task Scheduling Algorithm in Cloud Computing Environment.
CN117032902A (en) Cloud task scheduling method for improving discrete particle swarm algorithm based on load
Gopu et al. Energy-efficient virtual machine placement in distributed cloud using NSGA-III algorithm
Ben Hafaiedh et al. A model-based approach for formal verification and performance analysis of dynamic load-balancing protocols in cloud environment
Chhabra et al. Qualitative parametric comparison of load balancing algorithms in parallel and distributed computing environment
Franklin et al. A general matrix iterative model for dynamic load balancing
Gąsior et al. A Sandpile cellular automata-based scheduler and load balancer
Yassir et al. Graph-based model and algorithm for minimising big data movement in a cloud environment
Alatawi et al. Hybrid load balancing approach based on the integration of QoS and power consumption in cloud computing
Moussa et al. Comprehensive study on machine learning-based container scheduling in cloud
Chen et al. A two-level virtual machine self-reconfiguration mechanism for the cloud computing platforms
Filippini et al. Hierarchical Scheduling in on-demand GPU-as-a-Service Systems
Chhabra et al. Qualitative Parametric Comparison of Load Balancing Algorithms in Distributed Computing Environment
JP2022531353A (en) Equipment and methods for dynamically optimizing parallel computing
Manekar et al. Optimizing cost and maximizing profit for multi-cloud-based big data computing by deadline-aware optimize resource allocation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant