CN109684070B

CN109684070B - Scheduling method in cloud computing parallel operation

Info

Publication number: CN109684070B
Application number: CN201810997296.7A
Authority: CN
Inventors: 马建峰; 张兆一; 李辉; 张世哲; 李金库; 姚青松; 宁建斌
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2022-12-13
Anticipated expiration: 2038-08-29
Also published as: CN109684070A

Abstract

The invention discloses a scheduling method in parallel operation of cloud computing, which is characterized in that 1,2,3, 8230n virtual machines are started simultaneously after the configuration of the virtual machines is determined, and the average time t of starting each virtual machine in each test is counted ₁ ,t ₂ ,t ₃ ,…,t _n Obtaining the maximum likelihood value of a and b according to a formula ti = ai + b; then calculating the task time tau, and starting m virtual machines to execute the task in parallel to form an optimal scheme; and secondly, a task window approximately and evenly divides the tasks into n parts (n > m) to form a task set, the task set is placed in a task pool, a task scheduler sequentially creates m virtual machines, simultaneously starts a parallel processing task process, allocates the tasks to the virtual machines when a new virtual machine is started or the tasks on the virtual machines are completed, and finally all the virtual machines complete all the tasks and return task results. The optimal scheduling strategy is designed according to the characteristics of the starting time of the virtual machines, can realize the approximately optimal speed-up ratio, ensures that each virtual machine task is almost completed at the same time, greatly reduces the task execution time, improves the task efficiency and reduces the waste of system resources.

Description

Scheduling method in cloud computing parallel operation

Technical Field

The invention belongs to the technical field of computers, and further relates to a scheduling method in cloud computing parallel operation in the technical field of cloud computing.

Background

Due to the advantages of convenient storage service, flexible charging mode and the like, more and more enterprises and individuals store the local data on the cloud server to reduce the storage burden and management overhead of the local data. Considerable awareness has been gained in recent years in a considerable number of businesses and individuals. Users do not need to reserve their own hardware and software resources, and they can send requests to the cloud at any time to compute jobs. When a large-scale computing task or a transmission task is executed, a Virtual Machine (VM) on the cloud platform can be started to execute parallel computing or parallel transmission. For example, to protect the privacy of data, we encrypt a large amount of plaintext into ciphertext and transmit a large amount of ciphertext from the server to the client, which takes a lot of time to complete. Generally, we can break a task into multiple subtasks and start a large number of virtual machines to execute the subtasks to shorten the completion time.

However, there are many problems to be solved. Since the start of a virtual machine also requires time, the more virtual machines are not, the shorter the total completion time of the task. How many virtual machines should be started, how should the virtual machines be assigned tasks? Therefore, an optimal scheduling scheme is determined, so that the cloud platform can map the tasks to the virtual machines in an optimal mode, and the total task completion time is further shortened to be a hot spot problem in the current cloud computing field.

A parallel computing method and a system (application number: 201310591160.3, publication number: 103617086B) applied by Donghou group sharps corporation disclose a parallel computing method, which comprises the following steps: monitoring each computing node to obtain node monitoring data; calculating a load capacity of the compute node based on the node monitoring data; and distributing the tasks to be distributed according to the load capacity of the computing nodes. The method has the following disadvantages: scheduling is only performed on platform resources, and a task scheduling method is not optimized.

The patent applied to Shandong university based on a dynamic load balancing method of a Linux parallel computing platform (application number: 201310341592.9, publication number: 103399800B) discloses a dynamic load balancing method of a Linux-based parallel computing platform, and the method content is as follows: in the parallel computing process, the total computing task is divided into a plurality of stages with equal execution time to be executed. By using a routine work scheduling technology in the system, before the parallel computation of each time stage is started, the current resource utilization rate of each node is read, and the computation tasks of the nodes are dynamically allocated according to the computation performance and the computation complexity of each node, so that the computation time of each node in each stage is basically equal, and the delay of system synchronization waiting is reduced. By the dynamic adjustment strategy, the total calculation task can be completed with higher resource utilization rate. The method has the following defects: the method is limited to the Linux platform, and the influence of the system starting time on the execution efficiency is not considered.

The essence of the main strategies widely studied and practiced at present, such as dynamically adjusting the CPU frequency, shutting down or putting idle machines into a sleep state, migration and merging of virtual machines, etc., is to maximize the energy utilization rate by dynamically scheduling and integrating data center resources, however, there is no optimal acceleration strategy that minimizes the total time of tasks when the task volume and system resources are fixed.

Therefore, when system resources are fixed, how to shorten the total task completion time as much as possible when tasks are executed in parallel, and seek for an appropriate number of virtual machines, so that maximizing the optimal speed-up ratio becomes an urgent problem to be solved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a scheduling method in parallel operation of cloud computing, which is characterized in that before a task is executed, the number of virtual machines capable of realizing optimal acceleration is determined through system resources and task amount, then the task is divided, and the virtual machines are sequentially started to execute the task.

The invention is realized by the following technical scheme:

a scheduling method in parallel operation of cloud computing comprises the following steps:

s1, calculating cloud platform parameters:

after the configuration of the virtual machines is determined, 1,2,3, \ 8230, n virtual machines are started simultaneously, and the average time t used for starting each virtual machine in each test is counted ₁ ,t ₂ ,t ₃ ,…,t _n Substituting the two sets of data into t _i In the = ai + b, the likelihood values of a and b are calculated and recorded;

wherein, t _i When i virtual machines are started simultaneously, the average time for starting each virtual machine, and a and b system characteristic parameters;

s2, determining an optimal acceleration strategy:

executing tasks in one virtual machine, and starting m virtual machines to execute the tasks in parallel to obtain an optimal scheme when the task time tau is calculated;

s3, splitting a task:

when the task reaches the task window, the task window approximately and averagely divides the task into n parts (n > m) to form a task set, and the task set is placed in a task pool;

s4, creating a virtual machine:

the task scheduler sequentially creates m virtual machines and simultaneously starts a parallel processing task process;

s5, task allocation:

when a new virtual machine is started or a task on the virtual machine is completed, a task scheduler allocates a task to the virtual machine;

s6, completing a task:

and completing all tasks and returning task results.

Preferably, in step S5, the specific method for the task scheduler to allocate the tasks is that the task scheduler takes out 1 task from the task pool, maps the task onto the virtual machine, starts to execute the task, checks the task pool after the task on the virtual machine is executed, continues to allocate the task to the virtual machine if the task pool is not empty, and returns the virtual machine to the initial state if the task pool is empty.

Compared with the prior art, the invention has the following beneficial technical effects:

1) The optimal scheduling strategy is designed according to the characteristics of the starting time of the virtual machines, can realize the approximately optimal speed-up ratio, ensures that each virtual machine task is almost completed at the same time, greatly reduces the task execution time, improves the task efficiency and reduces the waste of system resources.

2) The optimal acceleration strategy in the invention is realized by controlling the number of virtual machines and planning tasks. Once the calculation task is given, the near optimal acceleration ratio system can calculate the optimal number of virtual machines and automatically plan the task. The application range is wider, and the method can be used together with the existing dynamic resource scheduling scheme, so that the system efficiency is greatly improved.

Drawings

FIG. 1 is a block flow diagram of a scheduling method;

FIG. 2 is a graph of total time for startup of multiple virtual machines;

FIG. 3 is a schematic diagram of the task time τ determination process;

FIG. 4 is a task allocation diagram.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

As shown in fig. 1, which is a schematic diagram of an optimal solution step in parallel operation of cloud computing according to the present invention, a specific flow of the system is described as follows:

the method comprises the following steps: calculating cloud platform parameters: after the configuration of the virtual machines is determined, 1,2,3, \ 8230, n virtual machines are started simultaneously, and the average time t used for starting each virtual machine in each test is counted ₁ ,t ₂ ,t ₃ ,…,t _n Substituting the two sets of data into t _i In = ai + b, (i =1,2, \8230;, n), the likelihood values of a and b are calculated and recorded;

step two: determining an optimal acceleration strategy: executing task in a virtual machine, and starting when computing task time tau

The parallel execution tasks of the virtual machines are the optimal scheme;

step three: splitting tasks: when the task reaches the task window, the task window approximately and averagely divides the task into n parts (n > m) to form a task set, and the task set is placed in a task pool;

step four: creating a virtual machine: the task scheduler sequentially creates m virtual machines and simultaneously starts a parallel processing task process;

step five: and (3) task allocation: when a new virtual machine is started or a task on the virtual machine is completed, a task scheduler allocates a task to the virtual machine;

step six: and (3) completing the task: and completing all tasks and returning task results.

In the first step, when a plurality of virtual machines are started simultaneously, the calculation method of the average starting time of each virtual machine T = AN + B (N is the number of virtual machines, and T is the average starting time), and the calculation formula is obtained through a large number of experimental tests. The experimental contents are as follows: starting m virtual machines on the OpenStack platform simultaneously, and recording the starting time T of the first virtual machine _min Total time to start T _max And calculating the average time T for starting each virtual machine _avg As shown in fig. 2, it can be found that the total starting time of multiple virtual machines approximately conforms to the curve of T = AN + B.

In step two, a schematic diagram of the process of determining the task time τ is shown in fig. 3, when the started virtual machine continuously fetches the task from the task pool and executes the task, it can be considered that all the virtual machines finish the task approximately at the same time, so the total task completion time is equal to the task completion time of each virtual machine, and is also equal to the sum of the average start time of each virtual machine and the average task execution time, i.e. T _all ＝T _avg + τ/m = am + b + τ/m, and it is observed that when m is taken as

The total time takes the minimum value.

In step five, fig. 4 is a schematic diagram of task allocation, and specifically includes the following steps:

the first step is as follows: when the virtual machine is started, taking 1 part of task out of the task pool, mapping the task to the virtual machine, and starting to execute the task;

the second step: and after the task on the virtual machine is executed, checking the task pool, if the task pool is not empty, continuing the first step, and if the task pool is empty, restoring the virtual machine to an initial state.

The above description is only one specific example of the present invention and should not be construed as limiting the invention in any way. It will be apparent to those skilled in the art that modifications and improvements to the algorithm can be made without departing from the principles and structures of the invention, after understanding the present disclosure and principles, but such modifications and improvements are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A scheduling method in parallel operation of cloud computing is characterized by comprising the following steps:

s1, calculating cloud platform parameters:

after the configuration of the virtual machines is determined, 1,2,3, \ 8230, n virtual machines are started simultaneously, and average time t1, t2 and t for starting each virtual machine in each test is counted ₃ ,…,t _n Respectively substituting n data obtained by statistics into t _i In = ai + b, the maximum likelihood value of a and b is calculated;

wherein, t _i When i virtual machines are started simultaneously, the average time for starting each virtual machine is used, and a and b are system characteristic parameters;

s2, determining an optimal acceleration strategy:

when the started virtual machines continuously take tasks from the task pool to execute, all the virtual machines finish the tasks approximately at the same time, so that the total task completion time is equal to the task completion time of each virtual machine and also equal to the sum of the average starting time of each virtual machine and the average task execution time, and the formula is as follows:

T _all ＝T _avg +τ/m＝am+b+τ/m

when m is taken

When the time is short, the total time is the minimum value;

wherein, T _avg For starting the average time of use of each virtual machine, T _all Total task completion time;

s3, splitting tasks:

s4, creating a virtual machine:

s5, task allocation:

s6, completing a task:

and all the virtual machines complete all the tasks and return task results.

2. The scheduling method in the parallel operation of the cloud computing according to claim 1, wherein the task scheduler allocates the tasks in step S5 by taking 1 task out of the task pool by the task scheduler, mapping the task onto the virtual machine, starting the virtual machine to execute the task, checking the task pool after the task on the virtual machine is executed, if the task pool is not empty, continuing to allocate the task to the virtual machine, and if the task pool is empty, restoring the virtual machine to the initial state.