CN116567084A

CN116567084A - Heterogeneous hybrid cloud scheduling method based on task classification and privacy protection

Info

Publication number: CN116567084A
Application number: CN202310438165.6A
Authority: CN
Inventors: 周娜琴; 陈晓博; 田志宏; 殷丽华; 李添喆; 邱路鹏
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-08-08

Abstract

The invention relates to the field of cloud computing, and discloses a heterogeneous hybrid cloud scheduling method based on task classification and privacy protection, which comprises the following steps: s1, task classification; s2, classifying private cloud resources; s3, task sequencing; s4, task sub-expected allocation; s5, selecting a private cloud processor; s6, public cloud reassignment. Aiming at the mixed cloud scheduling formed by a plurality of private cloud clusters and a plurality of public cloud platforms, the task and the private cloud resources are classified, the task is distributed to the processor type suitable for the task under the premise of considering protecting the privacy security of data, the resources are fully and reasonably utilized, the public cloud platform is selected by combining the price and the geographical position distance, and finally, the mixed cloud scheduling scheme with minimized cost is obtained under the premise of meeting the appointed expiration date of the user.

Description

Heterogeneous hybrid cloud scheduling method based on task classification and privacy protection

Technical Field

The invention relates to the field of cloud computing, in particular to a heterogeneous hybrid cloud scheduling method based on task classification and privacy protection.

Background

In recent years, cloud computing is becoming popular in the business industry and academia as a computing mode and business service, and is characterized by more resources, high flexibility and pay-per-demand. Cloud computing achieves efficient large-scale resource utilization with minimal platform management and maintenance costs over the internet by pooling a large number of computing resources together into a shared resource pool.

With the advent of the data age in recent years, a large amount of computing resources are required to process large amounts of data. More and more enterprises are beginning to build their own private clouds to provide high quality security and management control for internal users. However, enterprises often need flexible computing resources, if the enterprises build a private cloud cluster which is large enough to meet the high peak demand in holidays, the enterprises are at huge cost, and when the demand for resources is low, the resource utilization rate of the private cloud is extremely low, so that a large amount of resources can be wasted. Therefore, when the processing capacity of private cloud resources of an enterprise is limited, reasonably selecting a framework of combining public cloud and private cloud to process a large amount of data is a relatively high-price choice, and a mode of combining public cloud and private cloud is called hybrid cloud.

Aiming at the workflow scheduling problem in the hybrid cloud, the prior art has the following problems:

1. hybrid cloud scheduling for multiple private cloud clusters and multiple public cloud platforms is one problem that currently needs to be solved.

2. When the facing object is a plurality of public cloud platforms, the price alone cannot distinguish and select between different public cloud platforms.

3. Many workflow scheduling algorithms using table scheduling heuristics often order tasks directly, without classifying the tasks in advance, and cannot allocate the tasks to the type of processor suitable for themselves, so that resources are fully and reasonably utilized.

4. The reason why the private cloud is built by the enterprise is to protect the safety and privacy of the data inside the enterprise, so when the workflow submitted by the enterprise is scheduled between the public cloud and the private cloud, the protection of the privacy safety of the data is the primary task, and in the research of the hybrid cloud scheduling, the existing hybrid cloud scheduling method is found to not pay attention to the privacy safety of the data.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a heterogeneous hybrid cloud scheduling method based on task classification and privacy protection, aiming at hybrid cloud scheduling consisting of a plurality of private cloud clusters and a plurality of public cloud platforms, the task and private cloud resources are classified, the task is distributed to a processor type suitable for the task under the premise of considering protecting the privacy safety of data, the resources are fully and reasonably utilized, the public cloud platform is selected by combining the price and the geographical position distance, and finally, a hybrid cloud scheduling scheme with minimized cost is obtained under the premise of meeting the appointed expiration date of a user.

The invention provides the following technical scheme: a heterogeneous hybrid cloud scheduling method based on task classification and privacy protection comprises the following steps:

s1, classifying tasks in a workflow into two major categories, namely computationally intensive and communication intensive, classifying the tasks in the workflow into private tasks and non-private tasks according to whether the current tasks relate to private data or not, and finally classifying the tasks in the workflow into four minor categories: (1) computationally intensive-private, (2) computationally intensive-non-private, (3) communications intensive-private, (4) communications intensive-non-private.

The communication intensive task is characterized in that the CPU consumption of the resource is low, most of the execution time of the operation is completed in the same input and output operation, and the communication intensive task needs to frequently access a shared memory, so that the memory contention can cause the reduction of the system performance, thereby influencing the execution time of the whole workflow; however, the computationally intensive task is characterized by a large amount of computation, requiring a large amount of CPU resources, and in a multi-core architecture, the computationally intensive task may obtain a high performance effect through multiple cores. Therefore, tasks in the workflow are divided into two major categories, namely communication intensive and computation intensive, so that the memory contention problem can be reduced to the greatest extent, the system performance and the resource utilization rate are improved, and different types of tasks are distributed into corresponding types of server resource pools, so that the response time is reduced.

S2, classifying the private cloud resources, and dividing the processor resources in the private cloud into main computing type resources and main communication type resources according to physical factors to form two types of resource pools.

S3, task sequencing, namely sequencing the computation intensive tasks and the communication intensive tasks respectively to form two sequencing lists, wherein the sequencing standard adopts an upward sequencing value.

The upward ranking value refers to the length of the critical task from the current task to the exit task, including the computational cost of the current task. The whole list is arranged according to the descending order of the upward sorting value, and the task with a large sorting value preferentially executes the scheduling operation.

S4, task sub-expectation distribution, wherein each task sub-expectation is distributed according to the ratio of the upward sequencing value of the task in the sum of all the upward sequencing values of the task.

The user submits the workflow and designates a deadline, and the deadline of the whole workflow is subdivided into deadlines of each task, namely sub-expectations.

S5, selecting the private cloud processors, calling the tasks in the two formed sequencing lists of the computation-intensive tasks and the communication-intensive tasks in sequence, and distributing different types of tasks to the corresponding resource pools, so that the private cloud processors which can meet the expectations of the task sub-can be screened out from the current resource pool.

S6, public cloud reassignment, if no private cloud processor meeting task sub-expectations exists, two different choices are made according to whether the current task is a private task, if the current task is a private task, the current task is stored in a private cloud resource pool, a private cloud processor with earliest completion time is selected from the private cloud resource pool, and the current task is deployed on the private cloud processor; if the current task is a non-private task, it is reassigned in the public cloud.

For example, the data of the current task is already transmitted to the current private cloud processor, but the private cloud processor is still processing other tasks, and then the execution ending time of the last task of the private cloud processor is the execution starting time of the current task.

That is, the final compute-intensive-private tasks and the communications-intensive-private tasks must select a processor among the corresponding private cloud resource pools, whereas the compute-intensive-non-private tasks and the communications-intensive-non-private tasks are reassigned to the public cloud without meeting the task sub-expectations.

Preferably, the physical factors include CPU speed, memory performance, and storage disk input-output performance.

Preferably, in selecting a processor for a task, the task sub-expectations are first met and then other factors are considered, including optimization costs.

Preferably, in S5, the private cloud processor is selected based on the execution end time of the last task of the current private cloud processor and the transmission end time of the data of the current task to the current private cloud processor.

Preferably, the insertion policy is prioritized when selecting the private cloud processor, and the current task is inserted in the earliest free time slot between two tasks that have been scheduled on the current private cloud processor.

The length of the free time slot, i.e. the difference between the execution start time and the execution end time of two tasks scheduled consecutively on the same private cloud processor. Furthermore, the scheduling on this free slot must respect a priority constraint relationship between tasks.

Preferably, selecting a private cloud processor with the lowest cost from the screened private cloud processors meeting the task sub-expectations as the final deployment of the task.

Preferably, in S6, different types of servers with different prices on different public cloud platforms are selected according to the distance between the current task and the geographic location of the server where its direct predecessor task is deployed.

Preferably, on the premise of meeting the task sub-expectation, the server with the lowest public cloud cost is selected, and if the server meeting the task sub-expectation does not exist, the server with the fastest completion is selected for the task to be allocated.

The invention has the following beneficial effects:

1. the invention particularly designs an algorithm aiming at the problem of mixed cloud scheduling consisting of a plurality of private cloud clusters and a plurality of public cloud platforms.

2. According to the method, the tasks are classified according to the computing and communication bias of each task, and meanwhile, the private cloud resources are classified by main computing and main communication. On the premise of ensuring the privacy safety of task data, different types of tasks are distributed to corresponding types of server resource pools, so that the system performance and the resource utilization rate are improved, and the response time is reduced.

3. According to the method, processors with different types and different prices on different public cloud platforms can be selected according to the distance between the current task and the geographic position of the server where the direct predecessor task is deployed.

Drawings

FIG. 1 is a flow chart of a heterogeneous hybrid cloud scheduling method based on task classification and privacy protection;

FIG. 2 is a task classification model diagram of one embodiment;

FIG. 3 is a private cloud resource classification model diagram of one embodiment;

fig. 4 is a hybrid cloud scheduling flow diagram of one embodiment.

Detailed Description

The following detailed description of the invention is further detailed in connection with the following examples, which are provided to illustrate the invention and not to limit the scope of the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention, and all such equivalent technical solutions are therefore within the scope of the present invention.

Referring to fig. 1, a heterogeneous hybrid cloud scheduling method based on task classification and privacy protection includes:

S6, public cloud reassignment, if no private cloud processor meeting task sub-expectations exists, two different choices are made according to whether the current task is a private task, if the current task is a private task, the current task is stored in a private cloud resource pool, a private cloud processor with earliest completion time is selected from the private cloud resource pool, and the current task is deployed on the processor; if the current task is a non-private task, it will be reassigned in the public cloud.

In one embodiment, as shown in fig. 2, the tasks in the workflow are classified into two major classes, namely, communication intensive and computation intensive, and then the tasks in the workflow are classified into a private task and a non-private task according to whether the current task can relate to private data, and finally the tasks in the workflow are classified into four minor classes: (1) computationally intensive-private, (2) computationally intensive-non-private, (3) communications intensive-private, (4) communications intensive-non-private.

In one embodiment, as shown in fig. 3, the processor resources in the private cloud are divided into main computing type resources and main communication type resources according to the CPU speed, the memory performance and the storage disk input/output performance, so as to form two types of resource pools.

In one embodiment, as shown in FIG. 4, tasks are invoked in sequence in two ordered lists of formed computationally intensive tasks and communications intensive tasks, respectively, assigning different types of tasks to corresponding resource pools. At this stage, the private cloud processor which can meet the expectations of the task sub-is screened out from the current resource pool, and the private cloud processor is selected based on the execution ending time of the last task of the current processor and the transmission ending time of the data of the current task to the current private cloud processor. This stage would prioritize the insertion policy, i.e., inserting the current task among the earliest free time slot between two tasks that have been scheduled on the current private cloud processor. The length of the free time slot, i.e. the difference between the execution start time and the execution end time of two tasks scheduled consecutively on the same private cloud processor. Furthermore, the scheduling on this free slot must respect a priority constraint relationship between tasks. And selecting a private cloud processor with the lowest cost from the screened private cloud processors meeting the task sub-expectations as the final deployment of the task. If there are no private cloud processors that meet the task sub-expectations, two different choices are made depending on whether the current task is a private task. Firstly, if the current task is a privacy task, in order to protect the privacy security of data, the current task must be stored in a private cloud resource pool, a processor with earliest completion time is selected from the private cloud resource pool, and the current task is deployed on the processor; second, if the current task is a non-privacy task, it will be assigned in public cloud. That is, the final compute-intensive-private tasks and the communications-intensive-private tasks must select a processor among the corresponding private cloud resource pools, whereas the compute-intensive-non-private tasks and the communications-intensive-non-private tasks are reassigned to the public cloud without meeting the task sub-expectations. And selecting servers with different types and different prices on different public cloud platforms according to the distance between the current task and the geographic position of the servers where the task is directly precursor to the current task for the non-privacy tasks which do not meet the expectations of the task. On the premise of meeting the task sub-expectation, selecting the server with the lowest public cloud cost, and if the public cloud resource meeting the task sub-expectation does not exist, selecting the server with the fastest completion for the task to be distributed.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The heterogeneous hybrid cloud scheduling method based on task classification and privacy protection is characterized by comprising the following steps of:

s1, classifying tasks in a workflow into two major categories, namely computationally intensive and communication intensive, classifying the tasks in the workflow into private tasks and non-private tasks according to whether the current tasks relate to private data or not, and finally classifying the tasks in the workflow into four minor categories: (1) computationally intensive-private, (2) computationally intensive-non-private, (3) communications intensive-private, (4) communications intensive-non-private;

s2, classifying the private cloud resources, and dividing the processor resources in the private cloud into main computing type resources and main communication type resources according to physical factors to form two types of resource pools;

s3, task sequencing, namely sequencing the computation intensive tasks and the communication intensive tasks respectively to form two sequencing lists, wherein the sequencing standard adopts an upward sequencing value;

s4, task sub-expectation distribution, wherein each task sub-expectation is distributed according to the ratio of the upward sequencing value of the task in the sum of all the upward sequencing values of the task;

s5, selecting a private cloud processor, calling tasks in two formed sequencing lists of the computation-intensive task and the communication-intensive task in sequence, and distributing different types of tasks to corresponding resource pools, so that the private cloud processor which can meet the expectations of the task sub-in the current resource pool is screened out;

2. The heterogeneous hybrid cloud scheduling method based on task classification and privacy protection of claim 1, wherein the physical factors include CPU speed, memory performance, and storage disk input/output performance.

3. The heterogeneous hybrid cloud scheduling method based on task classification and privacy protection of claim 1, wherein when selecting a processor for a task, task sub-expectations are first met and then other factors are considered, including optimization costs.

4. The heterogeneous hybrid cloud scheduling method based on task classification and privacy protection according to claim 1, wherein in S5, a private cloud processor capable of executing the current task earliest is selected based on an execution end time of a last task of the current private cloud processor and a transmission end time of data of the current task to the current private cloud processor.

5. The heterogeneous hybrid cloud scheduling method based on task classification and privacy protection of claim 4, wherein the insertion policy is prioritized when selecting the private cloud processor, and the current task is inserted in the earliest free time slot between two tasks that have been scheduled on the current private cloud processor.

6. The heterogeneous hybrid cloud scheduling method based on task classification and privacy protection according to claim 1, wherein one private cloud processor with the lowest cost is selected as the final deployment of the task from among the screened private cloud processors meeting the task sub-expectations.

7. The heterogeneous hybrid cloud scheduling method based on task classification and privacy protection according to claim 1, wherein in S6, different types of servers with different prices on different public cloud platforms are selected according to the distance between the current task and the geographic location of the server where the task is directly predecessor to the current task.

8. The heterogeneous hybrid cloud scheduling method based on task classification and privacy protection according to claim 7, wherein on the premise of meeting task sub-expectations, a server with lowest public cloud cost is selected, and if no server meeting task sub-expectations exists, a server with fastest completion is selected for the task to be allocated.