CN116700920A

CN116700920A - Cloud primary hybrid deployment cluster resource scheduling method and device

Info

Publication number: CN116700920A
Application number: CN202310542318.1A
Authority: CN
Inventors: 梁永康; 马宗垚; 吴延生; 周新衡
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2023-09-05

Abstract

The invention discloses a cloud primary hybrid deployment cluster resource scheduling method and device, wherein the method comprises the following steps: acquiring cloud platform resource use data and application program performance index data of each node in a cloud platform; inputting cloud platform resource use data and application program performance index data of each node in the cloud platform into a trained decision tree model, and outputting resource information which can be used by each node in the cloud platform and application program information which can be operated; the decision tree model is obtained through training according to cloud platform resource use historical data and application program performance index historical data of each node in the cloud platform; and distributing and scheduling the resources of each node in the cloud platform according to the resource information which can be used by each node in the cloud platform and the application program information which can be operated. The invention can improve the utilization rate of resources in the mixed deployment cluster environment.

Description

Cloud primary hybrid deployment cluster resource scheduling method and device

Technical Field

The invention relates to the technical field of cloud computing, in particular to a cloud primary hybrid deployment cluster resource scheduling method and device.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

The rapid development and popularity of cloud computing technology has led to businesses increasingly relying on cloud services to run and manage business. With the continuous emergence and competition of cloud service providers, users can select cloud services which are most suitable for own business requirements from a plurality of cloud platforms. However, this also brings new challenges to enterprises, how to implement resource coordination and unified management across cloud platforms, and how to optimize the use efficiency of cloud resources and reduce the cost.

In order to solve these problems, a concept of hybrid deployment has emerged in recent years, that is, a unified hybrid cluster environment is formed by integrating a plurality of cloud computing resources such as public cloud, private cloud and local data center, so as to support business application and data processing of enterprises. The hybrid clusters can help enterprises to manage resources and applications more flexibly, improve resource utilization and response capacity, and reduce cost and risk.

However, in the hybrid cluster environment, there may be differences in resource characteristics and performance indexes of different cloud platforms, and how to implement dynamic scheduling and management of resources across cloud platforms is an important problem in the field of hybrid deployment. Meanwhile, because a large amount of cloud services, applications and data exist in the hybrid deployment cluster environment, how to intelligently analyze and schedule the resources is an important challenge for improving the utilization rate of the resources of the hybrid clusters and the performance of the clusters.

Disclosure of Invention

The embodiment of the invention provides a cloud native hybrid deployment cluster resource scheduling method, which is used for improving the utilization rate of resources in a hybrid deployment cluster environment and comprises the following steps:

acquiring cloud platform resource use data and application program performance index data of each node in a cloud platform;

inputting cloud platform resource use data and application program performance index data of each node in the cloud platform into a trained decision tree model, and outputting resource information which can be used by each node in the cloud platform and application program information which can be operated; the decision tree model is obtained through training according to cloud platform resource use historical data and application program performance index historical data of each node in the cloud platform;

and distributing and scheduling the resources of each node in the cloud platform according to the resource information which can be used by each node in the cloud platform and the application program information which can be operated.

The embodiment of the invention also provides a cloud primary hybrid deployment cluster resource scheduling device, which is used for comprising:

monitoring cloud platform resource use data and application program performance index data of each node in the cloud platform in real time;

inputting real-time monitoring cloud platform resource use data and application program performance index data of each node in the cloud platform into a trained decision tree model;

when the resource information which can be used by each node and the application information which can be operated in the cloud platform output by the decision tree model are changed, the resource allocation and the scheduling of each node in the cloud platform are carried out again according to the changed resource information which can be used by each node and the changed application information which can be operated in the cloud platform.

The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the cloud native hybrid deployment cluster resource scheduling method is realized when the processor executes the computer program.

The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the cloud native hybrid deployment cluster resource scheduling method when being executed by a processor.

The embodiment of the invention also provides a computer program product, which comprises a computer program, and the computer program realizes the cloud native hybrid deployment cluster resource scheduling method when being executed by a processor.

In the embodiment of the invention, compared with the technical scheme that in the prior art, a large amount of cloud services, applications and data in a hybrid deployment cluster environment cannot be intelligently analyzed and resource scheduling is performed, and the resource utilization rate is low, the cloud primary hybrid deployment cluster resource scheduling scheme is characterized in that the cloud platform resource utilization data and the application performance index data of each node in the cloud platform are obtained; inputting cloud platform resource use data and application program performance index data of each node in the cloud platform into a trained decision tree model, and outputting resource information which can be used by each node in the cloud platform and application program information which can be operated; the decision tree model is obtained through training according to cloud platform resource use historical data and application program performance index historical data of each node in the cloud platform; and distributing and scheduling the resources of each node in the cloud platform according to the resource information which can be used by each node in the cloud platform and the application program information which can be operated. Thereby improving the utilization rate of resources in the mixed deployment cluster environment.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:

fig. 1 is a flow chart of a cloud native hybrid deployment cluster resource scheduling method in an embodiment of the invention;

FIG. 2 is a flowchart illustrating a method for scheduling cloud native hybrid deployment cluster resources according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a hybrid deployment cluster resource scheduling result in an embodiment of the present invention;

fig. 4 is a schematic diagram of a cloud native hybrid deployment cluster resource scheduling device in an embodiment of the present invention;

fig. 5 is a schematic diagram of a cloud native hybrid deployment cluster resource scheduling apparatus according to another embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.

In the description of the present specification, the terms "comprising," "including," "having," "containing," and the like are open-ended terms, meaning including, but not limited to. The description of the reference terms "one embodiment," "a particular embodiment," "some embodiments," "for example," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The order of steps involved in the embodiments is illustrative of the practice of the invention, and is not limited and may be suitably modified as desired.

The terms involved in the embodiments of the present invention are explained as follows:

decision tree algorithm: a classification and regression method based on tree structure.

Cloud primordial: a software development and deployment mode aims at supporting agile, extensible and elastic application program development and management in a cloud computing environment, cloud native is based on a micro-service framework, packaging and deployment are carried out by using a containerization technology, and efficiency and reliability are provided by combining an automatic management and arrangement tool.

And (3) hybrid deployment: the cloud nodes of different types are scheduled to the same physical resource, and the idle resources meet the requirement of the application operation of the nodes on the basis of ensuring the service level target by means of control means such as scheduling, resource isolation and the like, so that the resource capacity is fully utilized, and the cost is greatly reduced.

In the prior art, intelligent analysis and resource scheduling cannot be performed on a large amount of cloud services, applications and data in a hybrid deployment cluster environment, and the resource utilization rate is low.

Based on the above-mentioned problems, an embodiment of the present invention provides a cloud native hybrid deployment cluster resource scheduling method, which is used to improve the utilization rate of resources in a hybrid deployment cluster environment, and fig. 1 is a schematic flow chart of the cloud native hybrid deployment cluster resource scheduling method in the embodiment of the present invention, as shown in fig. 1, and the method includes:

step 101: acquiring cloud platform resource use data and application program performance index data of each node in a cloud platform;

step 102: inputting cloud platform resource use data and application program performance index data of each node in the cloud platform into a trained decision tree model, and outputting resource information which can be used by each node in the cloud platform and application program information which can be operated; the decision tree model is obtained through training according to cloud platform resource use historical data and application program performance index historical data of each node in the cloud platform;

step 103: and distributing and scheduling the resources of each node in the cloud platform according to the resource information which can be used by each node in the cloud platform and the application program information which can be operated.

In one embodiment, after resource allocation and scheduling for each node in the cloud platform,

In one embodiment, for the decision tree model in step 102, the decision tree model is trained as follows:

acquiring cloud platform resource use history data and application program performance index history data of each node in a cloud platform;

performing feature extraction on the cloud platform resource use history data and the application program performance index history data of each node in the cloud platform, and constructing a history data set by using the extracted feature variables;

dividing a historical data set to obtain a training set and a testing set;

training the decision tree model by using the training set, evaluating the trained decision tree model by using the testing set, and obtaining the trained decision tree model when the evaluation passes.

Wherein, the characteristic variable includes:

CPU utilization: CPU utilization is an important indicator for measuring system performance, and can be used for judging whether the system is overloaded or not and whether resources need to be added or not.

Memory utilization rate: the memory utilization is an important index for measuring the utilization of system resources, and can be used for judging whether the system needs to increase the memory capacity or not so as to improve the system performance.

Disk space utilization: disk space usage is an important indicator of storage resource usage and can be used to determine if the system needs to increase storage capacity.

Network bandwidth utilization: the network bandwidth utilization is an important index for measuring the use condition of network resources, and can be used for judging whether the network is overloaded or not and whether the bandwidth needs to be increased or not.

Request response duration: the request response time is an important index for measuring the performance of the application program, and can be used for judging whether the application program needs to be optimized or not so as to improve the response speed.

Number of concurrent connections: the number of concurrent connections is an important indicator for measuring the performance of an application program, and can be used to determine whether the application program needs to add resources to improve the concurrent performance.

Application error rate: the application error rate is an important index for measuring the stability of an application, and can be used for judging whether the application needs tuning or not so as to improve the stability.

Request throughput: request throughput is an important measure of the performance of an application and can be used to determine whether the processing power of the application is sufficient or whether an increase in resources is required.

Load balancing: load balancing means that the load is balanced among a plurality of servers, and the condition that one server is overloaded is avoided. The indicators of load balancing may include request amount, connection number, CPU usage, etc.

Network delay: network delay refers to the time required from sending a request to receiving a response. The indicators of network delay may include round trip time, packet loss rate, etc.

Cloud platform reliability: cloud platform reliability refers to the stability and fault tolerance of the cloud platform. The indexes of the reliability of the cloud platform can comprise the error rate, the crash rate, the recovery time length and the like of the cloud platform.

In specific implementation, a cross verification method is adopted to divide a historical data set to obtain a training set and a testing set; when training the decision tree model by using the training set, acquiring the maximum information gain entropy as a judgment condition of the decision tree; the information gain entropy is the difference value between the information entropy and the conditional entropy; evaluating one or any combination of the accuracy, recall and F1 value of the trained decision tree model with the test set.

In the decision tree algorithm, the information gain entropy is an important index for selecting the feature variable, and is defined as how much information can be brought to the classification system by the feature variable, and the more the information is brought, the more important the feature variable is, and the larger the corresponding information gain entropy is. Information entropy is the complexity representing a random variable, and conditional entropy represents the complexity of a random variable under a certain condition. The information gain entropy is equal to the information entropy-conditional entropy. The information gain entropy represents the degree to which the complexity of the information is reduced under one condition.

In the decision tree algorithm, information gain entropy is used for measurement. After selecting a feature variable, the information gain entropy is the largest, and the feature variable is selected.

The information entropy is calculated as follows:

wherein H (p) represents information entropy, p represents probability that cloud native nodes can be stably supported in hybrid deployment, and p _i With p ₁ And p ₂ Two values, p ₁ Represents a stable support, p ₂ Indicating that the support is not stable.

The conditional entropy is calculated as follows:

wherein H (Y|X) represents conditional entropy of characteristic variable X, Y represents the condition that cloud native nodes can be stably supported in mixed deployment, X represents characteristic variable, X _i The feature variable X with the dimension i and m represents the total number of the dimensions of the feature variable X, for example, the dimensions of the use condition of the feature variable CPU comprise 1C, 2C, 4C, 8C and 16G, and then the total number of the dimensions of the use condition of the feature variable CPU is 5, p _i The probability that deployment can be stabilized based on the feature variable X with dimension i is shown.

The information gain entropy is equal to the difference value between the information entropy and the conditional entropy, the obtained maximum information gain entropy is used as a judging condition of the decision tree, the information gain entropy is calculated for multiple times based on different characteristic variables, pruning is carried out on the decision tree, overfitting is prevented until each branch of the decision tree is calculated, and a complete decision tree is formed.

The following describes in detail a cloud native hybrid deployment cluster resource scheduling method according to an embodiment of the present invention with reference to fig. 2, and fig. 2 is a flow chart of the cloud native hybrid deployment cluster resource scheduling method according to an embodiment of the present invention, as shown in fig. 2, where the method includes:

s101: data preparation, comprising: acquiring cloud platform resource use history data and application program performance index history data of each node in a cloud platform;

s102: feature extraction, comprising: carrying out feature extraction on cloud platform resource use history data and application program performance index history data of each node in the cloud platform, wherein feature variables comprise: CPU utilization rate, memory utilization rate, disk space utilization rate, network bandwidth utilization rate, request response time length, concurrent connection number, application error rate, request throughput, load balancing, network delay and cloud platform reliability;

s103: data partitioning, comprising: constructing a historical data set by the extracted characteristic variables, and dividing the historical data set by adopting a cross-validation method to obtain a training set and a testing set;

s104: constructing a decision tree, comprising: training the decision tree model by using a training set, and taking the obtained maximum information gain entropy as a judgment condition of the decision tree; the information gain entropy is the difference value between the information entropy and the conditional entropy;

s105: model evaluation, comprising: evaluating the accuracy, recall rate and F1 value of the trained decision tree model by using a test set, and obtaining a trained decision tree model when the evaluation passes;

s106: a model application, comprising: acquiring cloud platform resource use data and application program performance index data of each node in a cloud platform;

according to the resource information which can be used by each node in the cloud platform and the application information which can be operated, carrying out resource allocation and scheduling on each node in the cloud platform;

Fig. 3 is a schematic diagram of a scheduling result of a hybrid deployment cluster resource in an embodiment of the present invention, as shown in fig. 3, pod represents a Node in a cloud platform, and Node represents a machine resource, and it can be seen that before hybrid deployment, the resource occupation of the machine resource is smaller, and 6 nodes Pod are deployed from 3 machine resources to 2 machine resources through hybrid deployment, so that the utilization rate of the resource is improved, the resource of a single machine can be maximally utilized, and the utilization rate of the hybrid deployment cluster resource is improved.

The embodiment of the invention also provides a cloud native hybrid deployment cluster resource scheduling device, which is described in the following embodiment. Because the principle of the device for solving the problem is similar to that of the cloud primary hybrid deployment cluster resource scheduling method, the implementation of the device can refer to the implementation of the cloud primary hybrid deployment cluster resource scheduling method, and the repetition is omitted.

Fig. 4 is a schematic diagram of a cloud native hybrid deployment cluster resource scheduling device in an embodiment of the present invention, where, as shown in fig. 4, the device includes:

the data acquisition module 01 is used for acquiring cloud platform resource use data and application program performance index data of each node in the cloud platform;

the information output module 02 is used for inputting the cloud platform resource use data and the application program performance index data of each node in the cloud platform into the trained decision tree model, and outputting the resource information which can be used by each node in the cloud platform and the application program information which can be operated; the decision tree model is obtained through training according to cloud platform resource use historical data and application program performance index historical data of each node in the cloud platform;

the resource processing module 03 is configured to allocate and schedule resources for each node in the cloud platform according to the resource information that can be used by each node in the cloud platform and the application information that can be operated.

In an embodiment, the cloud native hybrid deployment cluster resource scheduling device in the embodiment of the invention further includes a monitoring module, configured to:

the information output module is further used for: inputting real-time monitoring cloud platform resource use data and application program performance index data of each node in the cloud platform into a trained decision tree model;

the resource processing module is further used for: when the resource information which can be used by each node and the application information which can be operated in the cloud platform output by the decision tree model are changed, the resource allocation and the scheduling of each node in the cloud platform are carried out again according to the changed resource information which can be used by each node and the changed application information which can be operated in the cloud platform.

In one embodiment, the cloud native hybrid deployment cluster resource scheduling device in the embodiment of the present invention further includes a decision tree model training module, configured to train to obtain a decision tree model as follows:

dividing a historical data set to obtain a training set and a testing set;

Wherein, the characteristic variable includes: CPU usage, memory usage, disk space usage, network bandwidth usage, request response time, concurrent connection number, application error rate, request throughput, load balancing, network delay, cloud platform reliability, or any combination thereof.

In specific implementation, the decision tree model training module is used for:

dividing a historical data set by adopting a cross verification method to obtain a training set and a testing set; when training the decision tree model by using the training set, acquiring the maximum information gain entropy as a judgment condition of the decision tree; the information gain entropy is the difference value between the information entropy and the conditional entropy; evaluating one or any combination of the accuracy, recall and F1 value of the trained decision tree model with the test set.

Fig. 5 is a schematic diagram of a cloud native hybrid deployment cluster resource scheduling apparatus according to an embodiment of the present invention, as shown in fig. 5, where the apparatus includes:

and the M201 node identification module is used for acquiring cloud platform resource use data and application program performance index data of each node in the cloud platform, wherein the application program performance index data of the nodes is acquired according to an API (application program interface) provided by the cloud platform, and the indexes such as configuration, capacity and load of the nodes are included, so that the nodes are classified and identified.

And the M202 node decision module is used for judging the identified nodes according to the trained decision tree model, outputting the resource information which can be used by each node in the cloud platform and the application program information which can be operated, and realizing the prediction and analysis of the resource use condition.

The M203 node scheduling module is used for outputting according to the M202 node decision module: resource information which can be used by each node in the cloud platform and application program information which can be operated are dynamically distributed and scheduled to proper machine resources through self-adaptive scheduling and dynamic resource distribution technology, and the application program is deployed on the optimal node so as to achieve optimal resource utilization rate and application performance.

And the M204 node monitoring module is used for monitoring the cloud platform resource use data and the application program performance index data of each node in the cloud platform in real time, feeding the data back to the M202 node decision module for optimization and adjustment, and ensuring the resource utilization rate and performance stability of the whole mixed part cluster.

In summary, the embodiment of the invention has the advantages that the decision tree algorithm is utilized to intelligently analyze and predict the resource usage, so as to realize dynamic scheduling and optimize the resource usage, thereby improving the resource utilization rate and service performance in the mixed cluster environment. Meanwhile, the method can be adaptively learned and optimized, adapts to different resource environments and service requirements, and has stronger adaptability and flexibility.

The technical scheme provided by the embodiment of the invention can be applied to various mixed part cluster scenes, including integration and management of public cloud, private cloud, local data center and other resources. By realizing intelligent analysis and optimization of resources, the resource utilization rate and response capability of enterprises can be improved, the cost and risk are reduced, and the digitized transformation and innovation development of the enterprises are promoted.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The cloud primary hybrid deployment cluster resource scheduling method is characterized by comprising the following steps of:

2. The method of claim 1, further comprising, after resource allocation and scheduling for each node in the cloud platform:

3. The method of claim 1, further comprising training to obtain a decision tree model as follows:

dividing a historical data set to obtain a training set and a testing set;

4. A method as claimed in claim 3, wherein the feature variables comprise:

CPU usage, memory usage, disk space usage, network bandwidth usage, request response time, concurrent connection number, application error rate, request throughput, load balancing, network delay, cloud platform reliability, or any combination thereof.

5. The method of claim 3, wherein partitioning the historical dataset into training and testing sets comprises:

and dividing the historical data set by adopting a cross-validation method to obtain a training set and a testing set.

6. The method of claim 3, wherein training the decision tree model with the training set comprises:

when training the decision tree model by using the training set, acquiring the maximum information gain entropy as a judgment condition of the decision tree; the information gain entropy is the difference between the information entropy and the conditional entropy.

7. The method of claim 3, wherein evaluating the trained decision tree model in the test set comprises:

evaluating one or any combination of the accuracy, recall and F1 value of the trained decision tree model with the test set.

8. The cloud primary hybrid deployment cluster resource scheduling device is characterized by comprising:

the data acquisition module is used for acquiring cloud platform resource use data and application program performance index data of each node in the cloud platform;

the information output module is used for inputting the cloud platform resource use data and the application program performance index data of each node in the cloud platform into the trained decision tree model and outputting the resource information which can be used by each node in the cloud platform and the application program information which can be operated; the decision tree model is obtained through training according to cloud platform resource use historical data and application program performance index historical data of each node in the cloud platform;

the resource processing module is used for distributing and scheduling the resources of each node in the cloud platform according to the resource information which can be used by each node in the cloud platform and the application program information which can be operated.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any of claims 1 to 7.

11. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the method of any of claims 1 to 7.