CN109445936A

CN109445936A - A kind of cloud computing load clustering method, system and electronic equipment

Info

Publication number: CN109445936A
Application number: CN201811188871.5A
Authority: CN
Inventors: 叶可江; 陈文艳; 须成忠
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2018-10-12
Filing date: 2018-10-12
Publication date: 2019-03-08
Anticipated expiration: 2038-10-12
Also published as: CN109445936B

Abstract

This application involves a kind of cloud computing load clustering method, system and electronic equipments.This method comprises: step a: acquisition cluster monitoring data, and extract the feature vector in the cluster monitoring data；Step b: calculating the mean profile coefficient of each feature vector, determines each feature vector corresponding K value when the mean profile coefficient maximum；Step c: the corresponding K value of each feature vector is substituted into K-Means clustering algorithm and is clustered, the cluster result of each dimension is obtained；Step d: the cluster result of each dimension is combined division, forms the load class with similar features.The application clusters feature vector based on the estimation of feature set silhouette coefficient in the cloud computing environment of type of production, all feature vectors are combined again, the higher loadtype of similarity is formed, so that the class cluster generated has stronger similitude, it is ensured that the accuracy of Clustering Effect.

Description

A kind of cloud computing load clustering method, system and electronic equipment

Technical field

The application belongs to field of cloud computer technology, in particular to a kind of cloud computing load clustering method, system and electronics are set It is standby.

Background technique

Cloud computing technology is widely used in cloud environment due to its high availability, on-demand service and low cost.Yun Huan Border be by a large amount of physics units at distributed type assemblies (resource pool), allow users to obtain computing capability, memory space on demand And information service, the key factor for influencing its performance is job scheduling and resource allocation to each node, therefore to operating in Workload characteristic in cluster is analysed in depth, to more reasonably distribute physical resource and the execution node to task It is of crucial importance to cluster management to carry out effective decision-making.

Cloud computing environment has polymerize a large amount of physical resources, virtual resource, and realizes dynamic retractility clothes using virtualization technology Business amount provides service on demand, and the workload run thereon is because of the difference to server resource (CPU, memory, disk etc.) demand And it executes the difference of time and shows different load characteristics.Cluster management is facilitated to the deep understanding of load characteristic Job scheduling and resource allocation.And it is how with unsupervised clustering method that a variety of different loads are special according to similar operation Sign is divided into same kind, to promote resource utilization to its distribution server resource, be still what current load characteristic was analyzed Significant challenge.

Carrying out effective cluster to the type of workload can be used for load estimation, and load estimation result is cluster money Source planning and task schedule provide true and reliable foundation, to promote physical resource utilization rate, save for Cloud Server provider Save great amount of cost.Therefore, either industry or academia, load cluster all become the research of present load analysis field Hot spot.

In the prior art, relatively common load class has: CPU intensive type, I/O intensity, memory-intensive, network are close Collection type and other.Most common load classification method is known class label, is carried out with mathematical method to load object special Sign analysis, it is similar to known type feature then by load partition classification thus, but at present still without the similar of relatively specification Spend measurement standard.Load partition class method for distinguishing is to be divided using unsupervised learning method, class number and label by another kind It is unknown before classification.There are K-Means clustering algorithm, mean shift clustering algorithm, cohesion using more extensive clustering algorithm Hierarchical clustering algorithm, density clustering method, figure group's detection (Graph Community Detection).

Ren Xiaodan's is appointed " based on the division of the task type of resource and real-time demand and its load balancing research " from user This different feature of business request type is set out, and according to resource requirement type and real-time demand degree, task is divided into real time Type task queue, the task queue of I/O consumption-type, CPU consumption-type task queue.Guo's equality " is divided based on server load situation The load balance scheduling algorithm of class " propose it is a kind of according to the load characteristic run on server by server carry out do not have to type The method of classification.A kind of " the load classification run on virtual machine under cloud computing environment of patent that Yin Jianwei etc. was delivered in 2016 Method " propose it is a kind of with TSRSVM (TrainingSets Refresh SVM) classifier to the load run in cluster into The load of monitoring is divided into four classes: CPU intensive type, memory-intensive, I/O intensity and network-intensive by the method for row classification, Four kinds of intensive corresponding customization optimisation strategies of operating systems offer have been categorized into running.

In conclusion existing load classification method is rule of thumb to be divided into load with the load characteristic of coarseness Then common loadtype determines the classification subordinate relation of acquisition data sample according to the label of known load type.And it is existing It is mostly to classify in the method for supervised learning to load that some, which loads classification method, and the load class of division is common Several types, and do not consider with the development of science and technology and diversification of the people to network service requirement, the feature of load also become Must be more complicated, it is difficult to be divided into certain specific type.Secondly, most load disaggregated model is to use to have prison at present The machine learning method superintended and directed such as support vector machines (SVM) and statistical analysis technique are classified, and are applied to clustering most Extensive K-Means algorithm is largely that clustering is carried out according to artificially determining k value, and by the spy of all selections when clustering Input of the vector as K-Means is levied, can not measure K is worth choosing whether most preferably, that is, in the load that not can guarantee division, each Characteristics of objects in class cluster all has stronger similitude.

Summary of the invention

This application provides a kind of cloud computing load clustering method, system and electronic equipments, it is intended at least to a certain degree It is upper to solve one of above-mentioned technical problem in the prior art.

To solve the above-mentioned problems, this application provides following technical solutions:

A kind of cloud computing load clustering method, comprising the following steps:

Step a: acquisition cluster monitoring data, and extract the feature vector in the cluster monitoring data；

Step b: calculating the mean profile coefficient of each feature vector, determines each spy when the mean profile coefficient maximum Levy the corresponding K value of vector；

Step c: the corresponding K value of each feature vector is substituted into K-Means clustering algorithm and is clustered, is obtained each The cluster result of dimension；

Step d: the cluster result of each dimension is combined division, forms the load class with similar features.

The technical solution that the embodiment of the present application is taken further include: the step a further include: judge to bear when the sampling time starts It carries and whether is carrying out, or load at the end of sampling and still have not been executed, if load is carrying out when the sampling time starts, or It loads at the end of sampling and still has not been executed, set negative value for the task Starting Executing Time of the cluster monitoring data.

The technical solution that the embodiment of the present application is taken further include: described to extract the cluster monitoring number in the step a Feature vector in further include: min-max standard normalized is carried out to the feature vector of extraction.

The technical solution that the embodiment of the present application is taken further include: the step b further include: judge whether K value is larger and right The mean profile coefficient and K-1 answered, whether K-2 gap is smaller, if K value is larger and corresponding mean profile coefficient and K-1, K- 2 gaps are smaller, then using lesser K-1 or K-2 as new K value.

The technical solution that the embodiment of the present application is taken further include: the step d further include: whether judge load class number It is excessive and there are the lesser load class of difference, if load class number is excessive and there are the lesser load class of difference, point The characteristic similarity between each load class is analysed, and the load class with similar features is merged.

A kind of another technical solution that the embodiment of the present application is taken are as follows: cloud computing load clustering system, comprising:

Data acquisition module: for acquiring cluster monitoring data；

Characteristic extracting module: for extracting the feature vector in the cluster monitoring data；

Silhouette coefficient computing module: for calculating the mean profile coefficient of each feature vector, the mean profile is determined Each feature vector corresponding K value when coefficient maximum；

Cluster module: clustering for the corresponding K value of each feature vector to be substituted into K-Means clustering algorithm, Obtain the cluster result of each dimension；

Category division module: for the cluster result of each dimension to be combined division, being formed has similar features Load class.

The technical solution that the embodiment of the present application is taken further includes task judgment module, and the task judgment module is for judging Whether load is carrying out when sampling time starts, or loads at the end of sampling and still have not been executed, if the sampling time starts When load be carrying out, or load and still have not been executed at the end of sampling, when the task of the cluster monitoring data is started to execute Between be set as negative value.

The technical solution that the embodiment of the present application is taken further include: the characteristic extracting module be also used to the feature of extraction to Amount carries out min-max standard normalized.

The technical solution that the embodiment of the present application is taken further includes K value judgment module, and the K value judgment module is for judging K Whether value is larger and whether corresponding mean profile coefficient and K-1, K-2 gap are smaller, if K value is larger and corresponding average wheel Wide coefficient and K-1, K-2 gap is smaller, then using lesser K-1 or K-2 as new K value.

The technical solution that the embodiment of the present application is taken further includes categories combination module, and the categories combination module is for judging Whether load class number excessive and there are the lesser load class of difference, if load class number it is excessive and there are difference compared with Small load class analyzes the characteristic similarity between each load class, and the load class with similar features is carried out Merge.

The another technical solution that the embodiment of the present application is taken are as follows: a kind of electronic equipment, comprising:

At least one processor；And

The memory being connect at least one described processor communication；Wherein,

The memory is stored with the instruction that can be executed by one processor, and described instruction is by described at least one Device is managed to execute, so that at least one described processor is able to carry out the following operation of above-mentioned cloud computing load clustering method:

Compared with the existing technology, the beneficial effect that the embodiment of the present application generates is: the cloud computing of the embodiment of the present application is negative The advantages of clustering method, system and electronic equipment combine supervised learning and unsupervised learning is carried, in the cloud computing of type of production Feature vector is clustered based on the estimation of feature set silhouette coefficient in environment, then all feature vectors are combined, shape At the higher loadtype of similarity, so that the class cluster generated has stronger similitude, it is ensured that the accuracy of Clustering Effect, To maximize server resource utilization rate, more accurate foundation is provided for resource planning and load dispatch.

Detailed description of the invention

Fig. 1 is the flow chart of the cloud computing load clustering method of the embodiment of the present application；

Fig. 2 is the structural schematic diagram of the cloud computing load clustering system of the embodiment of the present application；

Fig. 3 is the hardware device structural schematic diagram of cloud computing load clustering method provided by the embodiments of the present application.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the application, not For limiting the application.

Referring to Fig. 1, being the flow chart of the cloud computing load clustering method of the embodiment of the present application.The cloud of the embodiment of the present application Computational load clustering method the following steps are included:

Step 100: passing through cluster monitoring software collection cluster monitoring data；

In step 100, when the cluster monitoring data of acquisition include that cpu busy percentage, memory usage, task start to execute Between, the attribute values such as job end time, sample frequency be it is primary every sampling in 60 seconds, can specifically be set according to practical application It is fixed.

Step 200: judging whether load is carrying out when the sampling time starts, or load at the end of sampling and be still not carried out Finish, if load is carrying out when the sampling time starts, or is loaded at the end of sampling and still had not been executed, executes step 300；It is no Then, step 400 is executed；

Step 300: setting negative value (- 1) for the task Starting Executing Time of the cluster monitoring data；

In step 300, due to terminating still there is load not tie containing the load and sampling being currently running in sampling process The situation of beam, for this special circumstances, the application is by setting the task Starting Executing Time corresponded in cluster monitoring data It is set to negative value, can be negative value by these task Starting Executing Times when carrying out load classification to mark the particularity of load Cluster monitoring data reject, guarantee analysis accuracy and integrality.

Step 400: extracting in cluster monitoring data that most the feature vector of statistical significance is and right as the dimension clustered The feature vector of extraction carries out min-max standard normalized；

In step 400, since the primary colony monitoring data of acquisition can not be needed directly as the feature vector of cluster The feature vector that most statistical significance is selected from characteristic set, achievees the purpose that dimensionality reduction.According to correlative study, obtain CPU, memory, disk occupancy, I/O, network flow and load execute influence maximum of the time to load class attribute, Ke Yigen Feature vector is reasonably selected according to the cluster monitoring data of acquisition, the dimension as cluster.

Standard normalization is carried out to the characteristic value of extraction, can have big difference to avoid different lines data bulk grade, cause to count The problem of variation for counting to count greatly can cover the variation of decimal.The application uses typical min-max standardized method, calculates Method is for data object x, the calculation of normalized x value are as follows:

Step 500: calculating the mean profile coefficient of each feature vector, determine each feature when mean profile coefficient maximum The corresponding K value of vector；

In step 500, the optimum k value of each feature vector is determined using mean profile coefficient.Silhouette coefficient assessment is poly- The standard of class is the dispersion degree between class cluster and in class cluster, combines condensation degree and separating degree, wherein condensation degree refers to i-th A object, the average distance of it and other points in similar cluster, is denoted as ai；Separating degree refers to i-th of object and does not include its class The average distance of all the points in cluster, is denoted as bi；And the silhouette coefficient of i-th of objectThe value be in [- 1, 1] between, numerical value is bigger, illustrates that Clustering Effect is better.The mean profile coefficient of k-th of feature vector featurek is calculated Mode are as follows:The application assesses the cluster result of feature vector using mean profile coefficient, K-Means algorithm is avoided to need the artificial uncertainty for determining k value and not accuracy.

Step 600: judge whether larger K value and whether corresponding mean profile coefficient and K-1, K-2 gap are smaller, if K value is larger and corresponding mean profile coefficient and K-1, K-2 gap are smaller, executes step 700；Otherwise, step 800 is executed；

Step 700: using lesser K-1 or K-2 as new K value；

Step 800: the corresponding K value of each feature vector being substituted into K-Means clustering algorithm and is clustered, each dimension is obtained The cluster result of degree；

In step 800, since each feature vector is one-dimensional, so can determine each class cluster after K value to be determined Boundary value, marked generally according to numerical value according to large, medium and small or long, medium and short.Using unsupervised clustering method K- Menas is applied in cluster load cluster, determines the boundary value of feature vector clusters, and the work for more being met demand complexity is negative Carry type.

Step 900: the cluster result of each dimension being combined division, forms the load class with similar features；

Step 1000: judging whether load class number is excessive and there are the lesser load class of difference, if load class Other number is excessive and there are the lesser load class of difference, executes step 1100；Otherwise, step 1200 is executed；

Step 1100: analyzing the characteristic similarity between each load class, and by the load class with similar features It merges, to reduce load class number；

Step 1200: load end of clustering.

Referring to Fig. 2, being the structural schematic diagram of the cloud computing load clustering system of the embodiment of the present application.The embodiment of the present application Cloud computing load clustering system include data acquisition module, task judgment module, characteristic extracting module, silhouette coefficient calculate mould Block, K value judgment module, cluster module, category division module and categories combination module.

Data acquisition module: for passing through cluster monitoring software collection cluster monitoring data；Wherein, the cluster monitoring of acquisition Data include that attribute values, the sample frequencys such as cpu busy percentage, memory usage, task Starting Executing Time, job end time are It is primary every sampling in 60 seconds, it can specifically be set according to practical application.

Task judgment module: whether load is carrying out when for judging that the sampling time starts, or loads at the end of sampling It still has not been executed, if load is carrying out when the sampling time starts, or is loaded at the end of sampling and still had not been executed, by this The task Starting Executing Time of cluster monitoring data is set as negative value (- 1)；Wherein, due in sampling process, containing transporting Capable load and sampling terminates still have the unclosed situation of load, and for this special circumstances, the application will be by that will correspond to collection Task Starting Executing Time in group's monitoring data is set as negative value, to mark the particularity of load, when carrying out load classification The cluster monitoring data that these task Starting Executing Times are negative value can be rejected, guarantee the accuracy and integrality of analysis.

Characteristic extracting module: for extracting in cluster monitoring data the most feature vector of statistical significance as cluster Dimension, and min-max standard normalized is carried out to the feature vector of extraction；Wherein, since the primary colony of acquisition monitors Data, directly as the feature vector of cluster, can not need to select from characteristic set the feature of most statistical significance to Amount, achievees the purpose that dimensionality reduction.According to correlative study, show that CPU, memory, disk occupancy, I/O, network flow and load execute Influence of the time to load class attribute is maximum, can reasonably select feature vector according to the cluster monitoring data of acquisition, as The dimension of cluster.

Silhouette coefficient computing module: for calculating the mean profile coefficient of each feature vector, mean profile coefficient is determined Each feature vector corresponding K value when maximum；Wherein, the optimum k value of each feature vector is determined using mean profile coefficient. The standard of silhouette coefficient assessment cluster is the dispersion degree between class cluster and in class cluster, combines condensation degree and separating degree, wherein Condensation degree refers to that i-th of object, the average distance of it and other points in similar cluster is denoted as ai；Separating degree refers to i-th of object With the average distance of all the points in the class cluster for not including it, it is denoted as bi；And the silhouette coefficient of i-th of objectThe value is between [- 1,1], and numerical value is bigger, illustrates that Clustering Effect is better.For k-th of feature vector The mean profile coefficient calculation of featurek are as follows:The application uses mean profile Coefficient assesses the cluster result of feature vector, and K-Means algorithm is avoided to need the uncertain and not accurate of artificial determining k value Property.

K value judgment module: for judging whether larger K value and whether are corresponding mean profile coefficient and K-1, K-2 gap It is smaller, if K value is larger and corresponding mean profile coefficient and K-1, K-2 gap are smaller, using lesser K-1 or K-2 conduct New K value；

Cluster module: it clusters, obtains for the corresponding K value of each feature vector to be substituted into K-Means clustering algorithm The cluster result of each dimension；Wherein, since each feature vector is one-dimensional, so can determine every after K value to be determined The boundary value of a class cluster, marks generally according to numerical value according to large, medium and small or long, medium and short.Using unsupervised cluster side Method K-Menas is applied in cluster load cluster, is determined the boundary value of feature vector clusters, is more met the work of demand complexity Make loadtype.

Category division module: for the cluster result of each dimension to be combined division, being formed has similar features Load class；

Categories combination module: for judging whether load class number is excessive and there are the lesser load class of difference, such as Fruit load class number is excessive and there are the lesser load class of difference, then the feature analyzed between each load class is similar Property, and the load class with similar features is merged, to reduce load class number.

Fig. 3 is the hardware device structural schematic diagram of cloud computing load clustering method provided by the embodiments of the present application.Such as Fig. 3 institute Show, which includes one or more processors and memory.It takes a processor as an example, which can also include: defeated Enter system and output system.

Processor, memory, input system and output system can be connected by bus or other modes, in Fig. 3 with For being connected by bus.

Memory as a kind of non-transient computer readable storage medium, can be used for storing non-transient software program, it is non-temporarily State computer executable program and module.Processor passes through operation non-transient software program stored in memory, instruction And module realizes the place of above method embodiment thereby executing the various function application and data processing of electronic equipment Reason method.

Memory may include storing program area and storage data area, wherein storing program area can storage program area, extremely Application program required for a few function；It storage data area can storing data etc..In addition, memory may include that high speed is random Memory is accessed, can also include non-transient memory, a for example, at least disk memory, flush memory device or other are non- Transient state solid-state memory.In some embodiments, it includes the memory remotely located relative to processor that memory is optional, this A little remote memories can pass through network connection to processing system.The example of above-mentioned network includes but is not limited to internet, enterprise Intranet, local area network, mobile radio communication and combinations thereof.

Input system can receive the number or character information of input, and generate signal input.Output system may include showing Display screen etc. shows equipment.

One or more of module storages in the memory, are executed when by one or more of processors When, execute the following operation of any of the above-described embodiment of the method:

Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiments of the present application.

The embodiment of the present application provides a kind of non-transient (non-volatile) computer storage medium, and the computer storage is situated between Matter is stored with computer executable instructions, the executable following operation of the computer executable instructions:

The embodiment of the present application provides a kind of computer program product, and the computer program product is non-temporary including being stored in Computer program on state computer readable storage medium, the computer program include program instruction, when described program instructs When being computer-executed, the computer is made to execute following operation:

The cloud computing load clustering method, system and electronic equipment of the embodiment of the present application combine supervised learning and non-prison The advantages of educational inspector practises clusters feature vector based on the estimation of feature set silhouette coefficient in the cloud computing environment of type of production, All feature vectors are combined again, form the higher loadtype of similarity, so that the class cluster generated is with stronger Similitude, it is ensured that the accuracy of Clustering Effect is resource planning and load dispatch to maximize server resource utilization rate More accurate foundation is provided.

The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, defined herein General Principle can realize in other embodiments without departing from the spirit or scope of the application.Therefore, this Shen These embodiments shown in the application please be not intended to be limited to, and are to fit to special with principle disclosed in the present application and novelty The consistent widest scope of point.

Claims

1. a kind of cloud computing loads clustering method, which comprises the following steps:

Step b: calculating the mean profile coefficient of each feature vector, determine when the mean profile coefficient maximum each feature to Measure corresponding K value；

Step c: the corresponding K value of each feature vector is substituted into K-Means clustering algorithm and is clustered, each dimension is obtained Cluster result；

2. cloud computing according to claim 1 loads clustering method, which is characterized in that the step a further include: judgement is adopted Whether load is carrying out when the sample time starts, or loads at the end of sampling and still have not been executed, if the sampling time starts Load is carrying out, or is loaded at the end of sampling and still had not been executed, by the task Starting Executing Time of the cluster monitoring data It is set as negative value.

3. cloud computing according to claim 1 loads clustering method, which is characterized in that in the step a, the extraction Feature vector in the cluster monitoring data further include: min-max standard normalized is carried out to the feature vector of extraction.

4. cloud computing according to claim 3 loads clustering method, which is characterized in that the step b further include: judge K Whether value is larger and whether corresponding mean profile coefficient and K-1, K-2 gap are smaller, if K value is larger and corresponding average wheel Wide coefficient and K-1, K-2 gap is smaller, then using lesser K-1 or K-2 as new K value.

5. cloud computing according to claim 4 loads clustering method, which is characterized in that the step d further include: judgement is negative It is whether excessive and there are the lesser load class of difference to carry class number, if load class number is excessive and that there are difference is smaller Load class, analyze the characteristic similarity between each load class, and the load class with similar features is closed And.

6. a kind of cloud computing loads clustering system characterized by comprising

Data acquisition module: for acquiring cluster monitoring data；

Silhouette coefficient computing module: for calculating the mean profile coefficient of each feature vector, the mean profile coefficient is determined Each feature vector corresponding K value when maximum；

Cluster module: it clusters, obtains for the corresponding K value of each feature vector to be substituted into K-Means clustering algorithm The cluster result of each dimension；

Category division module: for the cluster result of each dimension to be combined division, the load with similar features is formed Classification.

7. cloud computing according to claim 6 loads clustering system, which is characterized in that further include task judgment module, institute It states whether task judgment module is carrying out for judging to load when the sampling time starts, or loads at the end of sampling and be still not carried out It finishes, if load is carrying out when the sampling time starts, or is loaded at the end of sampling and still had not been executed, by the cluster monitoring The task Starting Executing Time of data is set as negative value.

8. cloud computing according to claim 6 loads clustering system, which is characterized in that the characteristic extracting module is also used to Min-max standard normalized is carried out to the feature vector of extraction.

9. cloud computing according to claim 8 loads clustering system, which is characterized in that it further include K value judgment module, it is described K value judgment module is for judging whether larger K value and whether corresponding mean profile coefficient and K-1, K-2 gap are smaller, if K It is worth larger and corresponding mean profile coefficient and K-1, K-2 gap is smaller, then using lesser K-1 or K-2 as new K value.

10. cloud computing according to claim 9 loads clustering system, which is characterized in that further include categories combination module, institute Categories combination module is stated for judging whether load class number is excessive and there are the lesser load class of difference, if load class Other number is excessive and there are the lesser load class of difference, analyzes the characteristic similarity between each load class, and will have The load class of similar features merges.

11. a kind of electronic equipment, comprising:

At least one processor；And

The memory is stored with the instruction that can be executed by one processor, and described instruction is by least one described processor Execute so that at least one described processor be able to carry out above-mentioned 1 to 5 described in any item cloud computings load clustering methods with Lower operation: