CN110647379A

CN110647379A - Hadoop cluster automatic telescopic deployment and Plugin deployment method based on OpenStack cloud

Info

Publication number: CN110647379A
Application number: CN201810682329.9A
Authority: CN
Inventors: 吕智慧; 吴杰; 强浩
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2020-01-03
Anticipated expiration: 2038-06-27
Also published as: CN110647379B

Abstract

The invention belongs to the technical field of cloud computing, and particularly relates to a method for Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud. The method comprises the following steps: in a Hadoop cluster deployment stage, a cluster automatic scaling strategy based on resource utilization rate and a replacement mechanism based on task success rate. The invention enables OpenStack to provide better support for the cluster, the cluster scale can be adjusted according to the service processing amount in different time periods, and the service processing speed is guaranteed.

Description

Hadoop cluster automatic telescopic deployment and Plugin deployment method based on OpenStack cloud

Technical Field

The invention belongs to the technical field of cloud computing, and relates to a Hadoop cluster automatic telescopic deployment and Plugin deployment method based on OpenStack cloud.

Background

The prior art discloses that Sahara can be integrated with third party management tools (such as Apache Ambari and Cloudera management consoles) via a plug-in mechanism. The core part of Sahara is responsible for interaction with users, and provides resources (such as virtual machines, servers, security groups, etc.) of OpenStack through Heat components; the Plugin is responsible for installing and configuring Hadoop clusters in pre-allocated virtual machines, and can also be a tool for cluster deployment management and monitoring. Sahara provides a unified mechanism for Plugin to work in pre-allocated virtual machines: on the one hand, Plugin has to inherit sahara. Plugins. providing.provisioningPluginBase class and needs to implement all necessary methods/interfaces; on the other hand, the virtual machine object provided by Sahara has a remote property, which can be used to implement interaction with the virtual machine, and the virtual machine is operated by an instance remote call command (an available command can be found in Sahara. clients. remote. instant Interophelper).

Based on the current situation of the prior art, the inventor of the application proposes a method for Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud, supplements and optimizes a deployment automatic telescopic mechanism related to a Hadoop cluster, and the automatic telescopic mechanism adjusts the cluster scale to delete redundant nodes, replace problem nodes and deploy new nodes.

Disclosure of Invention

The invention aims to provide a method for Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud, based on the current situation of the prior art, supplements and optimizes a deployment automatic telescopic mechanism related to a Hadoop cluster, and the automatic telescopic mechanism adjusts the cluster scale to delete redundant nodes, replace problem nodes and deploy new nodes.

The purpose of the invention is realized by the following technical scheme:

the invention provides an OpenStack cloud-based Hadoop cluster automatic telescopic deployment method which includes integrating a Sahara module in an OpenStack cloud with a third-party management tool through a Plugin mechanism based on the Sahara module in the OpenStack cloud, referring to requirements, combining an automatic telescopic deployment method, distributing a proper amount of virtual machines for a required Hadoop cluster, and installing and configuring the Hadoop cluster in pre-distributed virtual machines.

Specifically, the method for Hadoop cluster automatic telescopic deployment based on the OpenStack cloud is characterized by completing automatic telescopic deployment of a Hadoop cluster in a cloud environment according to prediction and real-time conditions; the method specifically comprises the following steps:

(1) utilization-based automatic scaling strategy

Introduction of the invention

Respectively representing the expected values of the three utilization rates of the cluster CPU, the RAM and the hard Disk of the user, and the utilization rates of the three utilization rates are respectively the expected values in the actual situation

Since users have different degrees of importance for different resources, lambda is introduced into the system_C、λ_R、λ_DThe three terms are respectively used as the weights of the three terms, so that the following data can be obtained,

φ＝λ_C·η_C+λ_R·η_R+λ_D·η_D(definition 2)

Wherein, definition 1 represents the difference between the actual utilization rate and the expected utilization rate of each index, and the index can specifically reflect the difference between each index; definition 2 represents the difference between the three items and the expectation, the range of the value is also within [0,1), the closer to 0, the closer to the expectation value, the more consistent the utilization rate of the cluster is;

based on the above, the invention provides an automatic scaling strategy based on utilization rate;

(2) automatic telescopic rapid deployment strategy based on task success rate

The invention introduces a variable in the strategy

Is one between [0,1]The percentage of (c) represents the proportion of a task that can successfully run on a single node; under the optimal condition, each node

All tasks can be successfully executed, and the result can be smoothly output; because the failure of the task execution of the node is inevitable, the node cannot be replaced immediately as long as a mistake is made, which is unreasonable and can also cause the increase of the system overhead; so will be right

With a pre-estimated value

If the value approaches 0, the task success rate of the node is too small, in this case, the continuous use of the node reduces the operation efficiency of the cluster, so a node replacement strategy is started;

based on the above, the invention provides an automatic telescopic rapid deployment strategy based on the task success rate.

In the invention, the automatic Hadoop cluster telescopic deployment based on the OpenStack cloud comprises the following two processes:

1. automatic scaling deployment mechanism

(1) Utilization-based automatic scaling strategy

The final realization target of the automatic scaling strategy algorithm based on the cloud platform resource utilization rate is as follows: by combining cloud platform resources and current application scene requirements, computing resources of the cloud platform are utilized more reasonably, and a cluster deployment automatic stretching mechanism related to a Hadoop cluster is optimized, so that the operation efficiency of the cluster reaches a better result;

the use conditions of three indexes of a CPU, a memory and a hard disk generally play an important role in the realization process of the automatic expansion function based on the utilization rate, and the invention mainly reflects the utilization rate from the three indexes;

introduction of the invention

Respectively representing the expected values of the utilization rates of the cluster CPU, the RAM and the hard Disk by the user, and the utilization rates of the three are respectively the expected values in practical situation

Since users have different degrees of importance for different resources, lambda is introduced into the system_C、λ_R、λ_DThe three terms are respectively used as the weights of the three terms, so that the following data are obtained,

φ＝λ_C·η_C+λ_R·η_R+λ_D·η_D(definition 2)

Defining 1 to represent a difference value between an actual utilization rate and an expected utilization rate under each index, wherein the index specifically reflects the difference between each index, and the platform adjusts the cluster resource configuration condition according to the difference value; in the actual situation,

is a possible interval range, so the calculation result of η is a value interval, and since the degree of coincidence with the user's expectation needs to be calculated, a region is the minimum value included in the sub-interval range of the [0,1) interval; definition 2 represents the difference between the three items and the expectation, the range of the value is also within [0,1), the closer to 0, the closer to the expectation value, the more consistent the utilization rate of the cluster is;

in the algorithm, expected values of three utilization rates of a cluster CPU, a memory RAM and a hard Disk of a user are obtained firstly, actual utilization rates of the three utilization rates in a platform are compared, if the actual utilization rates are smaller than the minimum value of the corresponding expected values, Datanode and Namenode services are closed, a virtual machine is closed, and if the actual utilization rates are larger than the minimum value of the corresponding expected values and smaller than the maximum value of the corresponding expected values, the virtual machine is started, a Hadoop cluster is deployed, and the Hadoop cluster is started;

(2) automatic telescopic rapid deployment strategy based on task success rate

Based on Hadoop in the design process, it is thought that various problems may occur in any node to cause failure in execution of the distributed tasks, and the failure caused by some physical factors of the node can be avoided by adopting a mode of replacing the node with a high failure rate;

meanwhile, before a node which replaces the node to provide computing service officially serves the cluster, point-to-point data block copying is needed to ensure that data can be correctly stored, the realization of the strategy can better adjust the state of the whole cluster, and avoids the conditions of weight increment of other node loads and even scale effect caused by the failure of a single node;

to implement this strategy, in the present invention, a variable is first introduced

All tasks can be successfully executed, the result can be smoothly output, and because the failure of the node task execution is inevitable, the node cannot be immediately replaced as long as a mistake is made, which is not only unreasonable, but also can cause the increase of the system overhead, therefore, in the invention, the task execution of the node can be successfully executed, and the node can not be replaced immediately, and the invention can not only avoid the problem of the failure of the node task execution

With a pre-estimated value

If the value approaches to 0, the task success rate of the node is too small, the running efficiency of the cluster is reduced by the continuous use of the node, and therefore a node replacement strategy is started;

the process steps of the algorithm of the present invention are described as follows:

step1 selecting an appropriate one

Value as the minimum standard for measuring task success rate

step2 calculating the success rate of node task in a certain time periodValue of

step3 mixingValue and

comparing the values, if so, continuing to start step2 by the next node; if the ratio is less than the preset value, the next step is carried out

step4 applying for a new node

step5, deploying Hadoop application on the newly applied node and copying the data on the original node

step6 starting service on new node and suspending service of original node

step7 step2 starting service on new node, terminate original node, and enter next node

Through the replacement of the nodes, the optimized task execution effect of the whole cluster is finally realized, and in the replacement process, the failure caused by the physical aspect of the virtual machine can be avoided to the greatest extent.

2. Cluster automatic deployment Plugin implementation

(1) Cluster Plugin realization interface

The cluster is used as an independent plugin and exists in an independent directory form under the sahara/sahara/plugins directory, and the main directory structure of the cluster is shown in FIG. 1;

wherein:

1. as shown in fig. 2, the v2_7_1 directory is specifically-made content of the sandbox plugin 2.7.1 version, the hadoop2 is general content, and the outermost version factor.

Py is the core responsible for implementing all necessary interfaces, the interfaces specifically needed to be implemented are shown in table 1:

3. as shown in fig. 3, the functional implementation is mainly divided into two parts, namely configuration and startup of the cluster. Py is a configured core module, in which a path of a relevant configuration file is configured, and configuration of an environment variable is performed;

4. as shown in fig. 5, in versionhandle.py, the configuration and the startup of the sandbox are specifically completed according to the current plugin version;

5. run _ script.py/starting _ script.py specifically realizes the starting of the cluster, wherein the run _ script.remote () method is utilized to remotely connect to the started virtual machine through ssh, and execute a corresponding linux command, thereby specifically controlling the starting of processes, nodes, clusters and the like.

(2) Cluster mirror image packaging and manufacturing tool implementation

OpenStack virtual machine image

In the embodiment of the invention, a CentOS operating system is taken as an example, and the manufacturing process and principle of the OpenStack virtual machine image are briefly introduced;

2. cluster mirroring

In the invention, a Diskimage-builder is used for making a cluster mirror image.

The invention carries out Hadoop cluster automatic telescopic deployment experiment,

any one group of deployments is selected as a representative of the experiment to carry out a plurality of tests, and index data before and after optimization are compared on the basis to analyze deployment services. Table 2 shows the cluster configuration.

TABLE 2 Cluster configuration

Cluster	vCPU	RAM	disk	Node number
					Cluster
1	4core	10GB	5GB	8
					Cluster 2	2core	100GB	100GB	16
Cluster 3	1core	5GB	80GB	8
					Cluster 4	1core	5GB	80GB	16
Cluster 5	1core	5GB	80GB	24
					Cluster 6	1core	5GB	80GB	48

Test results show that the speed of deployment after optimization is improved considerably compared with that before optimization, when the number of cluster nodes to be deployed is small, the optimization effect is not obvious, but when the number of the cluster nodes is increased, the deployment time before optimization is increased obviously, and the deployment service after optimization is increased along with the increase of the cluster scale, but the increase is moderate, the 6-time cluster deployment time is relatively close, the deployment time is stable within the range from 10 minutes to 20 minutes, and the results show that the cluster deployment time after optimization is improved obviously, and the required time is more stable; compared with the prior art, the optimized deployment service still shows the optimization effect even in the process of cluster deployment with smaller scale, and the effect after optimization is obviously improved in the aspect of success rate, thus showing that the deployment service is more successfully optimized. The Hadoop cluster automatic telescopic deployment strategy provided by the invention can optimize the automatic deployment of the Hadoop cluster, so that the deployment service is more stable and efficient.

Drawings

FIG. 1 shows a directory structure diagram of a cluster as a stand-alone plugin, existing in the form of a stand-alone directory under the sahara/sahara/plugins directory.

Fig. 2 shows that the v2_7_1 directory is specifically-made content of the sandbox plugin 2.7.1 version, the hadoop2 is general content, and the outermost version factor.

Fig. 3 shows that, in terms of functional implementation, the configuration and startup are mainly divided into two parts, i.e., a configuration _ helper _ py is a configured core module, in which a path of a relevant configuration file is configured and an environment variable is configured, and the configuration _ helper _ py is also responsible for specifically generating a corresponding configuration file according to user configuration for a sandbox to be started.

Fig. 4 is a cut-away view of the work done by config _ helper. py when configuring the spark environment variable for a small portion of the system shown in fig. 3.

In fig. 5, pyy, the sandbox is configured and started according to the current plugin version,

wherein the start _ cluster method describes the whole flow of sandbox starting.

Fig. 6 shows that a qcow2 formatted virtual machine image of size 10G is created.

Fig. 7 is a schematic diagram of a cluster.

FIG. 8 shows a web page of the cluster creation service on which the relevant requirements submission for cluster creation is performed, including the selection of a Hadoop version, configuration of nodes as listed in Table 1, selection of mirror images, etc., after the relevant specification is completed, the cluster may enter a deployment phase.

Fig. 9 shows that the speed of deployment after optimization is considerably improved compared with that before optimization, the deployment time of cluster deployment after optimization is obviously improved, and the required time is more stable.

Fig. 10 shows that, compared to the deployment service before optimization, even in the process of cluster deployment with a smaller scale, the optimized deployment service still exhibits its optimization effect, and in terms of success rate, the optimized effect is significantly improved, which represents the success of the deployment service optimization of the present invention.

Detailed Description

The technical solution of the present invention is specifically described below with reference to the accompanying drawings and examples.

The invention aims to provide an OpenStack cloud-based Hadoop cluster automatic telescopic deployment method. As shown in fig. 1, the invention is based on a Sahara module in an OpenStack cloud, and is integrated with a third-party management tool through a Plugin mechanism, and a proper amount of virtual machines are allocated to a required Hadoop cluster by referring to requirements and combining with an automatic telescopic deployment method, and the Hadoop cluster is installed and configured in a pre-allocated virtual machine.

In the invention, Hadoop cluster automatic telescopic deployment is carried out based on OpenStack cloud, and the method specifically comprises the following two processes:

1. automatic scaling deployment mechanism

(1) Utilization-based automatic scaling strategy

The final realization target of the automatic scaling strategy algorithm based on the cloud platform resource utilization rate is as follows: by combining the cloud platform resources and the current application scene requirements, the computing resources of the cloud platform are more reasonably utilized, and the cluster deployment automatic stretching mechanism related to the Hadoop cluster is optimized, so that the operation efficiency of the cluster reaches a better result.

When the resource number occupied by the virtual machine is distributed, the CPU, the memory and the hard disk are generally concerned about 3 aspects, so that the use conditions of the three indexes are inevitably important aspects in the implementation process of the automatic scaling function based on the utilization rate; in the present embodiment, the utilization rate is mainly reflected from these three indexes in terms of the utilization rate.

Introduction of the invention

Since users have different degrees of importance for different resources, lambda is introduced into the system_C、λ_R、λ_DThe three terms are respectively used as the weights of the three. The following data can thus be obtained from,

φ＝λ_C·η_C+λ_R·η_R+λ_D·η_D(definition 2)

Defining 1 to represent a difference value between an actual utilization rate and an expected utilization rate under each index, wherein the index can specifically reflect the difference between each index, and a platform can adjust the configuration condition of cluster resources according to the difference value; in the actual situation,

will generally be a range of possible intervals, so that the calculation of η will also be a range of valuesSince it is necessary to calculate the degree of coincidence with the user's expectation, this region is the minimum value included in the range of the subintervals of the [0,1) interval; definition 2 represents the difference between the three items and the expectation, the range of the value is also within [0,1), the closer to 0, the closer to the expectation value, the more consistent the utilization rate of the cluster is;

the algorithm firstly obtains expected values of three utilization rates of a user for a cluster CPU, an internal memory RAM and a hard Disk, compares the actual utilization rates of the three utilization rates in a platform, and closes a Datanone and a Namenode service and a virtual machine if the actual utilization rate is smaller than the minimum value of the corresponding expected values; if the actual utilization value is greater than the minimum value of the corresponding expected value and less than the maximum value of the corresponding expected value, starting the virtual machine, deploying the Hadoop cluster, starting,

(2) automatic telescopic rapid deployment strategy based on task success rate

In the design process of Hadoop, it is considered that any node may have various problems to cause the execution failure of the distributed task, the failed task will operate again, if the probability of node failure increases, although Hadoop can have its own consideration standard for the distributed task, the operating time of the whole cluster will increase, the higher the task failure rate of the node is, the larger the increase amplitude is, even in this time, the node is always performing task calculation, the utilization rate of each resource is more reasonable, but certain influence is actually caused on the operation of the whole task, the reasons for causing task failure are many, and in order to cause the smallest possible influence on other nodes of the whole cluster, the invention adopts the mode of replacing the node with higher failure rate, thereby avoiding the failure caused by some physical factors of the node itself;

meanwhile, on the principle that mobile computing is higher in economic benefit than mobile data, Hadoop can allocate computing tasks to nodes with data blocks needed by computing as much as possible, the nodes finish storage of needed data in a security mode before, if the data are deleted directly, certain influence is inevitably caused to the whole cluster, the cluster possibly enters the security mode again and needs to wait, therefore, before the nodes replacing the nodes for providing computing service serve the cluster formally, point-to-point data block copying needs to be carried out, data can be stored correctly, the realization of the strategy can better adjust the state of the whole cluster, and the conditions of weight increment of loads of other nodes caused by failure of a single node and even scale effect are avoided;

in order to implement the above strategy, the present invention first introduces a variable

Is one between [0,1]Percentage of (d), representing the proportion of a task that can successfully run on a single node; under the optimal condition, each node

All tasks can be successfully executed, the result can be smoothly output, and because the execution failure of the node tasks is inevitable, the node cannot be immediately replaced as long as a mistake is made, which is unreasonable and can also cause the increase of the system overhead; in the invention, the

With a pre-estimated value

If the value approaches to 0, the task success rate of the node is too small, and at the moment, the running efficiency of the cluster is reduced by the continuous use of the node, so that the node replacement strategy is started;

the process steps of the algorithm are described as follows:

step1 selecting an appropriate one

Value as the minimum standard for measuring task success rate

step2 calculating the success rate of node task in a certain time period

Value of

step3 mixing

Value and

step4 applying for a new node

step6 starting service on new node and suspending service of original node

3. Cluster automatic deployment Plugin implementation

(3) Cluster Plugin realization interface

The cluster is used as an independent plug-in and exists in an independent directory form under the sahara/sahara/plug-ins directory, the main directory structure of the cluster is shown in figure 1,

wherein:

as shown in fig. 2, the v2_7_1 directory is specifically made content of the sandbox plugin version 2.7.1, the hadoop2 is general content, and the outermost version factor.

py is the core responsible for implementing all necessary interfaces, and the interfaces specifically needed to be implemented are shown in table 1:

TABLE 1

(cluster,instances)

Fig. 3 shows that in terms of functional implementation, the configuration and startup are mainly divided into two parts, namely a cluster, and config _ helper is a core module of the configuration, in which a path of a relevant configuration file is configured, and configuration of an environment variable is performed; in addition, the config _ helper _ py is also responsible for specifically generating a corresponding configuration file according to the user configuration for the sandbox to be started, and fig. 4 intercepts a small part of work done by the config _ helper _ py when the spark environment variable is configured;

py, the configuration and the startup of the sandbox are specifically completed according to the current plugin version, wherein the whole flow of the sandbox startup is written in the start _ cluster method;

run _ script.py/starting _ script.py specifically realizes the starting of the cluster, wherein the run _ script.remote () method is utilized to remotely connect to the started virtual machine through ssh, and execute a corresponding linux command, thereby specifically controlling the starting of processes, nodes, clusters and the like.

(4) Cluster mirror image packaging and manufacturing tool implementation

OpenStack virtual machine mirroring:

taking a CentOS operating system as an example, the manufacturing process and principle of the OpenStack virtual machine image in this embodiment are as follows:

1) downloading a CentOS installation ISO mirror image;

2) installation is done by the virt-manager tool or virt-install command. FIG. 6 is an example of an installation using a command line;

fig. 6 shows that a virtual machine image with a qcow2 format is created, the virtual machine image is 10G in size, if a virt-manager is used, gradual installation can be performed through graphical prompts, and some additional configuration needs to be performed during installation, such as changing the ethernet state, setting the host name, specifying the installation source and the storage device, performing disk partitioning, setting the root password, and the like;

3) after the step 2), logging in the virtual machine which is just installed through the root user, and performing related configuration, such as installation of ACPI service, installation of a closed-init packet, configuration of partition size adjustment support, zeroconf routing forbidding, configuration of console log output and the like;

4) after the configuration is finished, closing the virtual machine;

5) removing MAC address information;

6) the image is compressed.

After the steps are completed, a common OpenStack virtual machine mirror image is manufactured and can be uploaded to an OpenStack platform for use.

Cluster mirroring:

aiming at clusters of different types and versions, mirror images of corresponding types and versions are required to be used as support, when the cluster mirror images are manufactured, besides a basic OpenStack mirror image manufacturing step, downloading, installing and configuring of all related software packages (such as Hadoop and Spark) in the mirror images are required.

The invention performs Hadoop cluster automatic telescopic deployment experiment

Any one group of deployments is selected as a representative of the experiment to carry out a plurality of tests, and index data before and after optimization are compared on the basis to analyze deployment services.

TABLE 2 Cluster configuration

In an experiment, six different Hadoop clusters are deployed, as shown in fig. 6, six clusters with different scales and configurations are researched, 6 physical computing nodes in the experiment are total, Openstack can control the positions of virtual machines to a certain extent according to the resource use condition, so that the virtual machines are uniformly distributed on each node in a first cluster and a second cluster, in order to reduce uncertainty caused by other factors, the cluster deployment is directly performed on 6 identical machines in the embodiment, and table 2 shows the specific node resource configuration conditions of the two clusters;

fig. 8 shows a web page of the cluster creation service, on which relevant requirements for cluster creation are submitted, where the requirements include selection of a Hadoop version, configuration of nodes shown in table 1, selection of mirror images, and the like, and after all the requirements are specified in a relevant manner, the cluster may enter a deployment stage, and in this experiment, the experiment results are compared before and after optimization in terms of deployment speed and success rate;

in the two groups of comparison tests, six Hadoop clusters are respectively deployed, the deployment of each cluster is carried out for a plurality of times, the abnormal or failed condition is eliminated, the average value is taken as the deployment time of the cluster, fig. 9 shows that the speed of deployment after optimization is considerably higher than that before optimization, and in the case of a smaller number of cluster nodes to be deployed, the optimization effect is not significant, but when the number of cluster nodes is increased, the deployment time before optimization is obviously increased, the deployment service after optimization has the advantages that the deployment time is increased along with the increase of the cluster scale, but the increase is moderate, the 6-time cluster deployment time is relatively close, the deployment time is stabilized within the range from 10 minutes to 20 minutes, and the result shows that the optimized cluster deployment time is obviously improved, and the required time is more stable; as shown in fig. 10, the optimized deployment service still exhibits its optimization effect even in the process of cluster deployment with smaller scale compared to the deployment service before optimization; the results show that along with the increase of the cluster scale, the success rate of cluster deployment is reduced to a certain extent due to various uncertainties, and before optimization, the reduction is obvious, and the stability of the deployment service is poor; after optimization, although the success rate is also reduced, the amplitude reduction is small, and the amplitude reduction also tends to be smooth and is still maintained at a high level; the experimental result proves that in the aspect of success rate, the effect after optimization is obviously improved obviously, and the optimization of the deployment service is more successful. The Hadoop cluster automatic telescopic deployment strategy can optimize automatic deployment of the Hadoop cluster, so that the deployment service is more stable and efficient.

Claims

1. A Hadoop cluster automatic telescopic deployment method based on an OpenStack cloud is characterized by comprising the steps of integrating a Sahara module in the OpenStack cloud with a third-party management tool through a Plugin mechanism based on the OpenStack cloud, referring to requirements, combining an automatic telescopic deployment method, distributing a proper amount of virtual machines for the needed Hadoop cluster, and installing and configuring the Hadoop cluster in pre-distributed virtual machines.

2. The method of claim 1, wherein the method performs automated scaling deployment of Hadoop clusters in a cloud environment based on prediction and real-time conditions, comprising:

(1) utilization-based automatic scaling strategy

Introduction of

Respectively representing the expectation values of the three utilization rates of the cluster CPU, the RAM and the hard Disk of the user, and replacing the utilization rates of the three utilization rates to be l_C、l_R、l_DAccording to the different users' attention degree to different resources, lambda is introduced into the resource model_C、λ_R、λ_DThe three items are respectively used as the weight values of the three items; the following data were thus obtained from the above,

φ＝λ_C·η_C+λ_R·η_R+λ_D·η_D(definition 2)

Wherein, definition 1 represents the difference between the actual utilization rate and the expected utilization rate under each index, and the index specifically reflects the difference between each index; definition 2 represents the difference between the three terms and the expectation, the range of the value is [0,1), the closer to 0, the closer to the expectation value, the more consistent the utilization rate of the cluster is;

(2) automatic telescopic rapid deployment strategy based on task success rate

Introducing a variable into a strategy

Is one between [0,1]Percentage of (d), representing the proportion of a task that can successfully run on a single node; each node

All tasks can be successfully executed, and the result can be smoothly output;

based on the inevitable failure of the task execution of the node, to

Setting a pre-estimated value

If the value approaches 0, the task success rate representing the node is too small, the continuous use of the node reduces the operation efficiency of the cluster, and the node replacement strategy is enabled.