CN111651266A

CN111651266A - Hadoop cluster resource management-based method, device, equipment and storage medium

Info

Publication number: CN111651266A
Application number: CN202010356770.5A
Authority: CN
Inventors: 邓煜
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2020-09-11

Abstract

The application discloses a resource management method, a device, equipment and a storage medium based on a Hadoop cluster, belonging to the technical field of big data, wherein the method comprises the steps of obtaining resource use information data of all queues in the Hadoop cluster within a preset time period; inputting resource use information data into a pre-trained resource use condition prediction model; judging the running state of the queue according to the prediction result; calculating the ratio of the queue in the congestion state in all queues to obtain the occupation ratio of the queue in the congestion state; and judging whether the congestion state queue occupation ratio is larger than a preset threshold value, if so, adjusting queue resources of the Hadoop cluster according to the ratio of the congestion state queue. According to the method and the device, the resource adjusting mode is determined by comparing the congestion state queue duty ratio with the preset threshold value, so that the scheme for adjusting the queue resources is more flexible. The present disclosure also relates to blockchain techniques, and the resource usage information data can be stored in blockchain nodes.

Description

Hadoop cluster resource management-based method, device, equipment and storage medium

Technical Field

The application belongs to the technical field of big data, and particularly relates to a Hadoop cluster resource management-based method, device, equipment and storage medium.

Background

With the rapid development of the internet and the exponential increase of the network data amount, more and more enterprises at present choose to build their own data warehouse based on Hadoop (Hadoop is a distributed system infrastructure developed by the Apache foundation) to manage the network data of the enterprises, and in order to meet the data processing requirements of the enterprises in various application scenarios, multiple queues are often required to be built on the same Hadoop cluster, but the Hadoop cluster resources are easily in a disordered state, so that a uniform scheduling platform is required to be built to realize the uniform management and scheduling of the data resources.

In practical application, the Hadoop generally adopts the Yarn scheduling to realize the scheduling of resources, the present Yarn Scheduler with better performance, such as a Fair Scheduler, can realize the Fair scheduling of resources between two queues or a plurality of queues in the Hadoop cluster by adopting the Fair Scheduler, but a problem that may arise with the Yarn scheduler is that resources are not utilized to the maximum, e.g. queue a and queue B occupy equal resources, when the task amount of the queue A is far more than that of the queue B, only half of the resources of the queue B can be distributed to the queue A under the scheduling of the Fair Scheduler, in this case, it may happen that after queue a has acquired half the resources of queue B, the running state is still congested, even if half of the resources are divided from the queue B, the resources of the queue B may still be sufficient, and the running state of the queue B is still idle, so that the existing Yarn scheduling mode has certain defects.

Disclosure of Invention

The embodiment of the application aims to provide a method, a device, computer equipment and a storage medium based on Hadoop cluster resource management, so as to solve the problem that the existing Yarn scheduling can only carry out fair scheduling when carrying out scheduling on Hadoop cluster queue resources, so that the queue resource adjustment flexibility of a Hadoop cluster is not high.

In order to solve the above technical problem, an embodiment of the present application provides a method for resource management based on a Hadoop cluster, which adopts the following technical solutions:

acquiring resource use information data of all queues in a Hadoop cluster in a preset time period;

inputting resource use information data into a pre-trained resource use condition prediction model, and predicting the resource use conditions of all queues;

judging the running states of all queues according to the prediction result of the resource use condition, wherein the running states of the Hadoop cluster queue comprise an idle state, a normal state and a crowded state;

calculating the ratio of the queue with the running state being the congestion state in all Hadoop cluster queues to obtain the congestion state queue ratio;

judging whether the congestion state queue occupation ratio is greater than a preset threshold value or not;

if the congestion state queue occupation ratio is larger than or equal to a preset threshold, adjusting queue resources of the Hadoop cluster according to the congestion state queue occupation ratio;

and if the congestion state queue occupation ratio is smaller than a preset threshold value, adjusting the queue resources of the Hadoop cluster by adopting a yann algorithm. Optionally, before inputting the resource usage information data into a resource usage prediction model trained in advance to predict resource usage of all queues, the method further includes:

and constructing a resource use condition prediction model according to a logistic regression algorithm.

Optionally, after the resource usage prediction model is constructed according to a logistic regression algorithm, the method further includes:

acquiring historical resource use information data of all queues in a Hadoop cluster;

marking the running states of all queues in the Hadoop cluster according to the historical resource use information data to obtain a training sample;

and importing the training samples into a resource use condition prediction model for training to obtain the trained resource use condition prediction model.

Optionally, the introducing the training sample into the resource usage prediction model for training, and obtaining the trained resource usage prediction model specifically includes:

importing the training sample into a resource use condition prediction model for training to obtain a training result;

fitting the training result and the labeling result of the training sample to obtain a prediction error;

comparing the prediction error with the standard error, and adjusting a resource use condition prediction model according to the comparison result until the prediction error is smaller than the standard error;

and acquiring the adjusted resource use condition prediction model.

Optionally, comparing the prediction error with the standard error, and adjusting the resource usage prediction model according to the comparison result until the prediction error is smaller than the standard error, specifically including:

comparing the prediction error with the standard error;

and if the prediction error is larger than the standard error, reducing the regularization coefficient of the resource use condition prediction model until the prediction error is smaller than the standard error.

Optionally, the method for predicting resource usage of all queues includes the steps of inputting resource usage information data into a resource usage prediction model trained in advance, and predicting resource usage of all queues specifically includes:

sending the obtained resource use information data to kafka for flink consumption;

serializing the resource use information data after flink consumption to obtain serialized data information, and inputting the serialized data information into a pre-trained resource use condition prediction model;

deserializing and analyzing the serialized data information to obtain a cpu occupation proportion and an average task waiting time in the resource use information data;

and predicting the resource use conditions of all queues in the Hadoop cluster according to the CPU occupation proportion and the average task waiting time in the resource use information data.

Optionally, adjusting queue resources of the Hadoop cluster according to the ratio of the congestion state queue includes:

acquiring the ratio of the occupation ratio of the congestion state queues, and determining the Hadoop cluster resource adjustment ratio according to the ratio of the occupation ratio of the congestion state queues;

and adjusting the queue resources of the Hadoop cluster according to the Hadoop cluster resource adjustment proportion.

In order to solve the above technical problem, an embodiment of the present application further provides a device based on Hadoop cluster resource management, which adopts the following technical scheme:

the acquisition module is used for acquiring resource use information data of all queues in the Hadoop cluster in a preset time period;

the prediction module is used for inputting the resource use information data into a pre-trained resource use condition prediction model and predicting the resource use conditions of all queues;

the system comprises a first judgment module, a second judgment module and a third judgment module, wherein the first judgment module is used for judging the running states of all queues according to the prediction result of the resource use condition, and the running states of the Hadoop cluster queues comprise an idle state, a normal state and a crowded state;

the calculation module is used for calculating the ratio of the queue with the congestion state in all Hadoop cluster queues to obtain the congestion state queue ratio;

the second judgment module is used for judging whether the congestion state queue occupation ratio is greater than a preset threshold value or not;

the first adjusting module is used for adjusting queue resources of the Hadoop cluster according to the ratio of the congestion state queue occupation ratio when the congestion state queue occupation ratio is larger than or equal to a preset threshold value;

and the second adjusting module is used for adjusting the queue resources of the Hadoop cluster by adopting a yann algorithm when the congestion state queue duty ratio is smaller than a preset threshold value. In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

a computer device comprises a memory and a processor, wherein a computer program is stored in the memory, and the processor executes the computer program to realize the steps of the Hadoop cluster resource management-based method.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

a computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for Hadoop cluster resource management based.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

the resource use information data of all queues in the Hadoop cluster in a preset time period are obtained, the obtained resource use information data are input into a pre-trained resource use condition prediction model to predict the resource use conditions of all the queues, and the resource use information data of the queues are input into the pre-trained resource use condition prediction model to quickly judge the running states of the queues; and further calculating the ratio of the congestion state queues in all queues by counting the number of queues with the running state being the congestion state, and adjusting queue resources of the Hadoop cluster according to the ratio of the congestion state queues by judging whether the occupation ratio of the congestion state queues is greater than or equal to a preset threshold value, if the occupation ratio of the congestion state queues is greater than or equal to the preset threshold value, and if the occupation ratio of the congestion state queues is less than or equal to the preset threshold value, adjusting by adopting a yarn fair scheduling algorithm.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is a flow diagram of one embodiment of a Hadoop cluster resource management based method according to the present application;

FIG. 2 is a flow diagram of training a resource usage prediction model in a Hadoop cluster resource management-based method according to the present application;

FIG. 3 is a flow diagram of predicting Hadoop cluster queue resource usage in a Hadoop cluster resource management-based method according to the present application;

FIG. 4 is a schematic diagram illustrating an embodiment of an apparatus for Hadoop cluster resource management based on the present application;

FIG. 5 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

With continued reference to FIG. 1, a flow diagram of one embodiment of a Hadoop cluster resource management based method in accordance with the present application is shown. The Hadoop cluster resource management-based method comprises the following steps:

s101, acquiring resource use information data of all queues in the Hadoop cluster in a preset time period.

The Hadoop is a distributed system infrastructure developed by an Apache foundation, the Hadoop fully utilizes the power of a cluster to carry out high-speed operation and storage, meanwhile, the Hadoop realizes a distributed file system, and a user can develop a distributed program in the Hadoop without knowing details of a distributed bottom layer. Generally, a Hadoop cluster includes a plurality of queues, each queue is used for processing a corresponding event or task, and the Hadoop configures a corresponding number of cpu resources for each queue. The predetermined time period can be set by the user according to the requirement, such as 30min or 1 h. The resource usage information data comprises the cpu occupation proportion of each queue in the Hadoop cluster and the average task waiting time of each queue.

And S102, inputting the resource use information data into a pre-trained resource use condition prediction model, and predicting the resource use conditions of all queues.

The pre-trained resource use condition prediction model can be directly used for predicting the resource use condition of the queue in the Hadoop cluster.

Specifically, after the resource use information data of all queues in the Hadoop cluster in the predetermined time period is acquired in step S101, uploading the acquired resource usage information data of all queues to kafka, performing flink consumption at kafka, serializing resource use information data after flink consumption to obtain serialized data information, then inputting the serialized data information into a pre-trained resource use condition prediction model, and performing deserialization analysis on the serialized data information to obtain a cpu occupation ratio and an average task waiting time length in the resource use information data of each queue, inputting the cpu occupation ratio and the average task waiting time length of each queue obtained by analysis into a pre-trained resource use condition prediction model one by one, and predicting the resource use condition of each queue by the trained resource use condition prediction model.

S103, judging the running states of all queues according to the prediction result of the resource use condition, wherein the running states of the Hadoop cluster queue comprise an idle state, a normal state and a crowded state.

Specifically, the running states of all queues are judged according to the prediction result of the resource use condition. For example, in a specific embodiment of the present application, the result obtained by predicting the resource usage of the queue through the trained resource usage prediction model includes three levels, i.e., 0,1 and 2, where 0 indicates that the queue state is "idle state", 1 indicates that the queue state is "normal state", and indicates that the queue state is "congestion state", and a result of predicting the resource usage of the queue in the Hadoop cluster is shown in the following table:

TABLE 1 resource usage prediction results table for queues in Hadoop clusters

The CPU usage proportion of the queue with the queue serial number of 1 in the last hour is 0.85, the average waiting time of the queue task in the last hour is 8min, the two parameters are input into a pre-trained resource usage condition prediction model to obtain a queue resource usage condition prediction result of '1', and the running state of the queue with the serial number of 1 in the future preset time period is indicated to be a 'normal state'.

The CPU usage ratio of the queue with the queue serial number of 2 in the last hour is 0.15, the average waiting time of the queue task in the last hour is 0min, and the running state of the queue with the serial number of 2 in the future preset time period is represented as an idle state after the resource usage prediction result is '0'.

The CPU usage ratio of the queue with the queue serial number of 4 in the last hour is 1, the average waiting time of the queue task in the last hour is 20min, and the running state of the queue with the serial number of 4 in the future preset time period is represented as the congestion state after the resource usage prediction result is '2'.

And S104, calculating the ratio of the queue with the running state being the congestion state in all Hadoop cluster queues to obtain the congestion state queue occupation ratio.

Specifically, queues with a congestion state in an operation state are obtained from all Hadoop clusters, the number of the queues with the congestion state is counted, and the ratio of the queues with the congestion state in all the Hadoop clusters is calculated according to the number of the queues with the congestion state in the Hadoop clusters and the total number of the queues in the Hadoop clusters. For example, in one embodiment of the present application, if the total number of queues in the Hadoop cluster is 10 and the number of congestion status queues is 3, the ratio of the congestion status queues to all Hadoop cluster queues is 3/10.

S105, judging whether the congestion state queue occupation ratio is larger than a preset threshold value or not;

s106, if the congestion state queue occupation ratio is larger than or equal to a preset threshold value, adjusting queue resources of the Hadoop cluster according to the ratio of the congestion state queue occupation ratio;

and S107, if the congestion state queue occupation ratio is smaller than a preset threshold value, adjusting queue resources of the Hadoop cluster by adopting a yann algorithm.

And judging whether the occupation ratio of the congestion state queue is greater than a preset threshold value or not, and when the occupation ratio of the congestion state queue is greater than or equal to the preset threshold value, adjusting queue resources of the Hadoop cluster according to the ratio of the congestion state queue, wherein in the embodiment, the ratio of the congestion state queue in all Hadoop cluster queues is calculated to be 3/10, and the resources among the distribution queues are distributed according to the ratio of 3/10. And when the congestion state queue occupation ratio is smaller than a preset threshold value, adjusting the queue resources of the Hadoop cluster according to a yann scheduling algorithm. The yann scheduling algorithm is a fair scheduling algorithm, and resources between two or more queues can only be fairly distributed by adopting the yann scheduling algorithm, namely the resources between the queues are distributed according to the proportion of 1/2, and the resources cannot be scheduled according to other proportions.

Optionally, before inputting the resource usage information data into a resource usage prediction model trained in advance to predict resource usage of all queues, the method further includes:

Specifically, in the specific embodiment of the present application, a resource usage prediction model is constructed based on a logistic regression multi-classification algorithm. The logistic regression multi-classification algorithm can be obtained based on a logistic regression two-classification algorithm, and the specific principle of the logistic regression multi-classification algorithm is as follows: combining the N binary models.

In a specific embodiment of the present application, the operation states of the Hadoop cluster queues include an idle state, a normal state, and a congestion state, that is, when predicting, the operation states of the queues in the Hadoop cluster are predicted by 3 types of classification, and a logistic regression three-classification model may be used for prediction, that is, a combination of 3 two classification models: and 0, non-0, 1, non-1, 2 and non-2, when the model is trained, inputting the training sample into a logistic regression three-classification model consisting of 3 two-classification models, running the training sample in the 3 two-classification models once to obtain a training result, and finally obtaining the trained resource use condition prediction model. For example, in a certain prediction process, after data is input into the logistic regression three-classification model, three classification results are obtained as "0", "not 1", and "not 2", respectively, and then "0" in the classification results is used as a training result of the logistic regression three-classification model, and the training result is output. The logistic regression two-classification model is generally applicable to the two-classification problem, namely a set of data is given, the judgment result is 0 or 1, and the logistic regression two-classification model has the following principle: multiplying each feature by a regression coefficient, adding all calculation results, substituting the sum into the sigmoid function to obtain a value between 0 and 1, wherein the value is greater than 0.5 and is 1 type, the value is less than 0.5 and is 0 type, and the regression coefficient is used for searching the optimal regression coefficient in a gradient descending and ascending mode.

the historical resource usage information data comprises the historical cpu occupation proportion of each queue and the historical average task waiting time of each queue.

specifically, the running states of all queues in the Hadoop cluster are labeled according to the obtained historical resource use information data, wherein the running states of the queues comprise idle, normal and crowded, and after the labeling is completed, a training sample is obtained.

Specifically, training samples are labeled in the early stage, the training samples and labeling results are led into a resource use condition prediction model to be trained, the resource use condition prediction model obtains the capacity of predicting the use condition of the queue resources, the trained resource use condition prediction model is obtained, and the trained resource use condition prediction model is used for predicting the running state of the queue in the current cluster.

Optionally, with continued reference to fig. 2, a flow diagram for training a resource usage prediction model in a Hadoop cluster resource management based method according to the present application is shown. Importing a training sample into a resource use condition prediction model for training, and obtaining the trained resource use condition prediction model, wherein the method specifically comprises the following steps:

s201, importing the training samples into a resource use condition prediction model for training to obtain a training result.

And S202, fitting the training result and the labeling result of the training sample to obtain a prediction error.

Specifically, fitting the training result and the labeling result of the training sample by using an LOO cross-validation method can be adopted to obtain the prediction error. Wherein, leave one verification method (LOO) one kind of classic cross validation method, if the training sample is N samples, then every training sample is as verifying the set alone, remaining N-1 samples are as training the set, circulate repeatedly to every sample and all be regarded as once verifying the set, can make the set of training at every turn all be very close to the original data set through leaving one verification method authentication, avoid random factor can influence the experimental data, and do the cross verification method to the model, can avoid appearing the problem of overfitting. And the labeling result is the labeling result when the running states of all queues in the Hadoop cluster are labeled in the step S12.

S203, comparing the prediction error with the standard error, and adjusting the resource use condition prediction model according to the comparison result until the prediction error is smaller than the standard error;

the standard error is a parameter preset for the training of the resource usage prediction model, and may be set according to an empirical value. In the process of training the prediction model, an error may exist between the prediction result of the prediction model on the training sample and the labeling result of the training sample, and if the error ratio is too large, the actual prediction result is affected, so that the resource use condition prediction model with better recognition effect can be obtained only by correspondingly adjusting the resource use condition prediction model.

And S204, acquiring the adjusted resource use condition prediction model.

In the above embodiment, the training sample is introduced into the resource usage prediction model for training to obtain the training result, the training result is fitted with the labeling result of the training sample by using an LOO cross-validation method to obtain the prediction error, the prediction error is compared with the standard error, the resource usage prediction model is adjusted according to the comparison result, the adjusted resource usage prediction model is obtained, the resource usage prediction model can be obtained through training, the prediction error of the resource usage prediction model is reduced through multiple times of training, and the prediction accuracy is improved.

comparing the prediction error with the standard error;

Specifically, the prediction error is compared with the standard error, and if the prediction error is larger than the standard error, the regularization coefficient C value can be properly reduced until the prediction error is smaller than the standard error. The regularization coefficient C value is a regularization coefficient, the derivative of the regularization strength is smaller, the regularization strength is larger, and the degree of overfitting is prevented from being larger. Optimizing the regularization coefficient C value prevents overfitting, bringing the regularization to a suitable level.

Optionally, with continued reference to fig. 3, a flow diagram of predicting Hadoop cluster queue resource usage in a Hadoop cluster resource management-based method according to the present application is shown. Inputting resource use information data into a pre-trained resource use condition prediction model, predicting the resource use conditions of all queues, specifically comprising:

s301, sending the obtained resource use information data to kafka for flink consumption;

where Kafka is an open source streaming platform developed by the Apache software foundation, written in Scala and Java, and a high throughput distributed publish-subscribe messaging system that can handle all the action stream data of a consumer in a web site.

Specifically, the obtained resource usage information data is uploaded to kafka, and the purpose of performing flash consumption at kafka is to unify online and offline message processing through a parallel loading mechanism of Hadoop, and to provide real-time messages through clustering.

S302, serializing the resource use information data after flink consumption to obtain serialized data information, and inputting the serialized data information into a pre-trained resource use condition prediction model;

serialization (Serialization), among other things, is the process of converting state information of an object into a form that can be stored or transmitted. During serialization, the object writes its current state to a temporary or persistent store, after which the state information for the object can be retrieved by reading from the store or performing an deserialization operation.

Specifically, the resource usage information data after flink consumption is subjected to serialization operation to obtain serialized data information, and the serialized data information is more favorable for transmission and storage.

S303, performing deserialization analysis on the serialized data information to obtain a cpu occupation ratio and an average task waiting time length in the resource use information data;

the deserialization analysis is the reverse process of the serialization operation, and the serialized data information can be analyzed into the state information of the object through the deserialization analysis, so that the cpu occupation proportion and the average task waiting time in the resource use information data are obtained.

S304, predicting the resource use conditions of all queues in the Hadoop cluster according to the CPU occupation proportion and the average task waiting time length in the resource use information data.

Specifically, the pre-trained resource usage prediction model predicts the resource usage of all queues in the Hadoop cluster by using the historical cpu usage of each queue and the historical average task waiting duration of each queue as independent variables and using the resource usage of the queues as dependent variables according to the cpu usage and the average task waiting duration in the resource usage information data obtained by analysis, wherein the results obtained by predicting the resource usage of the queues include three levels of 0,1 and 2, 0 represents that the queue state is an "idle state", 1 represents that the queue state is a "normal state", and 1 represents that the queue state is a "congestion state".

In the above embodiment, the obtained resource usage information data is sent to kafka for flink consumption, an online and offline message processing can be unified through a parallel loading mechanism, the resource usage information data after flink consumption is subjected to serialization operation to obtain serialized data information, the serialized data information is subjected to deserialization analysis after being transmitted to a pre-trained resource usage condition prediction model, the serialized data information is convenient to transmit and store, the serialized data information is subjected to deserialization analysis to obtain a cpu occupancy ratio and an average task waiting duration in the resource usage information data, and resource usage conditions of all queues in a Hadoop cluster can be predicted through the cpu occupancy ratio and the average task waiting duration in the resource usage information data, so that resources of the queues can be adjusted subsequently.

Specifically, when the congestion state queue occupancy is greater than or equal to a preset threshold, the ratio of the congestion state queue occupancy is obtained, the Hadoop cluster resource adjustment ratio is determined according to the ratio of the congestion state queue occupancy, and the queue resources of the Hadoop cluster are adjusted according to the Hadoop cluster resource adjustment ratio. And determining a queue resource adjusting mode of the Hadoop cluster by comparing the calculated relation between the congestion state queue occupation ratio and a preset threshold value, so that the queue resource adjusting scheme of the Hadoop cluster is more flexible. For example, in a certain prediction process, if the ratio of the congestion state queue to all Hadoop cluster queues is 15%, the resources between the allocated queues are allocated according to the ratio of 15%, for example, in the above example, the queue 2 is an idle queue, the queue 4 is a congestion queue, and if all the CPU resources in all the queues are 100, the queue 2 needs to allocate 15 CPU resources to the queue 4 during adjustment. In the implementation of the application, the priority of the free queue allocation is allocated according to the CPU usage proportion of the queue in the last hour, and the priority is allocated when the CPU usage proportion of the queue in the last hour is smaller. In the above example, queue 2 and queue 10 are both free queues, the CPU usage duty ratio in the last hour of queue 2 is 0.15, and the CPU usage duty ratio in the last hour of queue 10 is 0.03, then queue 10 preferentially allocates 15 CPU resources to queue 4 (the CPU usage duty ratio in the last hour is maximum), if the running state of queue 4 becomes a normal state, then queue 2 allocates 15 CPU resources to another congested queue, otherwise, queue 2 continues to allocate 15 CPU resources to queue 4.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 4, as an implementation of the method shown in fig. 1, the present application provides an embodiment of an apparatus based on Hadoop cluster resource management, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 1, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 4, the apparatus for resource management based on Hadoop cluster according to this embodiment includes:

an obtaining module 401, configured to obtain resource usage information data of all queues in a Hadoop cluster within a predetermined time period;

a prediction module 402, configured to input resource usage information data into a pre-trained resource usage prediction model, and predict resource usage of all queues;

the first judging module 403 is configured to judge the operating states of all queues according to the prediction result of the resource usage, where the operating states of the Hadoop cluster queue include an idle state, a normal state, and a congested state;

a calculating module 404, configured to calculate a ratio of the queue in the congestion state in all Hadoop cluster queues to obtain a congestion state queue ratio;

a second judging module 405, configured to judge whether the congestion status queue occupancy is greater than a preset threshold;

a first adjusting module 406, configured to adjust queue resources of the Hadoop cluster according to a ratio of the congestion status queue occupancy ratio when the congestion status queue occupancy ratio is greater than or equal to a preset threshold;

and a second adjusting module 407, configured to adjust queue resources of the Hadoop cluster by using a yard algorithm when the congestion status queue occupancy is smaller than a preset threshold. Optionally, the apparatus for resource management based on Hadoop cluster further includes:

and the model construction module is used for constructing a resource use condition prediction model according to a logistic regression algorithm.

Optionally, the apparatus for resource management based on Hadoop cluster further includes:

the training information acquisition module is used for acquiring historical resource use information data of all queues in the Hadoop cluster;

the marking module is used for marking the running states of all queues in the Hadoop cluster according to the historical resource use information data to obtain a training sample;

and the training sample prediction module is used for importing the training samples into the resource use condition prediction model for training to obtain the trained resource use condition prediction model.

Optionally, the training sample prediction module specifically includes:

the importing unit is used for importing the training samples into the resource use condition prediction model for training to obtain a training result;

the fitting unit is used for fitting the training result and the labeling result of the training sample to obtain a prediction error;

the comparison unit is used for comparing the prediction error with the standard error and adjusting the resource use condition prediction model according to the comparison result until the prediction error is smaller than the standard error;

and the adjusting unit is used for acquiring the adjusted resource use condition prediction model.

Optionally, the alignment unit specifically includes:

the comparison subunit is used for comparing the prediction error with the standard error;

and the adjusting subunit is used for reducing the regularization coefficient of the resource use condition prediction model until the prediction error is smaller than the standard error if the prediction error is larger than the standard error.

Optionally, the prediction module 402 specifically includes:

the consumption unit is used for sending the obtained resource use information data to the kafka for flink consumption;

the serialization unit is used for carrying out serialization operation on the resource use information data after flink consumption to obtain serialized data information, and inputting the serialized data information into a pre-trained resource use condition prediction model;

the deserializing unit is used for deserializing and analyzing the serialized data information to obtain a cpu occupation proportion and an average task waiting time length in the resource use information data;

and the prediction unit is used for predicting the resource use conditions of all queues in the Hadoop cluster according to the CPU occupation proportion and the average task waiting time in the resource use information data.

Optionally, the first adjusting module 406 specifically includes:

the adjustment proportion determining unit is used for acquiring the ratio of the congestion state queue occupation ratio and determining the Hadoop cluster resource adjustment proportion according to the ratio of the congestion state queue occupation ratio;

and the queue resource adjusting unit is used for adjusting the queue resources of the Hadoop cluster according to the Hadoop cluster resource adjusting proportion.

The application discloses device based on Hadoop cluster resource management includes: an obtaining module 401, configured to obtain resource usage information data of all queues in a Hadoop cluster within a predetermined time period; a prediction module 402, configured to input resource usage information data into a pre-trained resource usage prediction model, and predict resource usage of all queues; the first judging module 403 is configured to judge the operating states of all queues according to the prediction result of the resource usage, where the operating states of the Hadoop cluster queue include an idle state, a normal state, and a congested state; a calculating module 404, configured to calculate a ratio of the queue in the congestion state in all Hadoop cluster queues to obtain a congestion state queue ratio; a second judging module 405, configured to judge whether the congestion status queue occupancy is greater than a preset threshold; a first adjusting module 406, configured to adjust queue resources of the Hadoop cluster according to a ratio of the congestion status queue occupancy ratio when the congestion status queue occupancy ratio is greater than or equal to a preset threshold; and a second adjusting module 407, configured to adjust queue resources of the Hadoop cluster by using a yard algorithm when the congestion status queue occupancy is smaller than a preset threshold. According to the resource adjustment method and device, the resource use information data of the queue are input into the resource use condition prediction model trained in advance, the running state of the queue is judged quickly, and the resource adjustment mode is determined by comparing the occupation ratio of the queue in the crowded state with the preset threshold value, so that the scheme of queue resource adjustment is more flexible.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only a computer device 6 having components 61-63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash Card (FlashCard), and the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal storage unit of the computer device 6 and an external storage device thereof. In this embodiment, the memory 61 is generally used for storing an operating system and various application software installed in the computer device 6, for example, a program code of a method based on Hadoop cluster resource management. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute the program code stored in the memory 61 or process data, for example, execute the program code of the Hadoop cluster resource management-based method.

The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

The present application provides yet another embodiment, which provides a computer readable storage medium storing a program of a Hadoop cluster resource management based method, which is executable by at least one processor to cause the at least one processor to perform the steps of the Hadoop cluster resource management based method as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method for cluster resource management based on Hadoop is characterized by comprising the following steps:

inputting the resource use information data into a pre-trained resource use condition prediction model to predict the resource use conditions of all queues;

judging whether the congestion state queue occupation ratio is greater than a preset threshold value;

if the congestion state queue occupation ratio is larger than or equal to the preset threshold, adjusting queue resources of the Hadoop cluster according to the congestion state queue occupation ratio;

and if the congestion state queue occupation ratio is smaller than the preset threshold value, adjusting queue resources of the Hadoop cluster by adopting a yann algorithm.

2. The Hadoop-based cluster resource management method of claim 1, wherein before inputting the resource usage information data into a pre-trained resource usage prediction model to predict resource usage for all queues, the method further comprises:

and constructing the resource use condition prediction model according to a logistic regression algorithm.

3. The Hadoop cluster resource management-based method of claim 2, wherein after the constructing the resource usage prediction model according to a logistic regression algorithm, further comprising:

acquiring historical resource use information data of all queues in the Hadoop cluster;

and importing the training sample into the resource use condition prediction model for training to obtain the trained resource use condition prediction model.

4. The method for resource management based on a Hadoop cluster as claimed in claim 3, wherein the introducing the training samples into the resource usage prediction model for training to obtain the trained resource usage prediction model specifically comprises:

importing the training sample into the resource use condition prediction model for training to obtain a training result;

comparing the prediction error with a standard error, and adjusting the resource use condition prediction model according to a comparison result until the prediction error is smaller than the standard error;

and acquiring the adjusted resource use condition prediction model.

5. The Hadoop cluster resource management-based method according to claim 4, wherein the comparing the prediction error with a standard error and adjusting the resource usage prediction model according to the comparison result until the prediction error is smaller than the standard error specifically comprises:

comparing the prediction error with a standard error;

6. The Hadoop cluster resource management-based method according to any one of claims 1 to 5, wherein the inputting the resource usage information data into a pre-trained resource usage prediction model to predict the resource usage of all queues includes:

performing deserialization analysis on the serialized data information to obtain a cpu occupation ratio and an average task waiting duration in the resource use information data;

7. The method for Hadoop cluster-based resource management according to claim 6, wherein the adjusting queue resources of the Hadoop cluster according to the ratio of the congestion status queues comprises:

acquiring the ratio of the occupation ratio of the congestion state queue, and determining the Hadoop cluster resource adjustment ratio according to the ratio of the occupation ratio of the congestion state queue;

8. An apparatus based on Hadoop cluster resource management, comprising:

the prediction module is used for inputting the resource use information data into a pre-trained resource use condition prediction model to predict the resource use conditions of all the queues;

the first judgment module is used for judging the running states of all queues according to the prediction result of the resource use condition, wherein the running states of the Hadoop cluster queue comprise an idle state, a normal state and a crowded state;

the calculation module is used for calculating the ratio of the queue with the running state being the congestion state in all Hadoop cluster queues to obtain the congestion state queue ratio;

the first adjusting module is used for adjusting queue resources of the Hadoop cluster according to the ratio of the congestion state queue occupation ratio when the congestion state queue occupation ratio is larger than or equal to the preset threshold;

and the second adjusting module is used for adjusting queue resources of the Hadoop cluster by adopting a yann algorithm when the congestion state queue duty ratio is smaller than the preset threshold value.

9. A computer device comprising a memory having stored therein a computer program and a processor which, when executing the computer program, carries out the steps of the method of Hadoop cluster resource management based on.

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for Hadoop cluster resource management based, according to any of the claims 1 to 7.