CN107315636B - Resource availability early warning method and device - Google Patents

Resource availability early warning method and device Download PDF

Info

Publication number
CN107315636B
CN107315636B CN201610265261.5A CN201610265261A CN107315636B CN 107315636 B CN107315636 B CN 107315636B CN 201610265261 A CN201610265261 A CN 201610265261A CN 107315636 B CN107315636 B CN 107315636B
Authority
CN
China
Prior art keywords
resource
resource usage
increment
time period
early warning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610265261.5A
Other languages
Chinese (zh)
Other versions
CN107315636A (en
Inventor
李湛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Hebei Co Ltd
Original Assignee
China Mobile Group Hebei Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Hebei Co Ltd filed Critical China Mobile Group Hebei Co Ltd
Priority to CN201610265261.5A priority Critical patent/CN107315636B/en
Publication of CN107315636A publication Critical patent/CN107315636A/en
Application granted granted Critical
Publication of CN107315636B publication Critical patent/CN107315636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/504Resource capping

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a resource availability early warning method, which comprises the following steps: establishing a resource usage amount estimation model; predicting the resource usage of the next time period through the resource usage prediction model; and when the estimated resource usage amount of the next period exceeds a preset threshold value, sending out a resource availability early warning. The invention also provides a resource availability degree early warning device.

Description

Resource availability early warning method and device
Technical Field
The invention relates to the technical field of business support, in particular to a method and a device for early warning of availability of Hadoop resources.
Background
Hadoop is one of mainstream software of the current big data platform, provides a basic framework of Distributed mass data storage (HDFS), Distributed large-scale computing (MapReduce) and a universal Resource management System (YARN), has the advantages of high fault tolerance, usability, expandability and the like, is widely used for data mining, Online analytical processing (OLAP), business analysis and the like, can discover potential client groups, help market segmentation and client relationship management, predict future market trends and the like, provides decision support for enterprise leaders, and achieves the purpose of data value increase and change. At present, Hadoop is widely applied to the fields of Internet, communication, finance and the like.
As shown in fig. 1, the existing Hadoop architecture based on YARN mainly includes a series of modules such as a global Resource Manager (RM), an application master manager (AM), a Node Manager (NM), and a Container.
The integral Hadoop execution flow based on the YARN framework is as follows:
step 1: a user submits application programs such as MapReduce and the like through a client JobClient and applies for resources from an RM;
step 2: after receiving the request, a global application manager (ASM) and a Resource Scheduler (RS) in the RM allocate a first Container to the application, check that the corresponding NM is in communication with the Container, and issue a command to start an AM in the Container;
and step 3: AM registers oneself in RM, then use polling mode to apply for every task resource through remote procedure call RPC protocol, mainly include CPU, memory, etc.;
and 4, step 4: when the AM receives the resources, the AM communicates with the NM, and the NM starts a task to be executed;
and 5: each task reports the current state of the task to an AM through an RPC protocol, the AM monitors the running states of all the tasks, and the tasks are restarted after the tasks are found to be run unsuccessfully and then re-apply for resources;
step 6: and after the application program is executed, the AM logs out to the RM and closes the AM, and related resources are recovered and released.
Although the conventional YARN architecture of Hadoop provides good support for resource management and task scheduling monitoring, the conventional YARN architecture only can monitor the task state at present, cannot give an early warning to a user in advance when resources are insufficient, cannot adjust the resources in advance, possibly causes serious resource deficiency when the task progress is close to completion, and can only restart and redistribute the resources, so that the waste of time and resources is caused.
Disclosure of Invention
In view of this, embodiments of the present invention are expected to provide a method and an apparatus for early warning resource availability, which can effectively reduce resource waste and improve resource utilization.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
the embodiment of the invention provides a resource availability early warning method, which comprises the following steps:
establishing a resource usage amount estimation model;
predicting the resource usage of the next time period through the resource usage prediction model;
and when the estimated resource usage amount of the next period exceeds a preset threshold value, sending out a resource availability early warning.
In the above scheme, the establishing a resource usage amount estimation model includes:
determining a consumed resource increment corresponding to the task progress increment in the next time period according to the progress of all parallel tasks in each time period, the consumed resource increment, the time period rule and the incidence relation;
comparing the consumption resource increment corresponding to the determined task progress increment in the next time period with the consumption resource increment corresponding to the actual task progress increment for multiple times, and dynamically adjusting the weight to reduce the error;
and selecting the weight corresponding to the minimum error to establish a resource usage amount estimation model.
In the above scheme, the estimating the resource usage in the next time period by the resource usage estimation model includes but is not limited to: and predicting the CPU resource usage and the memory resource usage in the next time period through the resource usage prediction model.
In the foregoing solution, the estimating that the resource usage amount of the next cycle exceeds the first threshold includes: the estimated resource usage for the next cycle exceeds a threshold of the remaining available resource amount.
In the above scheme, when the estimated resource usage amount of the next cycle exceeds a preset threshold, the sending out the resource availability early warning includes: and when the estimated resource usage amount of the next period exceeds a preset threshold, sending out resource availability early warnings of different levels according to the priority, the importance degree and the dependency relationship of the task.
The embodiment of the invention also provides a resource availability degree early warning device, which comprises: a model establishing module, a resource pre-estimating module and a resource pre-warning module, wherein,
the model establishing module is used for establishing a resource usage amount estimation model;
the resource pre-estimation module is used for pre-estimating the resource usage of the next time period through the resource usage pre-estimation model;
and the resource early warning module is used for sending out resource availability early warning when the estimated resource usage amount of the next period exceeds a preset threshold value.
In the foregoing solution, the model building module is specifically configured to:
determining a consumed resource increment corresponding to the task progress increment in the next time period according to the progress of all parallel tasks in each time period, the consumed resource increment, the time period rule and the incidence relation;
comparing the consumption resource increment corresponding to the determined task progress increment in the next time period with the consumption resource increment corresponding to the actual task progress increment for multiple times, and dynamically adjusting the weight to reduce the error;
and selecting the weight corresponding to the minimum error to establish a resource usage amount estimation model.
In the foregoing solution, the resource estimation module is specifically configured to: and predicting the CPU resource usage and the memory resource usage in the next time period through the resource usage prediction model.
In the foregoing solution, the determining, by the resource early warning module, that the estimated resource usage amount of the next cycle exceeds the first threshold includes: and the resource early warning module judges that the estimated resource usage amount of the next period exceeds the threshold value of the residual available resource amount.
In the above scheme, the resource early warning module is specifically configured to: and when the estimated resource usage amount of the next period exceeds a preset threshold, sending out resource availability early warnings of different levels according to the priority, the importance degree and the dependency relationship of the task.
The resource availability early warning method and the device provided by the embodiment of the invention firstly establish a resource usage amount estimation model, then estimate the resource usage amount of the next time period through the resource usage amount estimation model, and send out resource availability early warning when the estimated resource usage amount of the next period exceeds a preset threshold value. Therefore, the condition that a large number of tasks fail due to insufficient available resources can be avoided, the defects of the original operation mechanism are improved, the resource waste can be effectively reduced, and the resource utilization rate is improved.
Drawings
FIG. 1 is a YARN-based Hadoop architecture diagram;
FIG. 2 is a flowchart illustrating a resource availability warning method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a resource availability early warning method according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a resource availability warning apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a resource availability warning system according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a workflow of a resource availability early warning system according to an embodiment of the present invention.
Detailed Description
In the embodiment of the invention, a resource usage amount estimation model is established firstly, then the resource usage amount of the next time period is estimated through the resource usage amount estimation model, and when the estimated resource usage amount of the next period exceeds a preset threshold value, a resource availability degree early warning is sent out.
Under normal conditions, the normal operation of the Hadoop is not supported by sufficient resources such as a CPU (central processing unit), an internal memory and the like, the influence of the availability of various resources on the task operation of the Hadoop is different, for example, if the CPU resources are not enough, the task failure can not be directly caused, but the integral operation speed and the processing efficiency are obviously reduced, so that a large amount of task backlog queuing occurs; if the memory resources are not enough, the memory resources are often fatal, memory overflow occurs, task execution is interrupted, and many tasks fail. Therefore, it is necessary to monitor the availability of the Hadoop resource in real time and perform early warning before the resource availability is low.
The embodiment of the invention provides a Hadoop resource availability early warning method, which comprises the steps of firstly establishing a dynamic feedback learning model based on time sequences, learning potential rules and incidence relations in a time period according to preset rules and generating consumption resource increments corresponding to the task progress increments in the next time period according to the progress of all parallel tasks and the increment dynamic of consumption resources in each time sequence increment range, then comparing the actual task progress and the increment values of the consumption resources to dynamically adjust a weight reduction error optimization model continuously and repeatedly, taking the optimal weight with the minimum error in the current time sequence range from a series of weights to form a model, then estimating the size of resources required by completing the rest tasks on the basis of the model, and if the required resource value exceeds the threshold value of the rest available resource amount, sending different levels of early warning according to different priorities, importance degrees and dependence relations of the tasks, the prompt resource availability is low, a large number of task failures can be caused, reference is provided for the resource value of the automatic adjustment container in the later period, and basis is provided for the user to manually configure the resource value, so that the condition that a large number of task failures are caused due to insufficient available resources is avoided, the defects of the original operation mechanism are improved, the resource waste can be effectively reduced, and the resource utilization rate is improved.
The following describes the implementation of the technical solution of the present invention in further detail with reference to the accompanying drawings and specific embodiments. Fig. 2 is a schematic flow chart of a resource availability early warning method according to an embodiment of the present invention, and as shown in fig. 2, the resource availability early warning method according to the embodiment includes the following steps:
step 201: establishing a resource usage amount estimation model;
in the embodiment of the present invention, the establishing a resource usage amount estimation model includes: determining a consumed resource increment corresponding to the task progress increment in the next time period according to the progress of all parallel tasks in each time period, the consumed resource increment, the time period rule and the incidence relation; comparing the consumption resource increment corresponding to the determined task progress increment in the next time period with the consumption resource increment corresponding to the actual task progress increment for multiple times, and dynamically adjusting the weight to reduce the error; and selecting the weight corresponding to the minimum error to establish a resource usage amount estimation model.
Specifically, according to the progress and the resource consumption increment of all parallel tasks in the time sequence increment range, a potential rule and an association relation in the time period are learned according to a preset rule, a resource consumption increment corresponding to the task progress increment in the next time period is generated, then a weight reduction error optimization model is dynamically adjusted continuously and repeatedly by comparing the actual task progress and the increment value of the consumed resource, and an optimal weight with the minimum error in the current time sequence range is taken from a series of weights to form the model.
FIG. 3 is a schematic diagram illustrating a resource availability early warning method according to an embodiment of the present invention, and as shown in FIG. 3, a set of all tasks in Hadoop is represented as JALL={j1,j2,...,jNWhere N represents the number of all tasks. Setting the time sequence to be from tiTime ti+1A period of time within the time of day, the time sequence is Δ t ═ ti+1-ti. In the time series rangeThe set of all running tasks within the fence is denoted JRUNNING={j1,j2,...,jnN is less than or equal to N. The progress states of all running tasks in Hadoop are monitored and expressed in percentage, namely, JP is aggregatedRUNNING={jp1,jp2,...,jpnJ, any one of themkAt tiFor time of day
Figure BDA0000975146890000065
Indicates that t is within the time seriesiTime ti+1Task j within a timekIs increased by
Figure BDA0000975146890000066
Task j in unit timekThe average increment of progress is shown in equation (1-1):
Figure BDA0000975146890000061
accordingly, the average increment of progress of all tasks per unit time is calculated, as shown in equation (1-2):
Figure BDA0000975146890000062
t in the time seriesi+1At that moment, the fastest of the schedules of all running tasks is available
Figure BDA0000975146890000067
Measured in terms of the remaining progress of the required completion of the fastest-progress task is
Figure BDA0000975146890000068
Average progress increment of all tasks running simultaneously therewith
Figure BDA0000975146890000069
Calculated by the formula (1-3):
Figure BDA0000975146890000063
in the process of resource availability early warning, the resource consumption condition of the learning task is fed back dynamically mainly from the aspects of a CPU, a memory and the like. Suppose that the time series range is from tiTime ti+1The time represents the increment of CPU resource consumption of all running tasks in percentage form, namely delta cp is equal to cpi+1-cpiThen, the calculation method for all running tasks consuming CPU resources in unit time is as in formula (1-4):
Figure BDA0000975146890000064
the calculation method of the CPU resource increment which is still occupied for completing all running tasks in the next time period during the remaining progress period of completing the fastest progress task in the time sequence range of the period is shown as a formula (1-5):
Figure BDA0000975146890000071
comparing the actual CPU resource consumption value cp of the next time periodi+2And adjust the weight σcAnd reducing the error, as shown in the formula (1-6):
Figure BDA0000975146890000072
forming a series of weight sets in the time sequence range
Figure BDA0000975146890000076
Taking the optimal weight with the minimum error from the weights as the current sigmacAnd the value is dynamically and repeatedly learned to obtain the optimal value according to the continuous change of the resource consumption in the time series.
Similarly, from t in the time seriesiTime ti+1Calculating the method for all running tasks consuming memory resources in unit timeThe method is shown in the formula (1-7):
Figure BDA0000975146890000073
during the period of completing the remaining progress of the fastest progress task in the period, the calculation method of the occupied memory resource increment needed for completing all the running tasks in the next time period is shown as a formula (1-8):
Figure BDA0000975146890000074
comparing the actual memory resource consumption value mp in the next time periodi+2And adjust the weight σmReducing the error, such as the formula (1-9):
Figure BDA0000975146890000075
a series of weight sets are similarly formed over a time series
Figure BDA0000975146890000077
And dynamically and repeatedly learning, optimizing and adjusting to obtain the optimal weight value with the minimum error as the current sigmamThe value is obtained.
Step 202: predicting the resource usage of the next time period through the resource usage prediction model;
in the embodiment of the present invention, the estimating the resource usage in the next time period by using the resource usage estimation model includes, but is not limited to: and predicting the CPU resource usage and the memory resource usage in the next time period through the resource usage prediction model.
Specifically, on the basis of the formed minimum-error optimal resource usage estimation model, the size of resources required by the completion of the remaining tasks is estimated according to the current resource consumption condition of all parallel tasks on the basis of the progress condition of the task with the fastest progress in the time sequence range;
in the embodiment of the invention, if all the running tasks are required to be ensuredIt is important to ensure enough resources during the peak period of the most parallel tasks, as shown in fig. 3, t is within the time sequence rangeiTime ti+1The number of parallel tasks in time is the largest, and the task jp with the fastest progresskAfter the completion, some resources are released, the number of the parallel tasks is reduced, and the consumption of the related resources such as the CPU and the memory is correspondingly reduced, so that the success or failure of the parallel tasks in the time sequence range mainly depends on whether the available resources such as the CPU and the memory are enough during the period of time when the task with the fastest progress runs at most.
The resource estimation model established in step 201 can be used to estimate the size of the resource needed to complete the remaining task, if the current time is tTThen, the CPU resource cp required is estimatedTThe calculation method of (2) is shown in the formula (1-10):
Figure BDA0000975146890000081
predicting needed memory resource mpTThe calculation method of (2) is shown in (1-11):
Figure BDA0000975146890000082
step 203: and when the estimated resource usage amount of the next period exceeds a preset threshold value, sending out a resource availability early warning.
In this embodiment of the present invention, the estimating that the resource usage amount of the next cycle exceeds the first threshold includes: the estimated resource usage for the next cycle exceeds a threshold of the remaining available resource amount.
When the estimated resource usage amount of the next period exceeds a preset threshold, sending out a resource availability early warning comprises the following steps: and when the estimated resource usage amount of the next period exceeds a preset threshold, sending out resource availability early warnings of different levels according to the priority, the importance degree and the dependency relationship of the task.
Specifically, if the estimated resource usage amount required for completing the remaining tasks exceeds the threshold of the remaining available resource amount, different high-level and low-level early warnings are sent according to different priorities, importance degrees and dependency relationships of the tasks, so that the fact that a large number of tasks are likely to fail due to low resource availability is prompted;
at tTThe actual remaining available CPU resource amount of Hadoop at the moment is
Figure BDA0000975146890000091
Suppose CPU resource availability early warning threshold is mucThen the pre-warning rules are as shown in equations (1-12):
Figure BDA0000975146890000092
at tTThe actual remaining amount of available memory resources at that moment is
Figure BDA0000975146890000093
Setting the early warning threshold value of the availability of the memory resource as mumAnd then the early warning rule is shown as the formula (1-13):
Figure BDA0000975146890000094
wherein the early warning threshold value mucAnd mumThe method can be divided into alarms with different levels such as serious alarm, important alarm, general alarm and the like according to actual requirements. The serious alarm mainly refers to an alarm which has fatal influence on a system, a platform or an application and needs immediate intervention, and the alarm is the alarm at the highest level; the important alarm mainly refers to the alarm which partially affects the system, the platform or the application, and the alarm is at an intermediate level; general alarms are mainly alarms of the warning category, and may not have direct influence on a system, a platform or an application, which is the lowest-level alarm; and setting finer-grained alarms according to actual conditions. If the priority of the current running task is higher and important and a plurality of subsequent tasks depend on the task, a higher-level alarm is generated, the priority, the importance degree and the dependency relationship can be quantized and corresponding threshold values can be set in practical application, and the complex task can be realizedThe variable environment generates different levels of alarms.
According to the method, the early warning of the availability of other resources can be realized, so that all available resources of Hadoop can be monitored in real time, and a warning can be given in advance to inform maintenance personnel of capacity expansion when the resources are insufficient, and adverse effects such as operation efficiency reduction, task failure and data loss caused by insufficient resources are avoided.
An embodiment of the present invention further provides a resource availability early warning apparatus, fig. 4 is a schematic structural diagram of the resource availability early warning apparatus in the embodiment of the present invention, and as shown in fig. 4, the apparatus includes: a model building module 41, a resource estimation module 42, a resource pre-warning module 43, wherein,
the model establishing module 41 is configured to establish a resource usage amount estimation model;
in an embodiment of the present invention, the model establishing module is specifically configured to:
determining a consumed resource increment corresponding to the task progress increment in the next time period according to the progress of all parallel tasks in each time period, the consumed resource increment, the time period rule and the incidence relation; comparing the consumption resource increment corresponding to the determined task progress increment in the next time period with the consumption resource increment corresponding to the actual task progress increment for multiple times, and dynamically adjusting the weight to reduce the error; and selecting the weight corresponding to the minimum error to establish a resource usage amount estimation model.
Specifically, according to the progress and the resource consumption increment of all parallel tasks in the time sequence increment range, a potential rule and an association relation in the time period are learned according to a preset rule, a resource consumption increment corresponding to the task progress increment in the next time period is generated, then a weight reduction error optimization model is dynamically adjusted continuously and repeatedly by comparing the actual task progress and the increment value of the consumed resource, and an optimal weight with the minimum error in the current time sequence range is taken from a series of weights to form the model.
The resource estimation module 42 is configured to estimate the resource usage in the next time period through the resource usage estimation model;
in the embodiment of the present invention, the resource estimation module is specifically configured to: and predicting the CPU resource usage and the memory resource usage in the next time period through the resource usage prediction model.
Specifically, on the basis of the formed minimum-error optimal resource usage estimation model, the size of resources required by the completion of the remaining tasks is estimated according to the condition that all parallel tasks currently consume resources on the basis of the progress condition of the task with the fastest progress in the time sequence range.
And the resource early warning module 43 is configured to send out a resource availability early warning when the estimated resource usage amount of the next cycle exceeds a preset threshold.
In the embodiment of the present invention, the determining, by the resource early warning module, that the estimated resource usage amount of the next cycle exceeds the first threshold includes: the resource early warning module judges that the estimated resource usage amount of the next period exceeds a threshold value of the residual available resource amount; and when the estimated resource usage amount of the next period exceeds a preset threshold, sending out resource availability early warnings of different levels according to the priority, the importance degree and the dependency relationship of the task.
Fig. 5 is a schematic structural diagram of a resource availability early warning system according to an embodiment of the present invention, and as shown in fig. 5, a model building module 41 and a resource estimation module 42 in the resource availability early warning device according to the embodiment of the present invention are located in an application task early warning device 51 (AW, ApplicationWarner) in fig. 5, and the resource early warning module 43 is located in a global resource early warning device 52(RW, ResourceWarner) in fig. 5;
in terms of an overall architecture, the resource availability early warning system provided by the embodiment of the invention adds functional modules RW and AW with a resource availability monitoring early warning mechanism on the basis of an original YARN framework module architecture of Hadoop, respectively implements the resource availability monitoring early warning of global resources and the resource availability monitoring early warning of local Container containers, and timely pushes warning information with too low resource availability to notify a user according to a preset rule, so that the user can process the warning information in advance, and decision support is provided for operation and maintenance.
The resource availability early warning system establishes a resource availability early warning mechanism based on a time sequence dynamic feedback learning model, dynamically feeds back progress and resource consumption increment of all parallel tasks in each time sequence increment range to AW in real time, AW learns potential rules and association relation in a time period according to preset rules and generates consumption resource increment corresponding to the task progress increment in the next time period, compares actual task progress and the consumption resource increment, and dynamically adjusts a weight reduction error optimization model repeatedly, takes an optimal weight with minimum error in the current time sequence range from a series of weights to form a model, then predicts the size of resources required by the completed residual tasks on the basis, if the required resource value exceeds the threshold value of the residual available resource amount, the RW is notified by AW, and the RW notifies the RW according to different priorities of the tasks, The importance degree and the dependency relationship send out different high-level and low-level early warnings.
The resource availability early warning system of the embodiment of the invention firstly submits application programs such as MapReduce and the like by a user, respectively applies for resources to RM and applies for resource availability monitoring early warning to RW, after the request is allowed, an NM sends a command to a Container to start AM and AW to respectively register the resources and early warning, then the NM starts each task, the task enters an operation state, wherein the AW executes the resource monitoring early warning according to the rules defined by the formulas (1-1) to (1-13), if the resources are found to be insufficient, the NM notifies the RW to process and sends out early warning information, when the task is normally executed, the resources are released and the alarm is closed, and finally the process is ended and exited.
Fig. 6 is a schematic diagram of a workflow of a resource availability early warning system according to an embodiment of the present invention, and as shown in fig. 6, the workflow of the resource availability early warning system according to the embodiment of the present invention includes the following steps:
step 601: a user submits an application program, applies for resources from an RM, and requests a RW to start resource availability monitoring and early warning;
in this step, the user submits the application program in a JobClient mode, applies for the resource from the RM, and requests to start the resource availability monitoring and early warning from the RW.
Step 602: RM allocates Container and communicates with NM to require AW to be started, and RW communicates with NM to require AW to be started;
in this step, the RM, after receiving the request, allocates a first Container to the application, communicates with the corresponding NM, and issues a command to start an AM in the Container. At the same time, RW also accepts requests to communicate with NM and requires that the application task monitoring AW is started in the containers in order to make resource availability monitoring pre-alarms for each Container.
Step 603: AM registers and receives resources to RM, AW registers and reports resources in RW;
in this step, AM registers itself with RM to apply for getting resources such as CPU, memory, etc. for each task, and at the same time, AW registers itself with RW and reports various information such as the type, size, etc. of the resources that each task has got so that RW can classify according to it and adopt different processing strategies.
Step 604: AM requires NM to start task, NM informs AW to monitor and early warn resource availability;
in this step, the AM requires the NM to start executing each task, the NM notifies the AW to monitor and warn the resource availability, then, the NM starts each task, and all tasks enter a formal running state.
Step 605: AM manages the state of the monitoring task, AW monitors and warns the availability of the resource;
in this step, all tasks are managed and state monitored by the AM during the operation period, and resource availability is monitored and early warned by the AW.
Step 606: AW judges whether to carry out early warning, if so, step 607 is executed, and if not, step 608 is executed;
in the step, the progress of the task and the resource availability in the Container are recorded in a specified time sequence range, feedback learning is continuously and repeatedly carried out according to the formulas (1-1) to (1-13) to obtain an optimal model with the minimum error and estimate the resource quantity required by the next period, and an alarm is given when the remaining resource availability is lower than the threshold value of the estimated required resource. And when the AM finds that the irrecoverable task in the fatal early warning state or the failed state can not be recovered, the AM reappears resources and restarts the task.
Step 607: sending out a serious early warning, and returning to the step 602;
step 608: and the RW realizes the monitoring and early warning of the availability of the global resources and informs the users of the warning information.
In this step, RW integrates resource availability and warning information from AW reports, realizes availability monitoring and warning of global resources, and pushes warning information to notify a user.
Step 609: and (4) ending the application program, logging off and releasing resources from the RM by the AM, closing the alarm from the RW by the AW, and ending the process.
In this step, after the application program is executed normally, the AM injects itself from the RM to release the resource, and the corresponding AW also closes its own alarm from the RW, and finally the process is ended.
The resource availability early warning method, device and system provided by the embodiment of the invention take a resource availability early warning mechanism based on a time sequence dynamic feedback learning model as a core method, add functional modules with a resource availability monitoring early warning mechanism, such as RW and AW, on a Hadoop original framework, respectively realize global resource availability monitoring early warning and local Container availability monitoring early warning, and timely notify a user of alarm information of too low resource availability according to a preset rule so as to process and distribute resources in advance. In the concrete implementation process, firstly, a dynamic feedback learning model based on time series is established, the progress of all parallel tasks and the increment of consumed resources in the increment range of each time series are dynamically fed back to AW in real time, AW learns the potential rule and incidence relation in the time period according to the preset rule and generates the increment of consumed resources corresponding to the progress increment of the tasks in the next time period, then the weight reduction error optimization model is dynamically adjusted continuously and repeatedly by comparing the actual progress of the tasks and the increment value of the consumed resources, the optimal weight with the minimum error in the current time series range is taken from a series of weights to form a model, and then estimating the size of the resource required by the completion of the residual task on the basis of the model, if the required resource value exceeds the threshold value of the residual available resource amount, informing RW by AW, and sending out early warnings of different levels according to different priorities, importance degrees and dependency relationships of the task by RW.
The resource availability early warning method, the device and the system make up the defects that the existing Hadoop does not have related modules to provide data and fact basis for the user to distribute resources for the user and the resource shortage situation cannot be predicted in advance only by monitoring the resources and the task state in real time, avoid resource waste, reduce the capital cost, reduce the working pressure of maintainers and provide decision support for operation and maintenance, and have higher practicability in practical application.
The implementation functions of the processing modules in the resource availability warning device shown in fig. 4 can be understood by referring to the related description of the resource availability warning method. Those skilled in the art will understand that the functions of the processing modules in the resource availability warning apparatus shown in fig. 4 can be implemented by a program running on a processor, and can also be implemented by specific logic circuits, such as: may be implemented by a Central Processing Unit (CPU), Microprocessor (MPU), Digital Signal Processor (DSP), or Field Programmable Gate Array (FPGA).
In the embodiments provided in the present invention, it should be understood that the disclosed method and apparatus can be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the modules is only one logical functional division, and other division manners may be implemented in practice, such as: multiple modules or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the communication connections between the components shown or discussed may be through interfaces, indirect couplings or communication connections of devices or modules, and may be electrical, mechanical or other.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed on a plurality of network modules; some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional modules in the embodiments of the present invention may be integrated into one processing module, or each module may be separately used as one module, or two or more modules may be integrated into one module; the integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated module according to the embodiment of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The resource availability warning method and apparatus described in the embodiments of the present invention are only examples of the above embodiments, but are not limited thereto, and those skilled in the art should understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (8)

1. A resource availability early warning method is characterized by comprising the following steps:
establishing a resource usage amount estimation model;
predicting the resource usage of the next time period through the resource usage prediction model;
when the estimated resource usage amount of the next period exceeds a preset threshold value, sending out a resource availability early warning;
the establishing of the resource usage amount pre-estimation model comprises the following steps:
determining a consumed resource increment corresponding to the task progress increment in the next time period according to the progress of all parallel tasks in each time period, the consumed resource increment, the time period rule and the incidence relation;
comparing the consumption resource increment corresponding to the determined task progress increment in the next time period with the consumption resource increment corresponding to the actual task progress increment for multiple times, and dynamically adjusting the weight to reduce the error;
and selecting the weight corresponding to the minimum error to establish a resource usage amount estimation model.
2. The method of claim 1, wherein the estimating the resource usage for the next time period by the resource usage estimation model includes but is not limited to: and predicting the CPU resource usage and the memory resource usage in the next time period through the resource usage prediction model.
3. The method of claim 1, wherein estimating the resource usage for the next cycle exceeding a first threshold comprises: the estimated resource usage for the next cycle exceeds a threshold of the remaining available resource amount.
4. The method according to claim 1 or 3, wherein the issuing of the resource availability warning when the estimated resource usage amount of the next cycle exceeds a preset threshold comprises: and when the estimated resource usage amount of the next period exceeds a preset threshold, sending out resource availability early warnings of different levels according to the priority, the importance degree and the dependency relationship of the task.
5. A resource availability warning apparatus, the apparatus comprising: a model establishing module, a resource pre-estimating module and a resource pre-warning module, wherein,
the model establishing module is used for establishing a resource usage amount estimation model;
the resource pre-estimation module is used for pre-estimating the resource usage of the next time period through the resource usage pre-estimation model;
the resource early warning module is used for sending out resource availability early warning when the estimated resource usage amount of the next period exceeds a preset threshold value;
the model establishing module is specifically used for determining a consumption resource increment corresponding to the task progress increment in the next time period according to the progress of all parallel tasks in each time period, the consumption resource increment, the time period rule and the association relation;
comparing the consumption resource increment corresponding to the determined task progress increment in the next time period with the consumption resource increment corresponding to the actual task progress increment for multiple times, and dynamically adjusting the weight to reduce the error;
and selecting the weight corresponding to the minimum error to establish a resource usage amount estimation model.
6. The apparatus of claim 5, wherein the resource prediction module is specifically configured to: and predicting the CPU resource usage and the memory resource usage in the next time period through the resource usage prediction model.
7. The apparatus of claim 5, wherein the resource pre-warning module, when determining that the estimated resource usage amount of the next cycle exceeds the first threshold, comprises: and the resource early warning module judges that the estimated resource usage amount of the next period exceeds the threshold value of the residual available resource amount.
8. The apparatus according to claim 5 or 7, wherein the resource pre-warning module is specifically configured to: and when the estimated resource usage amount of the next period exceeds a preset threshold, sending out resource availability early warnings of different levels according to the priority, the importance degree and the dependency relationship of the task.
CN201610265261.5A 2016-04-26 2016-04-26 Resource availability early warning method and device Active CN107315636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610265261.5A CN107315636B (en) 2016-04-26 2016-04-26 Resource availability early warning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610265261.5A CN107315636B (en) 2016-04-26 2016-04-26 Resource availability early warning method and device

Publications (2)

Publication Number Publication Date
CN107315636A CN107315636A (en) 2017-11-03
CN107315636B true CN107315636B (en) 2020-06-05

Family

ID=60184366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610265261.5A Active CN107315636B (en) 2016-04-26 2016-04-26 Resource availability early warning method and device

Country Status (1)

Country Link
CN (1) CN107315636B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888733B (en) * 2018-09-11 2023-12-26 三六零科技集团有限公司 Cluster resource use condition processing method and device and electronic equipment
CN109684059A (en) * 2018-12-20 2019-04-26 北京百度网讯科技有限公司 Method and device for monitoring data
CN111858015B (en) * 2019-04-25 2024-01-12 中国移动通信集团河北有限公司 Method, device and gateway for configuring running resources of application program
CN110597634B (en) * 2019-09-12 2021-05-07 腾讯科技(深圳)有限公司 Data processing method and device and computer readable storage medium
CN112328393A (en) * 2020-11-02 2021-02-05 京东数字科技控股股份有限公司 Job processing method, device and system based on big data environment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103024762A (en) * 2012-12-26 2013-04-03 北京邮电大学 Service feature based communication service forecasting method
CN103581339A (en) * 2013-11-25 2014-02-12 广东电网公司汕头供电局 Storage resource allocation monitoring and processing method based on cloud computing
CN103812911A (en) * 2012-11-14 2014-05-21 中兴通讯股份有限公司 Method and system for controlling and utilizing service resources of PaaS (platform as a service) cloud computing platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9588821B2 (en) * 2007-06-22 2017-03-07 Red Hat, Inc. Automatic determination of required resource allocation of virtual machines
US8850450B2 (en) * 2012-01-18 2014-09-30 International Business Machines Corporation Warning track interruption facility

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103812911A (en) * 2012-11-14 2014-05-21 中兴通讯股份有限公司 Method and system for controlling and utilizing service resources of PaaS (platform as a service) cloud computing platform
CN103024762A (en) * 2012-12-26 2013-04-03 北京邮电大学 Service feature based communication service forecasting method
CN103581339A (en) * 2013-11-25 2014-02-12 广东电网公司汕头供电局 Storage resource allocation monitoring and processing method based on cloud computing

Also Published As

Publication number Publication date
CN107315636A (en) 2017-11-03

Similar Documents

Publication Publication Date Title
CN107315636B (en) Resource availability early warning method and device
Imdoukh et al. Machine learning-based auto-scaling for containerized applications
US9363190B2 (en) System, method and computer program product for energy-efficient and service level agreement (SLA)-based management of data centers for cloud computing
CN100473021C (en) System and method for autonomic system management through modulation of network controls
CN115208879B (en) Method, medium and system for query processing
US11579933B2 (en) Method for establishing system resource prediction and resource management model through multi-layer correlations
Moore et al. Transforming reactive auto-scaling into proactive auto-scaling
US20130085998A1 (en) Latency-aware live migration for multitenant database platforms
Hiessl et al. Optimal placement of stream processing operators in the fog
CN112579304A (en) Resource scheduling method, device, equipment and medium based on distributed platform
CN106168912B (en) A kind of dispatching method based on the estimation of backup tasks runing time in Hadoop big data platform
WO2011105001A1 (en) Throughput maintenance support system, device, method, and program
CN104750538B (en) Method and system for providing virtual storage pool for target application
CN110990160B (en) Static security analysis container cloud elastic telescoping method based on load prediction
US20210357016A1 (en) Intelligent and predictive optimization of power needs across virtualized environments
CN112565391A (en) Method, apparatus, device and medium for adjusting instances in an industrial internet platform
Xiao et al. Dscaler: A horizontal autoscaler of microservice based on deep reinforcement learning
WO2020206699A1 (en) Predicting virtual machine allocation failures on server node clusters
Tsenos et al. Energy efficient scheduling for serverless systems
Tran et al. Optimized resource usage with hybrid auto-scaling system for knative serverless edge computing
CN113806027B (en) Task orchestration method, apparatus, electronic device, and computer-readable storage medium
Lanciano et al. Predictive auto-scaling with OpenStack Monasca
Iglesias et al. Increasing task consolidation efficiency by using more accurate resource estimations
US11681353B1 (en) Power capping in a composable computing system
Amoretti et al. A modeling and simulation framework for mobile cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant