CN117667606A - High-performance computing cluster energy consumption prediction method and system based on user behaviors - Google Patents

High-performance computing cluster energy consumption prediction method and system based on user behaviors Download PDF

Info

Publication number
CN117667606A
CN117667606A CN202410146277.9A CN202410146277A CN117667606A CN 117667606 A CN117667606 A CN 117667606A CN 202410146277 A CN202410146277 A CN 202410146277A CN 117667606 A CN117667606 A CN 117667606A
Authority
CN
China
Prior art keywords
energy consumption
data
sequence
user
user behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410146277.9A
Other languages
Chinese (zh)
Inventor
王继彬
娄燕涛
郭莹
吴晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Computer Science Center National Super Computing Center in Jinan
Original Assignee
Qilu University of Technology
Shandong Computer Science Center National Super Computing Center in Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology, Shandong Computer Science Center National Super Computing Center in Jinan filed Critical Qilu University of Technology
Priority to CN202410146277.9A priority Critical patent/CN117667606A/en
Publication of CN117667606A publication Critical patent/CN117667606A/en
Pending legal-status Critical Current

Links

Abstract

The disclosure provides a high-performance computing cluster energy consumption prediction method and system based on user behaviors, which relate to the technical field of high-performance computing and cloud computing and are used for acquiring all active user sessions monitored in real time and energy consumption data of all nodes and cabinets; extracting a user behavior sequence in a user session, classifying and encoding the user behavior sequence and converting the user behavior sequence into a graph data structure; inputting the graph data structure into a user behavior prediction model, predicting a behavior sequence in a future set time and taking the behavior sequence as a covariate; and carrying out data combination and feature sequence expansion on the covariates and the energy consumption data, obtaining high-dimensional time sequence energy consumption data containing user behavior information, and inputting the high-dimensional time sequence energy consumption data into an energy consumption prediction model to obtain an energy consumption prediction value in a future set time of each cabinet and each node of the cluster. The method and the device consider the influence of the user behavior on the energy consumption, so that more accurate prediction is realized.

Description

High-performance computing cluster energy consumption prediction method and system based on user behaviors
Technical Field
The disclosure relates to the technical field of high-performance computing and cloud computing, in particular to a high-performance computing cluster energy consumption prediction method and system based on user behaviors.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The high performance computing (High Performance Computing, HPC) clusters can rapidly and efficiently complete complex scientific computing tasks, and have gradually become an indispensable infrastructure in the scientific research fields of weather, pharmacological analysis, artificial intelligence and the like. With the rapid increase of the performance and the scale of the HPC cluster, the energy consumption problem of the HPC cluster is increasingly prominent, and the problems not only affect the service life of the server and increase the operation cost, but also restrict the further increase of the cluster scale. How to increase the energy efficiency of HPC clusters is a major problem currently faced.
The energy consumption prediction is the basis for improving the energy efficiency level of the HPC cluster and realizing energy-saving scheduling, the existing prediction method generally regards energy consumption data as time series data, classical time series prediction models such as (Autoregressive Integrated Moving Average Model, ARIMA), long-short-term memory neural networks (Long Short Term Memory, LSTM) and the like are used for prediction, and the core idea of the method is to learn potential change rules from time series data and make predictions for future data according to the rules. However, when the method is directly applied to the energy consumption prediction of the HPC cluster, some behaviors of a user such as submitting, terminating operation and the like can have great influence on the energy consumption of the cluster, unless obvious regularity exists in time of the behaviors of the user, the sudden fluctuation cannot be predicted by simply using a time sequence prediction model, so that obvious hysteresis exists in a prediction curve at the position, the phenomenon is particularly obvious when a large number of small-scale operations with shorter running time exist in the cluster, and the influence of the behaviors of the user cannot be accurately captured in the existing energy consumption prediction, so that the sudden fluctuation is found in time, and the energy efficiency of the cluster is influenced.
Disclosure of Invention
In order to solve the problems, the disclosure provides a high-performance computing cluster energy consumption prediction method and a system based on user behaviors, which are used for accurately predicting a user behavior sequence by establishing a user behavior prediction model; and then combining the predicted user behavior sequence as a covariate with the cluster energy consumption data, so that the user behavior sequence can sense the influence of the user behavior on the energy consumption, and further, the cluster energy consumption is predicted more accurately.
According to some embodiments, the present disclosure employs the following technical solutions:
a high-performance computing cluster energy consumption prediction method based on user behaviors comprises the following steps:
acquiring energy consumption data of all active user sessions, nodes and cabinets in a set time window monitored in real time;
extracting a user behavior sequence in a user session, classifying and encoding the user behavior sequence and converting the user behavior sequence into a graph data structure; inputting the graph data structure into a user behavior prediction model, predicting a behavior sequence in a future set time and taking the behavior sequence as a covariate;
and carrying out data combination and feature sequence expansion on the covariates and the energy consumption data, obtaining high-dimensional time sequence energy consumption data containing user behavior information, and inputting the high-dimensional time sequence energy consumption data into an energy consumption prediction model to obtain an energy consumption prediction value in a future set time of each cabinet and each node of the cluster.
According to some embodiments, the present disclosure employs the following technical solutions:
a high performance computing cluster energy consumption prediction system based on user behavior, comprising:
the data acquisition module is used for acquiring all active user sessions monitored in real time and energy consumption data of each node and each cabinet;
the data processing module is used for extracting a user behavior sequence in a user session, classifying and encoding the user behavior sequence and converting the user behavior sequence into a graph data structure;
the user behavior prediction module is used for inputting the graph data structure into a user behavior prediction model, predicting a behavior sequence in a future set time and taking the behavior sequence as a covariate;
and the energy consumption prediction module is used for carrying out data combination on the covariates and the energy consumption data and expanding a characteristic sequence, obtaining high-dimensional time sequence energy consumption data containing user behavior information, and inputting the high-dimensional time sequence energy consumption data into an energy consumption prediction model to obtain energy consumption prediction values of all cabinets and nodes of the cluster within a future set time.
Compared with the prior art, the beneficial effects of the present disclosure are:
the present disclosure provides a high performance computing cluster energy consumption prediction method based on user behavior, firstly, a prediction model is established to make predictions on a behavior sequence possibly executed by a user in the future, and then the behavior sequence obtained by prediction is used as a covariate to be combined with cluster energy consumption data to obtain high-dimensional time sequence data containing user behavior information; and finally, inputting the high-dimensional time sequence data into an energy consumption prediction model for processing, and generating predictions of energy consumption of all cabinets and nodes of the cluster.
The method comprises the steps of encoding user behaviors through an embedded layer, learning local association of the user behaviors by using a graph learning layer and a graph rolling module, reorganizing data output by the graph rolling module into a sequence structure by using a serialization module, and establishing residual connection with original data; finally, learning global features of user behaviors by using an LSTM module, and generating predictions of user behavior sequences; compared with the existing energy consumption prediction method, the method and the device have the advantages that the influence of the user behavior on the energy consumption is emphasized, and the user behavior information contained in the energy consumption time sequence data can enable the energy consumption prediction model to timely discover the user behavior which possibly influences the energy consumption, so that more accurate prediction is realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.
FIG. 1 is a flowchart of a method for predicting energy consumption of a high-performance computing cluster based on user behavior prediction provided by an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a training process of a user behavior prediction model and an energy consumption prediction model in an energy consumption prediction method according to an embodiment of the disclosure;
FIG. 3 is an internal structure diagram of a user behavior prediction model provided by an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a process for combining energy consumption time series data and user behavior data and amplifying a feature sequence according to an embodiment of the disclosure;
FIG. 5 is a schematic diagram of a prediction effect evaluation and model update flow provided in an embodiment of the disclosure;
fig. 6 is an overall architecture diagram of a high performance computing cluster energy consumption prediction system based on user behavior prediction provided in an embodiment of the present disclosure.
Detailed Description
The disclosure is further described below with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
An embodiment of the present disclosure provides a high performance computing cluster energy consumption prediction method based on user behavior, including:
step one: acquiring all active user sessions monitored in real time and energy consumption data of each node and cabinet;
step two: extracting a user behavior sequence in a user session, classifying and encoding the user behavior sequence and converting the user behavior sequence into a graph data structure; inputting the graph data structure into a user behavior prediction model, predicting a behavior sequence in a future set time and taking the behavior sequence as a covariate;
step three: and carrying out data combination and feature sequence expansion on the covariates and the energy consumption data, obtaining high-dimensional time sequence energy consumption data containing user behavior information, and inputting the high-dimensional time sequence energy consumption data into an energy consumption prediction model to obtain an energy consumption prediction value in a future set time of each cabinet and each node of the cluster.
As an embodiment, as shown in fig. 1, a training method of a model in a high-performance computing cluster energy consumption prediction method based on user behavior in the disclosure is as follows:
step S101: and training a prediction model. The prediction model comprises a user behavior prediction model and a cluster energy consumption prediction model.
The user behavior prediction model takes a current behavior sequence of a user as input and takes user behavior prediction data as output; the cluster energy consumption prediction model takes multidimensional energy consumption time sequence data in a certain time window as input and takes energy consumption prediction data of each node and cabinet in the cluster as output.
Referring to fig. 2, the user behavior prediction model and the cluster energy consumption prediction model are trained based on the following steps:
step S201: and (5) data acquisition and cleaning. Collecting and recording energy consumption data of all cabinets and computing nodes in the HPC cluster; all user behavior information is recorded and organized into conversational forms.
The energy consumption data includes: CPU power, node power, and cabinet power.
The user behavior information includes all operations performed by the user in one server connection, including logging in, viewing resources, executing commands, submitting jobs, suspending jobs, logging out, and the like. Since the impact of a user submitting different types of jobs on energy consumption is different, the act of submitting jobs may be divided into a number of different types of operations according to job type.
Step S202: a user behavior data set is generated. In this embodiment, the user behavior data needs to be classified and encoded and organized into graph data structures in order to be processed using the graph neural network. The specific processing procedure is described in step S301.
Step S203: and training a user behavior prediction model. The user behavior prediction model is trained using the user behavior data set generated in S202. The present embodiment provides a reference implementation of a user behavior prediction model, and model specific information is introduced in step S103.
Step S204: a multi-dimensional energy consumption dataset is generated. Creating the energy consumption dataset requires a series of processes of merging the user behavior sequence with the cluster energy consumption sequence, expanding the covariate sequence for the energy consumption sequence, dividing the sequence categories, and the like. The process is substantially the same as step S104, but the user behavior data used in this step is real data acquired in advance, and the user behavior data used in step S104 is predicted data obtained by step S103.
Step S205: and training a cluster energy consumption prediction model. The energy consumption data set generated in S204 is used to train an energy consumption prediction model, and the embodiment adopts a TFT (Temporal Fusion Transformer) model as the energy consumption prediction model.
Note that the user behavior prediction model and the cluster energy consumption prediction model have various options, and the present embodiment is merely referred to as an implementation reference and should not be construed as being limited to the examples set forth herein; rather, the described features, structures, or characteristics of the embodiments may be combined in any suitable manner in any implementation.
Step S102: and (5) collecting and processing real-time data. And collecting and recording the energy consumption data of the nodes and the cabinets in a certain time window in real time, and recording all the user session data in an active state in real time, wherein the data are used as the input of the subsequent prediction step.
The energy consumption data includes: CPU power, node power, and cabinet power.
The user behavior information includes all operations performed by the user in one server connection, including logging in, viewing resources, executing commands, submitting jobs, suspending jobs, logging out, and the like.
The type of data collected in this step is substantially the same as that of step S201, but since the data collected in this step is only used for single prediction in the subsequent step and possibly on-line training, only session data in an active state and energy consumption data in a limited time window need to be recorded.
Step S103: and predicting user behavior. The user current behavior data is input to the user behavior prediction model generated in step S101, and the operation sequence of the user for a period of time next is predicted. The user behavior prediction model in this embodiment uses a graph convolution network and an LSTM network in combination, for learning local correlations and global sequences of user behaviors, respectively.
Referring specifically to FIG. 3, the user behavior prediction mainly includes the following steps:
step S301: and (5) preprocessing data. The main process of this step is as follows:
assume that the set of user operation types isWhen user session data is managed, operations in the session need to be ordered according to time stamps, and the ordered behavior sequence can be expressed asWherein any element->Representing the user at session->Go on->The type of secondary operation.
At the same time, each session may be organized as a directed graphIn this directed graph, each node represents a type of operation +.>Each edge->Representing that the user is in progress in the session>After the operation +.>And (3) operating.
In addition, to model the dwell time between successive operations of a user, a time-based weight needs to be added to each edge of the directed graphAssume that the set of dwell times is +.>In order to avoid negative effects on the model due to an excessively large time distribution range, it is necessary to provide a new design for the +.>The following treatment is carried out:
and sequentially processing all the sessions by using the steps to obtain the user session data set.
Step S302: and a graph rolling module. The graph neural module processes the input graph data structure, learns the relationships between adjacent nodes in the graph, and generates a latent vector for each node in the graph. In this embodiment, the graph convolution module is used to learn the local preferences of the user operation.
Step S303: and a serialization module. And reorganizing the data output by the graph convolution module into a linear sequence structure according to the original sequence of the user behaviors so that the subsequent modules can further learn the sequence characteristics of the user behaviors from the global level.
Step S304: and a residual error connection module. In the step, residual connection is established between the output of the graph convolution module and the original data, so that the model training speed is increased, and a better prediction effect is obtained.
Step S305: LSTM module. The step takes the user operation sequence coded as the latent vector as the input data of the LSTM model for learning the global sequence feature of the user behavior.
Step S306: and a data decoding module. The method comprises the steps of firstly carrying out linear mapping on a latent vector output by an LSTM module by using a linear layer, then calculating the occurrence probability of various behaviors through a normalized exponential function (Softmax) layer, and finally outputting the behavior with the highest probability as a prediction result.
Step S104: data sequences were combined and amplified. The key point of the step is to add the user behavior sequence obtained by prediction in the step S103 into cluster energy consumption data as covariates, and expand more characteristic sequences for the energy consumption time sequence on the basis of the covariates, so that the energy consumption time sequence is called as multidimensional time sequence data with more characteristics.
Referring to fig. 4, data sequence combining and amplification comprises the steps of:
step S401: and merging the data sequences. Firstly, adding a plurality of characteristic columns into the energy consumption data, wherein each column corresponds to one type of user behavior, and then matching the user behavior predicted in the step S103 according to a time stamp and adding the user behavior into the corresponding characteristic column.
Assuming that the timestamp set corresponding to the energy consumption data in the current input window is T, the specific steps of merging the data sequences are as follows:
extracting the latest timestamp T from the set T;
traversing the user behavior sequence output in step S103 to obtain an operation set H corresponding to t
Traversing the set H, and acquiring each operation H and the affected node list N
Traversing the node list N, adding h to the corresponding feature columns of all nodes in the node list N
Repeating 3, 4 until the set H traversal is completed
Repeating 1-5 until set T is empty
Step S402: and (5) amplifying the characteristic sequence. Expanding more feature sequences for the energy time sequence generated in the step S401, wherein the feature sequences comprise information of the affiliated user, a queue, a resource pool, a working day, a holiday, a month, a season and the like, so that the information comprises more factors which can influence the energy consumption and become multidimensional time sequence data;
step S403: and (5) classifying sequences. The multidimensional time sequence data generated in the steps are classified into three types of static variables, past dynamic variables and known future dynamic variables according to the requirements of an energy consumption prediction model.
Step S105: cluster energy consumption prediction. And (3) inputting the multidimensional time sequence data obtained in the step (S104) into the energy consumption prediction model generated in the step (S101) to obtain the energy consumption prediction value of each cabinet and each node of the cluster for a period of time in the future.
The energy consumption prediction model in this step is Temporal Fusion Transformer (TFT) model, which requires dividing input data into three types of static variables, past dynamic variables and future known dynamic variables. In this embodiment, the static variables include user, queue, resource pool information; the past dynamic variables comprise energy consumption information related to clusters such as nodes, cabinet power and the like; the future known dynamic variables contain information on user behavior, workday, holiday, season, etc.
Note that the user behavior information predicted and obtained by step S103 is divided into dynamic variables known in the future, which is a key to the energy consumption prediction model of the present invention being able to perceive the user behavior.
Step S106: effect evaluation and model updating. And (3) evaluating the energy consumption prediction effect, if the prediction error exceeds a threshold value, generating a new data set by using the recently accumulated data, and performing online training on the model to adapt to the new data and improve the prediction effect.
The specific process is shown in fig. 5, and mainly comprises the following steps:
step S501: long queues D1, E1 and short queue E2 are created. The queues E1 and E2 are used for storing prediction errors of the model in time windows with different lengths, and the queue D1 is used for storing input data in the time window corresponding to the E1.
Step S502: the single predicted input data is obtained and added to queue D1. Each time the prediction model performs prediction, the input data of the current prediction is stored in the queue D1.
Step S503: error values for a single prediction are obtained and added to queues E1 and E2. Each time the prediction model performs prediction, the error value of the current prediction is stored in the queues E1 and E2.
Step S504: the JS divergence for queues E1 and E2 is calculated.
Firstly, estimating probability distribution of error values in the queues E1 and E2 by using a kernel density estimation method, and then measuring the difference of data in the queues E1 and E2 on the probability distribution by using JS divergence (Jensen-Shannon Divergence, JSD), wherein the JS divergence calculation process is as follows:
wherein the method comprises the steps ofIndicating KL divergence (Kullback-Leibler divergence, KLD), +.>、/>And probability distribution functions respectively representing two groups of error values.
Step S505: it is determined whether the difference in distribution of E1 and E2 exceeds a threshold. If the threshold is not exceeded, returning to step S502; if the threshold has been exceeded, the characteristics of the input data are considered to have changed, and the model is updated to accommodate the new data by executing step S506.
Step S506: the predictive model is updated with the data in queue D1 as the dataset. The specific procedure of updating the prediction model is the same as step S101.
Example 2
In one embodiment of the present disclosure, a high performance computing cluster energy consumption prediction system based on user behavior is provided, including:
the data acquisition module is used for acquiring all active user sessions monitored in real time and energy consumption data of each node and each cabinet;
in this embodiment, the energy consumption information of the HPC cluster cabinet and the computing node is collected through a sensor installed on the computing node and a performance analysis tool in a node operating system, where the collection indexes include CPU power, node power, cabinet power, and the like; the user session data is obtained through a user service system deployed on a login node and a Slurm scheduling system deployed on a management and control node, and the collected content comprises all operations performed by a user, such as login, resource checking, command execution, job submission, job suspension, exit and job information submitted by the user, such as data of a work catalog, a job name, a requested resource scale and the like.
The data processing module is used for extracting a user behavior sequence in a user session, classifying and encoding the user behavior sequence and converting the user behavior sequence into a graph data structure;
the user behavior prediction module is used for inputting the graph data structure into a user behavior prediction model, predicting a behavior sequence in a future set time and taking the behavior sequence as a covariate;
in this embodiment, the session is a user session in an active state at present, and the user behavior prediction model may be input to perform prediction by extracting operations that have been performed by the user from the session to form a behavior sequence, and organizing the behavior sequence into a graph form required by the model. The user behavior prediction model firstly encodes user behaviors through an embedded layer and learns local association of the user behaviors by using a graph convolution module, then converts graph structure data into sequence data again through a serialization module and a residual error connection layer, finally generates prediction of the user behavior sequence by using an LSTM module and outputs category information of the user behaviors through a normalized exponential function () layer.
The cluster energy consumption prediction module is used for carrying out data combination on the covariates and the energy consumption data and expanding a characteristic sequence, obtaining high-dimensional time sequence energy consumption data containing user behavior information, inputting the high-dimensional time sequence energy consumption data into the energy consumption prediction model, and obtaining energy consumption prediction values of each cabinet and each node of the cluster within a future set time.
In this embodiment, the energy consumption prediction module integrates the cluster energy consumption data, cluster resource usage information and behavior prediction information output by the user behavior prediction module; then, introducing data such as users, resource pools, holidays, seasons and the like as covariates, and expanding the time sequence into high-dimensional time sequence data with more characteristic dimensions; and finally, inputting the time sequence data into an energy consumption prediction model and generating a prediction result. Because the energy consumption prediction model in this embodiment adopts the TFT model, the sequence needs to be divided into a static variable, a past dynamic variable and a known future dynamic variable before data is input into the prediction model, where the predicted operation sequence output by the user behavior prediction module is classified into the known future dynamic variable, which is the key point that the energy consumption prediction model can sense the user behavior.
The system also comprises a model training and evaluating module, wherein the model training and evaluating module is used for training an initial prediction model by utilizing pre-collected data before the energy consumption prediction system is on line, continuously evaluating the prediction effect after the system is on line, reorganizing a data set and carrying out on-line training on the model when the prediction effect is obviously reduced, so that the model is suitable for new data, and the prediction effect is ensured.
In this embodiment, the model training and evaluating module stores the prediction error of the prediction model by maintaining two queues with different lengths, and accumulates the input data corresponding to the short queues at the same time; the two queues are respectively used for measuring the error value distribution conditions of the long term and the short term of the prediction model, when the probability distribution difference of the error values in the two queues is too large, the input data is considered to be changed to cause the reduction of the model effect, and the accumulated input data is used for generating a data set and updating the model on line.
As shown in fig. 6, as an embodiment, the specific functions of the data acquisition module 601, the user behavior prediction module 602, the cluster energy consumption prediction module 603, and the model training and evaluation module 604 of the present disclosure are illustrated as follows:
module 600: high performance computing clusters. The high performance computing cluster includes a small number of login nodes and management nodes, and a large number of computing nodes. The log system is used for recording user behavior logs and job scheduling logs; the computing node is provided with a sensor for recording the energy consumption and the resource use condition of the system. Such data may be collected and useful data extracted by the data acquisition module 601.
Module 601: and a data acquisition module. Is responsible for collecting energy consumption data from computing nodes in the high performance cluster 600 and extracting user session data from the logging nodes and the management and control nodes.
The energy consumption data are collected through a sensor installed on a computing node and a performance analysis tool in a node operation system, and the collection indexes comprise CPU power, node power, cabinet power and the like;
the user session data is obtained through a user service system deployed on a login node and a Slurm scheduling system deployed on a management and control node, and the collected content comprises all operations performed by a user, such as login, resource checking, command execution, job submission, job suspension, exit and job information submitted by the user, such as data of a work catalog, a job name, a requested resource scale and the like.
Module 602: and a user behavior prediction module. User session data is acquired by the data acquisition module 601 and predictions of future behavior sequences of the user are given.
The session is a user session in an active state at present, and an operation behavior sequence executed by a user is extracted from the session and organized into a graph form required by a model, namely, the operation behavior sequence can be input into a user behavior prediction model for prediction. The predicted sequence of user actions is sent to the energy consumption prediction module 603 for further processing.
Module 603: and an energy consumption prediction module. Cluster energy consumption data are acquired through a data acquisition module 601, a user future behavior sequence is acquired through a user behavior prediction module 602, the data are combined into multi-dimensional time sequence data and are used as model input, and prediction of the cluster future energy consumption condition is finally output.
In this embodiment, the TFT model is used as the energy consumption prediction model, so the sequence needs to be divided into a static variable, a past dynamic variable and a known future dynamic variable before data is input into the prediction model, where the predicted operation sequence output by the user behavior prediction module 602 is classified into the known future dynamic variable, which is the key point that the energy consumption prediction model can sense the user behavior.
Module 604: model training and evaluation module. The working content of the module can be divided into two phases:
before the system is on-line, an initial predictive model is trained using data pre-acquired by the data acquisition module 601.
After the system is online, the energy consumption prediction module 603 continuously acquires and evaluates the prediction result, and online training and updating are performed on the model when the model effect is weakened.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims (10)

1. The high-performance computing cluster energy consumption prediction method based on user behaviors is characterized by comprising the following steps of:
acquiring all active user sessions monitored in real time and energy consumption data of each node and cabinet;
extracting a user behavior sequence in a user session, classifying and encoding the user behavior sequence and converting the user behavior sequence into a graph data structure; inputting the graph data structure into a user behavior prediction model, predicting a behavior sequence in a future set time and taking the behavior sequence as a covariate;
and carrying out data combination and feature sequence expansion on the covariate and the energy consumption data, obtaining high-dimensional time sequence energy consumption data containing user behavior information, and inputting the high-dimensional time sequence energy consumption data into a cluster energy consumption prediction model to obtain an energy consumption prediction value in a future set time for each cabinet and node of the cluster.
2. The method for predicting energy consumption of a high performance computing cluster based on user behavior according to claim 1, wherein the user session includes all operations performed by a user from log-in to log-out, and all data contained in the user session is time stamped; the energy consumption data includes power data of all cabinets and computing nodes in the machine room.
3. The method of claim 1, wherein the sequence of user actions is classified into a classification of operations comprising logging in, viewing resources, executing commands, submitting actions, suspending jobs, and exiting operations, wherein the actions of submitting actions and suspending actions are divided into a plurality of different operations according to the type of job.
4. The method for predicting the energy consumption of a high-performance computing cluster based on user behaviors according to claim 1, wherein the step of performing data combination and feature sequence expansion on the covariates and the energy consumption data comprises the steps of:
and adding a plurality of characteristic sequences for the energy consumption data, wherein each sequence corresponds to one type of user behavior, introducing user, resource pool, holiday and season data into the covariate, matching according to a time stamp, adding the matched data into the corresponding characteristic sequences, merging the data and expanding the characteristic sequences, and obtaining high-dimensional time sequence energy consumption data comprising multiple characteristic sequences.
5. The high performance computing cluster energy consumption prediction method based on user behavior according to claim 2, wherein the power data comprises CPU power, node power, and cabinet power.
6. The method for predicting high performance computing cluster energy consumption based on user behavior according to claim 1, wherein inputting the graph data structure into a user behavior prediction model predicts a behavior sequence within a future set time and uses the behavior sequence as a covariate, comprising: firstly, coding user behaviors through an embedding layer, learning local association of the user behaviors by using a graph learning layer and a graph convolution, then reorganizing data output in the graph convolution into a sequence structure through serialization, establishing residual connection with original input data, finally, learning global features of the user behaviors by using LSTM, generating prediction of a user behavior sequence, and converting the normalized index Korean into original category information for output.
7. The method for predicting energy consumption of a high performance computing cluster based on user behavior according to claim 4, wherein the specific step of data merging comprises:
1) Extracting the latest timestamp from the total collection;
2) Traversing the obtained user behavior sequence, and obtaining an operation set corresponding to the time stamp from the user behavior sequence;
3) Traversing an operation set corresponding to the time stamp, and acquiring each operation and a node list affected by each operation;
4) Traversing the node list, and adding each operation into the corresponding feature columns of all nodes in the node list;
5) The traversal is repeated until the operation set traversal is complete and the total set is empty.
8. The method of claim 1, wherein the cluster energy consumption prediction model is a Temporal Fusion Transformer model, which divides the input high-dimensional time-series energy consumption data into three types of static variables, past dynamic variables and future known dynamic variables, wherein the static variables comprise user, queue and resource pool information, the past dynamic variables comprise nodes, cabinet power and cluster-related energy consumption data, and the future known dynamic variables comprise time information of workdays, holidays and seasons.
9. The method for predicting energy consumption of a high performance computing cluster based on user behavior according to claim 1, wherein the effect evaluation on the predicted energy consumption value comprises: maintaining two error value queues with different lengths, accumulating input data corresponding to a short queue, comparing the error value distribution conditions of a long term and a short term of a prediction model by the two queues, firstly estimating probability distribution from a numerical sequence by using a kernel density estimation method, then measuring the difference between the two probability distribution by using JS divergence, and when the distribution difference of the two groups of values exceeds a preset threshold value, considering that the input data is changed to cause the reduction of the model effect, and generating a data set by using the accumulated input data and updating the model on line.
10. A high performance computing cluster energy consumption prediction system based on user behavior, comprising:
the data acquisition module is used for acquiring all active user sessions monitored in real time and energy consumption data of each node and each cabinet;
the data processing module is used for extracting a user behavior sequence in a user session, classifying and encoding the user behavior sequence and converting the user behavior sequence into a graph data structure;
the user behavior prediction module is used for inputting the graph data structure into a user behavior prediction model, predicting a behavior sequence in a future set time and taking the behavior sequence as a covariate;
and the energy consumption prediction module is used for carrying out data combination on the covariates and the energy consumption data and expanding a characteristic sequence, obtaining high-dimensional time sequence energy consumption data containing user behavior information, inputting the high-dimensional time sequence energy consumption data into the cluster energy consumption prediction model, and obtaining energy consumption prediction values of each cabinet and each node of the cluster within a future set time.
CN202410146277.9A 2024-02-02 2024-02-02 High-performance computing cluster energy consumption prediction method and system based on user behaviors Pending CN117667606A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410146277.9A CN117667606A (en) 2024-02-02 2024-02-02 High-performance computing cluster energy consumption prediction method and system based on user behaviors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410146277.9A CN117667606A (en) 2024-02-02 2024-02-02 High-performance computing cluster energy consumption prediction method and system based on user behaviors

Publications (1)

Publication Number Publication Date
CN117667606A true CN117667606A (en) 2024-03-08

Family

ID=90073566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410146277.9A Pending CN117667606A (en) 2024-02-02 2024-02-02 High-performance computing cluster energy consumption prediction method and system based on user behaviors

Country Status (1)

Country Link
CN (1) CN117667606A (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006331135A (en) * 2005-05-26 2006-12-07 Nec Corp Performance prediction device, performance prediction method and performance prediction program for cluster system
JP2015035883A (en) * 2013-08-08 2015-02-19 株式会社トーク Consumption power amount prediction system, consumption power amount prediction device, consumption power amount prediction method, and program
US20180260242A1 (en) * 2017-03-08 2018-09-13 International Business Machines Corporation Automatic reconfiguration of high performance computing job schedulers based on user behavior, user feedback, and job performance monitoring
US20190196572A1 (en) * 2017-12-22 2019-06-27 Bull Sas Control Of The Energy Consumption Of A Server Cluster
JP2020035413A (en) * 2018-08-28 2020-03-05 日鉄エンジニアリング株式会社 Electric power demand prediction system, construction method of electric power demand prediction model, program, business support system
US20200257605A1 (en) * 2019-02-12 2020-08-13 Fujitsu Limited Job power predicting method and information processing apparatus
CN112418482A (en) * 2020-10-26 2021-02-26 南京邮电大学 Cloud computing energy consumption prediction method based on time series clustering
US20220129745A1 (en) * 2020-10-27 2022-04-28 Sap Se Prediction and Management of System Loading
CN115220900A (en) * 2022-09-19 2022-10-21 山东省计算中心(国家超级计算济南中心) Energy-saving scheduling method and system based on operation power consumption prediction
CN115345355A (en) * 2022-08-02 2022-11-15 北京百度网讯科技有限公司 Energy consumption prediction model construction method, short-term energy consumption prediction method and related device
WO2023272726A1 (en) * 2021-07-02 2023-01-05 深圳先进技术研究院 Cloud server cluster load scheduling method and system, terminal, and storage medium
CN115698901A (en) * 2020-06-26 2023-02-03 英特尔公司 Method, system, article of manufacture, and apparatus for dynamically scheduling wake modes in a computing system
CN116069143A (en) * 2023-04-06 2023-05-05 山东省计算中心(国家超级计算济南中心) Energy saving method and system for judging power consumption prediction based on operation similarity
US20230267010A1 (en) * 2022-02-18 2023-08-24 Sas Institute Inc. System and methods for configuring, deploying and maintaining computing clusters
CN116737521A (en) * 2023-06-21 2023-09-12 山东省计算中心(国家超级计算济南中心) HPC operation power consumption prediction method and system based on self-supervision comparison learning
CN116894504A (en) * 2023-02-16 2023-10-17 国网河南省电力公司濮阳供电公司 Wind power cluster power ultra-short-term prediction model establishment method
CN117251754A (en) * 2023-08-04 2023-12-19 国网辽宁省电力有限公司经济技术研究院 CNN-GRU energy consumption prediction method considering dynamic time packaging

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006331135A (en) * 2005-05-26 2006-12-07 Nec Corp Performance prediction device, performance prediction method and performance prediction program for cluster system
JP2015035883A (en) * 2013-08-08 2015-02-19 株式会社トーク Consumption power amount prediction system, consumption power amount prediction device, consumption power amount prediction method, and program
US20180260242A1 (en) * 2017-03-08 2018-09-13 International Business Machines Corporation Automatic reconfiguration of high performance computing job schedulers based on user behavior, user feedback, and job performance monitoring
US20190196572A1 (en) * 2017-12-22 2019-06-27 Bull Sas Control Of The Energy Consumption Of A Server Cluster
JP2020035413A (en) * 2018-08-28 2020-03-05 日鉄エンジニアリング株式会社 Electric power demand prediction system, construction method of electric power demand prediction model, program, business support system
US20200257605A1 (en) * 2019-02-12 2020-08-13 Fujitsu Limited Job power predicting method and information processing apparatus
CN115698901A (en) * 2020-06-26 2023-02-03 英特尔公司 Method, system, article of manufacture, and apparatus for dynamically scheduling wake modes in a computing system
CN112418482A (en) * 2020-10-26 2021-02-26 南京邮电大学 Cloud computing energy consumption prediction method based on time series clustering
US20220129745A1 (en) * 2020-10-27 2022-04-28 Sap Se Prediction and Management of System Loading
WO2023272726A1 (en) * 2021-07-02 2023-01-05 深圳先进技术研究院 Cloud server cluster load scheduling method and system, terminal, and storage medium
US20230267010A1 (en) * 2022-02-18 2023-08-24 Sas Institute Inc. System and methods for configuring, deploying and maintaining computing clusters
CN115345355A (en) * 2022-08-02 2022-11-15 北京百度网讯科技有限公司 Energy consumption prediction model construction method, short-term energy consumption prediction method and related device
CN115220900A (en) * 2022-09-19 2022-10-21 山东省计算中心(国家超级计算济南中心) Energy-saving scheduling method and system based on operation power consumption prediction
CN116894504A (en) * 2023-02-16 2023-10-17 国网河南省电力公司濮阳供电公司 Wind power cluster power ultra-short-term prediction model establishment method
CN116069143A (en) * 2023-04-06 2023-05-05 山东省计算中心(国家超级计算济南中心) Energy saving method and system for judging power consumption prediction based on operation similarity
CN116737521A (en) * 2023-06-21 2023-09-12 山东省计算中心(国家超级计算济南中心) HPC operation power consumption prediction method and system based on self-supervision comparison learning
CN117251754A (en) * 2023-08-04 2023-12-19 国网辽宁省电力有限公司经济技术研究院 CNN-GRU energy consumption prediction method considering dynamic time packaging

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BAI, Y等: "A Job-Aware Decision Method for Hybrid HPC Cluster Scenarios", 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND SIGNAL PROCESSING (ICSP), 6 October 2023 (2023-10-06) *
于俊洋;胡志刚;周舟;杨柳;: "计算机系统能耗估量模型研究", 电子科技大学学报, no. 03, 30 May 2015 (2015-05-30) *
吴光欣: "云服务器的功耗预测和功耗封顶节能技术研究", 中国优秀硕士学位论文全文数据库 (基础科学辑), 15 October 2022 (2022-10-15) *
王海峰;陈庆奎;: "多指标自趋优的GPU集群能耗控制模型", 计算机研究与发展, no. 01, 15 January 2015 (2015-01-15) *

Similar Documents

Publication Publication Date Title
Karim et al. BHyPreC: a novel Bi-LSTM based hybrid recurrent neural network model to predict the CPU workload of cloud virtual machine
JP2022092592A (en) Method, apparatus, and program for predicting failure and remaining useful life for device
CN114580263A (en) Knowledge graph-based information system fault prediction method and related equipment
Liu et al. Integrating artificial bee colony algorithm and BP neural network for software aging prediction in IoT environment
US11886779B2 (en) Accelerated simulation setup process using prior knowledge extraction for problem matching
CN114647741A (en) Process automatic decision and reasoning method, device, computer equipment and storage medium
CN115168443A (en) Anomaly detection method and system based on GCN-LSTM and attention mechanism
CN115564071A (en) Method and system for generating data labels of power Internet of things equipment
CN115983497A (en) Time sequence data prediction method and device, computer equipment and storage medium
CN113449919B (en) Power consumption prediction method and system based on feature and trend perception
CN112990530B (en) Regional population quantity prediction method, regional population quantity prediction device, electronic equipment and storage medium
CN113505879A (en) Prediction method and device based on multi-attention feature memory model
CN116737521A (en) HPC operation power consumption prediction method and system based on self-supervision comparison learning
CN116502162A (en) Abnormal computing power federal detection method, system and medium in edge computing power network
CN117667606A (en) High-performance computing cluster energy consumption prediction method and system based on user behaviors
CN114818460A (en) Laboratory equipment residual service life prediction method based on automatic machine learning
Zhao et al. A Data-Driven Model for Bearing Remaining Useful Life Prediction with Multi-step Long Short-Term Memory Network
Du et al. OctopusKing: A TCT-aware task scheduling on spark platform
CN117435901B (en) Industrial Internet data acquisition method, system, terminal and storage medium
Lei et al. Application of distributed machine learning model in fault diagnosis of air preheater
CN114036823B (en) Power transformer load control method and device based on coding-decoding and memory mechanism
CN112801372B (en) Data processing method, device, electronic equipment and readable storage medium
CN117539948B (en) Service data retrieval method and device based on deep neural network
Vora et al. Mining environmental data for prediction of transmission patterns of communicable diseases
Jing et al. CBLA_PM: an improved ann-based power consumption prediction algorithm for multi-type jobs on heterogeneous computing server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination