CN113434034B - Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning - Google Patents

Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning Download PDF

Info

Publication number
CN113434034B
CN113434034B CN202110774208.9A CN202110774208A CN113434034B CN 113434034 B CN113434034 B CN 113434034B CN 202110774208 A CN202110774208 A CN 202110774208A CN 113434034 B CN113434034 B CN 113434034B
Authority
CN
China
Prior art keywords
cpu frequency
computing node
computing
energy consumption
critical value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110774208.9A
Other languages
Chinese (zh)
Other versions
CN113434034A (en
Inventor
苏斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huaheng Shengshi Technology Co ltd
Original Assignee
Beijing Huaheng Shengshi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huaheng Shengshi Technology Co ltd filed Critical Beijing Huaheng Shengshi Technology Co ltd
Priority to CN202110774208.9A priority Critical patent/CN113434034B/en
Publication of CN113434034A publication Critical patent/CN113434034A/en
Application granted granted Critical
Publication of CN113434034B publication Critical patent/CN113434034B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3243Power saving in microcontroller unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Power Sources (AREA)

Abstract

The invention discloses a large-scale cluster energy-saving method for adjusting the CPU frequency of a computing task by utilizing deep learning. After the critical value is obtained, the CPU frequency of the computing node running the computing task is adjusted, so that the running efficiency of the computing task and the energy consumption of the machine reach a balanced state.

Description

Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning
Technical Field
The invention relates to the technical field of deep learning, in particular to a large-scale cluster energy-saving method for adjusting the frequency of a computing task CPU by utilizing deep learning.
Background
At present, in a large cluster, the CPU frequency of a computing node is fixed, and different computing tasks are operated by the same CPU frequency, so that the power consumption of a super-computing center is always kept at a high level. Some calculation tasks set the CPU frequency according to experience, which cannot effectively improve the performance of the calculation tasks and wastes resources.
The operation of different calculation tasks under the same CPU frequency is not beneficial to improving the performance of the calculation tasks and is also not beneficial to saving electricity of a large-scale cluster. The same CPU frequency may result in inefficient operation of the job or increased energy consumption of the machine. In the prior art, the balance between frequency and energy consumption is difficult to achieve, and even if the CPU frequency critical value of the running calculation task of the machine can be calculated by running a large amount of calculation operation, the operation of manually adjusting the frequency of the machine is very complicated.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a large-scale cluster energy-saving method for adjusting the CPU frequency of a calculation task by utilizing deep learning.
In order to achieve the purpose, the invention adopts the following technical scheme:
the large-scale cluster energy-saving method for adjusting the CPU frequency of the calculation task by utilizing deep learning comprises the following specific processes:
after receiving job information submitted by a user, dispatching the job to a computing node which is most suitable for running a computing task of the job according to the collected load condition of each computing node;
when the computing node runs the computing task for the first time, adjusting the CPU frequency of the computing node to be the current CPU frequency of the computing node; in the operation process of a calculation task, operation data and calculation node operation data are collected once every set time, the operation data comprise operation time, the calculation node operation data comprise calculation node energy consumption and CPU frequency, a CPU frequency critical value is obtained through deep learning algorithm analysis by using the operation data and the calculation node operation data, accordingly, the CPU frequency of the calculation node is adjusted to the CPU frequency critical value, the CPU frequency of the calculation node is reduced, and energy conservation of the calculation node is achieved;
the specific process for analyzing the CPU frequency critical value by using the deep learning algorithm comprises the following steps:
the method comprises the steps of constructing a neural network model, wherein input variables of the neural network comprise operation running time, calculation node energy consumption and calculation node CPU frequency, counting respective weighted values of the three input variables, outputting a CPU frequency critical value H of the calculation node by using the obtained operation running time, calculation node energy consumption and CPU frequency as a data training set, adjusting the CPU frequency of the calculation node to a critical value, repeatedly verifying whether the critical value is correct, and adjusting the CPU frequency of the calculation node again if the critical value changes.
Furthermore, each computing node is provided with a neural network belonging to the computing node, and the operation running time of the neural network of each computing node, the energy consumption of the computing node and the weight value of the CPU frequency of the computing node need to be determined according to the actual running condition.
The invention has the beneficial effects that: the invention forms a training set by acquiring the energy consumption of the machine running the calculation task under different frequencies, and analyzes the critical value of the relation between the CPU frequency and the energy consumption of the machine when the calculation task is run by using a deep learning algorithm. After the critical value is obtained, the CPU frequency of the computing node running the computing task is adjusted, so that the running efficiency of the computing task and the energy consumption of the machine reach a balanced state.
Drawings
FIG. 1 is a schematic flow chart of a method of example 1 of the present invention;
FIG. 2 is a functional image of CPU frequency and power consumption plotted in example 1 of the present invention;
fig. 3 is a schematic diagram of a neural network model in embodiment 1 of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and it should be noted that the present embodiment is based on the technical scheme, and a detailed implementation manner and a specific operation process are provided, but the protection scope of the present invention is not limited to the present embodiment.
Example 1
The embodiment provides a large cluster energy saving method for adjusting the CPU frequency of a computation task by using deep learning, as shown in fig. 1, the specific process is as follows:
after receiving job information submitted by a user, dispatching the job to a computing node which is most suitable for running a computing task of the job according to the collected load condition of each computing node (server);
when the computing node runs the computing task for the first time, adjusting the CPU frequency of the computing node to be the current frequency value of the computing node; in the running process of the computing task, job running data and computing node running data are collected once every set time, the job running data comprises job running time, the computing node running data comprises computing node energy consumption and CPU frequency, a CPU frequency critical value is obtained by utilizing the job running data and the computing node running data and analyzing through a deep learning algorithm, the CPU frequency of the computing node is adjusted to the CPU frequency critical value, the CPU frequency of the computing node is reduced, and energy conservation of the computing node is achieved.
The principles and processes for deriving the CPU frequency threshold using deep learning algorithm analysis are further described below.
The formula for calculating the energy consumption and the CPU frequency of the node is as follows:
P=CV 2 f;
p represents energy consumption; c is a constant and is determined by factors such as the manufacturing process and design of the calculation node; v represents a voltage; f represents the CPU frequency.
The energy consumption of the same calculation task is different under different calculation node CPU frequencies in the operation process, and the minimum energy consumption of the calculation task under a certain CPU frequency can be realized through deep learning of training data. As shown in fig. 2, a data training set is formed by obtaining frequencies of CPUs with different core numbers when processing the same calculation task and numerical values of energy consumption thereof, data in the data training set is refined into energy consumption median numbers under different CPU frequencies, and functional images of the CPU frequencies and the energy consumption are drawn, and the result is shown in fig. 2.
In fig. 2, the ordinate is the energy consumption P of the CPU in units of W; the abscissa is the CPU frequency f in MHz. As can be seen from FIG. 2, when the CPU frequency f reaches a threshold (red line), P and f reach threshold points.
(1) When f is less than or equal to the critical value, the P and f are in a linear relation, and the higher the energy consumption P is, the higher the CPU frequency f is, and the higher the execution efficiency of the calculation task is;
(2)f>when the critical value is reached, P and f lose the original linear relationship and show an exponential relationship, and at this time, the CPU energy consumption is greatly increased every time the CPU frequency of Δ f is increased, because the formula P = CV at this time 2 In f, with the increase of f value, the required energy consumption is more and more, the CPU voltage is more and more, and V 2 The proportion of (a) is gradually increased, and at this time:
Figure BDA0003153836140000051
when the CPU frequency f is increased by the same Δ f, a larger Δ P needs to be increased, so the function image exhibits the characteristic of an exponential function, and at this time, the CPU power consumption is rapidly increased every time the CPU frequency is increased by a part of Δ f.
Therefore, in this embodiment, the specific process of obtaining the CPU frequency threshold value by using the deep learning algorithm includes:
the CPU frequency is increased to improve the execution speed of the calculation task, but the increase of the CPU frequency can cause the calculation node to be in an overclocking state, the system is unstable, and the energy consumption is also increased rapidly. In order to meet the energy-saving requirement of green computing, the most appropriate CPU frequency needs to be found, so that the energy consumption of computing nodes is relatively low and the execution speed of computing tasks is high.
After a large number of same calculation tasks are repeatedly run on the calculation nodes, CPU frequency and electric quantity data consumed by the computer nodes in the running process of the calculation tasks can be obtained to form a training set, the training set is used for calculating the critical points of P and f, and the energy consumption of the calculation nodes exceeding the critical points can be greatly increased.
Meanwhile, the energy conservation of the server cluster cannot conflict with the execution of the computing task, and the execution efficiency of the computing task is influenced by considering the energy conservation at one step, so that the execution time of the computing task is also required to be considered when the CPU frequency critical value is obtained. Therefore, the method of the present embodiment builds a neural network model as shown in fig. 3.
The input variables of the neural network comprise operation running time X1, computing node energy consumption X2 and computing node CPU frequency X3, the weights of the three input variables are different for different computing nodes and computing tasks, the weight values of different computing nodes are counted according to actual running conditions, the CPU frequency critical value H of the computing node is output, the CPU frequency of the computing node is adjusted to the critical value, whether the critical value is correct or not is repeatedly verified, and the CPU frequency of the computing node is adjusted again when the critical value changes.
And the CPU frequency of the critical value of the computing node is found by considering various factors, so that the computing task can balance the operating efficiency and the energy consumption by keeping the CPU frequency as the critical value when the computing task operates.
It should be noted that the two-dimensional convolutional neural network of the present embodiment is formulated as
Figure BDA0003153836140000061
x i,j As input variables, w u-i,v-j H (u, v) is the output quantity of the H layer; input matrix
Figure BDA0003153836140000062
Weight value matrix->
Figure BDA0003153836140000063
Affine transformation is carried out according to a forward propagation formula of the convolutional neural network to obtain
Figure BDA0003153836140000064
Wherein, the feature vector of each layer before activation is z, and the feature vector after activation is y, that is, y = f (z); the input x of each layer can be regarded as a feature vector y of the previous layer after activation; the loss function is denoted by j:
the convolution kernel size is n x n, so the effective convolution is defined as
Figure BDA0003153836140000065
Figure BDA0003153836140000071
Wherein, w rot The matrix w is rotated by 180 deg..
Therefore, the h-th layer output formula of the convolutional neural network is as follows:
Figure BDA0003153836140000072
Figure BDA0003153836140000073
b h for error loss, it is negligible in calculation.
In this embodiment, the input matrix and the output matrix obtained when a certain computation task runs are substituted into a formula to obtain
Figure BDA0003153836140000074
The coefficient result changes due to the difference of the type and the duration of each operation and the machine load, the calculation coefficients during the operation of the same type of operation are collected to form a data set, and the CPU frequency of the median adjusting system is obtained.
It should be noted that each computing node has its own set of deep learning algorithm framework, different CPU frequency critical values can be calculated for different machine models and computing task use conditions, and adjusting the computing node CPU frequency to the critical value can save computing node energy consumption to the greatest extent and ensure computing task operating efficiency.
Example 2
The method in embodiment 1 may be combined with a computing task management system, and the computing task management system may acquire energy consumption data information of a relevant computing node, and may acquire real-time data information through a command, as shown in table 1.
TABLE 1
host cpuf P job_name ave_job_time
quickpool-1 1300 0.45 test1 105
In table 1, the display information includes the node name, the node CPU frequency, the node energy consumption P, the name of the job that runs in batch on the corresponding computing node at this time, and the average time of job execution. The calculation task of the operation is started to run at the current CPU frequency, and then the CPU frequency is gradually increased by a fixed time interval, so that the node energy consumption and the operation execution efficiency are improved, and the average operation interval is shortened. The information changes of the process are shown in tables 2, 3 and 4.
TABLE 2
host cpuf P job_name ave_job_time
quickpool-1 1500 0.51 test1 99
TABLE 3
host cpuf P job_name ave_job_time
quickpool-1 1650 0.59 test1 89
TABLE 4
host cpuf P job_name ave_job_time
quickpool-1 1750 0.85 test1 84
At this time, when the CPU frequency reaches 1750, the energy consumption slowly increases by 0.8W from the CPU frequency increased by 100Hz, and suddenly changes to the energy consumption increased by 0.26W from the CPU frequency increased by 100Hz, and the energy consumption amount conforms to the turning point of the previous energy consumption curve, and at this time, the CPU frequency gradually decreases, as shown in tables 6 and 7.
TABLE 6
Figure BDA0003153836140000081
Figure BDA0003153836140000091
TABLE 7
host cpuf P job_name ave_job_time
quickpool-1 1640 0.59 test1 89
When the CPU frequency is stabilized at about 1600-1650Hz, the energy consumption of the machine and the operation execution efficiency are dynamically balanced. The method and the computing task management system in the embodiment 1 can be perfectly integrated to form integrated control of cluster management and energy conservation.
Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims (1)

1. A large-scale cluster energy-saving method based on a deep learning algorithm is characterized by comprising the following steps:
after receiving job information submitted by a user, dispatching the job to a computing node which is most suitable for running a computing task of the job according to the collected load condition of each computing node;
when the computing node runs the computing task for the first time, adjusting the CPU frequency of the computing node to be the current CPU frequency of the computing node;
in the running process of the computing task, collecting operation data and computing node operation data at set intervals, wherein the operation data comprises operation running time, and the computing node operation data comprises computing node energy consumption and computing node CPU frequency;
constructing a neural network model based on a deep learning algorithm, taking operation running time, computing node energy consumption and computing node CPU frequency as input variables of the neural network model, wherein the three input variables have corresponding input weight values, taking the operation running time, the computing node energy consumption and the computing node CPU frequency of different nodes in the same computing task as a data training set, and training the neural network model to obtain a trained neural network;
inputting the operation running time, the energy consumption of the computing nodes, the CPU frequency of the computing nodes and the corresponding weight values of the three obtained in the actual computing task into a trained neural network, and outputting the CPU frequency critical value of the computing nodes by the trained neural network; when the CPU frequency of the computing node is less than or equal to the CPU frequency critical value, the energy consumption of the computing node and the CPU frequency of the computing node are in a linear relation; when the CPU frequency of the computing node is greater than the CPU frequency critical value, the energy consumption of the computing node and the CPU frequency of the computing node are in an exponential relation;
adjusting the CPU frequency of the computing node to the CPU frequency critical value, repeatedly verifying whether the CPU frequency critical value is correct, and if the CPU frequency critical value changes, adjusting the CPU frequency of the computing node again;
wherein each of the compute nodes has its own neural network.
CN202110774208.9A 2021-07-08 2021-07-08 Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning Active CN113434034B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110774208.9A CN113434034B (en) 2021-07-08 2021-07-08 Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110774208.9A CN113434034B (en) 2021-07-08 2021-07-08 Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning

Publications (2)

Publication Number Publication Date
CN113434034A CN113434034A (en) 2021-09-24
CN113434034B true CN113434034B (en) 2023-04-18

Family

ID=77759692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110774208.9A Active CN113434034B (en) 2021-07-08 2021-07-08 Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning

Country Status (1)

Country Link
CN (1) CN113434034B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11778045B2 (en) 2021-07-12 2023-10-03 Red Hat, Inc. Communication system for micro-frontends of a web application
US12067429B2 (en) 2022-03-18 2024-08-20 Red Hat, Inc. Synchronizing variable values between an application shell and micro-frontends of a web application

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111158974A (en) * 2019-12-06 2020-05-15 华南理工大学 Cloud server-oriented hardware-aware CPU energy consumption measuring and calculating method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7343505B2 (en) * 2004-10-28 2008-03-11 International Business Machines Corporation Method and apparatus for thermal control of electronic components
US9921633B2 (en) * 2014-08-22 2018-03-20 Intel Corporation Power aware job scheduler and manager for a data processing system
CN107943555B (en) * 2017-10-17 2021-11-23 华南理工大学 Big data storage and processing platform and big data processing method in cloud computing environment
CN107861606A (en) * 2017-11-21 2018-03-30 北京工业大学 A kind of heterogeneous polynuclear power cap method by coordinating DVFS and duty mapping
CN111245950B (en) * 2020-01-20 2023-03-10 南京邮电大学 Intelligent scheduling system and method for industrial Internet of things edge resources based on deep learning
CN112631415B (en) * 2020-12-31 2022-09-02 Oppo(重庆)智能科技有限公司 CPU frequency adjusting method, device, electronic equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111158974A (en) * 2019-12-06 2020-05-15 华南理工大学 Cloud server-oriented hardware-aware CPU energy consumption measuring and calculating method

Also Published As

Publication number Publication date
CN113434034A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN113434034B (en) Large-scale cluster energy-saving method for adjusting CPU frequency of calculation task by utilizing deep learning
TWI794157B (en) Automatic multi-threshold feature filtering method and device
CN111026548A (en) Power communication equipment test resource scheduling method for reverse deep reinforcement learning
Wu et al. A deadline-aware estimation of distribution algorithm for resource scheduling in fog computing systems
CN111461507A (en) Multi-server configuration profit maximization method based on client perception value and risk awareness
US20220243347A1 (en) Determination method and determination apparatus for conversion efficiency of hydrogen production by wind-solar hybrid electrolysis of water
CN117555683A (en) Cloud cluster resource scheduling method based on deep reinforcement learning
CN113762591A (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM counterstudy
CN117744877A (en) Virtual power plant optimization method, device, computer equipment and storage medium
CN113034343B (en) Parameter-adaptive hyperspectral image classification GPU parallel method
CN115828769A (en) Method for predicting working condition of cooling tower and reducing consumption based on intelligent calculation
CN109934394A (en) A kind of Demand Side Response prediction technique based on grey and Markov theory
CN115994653A (en) Method and device for constructing load model with equal value from top to bottom and terminal equipment
CN110826909B (en) Workflow execution method based on rule set
CN110737969B (en) Discrete manufacturing system energy-saving method based on maximum algebra
CN111310644A (en) Intelligent identification method and device for types and working states of electrical appliances
CN116542504B (en) Parameter-adaptive semiconductor workpiece production scheduling method, equipment and storage medium
CN106708499B (en) Analysis method and analysis system of drawing processing program
CN115174566B (en) Edge computing task unloading method based on deep reinforcement learning
CN114186627B (en) QR-GRU-based power plant statistical data prediction and verification method
CN114841366B (en) Learning model training method based on wireless federal learning
CN110851230B (en) Virtual machine placement method based on reinforcement learning in cloud computing
CN115511047B (en) Quantification method, device, equipment and medium of Softmax model
CN117648163A (en) Application migration CPU estimation method
CN107222540B (en) Negative feedback-based server cluster grouping scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant