CN117435451A - Method for establishing power consumption and performance model of virtual computing unit in mobile edge computing - Google Patents

Method for establishing power consumption and performance model of virtual computing unit in mobile edge computing Download PDF

Info

Publication number
CN117435451A
CN117435451A CN202311591555.3A CN202311591555A CN117435451A CN 117435451 A CN117435451 A CN 117435451A CN 202311591555 A CN202311591555 A CN 202311591555A CN 117435451 A CN117435451 A CN 117435451A
Authority
CN
China
Prior art keywords
power consumption
cpu
gpu
performance
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311591555.3A
Other languages
Chinese (zh)
Inventor
何臻力
陈新德
阎国军
苏凯
张皓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN202311591555.3A priority Critical patent/CN117435451A/en
Publication of CN117435451A publication Critical patent/CN117435451A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method for establishing a power consumption and performance model of a virtual computing unit in mobile edge computing, which comprises the steps of pre-configuring the virtual computing unit; collecting parameter data of CPU intensive tasks, IO intensive tasks and GPU intensive tasks, constructing a sample data set, and dividing a training set and a testing set; identifying key features of the CPU intensive tasks, the GPU intensive tasks and the IO intensive tasks from the sampling parameters through different benchmarks; establishing performance and power consumption models suitable for different task types according to the key characteristics, and training the models by using a training set; and evaluating and verifying the model through the test set. The model can accurately analyze the relation among the energy consumption, the execution time and the virtual resource allocation of different types of tasks, and provides substantial support for the trade-off between the task performance and the energy consumption and the exploration of the optimal resource allocation strategy in the edge computing environment.

Description

Method for establishing power consumption and performance model of virtual computing unit in mobile edge computing
Technical Field
The invention belongs to the technical field of mobile edge calculation, and particularly relates to a power consumption and performance model building method of a virtual calculation unit in mobile edge calculation.
Background
In recent years, the popularity of smart devices has impaired the effectiveness of traditional resource allocation optimization in improving resource utilization. At present, the virtualization technology occupies a central position, and virtual computing units such as containers, virtual network functions and the like play a vital role in reducing resource redundancy and improving utilization rate through repeated use or idle time occupation. Thus, MEC platforms utilizing virtualization technology have become the solution of choice.
MEC is a typical distributed computing environment having multiple computing nodes and various resources, including computing power, storage power, and network bandwidth. It carries a huge energy burden. Optimizing computing resource utilization is of fundamental importance, but minimizing energy consumption is also important. Therefore, how to achieve a proper balance between the computing requirements and the virtual computing unit configuration while taking into account energy consumption optimization becomes a key area requiring further intensive research. To make substantial progress in this critical area, the fundamental challenge of accurately predicting virtual compute unit configurations and their impact on performance and power consumption must first be addressed.
In recent years, many researchers have conducted intensive studies on the power consumption and performance of a specific type of task, and have achieved great research results, and three main types of methods have been used to address this challenge. a) Analyzing the model; b) A model based on machine learning; c) Based on simulation and optimization models; however, the above method has the following disadvantages:
1. Limitations of the analysis-based model: the analytical model uses techniques including linear regression, nonlinear regression, statistical regression, and support vector machines to build power consumption and performance models. It has a problem in that inaccurate prediction is easily generated due to complexity of system behavior and interdependence between resources. The complexity of system behavior refers to the fact that in practical applications, the performance and power consumption of the system are often affected by a variety of factors, including hardware configuration, software operation, load characteristics, and so on. These factors interleave with each other, resulting in a complex relationship between the input and output of the model, thereby increasing the uncertainty of the prediction. Yet another consideration is the interdependence between resources. In a computing system, there is an interactive relationship between different resources (e.g., CPU, memory, storage, etc.). Changing the configuration or manner of use of one resource may have an unexpected impact on other resources, further increasing the difficulty of model building and prediction.
2. Defects of machine learning based models: machine learning based models rely heavily on the quality and representativeness of training data, which has a complex structure and a large number of parameters, which makes understanding performance and energy consumption factors challenging. This is because the performance and predictive power of the model is directly affected by the training data. If the training data is not sufficiently diverse and representative, the model may not adequately capture the diversity and complexity of the system. Thus, collecting and preparing high quality, diverse data is critical to the successful application of machine learning. In addition, these machine learning models typically have complex structures, including deep neural networks, etc., that contain a large number of parameters. The interactions and trade-off relationships between these parameters make the model's performance and energy consumption predictions more complex and difficult to understand intuitively. This also increases the difficulty of optimizing and tuning the model, as tuning these parameters can require significant computational resources and time, and interactions between parameters make the optimization process more complex, and changing one parameter can affect the behavior of the entire model. Therefore, to effectively use the machine learning model for performance and energy consumption prediction, it is necessary to ensure the quality and representativeness of training data, and to put a lot of effort into understanding the relationship between the structure and parameters of the model.
3. Drawbacks of existing cloud-assisted edge computing environments: simulation and optimization based models present difficulties in parameter configuration and lack of flexibility, resulting in incomplete performance of the real world system. This approach typically requires a large number of parameters to describe the behavior and performance of the system. The selection and adjustment of these parameters can become very complex as they can be affected by interactions between the different system components. Thus, determining the appropriate parameter configuration may require a significant amount of experimentation and optimization, which increases the development and maintenance costs of the model. These models may be limited in describing real world systems because they are typically based on a set of assumptions and simplifications that may not be entirely practical. Due to the rigid structure of the model, it is difficult to adapt to system changes or new problem areas, which results in an insufficient flexibility of the model. Simulation and optimization based models also have difficulty capturing all the complexity and dynamics of the system. These models may ignore certain key factors or interactions, resulting in incomplete or inaccurate representations of the real world system. This may result in inaccurate performance predictions for the model or unreliable optimization results.
Disclosure of Invention
The embodiment of the invention aims to provide a power consumption and performance model building method of a virtual computing unit in mobile edge computing, which can accurately analyze the relation among the energy consumption, execution time and virtual resource allocation of different types of tasks, and provides substantial support for the trade-off between task performance and energy consumption and the exploration of an optimal resource allocation strategy in an edge computing environment.
In order to solve the technical problems, the technical scheme adopted by the invention is that a power consumption and performance model building method of a virtual computing unit in mobile edge computing comprises the following steps:
step 1, pre-configuring a virtual computing unit;
step 2, collecting parameter data of CPU intensive tasks, IO intensive tasks and GPU intensive tasks, constructing a sample data set, and dividing a training set and a testing set;
step 3, identifying key features of the CPU intensive tasks, the GPU intensive tasks and the IO intensive tasks from the sampling parameters through different benchmarks; wherein the CPU is a central processing unit, the GPU is a graphic processor, and the IO refers to input and output;
step 4, building performance and power consumption models suitable for different task types according to the key characteristics, and training the models by using a training set;
And 5, evaluating and verifying the model through the test set.
Further, in the step 1, the virtual computing unit is configured using a Kubernetes platform.
Further, in the step 2, the method for collecting the sampling parameters includes:
collecting information related to the resource utilization rate of the virtual computing units through a key component kubelet responsible for managing and monitoring the virtual computing units on the nodes;
acquiring power consumption data of a server through an intelligent ammeter;
measuring the execution time and performance index of the CPU intensive task and the IO intensive task through a Perf tool;
execution time and performance index of GPU-intensive tasks were measured by NVIDIANsight Compute tool.
Further, in the step 2:
the power consumption model of the CPU intensive task includes: the power consumption model based on the CPU frequency is expressed as:
P cpu =c 0 +c 1 f 3
wherein, c 0 Is a constant representing the static power consumption of the server, c 1 Is a constant, related to the voltage and capacitance of the CPU, f represents the processor frequency;
further comprises: the power consumption model based on CPU utilization is expressed as:
P cpu =P max u cpu +(1-u cpu )P idle
wherein P is max And P idle Respectively represents the power consumption of the CPU when fully used and the power consumption when idle, u cpu Indicating CPU utilization index;
the performance model of the CPU-intensive task is expressed as:
T pred =T avg_mem +T CPU_ops
In the method, in the process of the invention,T pred representing the overall predicted execution time of a program on a CPU, T avg_mem Represents average memory access time, T CPU_ops Representing the average time of CPU operation;
T avg_mem the calculation formula of (2) is as follows:
wherein lambda is avg Represents the average delay, beta avg Representing the inverse of the average throughput, b representing the block size, total_mem corresponding to the total memory required by the program;
the time required for a CPU operation is measured by the instruction delay and number of operations of the hardware.
Further, in the step 2:
the power consumption model of the GPU-intensive tasks includes:
including leakage power P of all components leakage Idle stream multiprocessor power consumption P idleSM And dynamic power consumptionIn the dynamic power consumption, N represents the number of micro-architecture components, and the dynamic power consumption is modeled as an activity factor alpha of each micro-architecture component i Multiplied by its peak power consumption p maxi
The performance model of the GPU-intensive tasks includes:
in Comm GM =(ld 1 +st 1 -L1-L2)·g GM +L1·g L1 +L2·g L2 ,Comm SM =(ld 0 +st 0 )·g SM
Where t represents the number of threads started, comp represents the number of processing cycles the threads spend, comm GM Representing communication overhead, ld, associated with thread accessing global memory 1 、st 1 Representing the loading and storing of the global memory; l1 and L2 respectively represent hit times of the first-level cache and the second-level cache, and are determined by an execution configuration file of an execution application program; g GM 、g L1 、g L2 The values of (1) represent the communication delays of the global memory, the L1 cache and the L2 cache, respectively; comm (Comm) SM Representing communication overhead generated when threads access the shared memory; ld (ld) 0 Representing total number of load transactions, st, executed by all threads in shared memory 0 Representing the total number of memory transactions executed by all threads in a shared memory g SM Representing communication delay on a shared memory, wherein R represents clock frequency, and P represents the number of CUDA kernels specific to the GPU model; the parameter lambda is used to simulate the inherent optimization utility of the application.
Further, in the step 2:
the power consumption model of the IO intensive task comprises:
where θ is equal to 0 or 1, θ=1 represents sequential IO, and θ=0 represents random IO; respectively representing the power consumption of the magnetic disk during sequential reading, sequential writing, random reading and random writing;
the power consumption model of the IO intensive task further includes:
P disk =C+W T ·P
wherein, C is the static power consumption of the disk, W is the weight vector corresponding to the power consumption vector P, and T represents the transposition of the vector;
wherein,
P=[P R P MR P RS P RT P W P MW P WS P WT P IOT ] T
W=[w R w MR w RS w RT w W w MW w WS w WT w IOT ] T
wherein P is R 、P MR 、P RS 、P RT 、P W 、P MW 、P WS 、P WT 、P IOT The power consumption of the read times per second, the power consumption of the combined read times per second, the power consumption of the read sector times per second, the power consumption of the read time consuming per second, the power consumption of the write times per second, the power consumption of the combined write times per second, the power consumption of the write sector times per second, the power consumption of the write time consuming per second and the power consumption of the I/O time consuming per second are respectively represented; w (w) R 、w MR 、w RS 、w RT 、w W 、w MW 、w WS 、w WT 、w IOT Each component P in the corresponding vector P R 、P MR 、P RS 、P RT 、P W 、P MW 、P WS 、P WT 、P IOT Weights of (2);
the parameter data of the IO intensive task is extracted from the system log.
Further, in the step 3:
the CPU intensive task adopts NAS parallel reference and parsec3.0 reference;
the GPU intensive task adopts Hashcat as a benchmark;
the IO intensive computing task adopts IOzone as a benchmark.
Further, in the step 3:
key features of CPU-intensive tasks include: the key features of the CPU intensive computing task, the current memory use condition, the interrupt times of the host per second, the number of processes running in the host, the size of a memory mapping file, the current memory working set, the limited memory quantity requested by the virtual computing unit, the limited CPU quantity requested by the virtual computing unit, the blocked process quantity, the average load in 5 minutes, the loading operation miss percentage in the last-level cache, the loading miss percentage in the TLB, the source code line number and the assembly code line number contained in the task;
key features of GPU-intensive tasks include: the method comprises the steps of determining the percentage of a task using CPU in a virtual computing unit, the current memory usage, the memory size of a virtual computing unit cache, the percentage of the virtual computing unit using GPU, the GPU temperature, the SM clock frequency in the GPU, the GPU memory clock frequency, the average load in 5 minutes, the number of blocked processes, the number of completed readings, the percentage of the GPU memory usage, the number of used frame buffers, the number of processes running in a host, the interrupt times per second of the host, the number of source code lines and the number of assembly code lines contained in the task;
Key features of the IO-intensive task include: the CPU percentage used by the task in the virtual computing unit, the number of processes running in the host, the number of writes completed, the current memory usage, the size of the virtual computing unit cache memory, the number of processes blocked, the total number of virtual computing unit memory failures, the maximum IOPS requested by the virtual computing unit, the size of the memory mapped file, the total number of written bytes, the maximum megapage usage recorded, the load operation miss percentage in the last level cache, the load miss percentage in the TLB, the size of the input file in the task.
Further, in the step 4, a model is constructed and trained by using an XGBoost or LightGBM method, and an optimal parameter configuration of the XGBoost or LightGBM model is searched by using a differential evolution method.
The beneficial effects of the invention are as follows:
(1) The method provided by the invention can model the performance and power consumption characteristics of different tasks. In contrast to conventional methods, which generally consider tasks as isomorphic entities with identical performance and power consumption requirements, the method of the present invention accurately predicts task execution time by considering the heterogeneity of inter-task performance and power consumption.
(2) The method of the invention considers the influence of the virtualization technology on the task execution time and power. The virtualization brings additional complexity to the resource allocation in the virtualized environment, and the method reduces the difficulty of the resource allocation by enhancing the estimation of the task execution time and the power consumption in the virtualized environment.
(3) The method of the invention analyzes the complex relationship among task execution performance, energy consumption and virtual resource allocation in detail, and provides substantial support for the trade-off between task performance and energy consumption and the exploration of the optimal resource allocation strategy in the edge computing environment.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a general flow chart of a modeling method of an embodiment of the present invention.
FIG. 2 is a two-stage process diagram of identifying key features according to an embodiment of the invention.
FIG. 3 is a schematic diagram of a histogram method in accordance with an embodiment of the present invention.
FIG. 4 is a diagram illustrating a Leaf-wise policy according to an embodiment of the present invention.
FIG. 5 is a graph comparing the key features of a CPU power consumption model with the effects of the original features according to an embodiment of the present invention.
FIG. 6 is a schematic diagram of feature importance scores in a CPU power consumption model of an embodiment of the invention.
FIG. 7 is a graph comparing the impact of key features and original features on the results of a CPU performance model according to an embodiment of the present invention.
FIG. 8 is a schematic diagram of feature importance scores in a CPU performance model according to an embodiment of the invention.
FIG. 9 is a graph of predicted power consumption results of a CPU power consumption model trained from XGBoost versus CPU intensive tasks according to an embodiment of the invention.
FIG. 10 is a graph of predicted power consumption results for CPU intensive tasks from a CPU power consumption model trained by LightGBM according to an embodiment of the invention.
FIG. 11 is a comparison of different power consumption models for CPU-intensive tasks according to an embodiment of the present invention.
FIG. 12 is a comparison of different performance models for CPU intensive tasks according to an embodiment of the invention.
FIG. 13 is a graph comparing the key features and the original features of the GPU power consumption model with the effect of the result according to the embodiment of the present invention.
FIG. 14 is a feature importance score schematic in a GPU power consumption model in accordance with an embodiment of the invention.
FIG. 15 is a graph comparing the impact of key features and original features on the results of a GPU performance model according to an embodiment of the present invention.
FIG. 16 is a feature importance score schematic in a GPU performance model according to an embodiment of the present invention.
FIG. 17 is a graph of power consumption predictions for GPU-intensive tasks by a GPU power consumption model trained from XGBoost, according to an embodiment of the present invention.
FIG. 18 is a graph of power consumption predictions for GPU-intensive tasks by a GPU power consumption model trained from LightGBM according to an embodiment of the present invention.
FIG. 19 is a comparison of different power consumption models for GPU-intensive tasks, according to an embodiment of the present invention.
FIG. 20 is a comparison of different performance models for GPU-intensive tasks, according to an embodiment of the present invention.
FIG. 21 is a graph comparing the impact of key features and original features on the results of an IO power consumption model in accordance with an embodiment of the present invention.
FIG. 22 is a schematic diagram of feature importance scores in an IO power consumption model in accordance with an embodiment of the present invention.
FIG. 23 is a graph comparing the impact of key features and original features on the results of an IO performance model in accordance with an embodiment of the present invention.
FIG. 24 is a schematic diagram of feature importance scores in an IO performance model in accordance with an embodiment of the present invention.
FIG. 25 is a graph of a predicted power consumption result of an IO power consumption model trained from XGBoost versus an IO intensive task in accordance with an embodiment of the present invention.
Fig. 26 is a graph of a predicted power consumption result of an IO power consumption model trained by a LightGBM versus an IO intensive task according to an embodiment of the present invention.
FIG. 27 is a comparison of different power consumption models for IO-intensive tasks in accordance with an embodiment of the present invention.
FIG. 28 is a comparison of different performance models for IO intensive tasks in accordance with an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, in the environment of a virtual computing unit under mobile edge computing, the power consumption and performance model establishment procedure proposed by the embodiment of the present invention follows the following 8 steps:
(1) Hardware virtualization support: the underlying hardware needs to support virtualization.
(2) The virtual computing unit is preconfigured: the pre-configuration of the virtual compute units encompasses a number of critical parameters including Central Processing Unit (CPU), input/output (IO), graphics Processing Unit (GPU), MEMORY (MEMORY), etc.
This embodiment configures the container (virtual computing unit) using Kubernetes, which has become a container standard coordination platform. Kubernetes was originally developed by google, providing a comprehensive framework for managing distributed systems. It provides a new method for expanding container deployment, supporting the execution and management of a large number of containers. Kubernetes owns a resilient ecosystem that includes support Container Network Interfaces (CNIs), container Storage Interfaces (CSI), and integrated logging and monitoring tools. To facilitate dynamic configuration of containers, the present embodiment performs this by creating a YAML format configuration file. Subsequently, CPU, GPU, IO and other parameters for allocating resources are recorded in the configuration file of the container. In addition, the container contains the environmental variables necessary for the experiment.
(3) Power consumption and performance analysis of different types of tasks: the present embodiment carefully considers a variety of factors that may affect power consumption and performance, including task characteristics, resource allocation, and system configuration. The method comprises the following steps:
principal power consumption component analysis for different types of tasks
The execution time and energy consumption of a task may be affected by a number of parameters. The present embodiment systematically captures parameter fluctuations associated with basic power consuming components such as CPU, memory, disk, GPU, etc. The variation of these parameters helps to select the appropriate inputs for the model to accurately reflect the power consumption conditions of the system.
For this reason, the present embodiment first determines a key component of the power consumption of the server, and then researches a general calculation formula of the power consumption during execution of different types of workloads. The purpose of this is to fully understand the power consumption dynamics and correlation, and to be able to determine the relevant parameters, thereby accurately capturing the power consumption behavior of the system.
By in-depth analysis and utilization of information obtained from examining critical power consuming components, the power consumption characteristics of the system can be fully understood. These knowledge forms the basis for selecting appropriate parameters that capture fine power consumption behavior and facilitate accurate modeling and prediction of power consumption during task execution. In general, server power consumption mainly involves several basic components: processor, graphic processing unit, memory, disk and network card, namely:
P=P cpu +P gpu +P memory +P disk +P net +P const
wherein P is cpu 、P gpu 、P memory 、P disk 、P net Respectively representing the power consumption of the processor, the graphic processing unit, the memory, the disk and the network card. P (P) const Representing the static power consumption of the server.
The power consumption of the whole server is regarded as the sum of static power consumption and dynamic power consumption. Like CMOS circuits, static power consumption is mainly caused by current leakage, while dynamic power consumption is mainly caused by capacitor charging and discharging:
P=P dynamic +P static
P dynamic =a·C·V 2 ·f
wherein P is dynamic Is static power consumption, P static For dynamic power consumption, V is the supply voltage, N is the number of transistors in the chip design, K design Is a constant that is related to the technical characteristics,is the normalized leakage current of a transistor, a is the average utilization, C is the total capacitance, and f is the operating frequency, depending on the threshold voltage.
CPU intensive task power consumption component analysis
CPU is the main power consumption of servers, especially in CPU intensive tasks. In general, the power consumption of a server is mainly determined by the power consumption characteristics of its CPU. Depending on the power consumption characteristics of the CMOS circuit, the power consumption model based on the processor frequency can be expressed as:
P cpu =c 0 +c 1 f 3
wherein c 0 Is a constant representing the static power consumption of the server. c 1 Is a constant that is related to the voltage and capacitance of the CPU. f represents the processor frequency.
In addition, the CPU utilization rate is the most common method for measuring the non-idle time proportion of the CPU, and can effectively reflect the workload intensity of the CPU. CPU utilization is readily available, so many studies use CPU utilization to estimate CPU power consumption. For example, based on a linear regression method, CPU power consumption may be expressed as a function of CPU utilization:
P cpu =P max u cpu +(1-u cpu )P idle
wherein the method comprises the steps of,P max And P idle Representing the power consumption of the CPU when fully used and idle, respectively. Furthermore, u cpu Indicating CPU utilization index.
GPU intensive task power consumption component analysis
The power consumption of the GPU may be broken down into a sum of the power consumption of the plurality of architecture components. The present embodiment employs a modeling approach to capture and bind the inflight TM GPU architecture similar to GPU architecture power consumption. The modeling process takes into account the main components including Streaming Multiprocessor (SM), memory controller, interconnect network, and DRAM. In particular SM, it is distinguished from a conventional CPU core in that it integrates SIMD execution units as well as additional functions such as texture cache, constant cache and shared memory. Furthermore, GPUs utilize GDDR memory rather than DDR memory to achieve higher bandwidths:
the above formula describes modeling of GPU power consumption aspects, including leakage power P for all components leakage Idle stream multiprocessor power consumption P idleSM And dynamic power consumption. Where N represents the number of microarchitectural components, modeling dynamic power consumption as the activity factor α of each microarchitectural component i Multiplied by its peak power consumptionTable 1 shows the main GPU components built from limited public information.
TABLE 1 Primary GPU component of Fermi architecture
IO intensive task power consumption component analysis
For servers that frequently perform IO intensive computing tasks, disk power consumption accounts for a significant proportion of the total power consumption of the server. According to the experimental study results, the following conclusions are drawn: in sequential I/O mode, disk power consumption increases significantly as the request rate increases, while the correlation between disk power consumption and I/O speed remains weak. Conversely, in a random I/O mode with low request rates, disk power consumption increases significantly with increased read or write speeds. The power consumption model of the I/O mode aware disk proposed in this embodiment is as follows:
Where θ is equal to 0 or 1 (θ=1 represents sequential I/O, θ=0 represents random I/O). The power consumption of the disk during sequential reading, sequential writing, random reading and random writing is shown respectively.
The disk power mathematical model of an AMD Option server includes a number of parameters reflecting different disk operations and workloads. In particular, these parameters include disk static power consumption C, read times per second R, merge read times per second MR, read sectors times per second RS, read time periods per second RT, write times per second W, merge write times per second MW, write sectors times per second WS, write time periods per second WT, and I/O time periods per second IOT.
To simplify the expression, the present embodiment organizes these parameters into one power consumption vector P, in which:
P=[P R P MR P RS P RT P W P MW P WS P WT P IOT ] T
wherein P is R 、P MR 、P RS 、P RT 、P W 、P MW 、P WS 、P WT 、P IOT Power consumption respectively representing the number of readings per second, combined number of readings per secondNumber of power consumption, number of times of reading a sector per second, number of times of reading a time consuming power per second, number of times of writing per second, number of times of merging writing per second, number of times of writing a sector per second, number of times of writing per second, number of times of I/O per second, T represents a transpose of vectors.
The corresponding weight vector is W. It can be expressed by the following equation:
W=[w R w MR w RS w RT w W w MW w WS w WT w IOT ] T
Each component W in the vector W R 、w MR 、w RS 、w RT 、w W 、w MW 、w WS 、w WT 、w IOT Each component P in the corresponding vector P R 、P MR 、P RS 、P RT 、P W 、P MW 、P WS 、P WT 、P IOT Is a weight of (2).
Overall disk power consumption P disk Can be represented by dot products, namely:
P disk =C+W T ·P
by evaluating and analyzing these parameters, the disk power consumption of the server can be estimated more accurately, thereby optimizing performance and energy efficiency.
Performance analysis of different task types
Deep analysis of CPU intensive task performance
The embodiment provides a performance prediction model, which is mainly used for predicting the performance of physical codes running on a CPU. The overall predicted execution time of a program is expressed by the following formula:
T pred =T avg_mem +T CPU_ops
wherein T is avg_mem Represents average memory access time, T CPU_ops Representing the average time of the CPU operation.
The calculation formula of the average memory access time is as follows:
wherein lambda is avg Represents the average delay, beta avg Representing the inverse of the average throughput, b represents the block size and total_mem corresponds to the total memory required by the program.
The time required for a CPU operation can be measured by the hardware specific instruction delay and the number of operations.
Deep analysis of performance of GPU-intensive tasks
The present embodiment introduces a simple and intuitive method based on a Batch Synchronization Parallel (BSP) model for predicting the execution time of CUDA applications running on a GPU. The specific formula is as follows:
Comm GM =(ld 1 +st 1 -L1-L2)·g GM +L1·g L1 +L2·g L2
Comm SM =(ld 0 +st 0 )·g SM
Where t represents the number of threads launched. Comp represents the number of processing cycles a thread spends. Comm (Comm) GM Representing the communication overhead associated with a thread accessing global memory. Wherein ld is 1 、st 1 Respectively representing the loading and storing of the global memory. L1 and L2 respectively represent hit times of the first-level cache and the second-level cache, and are determined by an execution configuration file of an execution application program. g GM 、g L1 、g L2 The values of (2) represent the communication latency of the global memory, the L1 cache, and the L2 cache, respectively. Comm (Comm) SM Representing the communication overhead incurred when a thread accesses shared memory. Wherein ld is 0 Representing total number of load transactions, st, executed by all threads in shared memory 0 Representing the total number of memory transactions executed by all threads in a shared memory g SM Representing the communication latency on the shared memory. R represents the clock frequency, and P represents the number of CUDA cores specific to the GPU model. The parameter lambda is used to simulate the application itselfIs described.
Deep analysis of performance of IO intensive tasks
The present embodiment analyzes various system logs to confirm the correlation between application behavior, file system behavior, and overall I/O performance. Then, the relevant features are selected from the most recent log entries and the best regression method is selected for the predicted task. Table 2 lists the basic features extracted from the log information:
TABLE 2 list of information extracted from the slurm and darshan logs
(4) Reference selection: the present embodiment selects a representative set of benchmarks including CPU-intensive tasks, GPU-intensive tasks, and IO-intensive tasks to cover a variety of scenarios and workload variations.
In this embodiment, two benchmark tests of CPU-intensive computing tasks are selected first: NAS parallel reference and parsec3.0.NAS parallel references, derived from CFD code, aimed at evaluating parallel computer performance, are widely recognized as one of the criteria for computer performance. parsec3.0 is a multi-core benchmark suite for accelerating Chip Multiprocessor (CMP) development and research. The suite provides a comprehensive set of workloads that may be used to evaluate performance of a multi-core processor system under various performance metrics.
Secondly, the embodiment selects Hashcat as a benchmark for GPU intensive tasks. Hashcat is currently the fastest and most advanced password recovery tool worldwide, supporting five unique attack modes for more than 300 highly optimized hash methods.
For IO intensive computing tasks, IOzone is selected as a benchmark in this embodiment. The IOzone benchmark is a file system performance testing tool, can test different size requests and access modes of an ext4 file system installed on a flash memory solid-state device or a mechanical disk, and can also be used for widely analyzing the file system of a computer platform of a cloud computing service provider.
(5) Determining sampling parameters and constructing a sample data set: determining sampling parameters is a precondition for data collection and is also a key step in establishing accurate performance and energy consumption models. Performance and power consumption of a task may be related to a number of parameters. Based on the existing hardware model parameters, the present embodiment determines parameters that may be related to each component, constructs a sampling parameter list containing more than 60 parameters, and the specific name of each parameter is shown in the left half of fig. 2.
Typically, in a containerized environment, the acquisition of basic parameters related to a Central Processing Unit (CPU), memory, input/output (I/O), and Graphics Processor (GPU) is accomplished through the use of specialized software tools. In the present invention, the present embodiment chooses to use containers in a dedicated Kubernetes cluster to perform tasks. By being responsible for managing and monitoring the critical components kubelet of the containers on the nodes, the present embodiment effectively collects, aggregates, processes and generates information related to container resource utilization.
The embodiment adopts the intelligent ammeter to acquire the power consumption data of the server, and then uses related software to store and manage the data. In the actual collection of the energy consumption data, the smart meter is connected to the server power supply, and the sampling frequency is set to once per second. Such high frequency sampling may enable accurate power measurements to accurately reflect the power consumption of the server. After the sampling process is finished, the collected data is subjected to statistical processing so as to ensure that the requirements of variance uniformity, normal distribution, independence and the like are met. This ensures the accuracy and reliability of the subsequent analysis.
Perf is a tool specifically used for performance analysis and can help measure the execution time and performance index of CPU-intensive and IO-intensive tasks. In contrast, NVIDIA Nsight Compute is a tool designed specifically for measuring the execution time and performance metrics of GPU-intensive tasks. The two tools provide powerful functions for academia and developers, and can be used for deeply exploring and enhancing performance characteristics of various tasks. They enable comprehensive analysis of various aspects of a task, including generating detailed analysis reports about task execution time, identifying memory access patterns (including memory bandwidth utilization and memory latency information), displaying function level instruction execution details, monitoring hardware performance counters, and providing various performance metrics such as throughput, latency, and resource utilization. These functions give them important value in the field of performance analysis and optimization, providing powerful tool support for in-depth inspection and improvement of task performance.
Next, the present embodiment performs different types of tasks on the container, including CPU-intensive tasks, IO-intensive tasks, and GPU-intensive tasks. Meanwhile, four key components are used in the embodiment, including monitoring software, a Perf tool, a NVIDIA Nsight Compute tool and a smart meter which are deployed on a Kubernetes platform, so as to measure the resource utilization rate, task performance indexes, power consumption and voltage level of four main resources (CPU, memory, disk and GPU) on a server. These obtained data are then used to construct the sample dataset of the present embodiment.
After preliminary data processing, the data set of the present embodiment includes 4106 valid data points for representing CPU-intensive tasks; 656 valid data points for representing GPU-intensive tasks; and 736 valid data points for representing IO intensive tasks. The data set provides an important experimental basis for analyzing task execution time and power consumption of the embodiment, and provides reliable data support for further research work.
(6) Identifying key features: critical features that have a significant impact on power consumption and performance are determined by XGBOOST-based feature selection techniques.
In this embodiment, this step will describe in detail the process of identifying key features. Fig. 2 illustrates a two-stage process of identifying the most critical features. In the first stage, the present embodiment performs a preliminary analysis and screening to identify a set of potentially relevant features from a large number of parameters. In the second stage, the present embodiment finally determines the features with the most influence and explanatory power by further analysis and evaluation. This process may ensure that the selected feature set accurately captures the relationship between task execution time and energy consumption.
The present embodiment employs the popular machine learning method XGBoost to help identify the most relevant, influential features in the data. This approach is widely accepted because of its ability to handle complex features and effectively capture potential patterns in the dataset. By utilizing XGBoost, it is intended to find key features that are able to adequately capture the relationship between task execution time and energy consumption.
Key features of CPU intensive computing tasks
The present embodiment creates a sample data set by running CPU-intensive computing tasks on a server. Based on the first stage preliminary selection of features, the present embodiment further selects some more important features based on an array of important features obtained from the sample dataset. Key feature selections and descriptions are shown in table 3:
TABLE 3 feature selection for CPU intensive tasks
Key features of GPU intensive computing tasks
In running GPU-intensive computing tasks, the present embodiment also uses the above-mentioned method to screen out key features that affect the GPU-intensive computing tasks. Although some functions not belonging to the GPU component are included, these functions are still very important for GPU intensive work load because some programs are executed in cooperation between the CPU and the GPU. Therefore, the present embodiment takes these metrics into account to more accurately capture the performance and power consumption of the task.
Table 4 lists the final selected features of this example and their description. These features cover various aspects including other functions that cooperate in the performance of tasks. By considering these features, the performance and power consumption characteristics of GPU-intensive tasks may be more fully evaluated, thereby improving the accuracy and predictive capabilities of the performance and power consumption models. By selecting and evaluating these features, the execution of GPU-intensive tasks can be more fully understood, and guidance is provided for optimizing the performance and power consumption of the tasks.
TABLE 4 feature selection for GPU-intensive tasks
Key features of IO intensive computing tasks
In performing I/O-intensive computing tasks, certain features that may have an impact on the I/O-intensive computing tasks are systematically identified. Table 5 lists selected features of this embodiment and their description.
TABLE 5 parameter selection for IO intensive tasks
(7) Model training: model training is performed on the sample dataset using XGBoost and LightGBM to learn the relationship between input key features and power consumption, performance.
XGBoost is an optimized distributed gradient lifting library, and is efficient, flexible and portable in design. It can quickly and accurately solve many data science problems. Its main objective is to optimize an objective function consisting of two parts. The objective function consists of a loss function (error for metric model prediction) and regularization term (for control model complexity).
1) Predictive value of integrated model
Wherein,is the model predictive value of the ith sample, K is the number of trees, f k Is the kth tree.
2) Loss function
Wherein,is a calculus-capable convex loss function for measuring the predicted value +.>And true value y i Differences between them. Omega (f) k ) The model complexity penalty is used for controlling the complexity of the model, overfitting is avoided, and n is the total number of samples.
3) Penalty term for model complexity
Where T is the number of leaf nodes, ω, of the tree j Is the weight of the j-th leaf node, and alpha, lambda and gamma are super parameters for controlling the regularized intensity.
XGBoost trains the model by minimizing the objective function, updating model parameters using gradient descent.
LightGBM is a gradient lifting framework that uses a tree-based learning approach. The present embodiment uses both methods for model training. The training method has the advantages of being faster in training speed, higher in efficiency, better in accuracy, supporting distributed parallel learning and the like. The basic principle of LightGBM consists of 4 parts as follows:
1) Histogram method
Specifically, for each feature, the histogram method sorts its feature values by size. The sorted eigenvalues are divided into a plurality of discrete bins (bins), each containing a certain number of eigenvalues. For each interval, the sum of the gradient and the Hessian matrix of all samples within that interval is calculated. These statistics can be used to calculate the information gain for the bin to determine the optimal split point until a feature histogram is constructed for each feature. And selecting the optimal splitting point to construct a decision tree according to the characteristic histogram. The process of the histogram method is shown in fig. 3.
2) Leaf-wise method with depth limitation
The Leaf-wise strategy selects the feature that most reduces the loss function for the current node splitting each time a split point is selected, which makes the depth of each tree relatively deep, enabling a better fit to the training data. Because the Leaf-wise strategy is relatively deep, the number of Leaf nodes per tree is small, the model is easier to adapt to the details of the training data, and the risk of overfitting is reduced. The Leaf-wise growth strategy selects the feature that improves gain most for the current node split at each split, so it can achieve the same fit effect with fewer splits than the Level-wise strategy. The process of Leaf-wise policy selection of split points is shown in fig. 4.
3) Unilateral gradient sampling method
Only a fraction of samples are selected in each iteration, the gradients of which play a key role in the updating of the current model parameters. Specifically, the single-sided gradient sampling method selects a sample having a maximum gradient (positive gradient) or a minimum gradient (negative gradient), while ignoring other samples. This may reduce the computational costs to some extent while still retaining samples that have an important impact on parameter updates.
4) Exclusive Feature Bundling (EFB) method
The core idea of the EFB method is to bundle mutually exclusive features together, thereby reducing the number of active features. It represents the mutual exclusion relationship between features by constructing a graph with weighted edges and groups features into bundles using a greedy approach. During the bundling process, a small portion of conflicts may be allowed to further reduce the number of feature bundles, thereby increasing computational efficiency. In addition, the EFB method ensures that the value of the original feature can be identified from the feature bundle by adding an offset to the feature.
The present embodiment employs a differential evolution method (DE) to search for optimal parameter configurations for XGBoost and LightGBM models. The differential evolution method is an optimization method that runs in a parameter space, and finds the best parameter combination by performing a series of operations. Specific steps include initializing populations, evaluating individual fitness, performing mutation, crossover, and selection operations.
In the differential evolution method, the adaptation values are defined in terms of Root Mean Square Error (RMSE) obtained during cross-validation of XGBoost and LightGBM training models. The fitness value is used to quantify the performance of each individual within the parameter space. The goal of this approach is to iteratively optimize and enhance the individual parameter configurations in an effort to minimize RMSE and thereby determine the best parameter combinations to enhance model performance.
(8) Evaluation and verification: the performance and accuracy of the model are evaluated using separate test data sets, ensuring its reliability and effectiveness in predicting power consumption, performance.
The server parameters used in this experiment are shown in table 6.
Table 6 server parameters
The present embodiment divides the sample dataset into two parts: training sets and test sets. Wherein 50% of the total data set is used as training set and the other 50% is used as test set. The data is then processed, including some data cleaning and data feature scaling (normalization), in preparation for training the model. The input features have been listed above. In addition to using XGBoost and LightGBM, the present embodiment introduces four methods, including random forest, MLP regression, and elastic Net and Lasso regression, and Support Vector Regression (SVR). The purpose is to conduct comparative analysis to evaluate the performance index of various methods to determine which method performs better. The present embodiment uses K-Fold cross-validation to solve the reliability problem of model performance. The K-Fold cross verification can reduce the randomness of the evaluation result, and the reliability of the evaluation result is improved through multiple training-test division. In addition, the K-Fold cross-validation can divide the data set into K shares for K model training and testing, thereby more fully utilizing the information in the data set. In this embodiment, the value of K-Fold is set to 10.
The embodiment sets key parameters for a Differential Evolution (DE) method to achieve efficient optimization. First, the maximum algebra (max-iter) is defined as 1000. This parameter serves as a termination criterion, limiting the total number of evolutionary iterations. Next, the selected population size (pore) was 500. The population size determines the number of individuals per generation.
To evaluate the convergence of the method, a relative convergence tolerance (tol) of 0.01 was used. In addition, the mutation constant is set in the range of 0.5 to 1. The mutation constant controls the degree of mutation applied to an individual. The recombination constant of 0.7 is also specified in this example. By properly configuring these parameters, the DE method can effectively converge to a near-globally optimal solution in a shorter time. The selection of these parameters is based on multiple experiments and empirical knowledge, aimed at achieving optimal process performance and high quality optimization results.
The super parameters and constraint settings involved in the DE method are described in detail below. This includes specification of adaptation values and constraints for XGBoost and LightGBM models. For more specific information see tables VII and VIII below.
TABLE 7 parameter settings for differential evolution method in xgboost superparameter tuning
/>
Table 8 parameter settings of the differential evolution method in lightgbm superparameter tuning
num boost round [200,1000]
bagging freq [0,100]
max depth [1,8]
feature fraction [0.95,1.0]
bagging fraction [0.95,1.0]
learning rate [0.01,0.10]
learningearly stopping rounds [25,100]
nimportant features [5,35]
num leaves [4,32]
max bin [128,512]
min data in leaf [1,120]
min split gain [0,1]
lambda l1 [0,1]
lambda l2 [0,1]
To better describe the comparison of each method, while considering the actual application scenario and requirements, the present embodiment selects Root Mean Square Error (RMSE) and R 2 As an evaluation index. As shown in the following equation. The index used by the power consumption model is RMSE, and the index used by the performance model is R 2
In the Root Mean Square Error (RMSE) formula, y i Representing the i-th true value of the value,the i-th predicted value is represented, and n represents the number of samples. The RMSE value represents the difference between the predicted value and the true value, and smaller RMSE represents better model performance. R square (R2), also known as the decision coefficient. Here, y i Representing the ith true value, +.>Represents the i-th predictive value,/->Represents the average of the true values, and n represents the number of samples. The R square measures the proportion of variation in the dependent variable (i.e., the true value) that is interpreted by the independent variable (i.e., the predicted value). The higher the R square value, the higher the fit of the model to the data.
To analyze the importance and significant impact of key features in a model, the present embodiment integrates feature importance scores obtained from the model, as well as impact analysis of the original features on relevant metrics as compared to the key features. This integrated process mainly comprises the following two steps. First, by quantifying feature importance scores obtained from the model, the impact of each feature on the model output can be measured. The higher the feature importance score, the greater the weight of the feature in predicting or interpreting the result, and the greater the impact of its variation on the output result. Secondly, the embodiment further verifies the importance of the key features by analyzing the influence of the original features and the key features on the prediction result. The importance and significant impact of key features in the model is further demonstrated if they exhibit a higher correlation with the predicted outcome.
By integrating these two analysis methods, the importance and significant impact of key features in the model can be summarized and demonstrated to have a greater impact on achieving the desired result or prediction than the original features. This lays a solid foundation for further research and interpretation of model results, as follows:
as shown in fig. 5 and 6, for CPU intensive tasks, if the tasks require a large amount of computation, more processor resources are required and thus power consumption increases. Likewise, if the task requires too much memory, its power consumption may also increase. On the other hand, if the task needs to read and write to the disk frequently, more storage resources may be used, thereby increasing power consumption. The feature Container CPU Usage (%) is the most important feature in the CPU-intensive task data set, and has an important influence on the prediction target variable.
As can be seen from fig. 7 and 8, the CPU utilization of CPU-intensive tasks plays a key role in the execution time of the tasks. The CPU utilization of a task is related to the number of other tasks in the container, which increases as more execution time is required for the task. Memory operations can greatly impact task execution time. It is therefore the most important influencing parameter to influence. If a task needs to read data from or write data to disk, the execution time of the task may be significantly affected. In general, the execution time of a process is affected by a combination of many factors, which need to be analyzed and evaluated to determine its execution time.
Fig. 9 and 10 show a comparison of actual values with predicted values for CPU-intensive tasks. As can be seen from the figure, the LightGBM achieves a better prediction result than XGBoost in terms of actual CPU power consumption prediction.
Fig. 11 compares six regression methods (XGBoost, lightGBM, FSDL, DSBF, CMP and Cubic). The RMSE values of each model in the power consumption model are compared in this embodiment, with XGBoost performing best, lightGBM second, DSBF third, CMP fourth, cubic fifth, FSDL sixth. The performance of LightGBM and XGBoost is superior to other methods because both methods select parameters related to energy consumption to build the model. The cubic "and" CMP "methods perform poorly as cubic polynomial regression and linear regression techniques, respectively, but the FSDL and DSBF perform much better than both methods. FIG. 12 compares six regression methods (XGBoost, lightGBM, MLP, instructs/MIPS, elasticNet and Lasso). XGBoost and LightGBM also have good effects in performance models. Most edge computing research papers use the ratio of instructions to MIPS to calculate the response time of a task. However, in practical comparison, this method was found to be inaccurate compared to other methods.
For each feature in the array, its importance is calculated in the form of a score, the higher the score, indicating that the feature has a greater impact on the performance of the predicted GPU-intensive task. As can be seen from fig. 13 and 14, in the power consumption model, container memory usage (MB) is the most important feature of GPU-intensive tasks, because hash methods often require loading and transferring large amounts of data, including cryptographic dictionaries, hash tables, and transient data sets, among others. Since GPU-intensive tasks require cooperation between the CPU and the GPU. Such collaboration involves data transfer and task flow control, thereby placing additional computational burden on the CPU, which can have an impact on overall power consumption, especially during frequent data transfers. Furthermore, the increased GPU utilization means that the GPU is doing a lot of computation work, resulting in an increase of the inherent power consumption of the GPU. Both of the characteristics Container CPU USAGE (%) and Container GPU Utilization (%) have a significant influence on the power consumption of the GPU task, and their importance is roughly equivalent. This may be due to the collaborative nature of GPU-intensive tasks, which require cooperation between the CPU and the GPU. This cooperation involves data transfer and task flow control, thus placing additional computational load on the CPU, which has an impact on overall power consumption. Furthermore, the increase in GPU utilization indicates that the GPU is actively performing a large amount of computing work, resulting in an increase in the inherent power consumption of the GPU. The operating temperature of the GPU may affect its power consumption. The GPU having good heat dissipation performance can dissipate heat more effectively, thereby reducing power consumption.
As is clear from fig. 15 and 16, the two features Container CPU USAGE (%) and Container GPU Utilization (%) are very important, and their importance is several times that of the other features. Because the GPU needs to communicate with the host and coordinate tasks through the CPU, CPU performance also affects the performance and efficiency of GPU tasks. The effect of the feature Kernel lines is also large, which means that different GPU code libraries have inherent differences in complexity, which necessarily results in complex code requiring longer execution times.
Fig. 17 and 18 show a comparison of actual values with predicted values for GPU-intensive tasks. It clearly shows the predicted trend of power consumption based on key features. When a task arrives at the server, the XGBoost predicted power consumption value is slightly more accurate than the LightGBM predicted power consumption value.
FIG. 19 shows a comparison of six regression methods (XGBoost, lightGBM, BP-ANN, novel GPU Power Model, linear regression, SVR regression) in terms of GPU intensive tasks. Compared with CPU intensive tasks, the GPU program has more indexes and is more complex to run. The novel GPU power model, linear regression and SVR regression methods all produce poor power model metrics. BP-ANN performs well in predictive tasks. XGBoost and LightGBM are superior to all other regression methods in terms of predicting the power consumption of tasks. Fig. 20 shows a comparison of five regression methods (XGBoost, lightGBM, linear regression (with feature extraction), random forest (with feature extraction) and SVR regression (with feature extraction)) in terms of GPU performance models. XGBoost and LightGBM perform better than the other three regression methods, while SVR regressions perform the worst.
By comparing the feature importance arrays in fig. 21 and 22, it can be seen that Node Procs Blocked is the most important feature of the IO intensive task. As can be seen from the feature importance array, container Memory Cached (MB) } and content Memory (MB) have the same importance, so they have the same degree of impact on predicting the power consumption of IO-intensive tasks.
From fig. 23 and 24, it can be seen that the two characteristics of Node Load5 and Node Procs Blocked are most important. Because the IO intensive task can generate a plurality of concurrent IO operations to improve IO efficiency and response time, more processes and threads compete IO resources in the system, and the system load is greatly increased. As the system load increases, the execution delay of IO intensive tasks may also be affected. In virtualized environments, the IO transport mechanism between the host and the container may also have an impact on IO execution latency.
Fig. 25 and 26 show a comparison of actual values with predicted values for IO-intensive tasks. Both XGBoost and LightGBM are less efficient in predicting IO power consumption. IO operations are typically more complex than CPU operations because IO operations are affected by a variety of factors such as disk physical characteristics, system load, cache size, system load, etc. Variations in these factors can lead to fluctuations and instability in IO power consumption.
FIG. 27 compares the effect of six regression methods (XGBoost, lightGBM, cubic, CMP, DSBF, FSDL) on IO intensive tasks. The IO intensive tasks run relatively complex. In the power consumption model, the four regression methods perform well except FSDL. On the power consumption model, XGBoost and LightGBM still perform better than other regression methods. FIG. 28 compares six regression methods (XGBoost, lightGBM, MLP with feature extraction, BP-ANN with feature extraction, linear regression, and SVR) for IO performance models. From the graph, feature extraction plays a key role in improving the prediction accuracy of the IO performance model.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (9)

1. A method for building a power consumption and performance model of a virtual computing unit in mobile edge computing, comprising the steps of:
step 1, pre-configuring a virtual computing unit;
step 2, collecting parameter data of CPU intensive tasks, IO intensive tasks and GPU intensive tasks, constructing a sample data set, and dividing a training set and a testing set;
step 3, identifying key features of the CPU intensive tasks, the GPU intensive tasks and the IO intensive tasks from the sampling parameters through different benchmarks; wherein the CPU is a central processing unit, the GPU is a graphic processor, and the IO refers to input and output;
step 4, building performance and power consumption models suitable for different task types according to the key characteristics, and training the models by using a training set;
and 5, evaluating and verifying the model through the test set.
2. The method for modeling power consumption and performance of a virtual computing unit in mobile edge computing according to claim 1, wherein in step 1, the virtual computing unit is configured using a Kubernetes platform.
3. The method for modeling power consumption and performance of a virtual computing unit in mobile edge computing according to claim 1, wherein in step 2, the method for collecting sampling parameters comprises:
Collecting information related to the resource utilization rate of the virtual computing units through a key component kubelet responsible for managing and monitoring the virtual computing units on the nodes;
acquiring power consumption data of a server through an intelligent ammeter;
measuring the execution time and performance index of the CPU intensive task and the IO intensive task through a Perf tool;
execution time and performance index of GPU-intensive tasks were measured by NVIDIA Nsight Compute tool.
4. The method for modeling power consumption and performance of a virtual computing unit in mobile edge computing according to claim 1, wherein in step 2:
the power consumption model of the CPU intensive task includes: the power consumption model based on the CPU frequency is expressed as:
P cpu =c 0 +c 1 f 3
wherein, c 0 Is a constant representing the static power consumption of the server, c 1 Is a constant, related to the voltage and capacitance of the CPU, f represents the processor frequency;
further comprises: the power consumption model based on CPU utilization is expressed as:
P cpu =P max u cpu +(1-u cpu )P idle
wherein P is max And P idle Respectively represents the power consumption of the CPU when fully used and the power consumption when idle, u cpu Indicating CPU utilization index;
the performance model of the CPU-intensive task is expressed as:
T pred =T avg_mem +T CPU_ops
wherein T is pred Representing the overall predicted execution time of a program on a CPU, T avg_mem Represents average memory access time, T CPU_ops Representing the average time of CPU operation;
T avg_mem the calculation formula of (2) is as follows:
wherein lambda is avg Represents the average delay, beta avg Representing the inverse of the average throughput, b representing the block size, total_mem corresponding to the total memory required by the program;
the time required for a CPU operation is measured by the instruction delay and number of operations of the hardware.
5. The method for modeling power consumption and performance of a virtual computing unit in mobile edge computing according to claim 1, wherein in step 2:
the power consumption model of the GPU-intensive tasks includes:
including leakage power P of all components leakage Idle stream multiprocessor power consumption P idleSM And dynamic power consumptionIn the dynamic power consumption, N represents the number of micro-architecture components, and the dynamic power consumption is modeled as an activity factor alpha of each micro-architecture component i Multiplied by its peak power consumption p maxi
The performance model of the GPU-intensive tasks includes:
in Comm GM =(ld 1 +st 1 -L1-L2)·g GM +L1·g L1 +L2·g L2 ,Comm SM =(ld 0 +st 0 )·g SM
Where t represents the number of threads started, comp represents the number of processing cycles the threads spend, comm GM Representing communication overhead, ld, associated with thread accessing global memory 1 、st 1 Representing the loading and storing of the global memory; l1 and L2 respectively represent hit times of the first-level cache and the second-level cache, and are determined by an execution configuration file of an execution application program; g GM 、g L1 、g L2 The values of (1) represent the communication delays of the global memory, the L1 cache and the L2 cache, respectively; comm (Comm) SM Representing communication overhead generated when threads access the shared memory; ld (ld) 0 Representing total number of load transactions, st, executed by all threads in shared memory 0 Representing the total number of memory transactions executed by all threads in a shared memory g SM Representing communication delay on a shared memory, wherein R represents clock frequency, and P represents the number of CUDA kernels specific to the GPU model; the parameter lambda is used to simulate the inherent optimization utility of the application.
6. The method for modeling power consumption and performance of a virtual computing unit in mobile edge computing according to claim 1, wherein in step 2:
the power consumption model of the IO intensive task comprises:
where θ is equal to 0 or 1, θ=1 represents sequential IO, and θ=0 represents random IO; respectively representing the power consumption of the magnetic disk during sequential reading, sequential writing, random reading and random writing;
the power consumption model of the IO intensive task further includes:
P disk =C+W T ·P
wherein, C is the static power consumption of the disk, W is the weight vector corresponding to the power consumption vector P, and T represents the transposition of the vector;
wherein,
P=[P R P MR P RS P RT P W P MW P WS P WT P IOT ] T
W=[w R w MR w RS w RT w W w MW w WS w WT w IOT ] T
wherein P is R 、P MR 、P RS 、P RT 、P W 、P MW 、P WS 、P WT 、P IOT The power consumption of the read times per second, the power consumption of the combined read times per second, the power consumption of the read sector times per second, the power consumption of the read time consuming per second, the power consumption of the write times per second, the power consumption of the combined write times per second, the power consumption of the write sector times per second, the power consumption of the write time consuming per second and the power consumption of the I/O time consuming per second are respectively represented; w (w) R 、w MR 、w RS 、w RT 、w W 、w MW 、w WS 、w WT 、w IOT Each component P in the corresponding vector P R 、P MR 、P RS 、P RT 、P W 、P MW 、P WS 、P WT 、P IOT Weights of (2);
the parameter data of the IO intensive task is extracted from the system log.
7. The method for modeling power consumption and performance of a virtual computing unit in mobile edge computing according to claim 1, wherein in step 3:
the CPU intensive task adopts NAS parallel reference and parsec3.0 reference;
the GPU intensive task adopts Hashcat as a benchmark;
the IO intensive computing task adopts IOzone as a benchmark.
8. The method for modeling power consumption and performance of a virtual computing unit in mobile edge computing according to claim 1, wherein in step 3:
key features of CPU-intensive tasks include: the key features of the CPU intensive computing task, the current memory use condition, the interrupt times of the host per second, the number of processes running in the host, the size of a memory mapping file, the current memory working set, the limited memory quantity requested by the virtual computing unit, the limited CPU quantity requested by the virtual computing unit, the blocked process quantity, the average load in 5 minutes, the loading operation miss percentage in the last-level cache, the loading miss percentage in the TLB, the source code line number and the assembly code line number contained in the task;
Key features of GPU-intensive tasks include: the method comprises the steps of determining the percentage of a task using CPU in a virtual computing unit, the current memory usage, the memory size of a virtual computing unit cache, the percentage of the virtual computing unit using GPU, the GPU temperature, the SM clock frequency in the GPU, the GPU memory clock frequency, the average load in 5 minutes, the number of blocked processes, the number of completed readings, the percentage of the GPU memory usage, the number of used frame buffers, the number of processes running in a host, the interrupt times per second of the host, the number of source code lines and the number of assembly code lines contained in the task;
key features of the IO-intensive task include: the CPU percentage used by the task in the virtual computing unit, the number of processes running in the host, the number of writes completed, the current memory usage, the size of the virtual computing unit cache memory, the number of processes blocked, the total number of virtual computing unit memory failures, the maximum IOPS requested by the virtual computing unit, the size of the memory mapped file, the total number of written bytes, the maximum megapage usage recorded, the load operation miss percentage in the last level cache, the load miss percentage in the TLB, the size of the input file in the task.
9. The method for building a power consumption and performance model of a virtual computing unit in mobile edge computing according to claim 1, wherein in step 4, the model is built and trained by XGBoost or LightGBM method, and the optimal parameter configuration of XGBoost or LightGBM model is searched by differential evolution method.
CN202311591555.3A 2023-11-27 2023-11-27 Method for establishing power consumption and performance model of virtual computing unit in mobile edge computing Pending CN117435451A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311591555.3A CN117435451A (en) 2023-11-27 2023-11-27 Method for establishing power consumption and performance model of virtual computing unit in mobile edge computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311591555.3A CN117435451A (en) 2023-11-27 2023-11-27 Method for establishing power consumption and performance model of virtual computing unit in mobile edge computing

Publications (1)

Publication Number Publication Date
CN117435451A true CN117435451A (en) 2024-01-23

Family

ID=89546108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311591555.3A Pending CN117435451A (en) 2023-11-27 2023-11-27 Method for establishing power consumption and performance model of virtual computing unit in mobile edge computing

Country Status (1)

Country Link
CN (1) CN117435451A (en)

Similar Documents

Publication Publication Date Title
Hernández et al. Using machine learning to optimize parallelism in big data applications
Singh et al. Napel: Near-memory computing application performance prediction via ensemble learning
Waldspurger et al. Cache modeling and optimization using miniature simulations
Bei et al. RFHOC: A random-forest approach to auto-tuning hadoop's configuration
US9229838B2 (en) Modeling and evaluating application performance in a new environment
Zhang et al. {OSCA}: An {Online-Model} Based Cache Allocation Scheme in Cloud Block Storage Systems
Yin et al. Boosting application-specific parallel I/O optimization using IOSIG
Han et al. Benchmarking big data systems: State-of-the-art and future directions
Lattuada et al. Performance prediction of deep learning applications training in GPU as a service systems
CN114780233A (en) Scheduling method and device based on microservice link analysis and reinforcement learning
Ferreira da Silva et al. Accurately simulating energy consumption of I/O-intensive scientific workflows
Rosas et al. Improving performance on data-intensive applications using a load balancing methodology based on divisible load theory
Bai et al. Dnnabacus: Toward accurate computational cost prediction for deep neural networks
Delimitrou et al. Accurate modeling and generation of storage i/o for datacenter workloads
Chen et al. A methodology for understanding mapreduce performance under diverse workloads
Méndez et al. Modeling parallel scientific applications through their input/output phases
Chen et al. Multiple CNN-based tasks scheduling across shared GPU platform in research and development scenarios
Booth et al. Phase detection with hidden markov models for dvfs on many-core processors
Li et al. A learning-based approach towards automated tuning of ssd configurations
Cui et al. Modeling the performance of MapReduce under resource contentions and task failures
Groot et al. Modeling i/o interference for data intensive distributed applications
Wasi-ur-Rahman et al. Performance modeling for RDMA-enhanced hadoop MapReduce
Du et al. Monkeyking: Adaptive parameter tuning on big data platforms with deep reinforcement learning
Arif et al. Infrastructure-aware tensorflow for heterogeneous datacenters
CN117435451A (en) Method for establishing power consumption and performance model of virtual computing unit in mobile edge computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination