WO2023199482A1 - Dispositif de commande de réduction de quantité d'énergie électrique, procédé de commande de réduction de quantité d'énergie électrique, système de commande de réduction de quantité d'énergie électrique, et programme - Google Patents

Dispositif de commande de réduction de quantité d'énergie électrique, procédé de commande de réduction de quantité d'énergie électrique, système de commande de réduction de quantité d'énergie électrique, et programme Download PDF

Info

Publication number
WO2023199482A1
WO2023199482A1 PCT/JP2022/017844 JP2022017844W WO2023199482A1 WO 2023199482 A1 WO2023199482 A1 WO 2023199482A1 JP 2022017844 W JP2022017844 W JP 2022017844W WO 2023199482 A1 WO2023199482 A1 WO 2023199482A1
Authority
WO
WIPO (PCT)
Prior art keywords
power consumption
server
air conditioning
accelerator
gpu
Prior art date
Application number
PCT/JP2022/017844
Other languages
English (en)
Japanese (ja)
Inventor
彦俊 中里
誠亮 新井
雅志 金子
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/017844 priority Critical patent/WO2023199482A1/fr
Publication of WO2023199482A1 publication Critical patent/WO2023199482A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/20Cooling means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/329Power saving characterised by the action undertaken by task scheduling
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to a power consumption reduction control device, a power consumption reduction control method, a power consumption reduction control system, and a program that reduce power consumption in a data center (hereinafter sometimes referred to as "DC").
  • DC data center
  • DCs data centers
  • the power consumption of air conditioning in data centers (DCs) accounts for a large proportion, and as the number and scale of DCs increases, there is a need to reduce the power consumption of air conditioning. Furthermore, the amount of data processed in a DC tends to increase year by year, and it is necessary to improve the power consumption efficiency of the entire DC (the amount of power consumed by the entire DC for processing a certain amount of data).
  • Non-Patent Document 1 A technique described in Non-Patent Document 1 has been disclosed as a technique for optimizing the overall power consumption of a DC by considering the power consumption of air conditioning and the power consumption of a server (IT device).
  • the air conditioning linked IT load placement optimization method for data centers described in Non-Patent Document 1 by collecting operating information and monitoring information of IT equipment in the data center, it is possible to predict future changes in the loads of IT equipment and Calculate the power increase for air conditioning equipment according to the power increase. Then, an optimization problem is solved in which the objective function, which is the power amount of the data center, is minimized so that the load aggregation rate on IT equipment increases over time, that is, the number of operating IT equipment is reduced. This calculates the placement of IT loads (virtual machines) on IT equipment that minimizes the power consumption of the data center.
  • Non-Patent Document 1 a general-purpose rule-based standard that does not depend on equipment conditions that differ for each DC is adopted in the air conditioning power model used to calculate the power of the air conditioning equipment. Therefore, it was difficult to optimize the total amount of power consumed by the DC, taking into account individual equipment conditions such as the location of air conditioning equipment, airflow, server arrangement within the DC, and thermal cooling efficiency. .
  • the accelerator is, for example, an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), or a TPU (Tensor Processing Unit).
  • the CPU, GPU, and accelerator differ in the amount of power consumption required for server cooling during load processing. Specifically, CPUs show almost no fluctuations in power consumption within the normal temperature range, whereas GPU servers and accelerators experience fluctuations in power consumption even within the normal temperature range. It is assumed that
  • the inventor of this application set the inlet temperature of the GPU server at 20 degrees Celsius and 33 degrees Celsius, which are within the normal temperature range, and set the temperature of the GPU card in the GPU server (GPU temperature), the amount of power consumption, and the fan rotation. The rate was measured.
  • the GPU temperature increases by approximately 10 degrees at 33 degrees Celsius when comparing the inlet temperature between 20 degrees Celsius and 33 degrees Celsius, as shown in FIG.
  • the power consumption of the GPU card also increases by approximately 20 W at 33 degrees Celsius.
  • the fan rotation rate of the GPU server also increases by approximately 10% at 33° C., as shown in FIG. Note that the horizontal axis in FIGS. 1 to 3 represents time [minutes: seconds] from the start of measurement.
  • the present invention was made in view of these points, and the present invention reduces the total power consumption consisting of server power consumption and air conditioning power consumption in an environment where CPU servers, GPU servers, accelerators, etc. coexist.
  • the task is to do so.
  • a power consumption reduction control device is a power consumption reduction control device that controls a CPU server, a GPU server, an accelerator, and a plurality of air conditioners, wherein any one of the CPU server, the GPU server, and the accelerator is A plurality of placement control areas to be arranged and an air conditioning control area that is an area for measuring the effect of air conditioning control by the plurality of air conditioners are set, and the power consumption reduction control device an air conditioning control value generation unit that generates an air conditioning control value that includes at least a target temperature set to a target temperature; an air conditioning control execution unit that executes control of the plurality of air conditioners using the air conditioning control value; In a plurality of layout patterns in which processing loads are placed on the GPU server and the accelerator, the air conditioning control execution unit calculates a reward for evaluating the results of controlling the plurality of air conditioners using the air conditioning control value using the target temperature as an index.
  • a remuneration calculation unit that determines whether the remuneration satisfies a predetermined condition, and a remuneration calculation unit that determines whether or not the remuneration satisfies a predetermined condition, and temperature distribution information and information of the plurality of air conditioners as a control result based on the air conditioning control value that is determined to satisfy the predetermined condition.
  • an operation history creation unit that acquires air conditioning power consumption and creates operation history information associated with the predicted heat generation amount of each layout control area in each of the plurality of layout patterns; a placement pattern calculation unit that calculates a plurality of placement patterns in which new processing loads are placed using processing load information; and a CPU server and a GPU server that belong to each of the placement control areas for each of the calculated placement patterns. and an area heat generation estimation unit that estimates the predicted heat generation amount of each of the placement control areas by summing the heat generation amount when processing loads are placed in the accelerators, and an area heat generation estimation unit that estimates the predicted heat generation amount of each of the placement control areas.
  • an operation history information extraction unit that extracts the temperature distribution information and the air conditioning power consumption amount when controlling using the air conditioning control value in each arrangement pattern with reference to the operation history information, and the extracted temperature distribution; Using the information and information regarding the new processing load, total the power consumption of each CPU server, the power consumption of each GPU server, and the power consumption of each accelerator for each of the placement control areas in each of the placement patterns.
  • a server power consumption prediction unit that calculates the server power consumption of the server; and a server power consumption prediction unit that calculates the server power consumption of each of the placement control areas in each of the placement patterns; and an arrangement pattern determining unit that calculates a total amount of the air-conditioning power consumption and determines an arrangement pattern that minimizes the calculated total amount as an arrangement pattern in which the processing load is arranged.
  • the total power consumption consisting of server power consumption and air conditioning power consumption can be reduced.
  • FIG. 3 is a diagram for comparing GPU temperatures when the inlet temperature is 20° C. and 33° C.
  • FIG. 4 is a diagram for comparing the power consumption of the GPU card when the inlet temperature is 20° C. and 33° C. It is a figure for comparing the FAN rotation rate of the GPU server when the suction port temperature is 20° C. and 33° C.
  • 1 is a diagram showing the overall configuration of a power amount reduction control system including a power amount reduction control device according to the present embodiment.
  • FIG. 2 is a functional block diagram showing a configuration example of a power amount reduction control device according to the present embodiment.
  • FIG. 3 is a diagram for explaining situation classification according to the present embodiment.
  • FIG. 3 is a diagram for explaining temperature distribution information according to the present embodiment.
  • FIG. 2 is a hardware configuration diagram showing an example of a computer that implements the functions of the power reduction control device according to the present embodiment.
  • FIG. 4 is a diagram showing the overall configuration of the power amount reduction control system 1 including the power amount reduction control device 100 according to the present embodiment.
  • the power consumption reduction control system 1 includes a plurality of CPU servers 3, a GPU server 4, an accelerator 5, and a plurality of It is configured to include a data center (DC 10) having an air conditioner 2 and a power consumption reduction control device 100.
  • a plurality of CPU servers 3, GPU servers 4, and accelerators 5 accommodated in a predetermined control area (arrangement control area) may be hereinafter referred to as a "server group" or collectively as a "server.” 4 and FIG. 7, which will be described later, the server 3 is represented by an unfilled hexagon, the GPU server 4 is represented by a hexagon with a plurality of diagonal lines, and the accelerator 5 is represented by a hexagon filled with dots.
  • This server group includes at least one CPU server 3, GPU server 4, and accelerator 5.
  • the accelerator 5 This also includes a case where there is no server and the server is configured with a CPU server 3 and a GPU server 4.
  • the power amount reduction control device 100 may be provided inside the DC 10 or may be provided in a location different from the DC 10 and may control a plurality of DCs 10.
  • This power consumption reduction control device 100 receives status information of air conditioners 2 (air conditioners “1”, “2”, and “3” in FIG. 4) installed in the DC 10 via an air conditioning management device (not shown). The information may be acquired, the air conditioning control information may be transmitted, or the communication may be directly connected to each air conditioner 2 without using an air conditioning management device.
  • the power consumption reduction control device 100 also acquires status information and controls the CPU server 3, GPU server 4, and accelerator 5 that are accommodated as a server group provided in the DC 10 via a server management device (not shown). It may transmit information, or it may be directly communicatively connected to each CPU server 3, GPU server 4, and accelerator 5.
  • the CPU server 3, the GPU server 4, and the accelerator 5 are arranged as shown in FIG.
  • Each area is divided and controlled as a "placement control area.”
  • This placement control area 30 is an area that accommodates a group of servers on which processing loads (virtual resources, processing of GPUs, accelerators, etc.) are placed.
  • FIG. 4 shows an example in which placement control areas "1" to "6" are provided.
  • a virtualization infrastructure is constructed in the CPU server 3, and the description will be made assuming that it is operated using containers and VMs.
  • OpenStack registered trademark
  • Kubernetes registered trademark
  • OpenStack is primarily used for managing and operating physical machines and virtual machines (VMs).
  • Kubernetes is mainly used for managing and operating containers.
  • an application consisting of one or more containers, one or more VMs, etc.
  • a virtual resource In this specification, an application (consisting of one or more containers, one or more VMs, etc.) that is virtualized on a virtualization platform is referred to as a virtual resource.
  • the minimum execution unit of an application is a Pod, which is made up of one or more containers.
  • an "air conditioning control area” is provided as shown in FIG. 4 in association with the placement control area 30 of the server group.
  • the air conditioning control area 20 is a grouped area in which the room temperature effect due to air conditioning control is measured, and faces either the suction port side or the discharge port side of each server (CPU server 3, GPU server 4, accelerator 5). shall be taken as a thing.
  • the air blown from the air conditioner 2 is sent to the air conditioning control areas 20 (in FIG. 4, air conditioning control areas "3", "4", and "7”) on the suction side via piping installed under the floor of the DC 10, for example. ”, “8”).
  • a plurality of sensors are installed in each of the air conditioning control areas 20. Furthermore, temperature sensors are also installed at the suction ports of the GPU servers 4 and accelerators 5 in each placement control area 30. Furthermore, a sensor (temperature sensor, etc.) is installed outside the DC 10 as well. Information obtained from these sensors (sensor information) can be acquired by the power consumption reduction control device 100 via a communication line or the like.
  • the power consumption reduction control device 100 reduces the load to each server resource (this embodiment The amount of heat generated for each placement control area 30 ("predicted heat generation amount of placement control area" to be described later) in the placement pattern arranged in the CPU server 3, GPU server 4, and accelerator 5 is predicted. Note that the above “/" means “and/or”.
  • the power consumption reduction control device 100 sets air conditioning control values for the air conditioner 2 in multiple stages for each situation (“Situation” to be described later) of the external temperature of the DC 10, the floor temperature, and the predicted amount of heat generation in the placement control area 30. Temperature distribution information and air conditioning power consumption information when controlled at each stage are held.
  • the power consumption reduction control device 100 calculates the server power consumption (to be described later, "total server power consumption") of each placement control area 30 based on the temperature distribution information etc. when controlling at each stage, Determine the layout pattern that minimizes the total amount of server power consumption and air conditioning power consumption (details will be described later).
  • the power amount reduction control device 100 will be described in detail below.
  • FIG. 5 is a functional block diagram showing a configuration example of the power amount reduction control device 100 according to the present embodiment.
  • the power consumption reduction control device 100 predicts the amount of heat generated (predicted amount of heat generated) for each placement control area 30 in each load placement pattern of server resources (CPU server 3, GPU server 4, accelerator 5), and calculates the situation (Situation). ), temperature distribution information 64 and air conditioning power consumption information 65 are acquired when the air conditioning of the air conditioner 2 is controlled. Then, the power consumption reduction control device 100 calculates the server power consumption of each of the CPU server 3, GPU server 4, and accelerator 5 using the learning model, and adds up the power consumption of each of the CPU servers 3, GPU servers 4, and accelerators 5. 3. Calculate the total power consumption of the GPU server 4 and accelerator 5.
  • the power consumption reduction control device 100 calculates the total server power consumption (total server power consumption to be described later), which is the sum of the total power consumption of the CPU server 3, GPU server 4, and accelerator 5 in each placement control area 30. , and the air-conditioning power consumption, determine an arrangement pattern that minimizes the total amount, and execute load arrangement and air-conditioning control based on the arrangement pattern.
  • This power consumption reduction control device 100 is constituted by a computer including a control section, an input/output section, and a storage section (all not shown).
  • the input unit inputs and outputs information between each device in the DC 10 (each air conditioner 2 and each server (CPU server 3, GPU server 4, accelerator 5)), etc.
  • This input/output unit is composed of a communication interface that sends and receives information via a communication line, and an input/output interface that inputs and outputs information between an input device such as a keyboard (not shown) and an output device such as a monitor (not shown). be done.
  • the storage unit includes a hard disk, flash memory, RAM (Random Access Memory), and the like.
  • This storage section temporarily stores programs for executing each function of the control section and information necessary for processing of the control section.
  • This storage unit also contains control values for the air conditioner 2 (air conditioning control value information 63) for each Situation in each arrangement pattern, temperature distribution information 64 as the control result, air conditioning power consumption information 65, etc. Operation history information 201 indicated by is stored.
  • the storage unit includes basic power consumption information 301 for calculating the predicted heat generation amount of each server (CPU server 3, GPU server 4, accelerator 5), and CPU information 301 for predicting the power consumption of the CPU server 3.
  • a server power amount learning model 302, a GPU server power amount learning model 303 for predicting the power consumption of the GPU server 4, and an accelerator power amount learning model 304 for predicting the power consumption of the accelerator 5 are stored. (Details below).
  • the control unit is in charge of overall processing executed by the power consumption reduction control device 100, and is configured to include an air conditioning control unit 200 and a server control unit 300, as shown in FIG.
  • the air conditioning control unit 200 uses the average temperature of the floor in the DC 10 before control (floor average temperature), the outside temperature (outside air temperature), and the predicted amount of heat generation for each location control area 30 as Situation components, and performs each air conditioning control for each Situation.
  • temperature distribution information 64 is acquired in the control turn, and air conditioning power consumption information 65 is calculated, thereby generating operation history information 201.
  • the phase in which this operation history information 201 is generated is referred to as a learning phase.
  • the air conditioning control unit 200 acquires information on the predicted heat generation amount of each placement control area 30 from the server control unit 300 during the operation phase in which load placement and air conditioning control are actually executed, the air conditioning control unit 200 selects the corresponding Situation (Situation classification 62). , and outputs temperature distribution information 64 and air conditioning power consumption information 65 to the server control unit 300. Then, the air conditioning control unit 200 causes each air conditioner 2 to perform air conditioning control in the arrangement pattern determined by the server control unit 300.
  • the air conditioning control unit 200 includes a situation recognition unit 210, an operation history information generation unit 220, an operation history information extraction unit 230, and an air conditioning control execution unit 240.
  • the situation recognition unit 210 acquires information on external factors that are parameter elements that constitute the Situation. Then, the situation recognition unit 210 divides each external world factor into a plurality of ranges, defines a combination of each range area as one situation, and determines a situation classification 62 indicating which situation it belongs to based on the acquired information on the external world factor. .
  • the situation recognition section 210 includes an external world factor acquisition section 211 and a situation determination section 212.
  • the external world factor acquisition unit 211 acquires information on the measurement results of external world factors.
  • the external factor is an element that affects an increase or decrease in air conditioning power consumption, and means a parameter element that constitutes the Situation classification 62.
  • the external factors are (1) the average floor temperature in the DC 10 before control, (2) the outside temperature (outside air temperature), and (3) the predicted amount of heat generation for each placement control area 30.
  • the external factor acquisition unit 211 calculates the floor average temperature in the DC 10 before control as follows.
  • the external factor acquisition unit 211 calculates the average value of the temperatures acquired from the temperature sensors of the air conditioning control area 20, and calculates the average temperature for each air conditioning control area 20. Then, the external factor acquisition unit 211 averages the calculated average temperature for each air conditioning control area 20 over the entire floor, and sets the obtained temperature as the floor average temperature.
  • the external temperature is information obtained from a temperature sensor installed outside the DC 10.
  • the predicted amount of heat generation for each placement control area 30 is information calculated by the server control unit 300 (details will be described later).
  • the external world factor acquisition unit 211 acquires information on this external world factor, it outputs it to the Situation determination unit 212 .
  • the Situation determining unit 212 determines to which Situation classification 62 the information acquired by the external world factor acquiring unit 211 belongs. Each external factor is divided into a plurality of ranges between the minimum value and the maximum value depending on the characteristics of the external factor. Then, a combination of ranges obtained by dividing each external factor is defined as one Situation. This will be explained below with reference to FIG.
  • each of the external factors is defined as a "factor”, and a range for division is defined (hereinafter referred to as "division definition").
  • the external factor of "factor 1” shown in the Situation classification 62 in FIG. 6 is "floor average temperature”, and the division definition is "0-48 degrees divided into 6".
  • the external factor of "factor 2" is “external temperature”, and the division definition is "0-48 degrees divided into 6”.
  • the external factor of "factor 3” is "predicted amount of heat generation in placement control area 1", and the division definition is "0-200W divided into 20".
  • the external factor of "factor 8” is “predicted heat generation amount of placement control area 6", and the division definition is "0-200W divided into 20".
  • the external world factor information acquired by the Situation determination unit 212 is the external world factor information 61 shown in FIG.
  • the Situation determination unit 212 determines that since the value of "factor 1" (floor average temperature) is "25", the “range” is included in the "24-32 range” (24 degrees or more and less than 32 degrees), Set the "factor range identifier” to "factor1-4".
  • This "factor range identifier” is, for example, 0-48 degrees divided into 6, "factor1-1” for 0 degrees or more and less than 8 degrees, “factor1-2” for 8 degrees or more and less than 16 degrees, and “factor 1-2” for 8 degrees or more and less than 16 degrees, and 16 degrees or more and less than 16 degrees. This is information that identifies the range to which it belongs, such as "factor1-3". The same applies to other "factors”.
  • the Situation determining unit 212 combines the information on the "factor range identifiers" of the external factors to form a "Situation classification" and determines that it is "factor1-4_factor2-4_factor3-4_factor4-4_factor5-5_factor6-5_factor7-4_factor8-4". In this way, the Situation determining unit 212 determines the "Situation classification" based on the acquired information on external factors.
  • the operation history information generation unit 220 generates temperature distribution information 64 as a result of controlling the air conditioner 2 and air conditioning power consumption based on air conditioning control values that divide the control of the air conditioner 2 into multiple stages in each Situation. Operation history information 201 is generated based on the amount information 65.
  • the operation history information generation section 220 includes an air conditioning control value generation section 221, a remuneration calculation section 222, and an operation history generation section 223.
  • the air conditioning control value generation unit 221 generates air conditioning control values divided into multiple stages of control of the air conditioner 2 in each Situation. Specifically, the air conditioning control value generation unit 221 sets each parameter between an upper limit value and a lower limit value for control parameters (for example, set temperature (target temperature), air volume, etc.) that can be changed in each air conditioner 2. It is divided into M stages. Then, air conditioning control value information 63 to be controlled by (one) air conditioner 2 is generated by combining the parameters of each stage.
  • control parameters for example, set temperature (target temperature), air volume, etc.
  • the air conditioning control execution unit 240 may, for example, control the air conditioners “1” ⁇ “2" ⁇ “3" in the order of air conditioners “1” ⁇ “2” ⁇ “3”, or control the air conditioners "1" and “2” , Air conditioning control with multiple patterns such as controlling air conditioners ⁇ 2'' and ⁇ 3'', controlling air conditioners ⁇ 1'' and ⁇ 3'' in combination, or controlling air conditioners ⁇ 1'', ⁇ 2'', and ⁇ 3'' simultaneously. Execute.
  • the reward calculation unit 222 calculates a reward (temperature reward) as an index for evaluating the result of executing control using the air conditioning control value generated by the air conditioning control value generation unit 221. Then, the remuneration calculation unit 222 determines whether the control result satisfies a predetermined remuneration, that is, whether the air conditioning control value satisfies a predetermined condition.
  • the reward calculation unit 222 defines two types of rewards, a high temperature warning reward and a low temperature warning reward, for each air conditioning control area 20, and calculates the reward for the control result for each turn.
  • the high temperature warning reward is applied when the temperature before control is higher than the target temperature, that is, when the room temperature is high and the temperature is controlled to be lowered.
  • the low temperature warning reward is applied when the temperature before control is lower than the target temperature, that is, when the room temperature is too low and the temperature is controlled to increase.
  • the reward calculation unit 222 uses the difference between the "target temperature of the turn” and the "temperature after turn control", that is, the deviation between the target temperature and the current temperature, as an index. For example, in the case of a high temperature warning reward, if the "temperature after turn control" is less than or equal to the "turn target temperature", the reward is "100%". In addition, the reward will be ⁇ -10%'' every time the ⁇ temperature after turn control'' increases by +1 degree from the ⁇ turn target temperature.'' Note that this reward is not limited to the above value and can be set arbitrarily.
  • the reward is "100%".
  • the reward will be ⁇ -10%'' every time the ⁇ temperature after turn control'' decreases by -1 degree from the ⁇ turn target temperature.'' Note that this reward is not limited to the above value and can be set arbitrarily.
  • the reward calculation unit 222 may determine that the test is finally passed when both the high temperature warning reward and the low temperature warning reward are passed. For example, if the initial temperature of air conditioner 2 before control is higher than the target temperature and control is performed using the air conditioning control value based on the high temperature warning reward, the predetermined passing threshold is exceeded, but the target temperature is exceeded and the temperature becomes low. There is a possibility of excessive control. In this case, excessive air conditioning power consumption occurs. Therefore, if the temperature before control is lower than the target temperature, control is performed based on the low temperature warning reward until it is determined to pass. In this manner, the reward calculation unit 222 determines that both the high temperature warning reward and the low temperature warning reward are passed, thereby making it possible to select an air conditioning control value that can be controlled within an appropriate range.
  • the reward calculation unit 222 calculates The air conditioning control value will be rejected.
  • the GPU server 4 if the GPU temperature does not fall within a predetermined range determined from the relationship between the GPU temperature (GPU card temperature) and the power consumption of the GPU server, it is judged as a failure.
  • the accelerator 5 if the accelerator temperature does not fall within a predetermined range determined from the relationship between the accelerator temperature (temperature inside the accelerator) and power consumption, the accelerator 5 is judged to have failed. This is a process to avoid unnecessary storage of operation history that is unlikely to be adopted when the DC 10 actually processes a load and has a temperature inappropriate for managing devices within the DC 10.
  • the operation history creation unit 223 generates temperature distribution information 64 and air conditioning consumption as a result of the air conditioning control execution unit 240 controlling each air conditioner 2 based on the air conditioning control value information 63 generated by the air conditioning control value generation unit 221 in each Situation.
  • the power amount information 65 is acquired. That is, the operation history creation unit 233 causes the air conditioning control execution unit 240 to execute control for each air conditioning control value for each pattern of combinations in which each parameter is divided into M stages, generated by the air conditioning control value generation unit 221 in each Situation. , temperature distribution information 64 and air conditioning power consumption information 65 when the air conditioning control value is executed are acquired.
  • the operation history creation unit 223 excludes from the creation of the operation history information 201 air conditioning control values and control results that the remuneration calculation unit 222 determines as failing because they do not satisfy a predetermined condition.
  • FIG. 7 is a diagram for explaining the temperature distribution information 64 according to this embodiment.
  • the temperature distribution information 64 is obtained by a temperature sensor 44 provided on the suction port side of the GPU server 4 and a temperature sensor 55 provided on the suction port side of the accelerator 5, as shown in FIG. This is temperature information measured over a period of time.
  • the temperature sensors 44 and 55 measure temperature after their identification information is associated with the identification information of the GPU server 4 and accelerator 5 in advance. From the start to the end of the turn, all the temperature sensors 44 and 45 on the floor measure the temperature at each time transition (at predetermined time intervals), and the resulting information is generated as temperature distribution information 64.
  • the temperature measured by the temperature sensor 44 provided on the suction port side of the GPU server 4 will be referred to as “GPU suction port temperature”
  • the temperature measured by the temperature sensor 55 provided on the suction port side of the accelerator 5 will be referred to as “GPU suction port temperature”.
  • the acceleration suction port temperature is referred to as the "accelerator suction port temperature.”
  • the air conditioning power consumption information 65 is measured by an unillustrated power consumption measuring means that monitors the air conditioners 2 in the turn in which the air conditioning control execution unit 240 controls each air conditioner 2 based on the air conditioning control value information 63. This is the total power consumption of each air conditioner 2. For example, the power consumption of each air conditioner 2 is measured at each time transition (predetermined time interval), and the sum of the power consumption measured in that turn is calculated as the air conditioning power consumption information 65.
  • the operation history creation unit 233 stores temperature distribution information 64 and air conditioning obtained as a result of the control in the Situation (Situation classification 62) and air conditioning control value information 63 when the air conditioning control execution unit 240 executes the air conditioning control. Operation history information 201 associated with power consumption information 65 is created and stored in a storage unit (not shown).
  • the operation history information extraction unit 230 acquires information on the predicted heat generation amount of each placement control area 30 from the server control unit 300 (area heat generation amount estimating unit 320) in the operation phase. Then, the operation history information extraction unit 230 determines the Situation classification 62 at the start of the control turn via the situation recognition unit 210. Then, the operation history information extraction unit 230 extracts temperature distribution information 64 and air conditioning power consumption information 65, which are the results of control using each air conditioning control value information 63 in the determined Situation classification 62, from the operation history information 201. . The operation history information extraction unit 230 outputs the extracted temperature distribution information 64 and air conditioning power consumption information 65 to the server control unit 300 (server power consumption prediction unit 330, layout pattern determination unit 340).
  • the air conditioning control execution unit 240 controls the air conditioner 2 in the plurality of patterns described above in each Situation using the air conditioning control value generated by the air conditioning control value generation unit 221. Furthermore, the air conditioning control execution unit 240 executes air conditioning control for each air conditioner 2 in the optimal arrangement pattern determined by the server control unit 300 in the operation phase.
  • the server control unit 300 allocates the load to each server resource (CPU server 3, GPU server 4, accelerator 5) based on the generation/deletion schedule information of the virtual resource that becomes the processing load on the CPU and the load on the GPU/accelerator.
  • the amount of heat generated for each placement control area 30 in the placement pattern (predicted amount of heat generation for the placement control area 30) is calculated.
  • the server control unit 300 calculates the total server power consumption, which is the sum of the server power consumption of each placement control area 30, based on the temperature distribution information 64 etc. obtained from the air conditioning control unit 200, which controls the air conditioning at each stage for each Situation. Calculate the power amount (total server power consumption).
  • the server control unit 300 calculates the total value of the total server power consumption and the air conditioning power consumption, and determines an arrangement pattern that minimizes the total amount.
  • the server control section 300 includes an arrangement pattern calculation section 310, an area heat generation estimation section 320, a server power consumption estimation section 330, and an arrangement pattern determination section 340.
  • the placement pattern calculation unit 310 calculates a generation/deletion schedule (hereinafter referred to as "load processing schedule information") regarding the load on virtual resources and GPUs/accelerators that become a processing load on the CPU server 3. get. Then, the placement pattern calculation unit 310 assigns a new load to each server resource (CPU server 3, GPU server 4, accelerator 5) based on the latest resource usage status (for example, usage rate of CPU, GPU, accelerator, etc.). Calculate the placed layout pattern. Note that, after allocating the load to each server resource (CPU server 3, GPU server 4, accelerator 5), the allocation pattern calculation unit 310 calculates that the resource occupation amount of each server resource is equal to or less than the load capacity (upper limit) x a predetermined threshold. Make it so that
  • the area heat generation estimation unit 320 calculates the power consumption of each server resource (CPU server 3, GPU server 4, accelerator 5) for each layout pattern calculated by the layout pattern calculation unit 310, with reference to the basic power consumption information 301. and predict. Then, the area heat generation amount estimating unit 320 calculates the total predicted heat generation amount for each placement control area 30 in each placement pattern based on the server arrangement configuration for each placement control area 30.
  • This basic power consumption information 301 does not take into account changes in server power consumption due to changes in the temperature of the inlet of each server resource, for example, the power consumption that is the standard for a normal state at a predetermined temperature (18 degrees Celsius). It's the amount.
  • the load processing schedule information is to execute 12 Pods of CPU processing "a", 5 Pods of GPU processing "b", and 10 Pods of FPGA processing "c" in placement control area "1".
  • the basic power consumption information 301 is "100w” for one Pod per server for CPU, "5kw” for one Pod per server for GPU, and "20w” for one Pod per server for FPGA.
  • the power consumption of the CPU server 3 in the placement control area "1" is "1.2 kw”
  • the power consumption of the GPU server 4 is "5 kw”
  • the power consumption of the FPGA is "0.2 kw”.
  • the area calorific value estimating unit 320 calculates the calorific value W of the corresponding placement control area 30 using the following equation (1).
  • Calorific value W of placement control area CPU server power consumption of applicable placement control area x kc + GPU server power consumption of applicable placement control area x kg + Accelerator power consumption of applicable placement control area x ka...Formula ( 1)
  • kc, kg, and ka are coefficients obtained by measuring in advance the amount of load on air conditioning cooling in each room in the DC 10 when the CPU server 3, GPU server 4, and accelerator 5 are in operation.
  • the area calorific value estimating unit 320 calculates the calorific value of each placement control area 30 by summing the calorific value of the CPU server 3, GPU server 4, and accelerator 5 using the above equation (1), and calculates the calorific value of each placement control area 30. This is the predicted amount of heat generation in the control area 30. Then, the area heat generation amount estimating unit 320 outputs the calculated predicted heat generation amount of each placement control area 30 to the air conditioning control unit 200 (operation history information extraction unit 230).
  • the server power consumption prediction unit 330 extracts load processing schedule information (information regarding new processing loads) for the CPU server 3, GPU server 4, and accelerator 5, and the air conditioning control unit 200 (operation history information extraction).
  • the total power consumption (CPU server total power consumption, GPU Calculate the total power consumption of the server and the total power consumption of the accelerator.
  • the server power consumption prediction unit 330 adds up the CPU server total power consumption, GPU server total power consumption, and accelerator total power consumption in the placement control area 30, and calculates the total amount of power consumption in the placement control area 30. Calculate the power consumption of each server.
  • the server power consumption prediction section 330 includes a CPU power consumption prediction section 331 , a GPU power consumption prediction section 332 , and an accelerator power consumption prediction section 333 .
  • the CPU power amount prediction unit 331 calculates information on the amount of virtual resources to be newly allocated based on the load processing schedule information (for example, the number of CPU cores) and the resource usage status of the CPU server 3 at that time (for example, , CPU usage rate). Then, the CPU power amount prediction unit 331 predicts the power consumption of each CPU server 3 using the CPU server power amount learning model 302. More specifically, when switching control turns, the CPU power amount prediction unit 331 deletes virtual resources (for example, Pods) whose processing was completed in the previous control turn.
  • the load processing schedule information for example, the number of CPU cores
  • the resource usage status of the CPU server 3 at that time for example, CPU usage rate
  • the CPU power amount prediction unit 311 obtains the resource usage rate (for example, CPU usage rate, memory usage rate, etc.) of each CPU server 3 excluding the deleted Pods. .
  • the CPU power amount prediction unit 311 calculates a predicted value of the resource usage rate when a new Pod is placed in each CPU server 3 based on the load processing schedule information.
  • the CPU power amount prediction unit 311 calculates the power consumption of each CPU server 3 by inputting this predicted value of the resource usage rate into the CPU server power amount learning model 302.
  • the CPU power amount prediction unit 331 calculates the CPU server total power consumption by summing up the power consumption calculated for each CPU server 3 in each of the placement control areas 30 based on the placement configuration of the CPU servers 3 in each placement control area 30. Calculate the amount.
  • the CPU server power consumption learning model 302 is a learning model that uses the resource usage status of the CPU server 3 (for example, CPU usage rate, memory usage rate, etc.) as input information, and uses the power consumption amount of the CPU server 3 as output information. It is.
  • This CPU server power amount learning model 302 is created in advance using the resource usage status of the CPU server 3 and the server power consumption, which is result information at that time, as learning data.
  • the GPU power amount prediction unit 332 predicts the amount of electricity based on the type of processing load that is newly scheduled to be processed (hereinafter referred to as "load type") obtained from the load processing schedule information, the GPU inlet temperature, the number of GPU cards, etc. , the GPU server power consumption of each GPU server 4 is predicted using the GPU server power learning model 303. Furthermore, the GPU power amount prediction unit 332 calculates the total GPU server power consumption, which is the sum of the power consumption of each GPU server 4 in each placement control area 30, based on the placement configuration of the GPU servers 4 in each placement control area 30. calculate.
  • the load type is a type of load depending on the purpose of executing the GPU server 4, such as image processing, machine learning processing, network processing, virtual space processing, etc., and each is determined using load processing schedule information. It is assumed that the type of load can be specified. Furthermore, it is assumed that each GPU server 4 executes a single type of application based on the load processing schedule information.
  • This GPU server power consumption learning model 303 has two methods: a method of directly predicting GPU server power consumption (one-step method), and a method of predicting GPU server power consumption in two stages via GPU temperature (GPU card temperature). There is a two-stage system).
  • one learning model is used as the GPU server power amount learning model 303.
  • This GPU server power consumption learning model 303 is a learning model that uses the GPU inlet temperature, load type, and number of GPU cards as input information, and uses the power consumption of the GPU server 4 as output information.
  • This GPU server power amount learning model 303 is created in advance using the GPU inlet temperature, load type, number of GPU cards, and information on the power consumption of the GPU server 4 at that time as learning data.
  • the GPU power amount prediction unit 332 calculates the GPU server power amount based on the GPU inlet temperature, load type, and number of GPU cards for each GPU server 4 that is not assigned processing at the beginning of the turn. Using the learning model 303, the GPU server power consumption of each GPU server 4 is predicted.
  • the GPU inlet temperature uses the current inlet temperature (GPU inlet temperature) of each GPU server 4 at the beginning of the turn, and thereafter is indicated by the temperature distribution information 64 acquired from the air conditioning control unit 200. Information on the GPU inlet temperature of each GPU server 4 (temperature distribution information 64 starting from the same temperature as the current GPU inlet temperature) is used (the same applies to the two-stage system).
  • the first GPU learning model 303a is a learning model that uses GPU inlet temperature, load type, and number of GPU cards as input information, and uses GPU temperature as output information.
  • This first GPU learning model 303a is created in advance using the GPU inlet temperature, load type, number of GPU cards, and current GPU temperature as learning data.
  • the second GPU learning model 303b is a learning model that uses the GPU temperature as input information and uses the power consumption of the GPU server 4 as output information. This second GPU learning model 303b is created in advance using the GPU temperature and the power consumption of the GPU server 4 at that time as learning data.
  • the GPU power amount prediction unit 332 When adopting the two-stage method, the GPU power amount prediction unit 332 generates a first GPU learning model based on the GPU inlet temperature, load type, and number of GPU cards for each GPU server 4 that is not assigned processing at the beginning of the turn. 303a to predict the GPU temperature. Then, the GPU power amount prediction unit 332 predicts the GPU server power consumption of each GPU server 4 based on the predicted GPU temperature using the second GPU learning model 303b.
  • the GPU power amount prediction unit 332 adds up the predicted power consumption of each GPU server 4 in each of the placement control areas 30 based on the placement configuration of the GPU servers 4 in each placement control area 30, and calculates the predicted amount of power for each placement control area 30. The total power consumption of 30 GPU servers is calculated.
  • the accelerator power amount prediction unit 333 performs accelerator power amount learning based on the type of accelerator processing load (load type) that is newly scheduled to be processed, the accelerator inlet temperature, the number of accelerator processing circuits, etc. obtained from the load processing schedule information. Using the model 304, the accelerator power consumption of each accelerator 5 is predicted. Further, the accelerator power amount prediction unit 333 calculates the total accelerator power consumption amount, which is the sum of the power consumption amounts of each accelerator 5 in each placement control area 30, based on the arrangement configuration of the accelerators 5 in each placement control area 30.
  • the load type is a type of load depending on the purpose of executing the accelerator 5, such as image processing, machine learning processing, internet processing, encryption processing, etc., and the load type is determined using load processing schedule information. It shall be possible to specify. Further, it is assumed that each accelerator 5 executes a single type of application based on a load processing schedule.
  • This accelerator power consumption learning model 304 like the GPU server power consumption learning model 303, uses a method (one-step method) of directly predicting accelerator power consumption and a two-step method using the accelerator temperature (temperature inside the accelerator). There is a method (two-step method) for predicting accelerator power consumption.
  • one learning model is used as the accelerator power amount learning model 304.
  • This accelerator power consumption learning model 304 is a learning model that uses the accelerator inlet temperature, load type, and number of accelerator processing circuits as input information, and uses the power consumption of the accelerator 5 as output information.
  • This accelerator power amount learning model 304 is created in advance using the accelerator inlet temperature, load type, number of accelerator processing circuits, and information on the power consumption of the accelerator 5 at that time as learning data.
  • the accelerator power amount prediction unit 333 performs accelerator power amount learning for each accelerator 5 that is not assigned processing at the beginning of a turn based on the accelerator inlet temperature, load type, and number of accelerator processing circuits. Using the model 304, the accelerator power consumption of each accelerator 5 is predicted. Note that the accelerator suction port temperature uses the current suction port temperature of each accelerator 5 (accelerator suction port temperature) at the beginning of the turn, and thereafter is indicated by the temperature distribution information 64 acquired from the air conditioning control unit 200. Information on the accelerator suction port temperature of each accelerator 5 (temperature distribution information 64 starting from the same temperature as the current accelerator suction port temperature) is used (the same applies to the two-stage system).
  • the first accelerator learning model 304a is a learning model that uses the accelerator inlet temperature, the load type, and the number of accelerator processing circuits as input information, and uses the accelerator temperature as output information.
  • This first accelerator learning model 304a is created in advance using the accelerator inlet temperature, load type, number of accelerator processing circuits, and accelerator temperature at that time as learning data.
  • the second accelerator learning model 304b is a learning model that uses the accelerator temperature as input information and uses the power consumption of the accelerator 5 as output information. This second accelerator learning model 304b is created in advance using the accelerator temperature and information on the power consumption of the accelerator 5 at that time as learning data.
  • the accelerator power amount prediction unit 333 When adopting the two-stage method, the accelerator power amount prediction unit 333 performs first accelerator learning based on the accelerator inlet temperature, load type, and number of accelerator processing circuits for each accelerator 5 that is not assigned processing at the beginning of the turn. Model 304a is used to predict accelerator temperature. Then, the accelerator power amount prediction unit 333 predicts the accelerator power consumption of each accelerator 5 based on the predicted accelerator temperature using the second accelerator learning model 304b.
  • the accelerator power amount prediction unit 333 calculates the predicted power consumption of each accelerator 5 in each placement control area 30 based on the arrangement configuration of the accelerators 5 in each placement control area 30. Calculate the total accelerator power consumption.
  • the server power consumption prediction unit 330 totals the CPU server total power consumption, the GPU server total power consumption, and the accelerator total power consumption in the placement control area 30, and controls the placement.
  • the server power consumption of each area 30 is calculated.
  • the placement pattern determination unit 340 sums up the server power consumption of each placement control area 30 in each placement pattern, and calculates the total server power consumption (total server power consumption).
  • the arrangement pattern determination unit 340 calculates the total amount of the calculated total server power consumption and the air conditioning power consumption in the arrangement pattern obtained from the air conditioning control unit 200, and selects the arrangement pattern that minimizes the total amount. decide.
  • FIG. 8 is a flowchart showing the flow of operation history information generation processing executed by the power consumption reduction control device 100 according to the present embodiment.
  • the air conditioning control value generation unit 221 of the air conditioning control unit 200 (operation history information generation unit 220) of the power consumption reduction control device 100 generates control parameters (for example, set temperature (target temperature)) that can be changed in each air conditioner 2. , air volume, etc.), air conditioning control values divided into multiple stages are generated (step S1). Specifically, the air conditioning control value generation unit 221 divides each parameter into M stages from an upper limit value to a lower limit value, and combines the parameters of each stage to generate air conditioning control value information 63 for each air conditioner 2. do.
  • control parameters for example, set temperature (target temperature)
  • the air conditioning control execution unit 240 executes air conditioning control in a plurality of patterns (step S2). For example, the air conditioning control execution unit 240 controls each air conditioning control value in the order of air conditioners "1" ⁇ “2" ⁇ “3", or controls air conditioners "1" and "2", air conditioner "2", etc. Air conditioning control is executed in multiple patterns, such as controlling a combination of air conditioners 1 and 3, or controlling air conditioners 1, 2, and 3 simultaneously.
  • the operation history information generation unit 220 (remuneration calculation unit 222) generates a reward (temperature reward) as an index for evaluating the result of performing air conditioning control using the air conditioning control value generated by the air conditioning control value generation unit 221.
  • the remuneration calculation unit 222 determines whether the control result satisfies a predetermined remuneration, that is, whether the air conditioning control value satisfies a predetermined condition.
  • the reward calculation unit 222 determines that the reward is passed if the calculated reward is equal to or higher than a predetermined threshold and satisfies predetermined conditions such as the average floor temperature after the control turn, the GPU temperature, the accelerator temperature, etc. are within the specified range. It is determined that
  • the operation history information generation section 220 uses the air conditioning control value information 63 generated by the air conditioning control value generation section 221 in each Situation and determined to be acceptable by the remuneration calculation section 222.
  • temperature distribution information 64 and air conditioning power consumption information 65 are acquired (step S4).
  • This temperature distribution information 64 includes the GPU suction port temperature measured by the temperature sensor 44 provided on the suction port side of the GPU server 4 and the accelerator suction temperature measured by the temperature sensor 55 provided on the suction port side of the accelerator 5. This is information obtained by measuring mouth temperature at predetermined time intervals.
  • the air conditioning power consumption information 65 is the total power consumption of each air conditioner 2 measured in a predetermined control turn.
  • the operation history creation unit 223 adds temperature distribution information 64 obtained as a result of the control to the Situation (Situation classification 62) and air conditioning control value information 63 when the air conditioning control execution unit 240 executes the air conditioning control. , and air conditioning power consumption information 65 to create operation history information 201 (step S5) and store it in the storage unit.
  • the power consumption reduction control device 100 creates the operation history information 201 generation process in advance in the learning phase before the operation phase.
  • FIG. 9 is a flowchart showing the flow of the arrangement pattern determination process executed by the power amount reduction control device 100 according to the present embodiment.
  • the server control unit 300 (arrangement pattern calculation unit 310) of the power reduction control device 100 acquires load processing schedule information, and at the start of each control turn, the server control unit 300 (arrangement pattern calculation unit 310) (e.g. usage rate), a placement pattern for placing a new load on each server resource (CPU server 3, GPU server 4, accelerator 5) is calculated (step S10).
  • the server control unit 300 (arrangement pattern calculation unit 310) (e.g. usage rate), a placement pattern for placing a new load on each server resource (CPU server 3, GPU server 4, accelerator 5) is calculated (step S10).
  • the area heat generation estimation unit 320 calculates the power consumption of each server resource (CPU server 3, GPU server 4, accelerator 5) for each layout pattern calculated by the layout pattern calculation unit 310, based on basic power consumption information. Prediction is made with reference to 301. Then, the area heat generation estimation unit 320 calculates the total predicted heat generation amount (predicted heat generation amount of the placement control area) for each placement control area 30 in each placement pattern based on the server placement configuration for each placement control area 30 ( Step S11). Then, the area heat generation amount estimating unit 320 outputs the calculated predicted heat generation amount of each placement control area 30 to the air conditioning control unit 200 (operation history information extraction unit 230).
  • the air conditioning control unit 200 operation history information extraction unit 230.
  • the operation history information extraction unit 230 of the air conditioning control unit 200 acquires information on the predicted heat generation amount of each layout control area 30 from the server control unit 300 (area heat generation amount estimating unit 320). Then, the operation history information extraction unit 230 determines the Situation classification 62 at the start of the control turn via the situation recognition unit 210 (Step S12). The operation history information extraction unit 230 extracts temperature distribution information 64 and air conditioning power consumption information 65, which are the results of control using each air conditioning control value information 63 in the determined Situation classification 62, from the operation history information 201 (step S13). The operation history information extraction unit 230 outputs the extracted temperature distribution information 64 and air conditioning power consumption information 65 to the server control unit 300.
  • the server power consumption prediction unit 330 of the server control unit 300 obtains the load processing schedule information for the CPU server 3, GPU server 4, and accelerator 5 from the air conditioning control unit 200 (operation history information extraction unit 230). Using the temperature distribution information 64 of the placement pattern, etc., the total power consumption (CPU server total power consumption, GPU server total power consumption, Accelerator total power consumption) is calculated (step S14).
  • the server power consumption prediction unit 330 calculates information on the amount of virtual resources (for example, the number of CPU cores) to be newly allocated based on the load processing schedule. and the resource usage status (for example, CPU usage rate) of the CPU server 3 at that time, and predicts the power consumption of each CPU server 3 using the CPU server power learning model 302. Further, the CPU power amount prediction unit 331 calculates the total CPU server power consumption, which is the sum of the power consumption of each CPU server 3 in each of the placement control areas 30, based on the arrangement configuration of the CPU servers 3 in each placement control area 30. calculate.
  • the server power consumption prediction unit 330 calculates the load type of the new load obtained from the load processing schedule information, and the GPU intake port obtained from the temperature distribution information 64. Based on the temperature and the number of GPU cards, the GPU server power consumption of each GPU server 4 is predicted using the GPU server power learning model 303. Furthermore, the GPU power amount prediction unit 332 calculates the total GPU server power consumption, which is the sum of the power consumption of each GPU server 4 in each placement control area 30, based on the placement configuration of the GPU servers 4 in each placement control area 30. calculate.
  • the server power consumption prediction unit 330 calculates the load type of the new load obtained from the load processing schedule information and the accelerator suction port obtained from the temperature distribution information 64. Based on the temperature and the number of accelerator processing circuits, the accelerator power consumption of each accelerator 5 is predicted using the accelerator power learning model 304. Further, the accelerator power amount prediction unit 333 calculates the total accelerator power consumption amount, which is the sum of the power consumption amounts of each accelerator 5 in each placement control area 30, based on the arrangement configuration of the accelerators 5 in each placement control area 30.
  • the server power consumption prediction unit 330 totals the CPU server total power consumption, the GPU server total power consumption, and the accelerator total power consumption in the placement control area 30, and controls the placement.
  • the server power consumption of each area 30 is calculated (step S15).
  • the placement pattern determination unit 340 sums up the server power consumption of each placement control area 30 in each placement pattern, and calculates the total server power consumption (total server power consumption).
  • the arrangement pattern determination unit 340 calculates the total amount of the calculated total server power consumption and the air conditioning power consumption in the arrangement pattern obtained from the air conditioning control unit 200, and selects the arrangement pattern that minimizes the total amount. Determine (step S16).
  • the power reduction control device 100 reduces the total power consumption of the data center consisting of server power consumption and air conditioning power consumption in a data center environment where CPU servers, GPU servers, accelerators, etc. coexist.
  • the processing load arrangement pattern and air conditioning control value can be determined.
  • FIG. 10 is a hardware configuration diagram showing an example of a computer 900 that implements the functions of the power consumption reduction control device 100 according to the present embodiment.
  • the computer 900 includes a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, a RAM 903, an HDD (Hard Disk Drive) 904, an input/output I/F (Interface) 905, a communication I/F 906, and a media I/F 907. have a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, a RAM 903, an HDD (Hard Disk Drive) 904, an input/output I/F (Interface) 905, a communication I/F 906, and a media I/F 907.
  • a CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM 903 Random Access Memory
  • HDD Hard Disk Drive
  • I/F Interface
  • the CPU 901 operates based on a program stored in the ROM 902 or HDD 904, and performs control by the control unit.
  • the ROM 902 stores a boot program executed by the CPU 901 when the computer 900 is started, programs related to the hardware of the computer 900, and the like.
  • the CPU 901 controls an input device 910 such as a mouse or a keyboard, and an output device 911 such as a display or printer via an input/output I/F 905.
  • the CPU 901 acquires data from the input device 910 via the input/output I/F 905 and outputs the generated data to the output device 911.
  • a GPU Graphics Processing Unit
  • the like may be used in addition to the CPU 901 as the processor.
  • the HDD 904 stores programs executed by the CPU 901 and data used by the programs.
  • the communication I/F 906 receives data from other devices via a communication network (for example, NW (Network) 920) and outputs it to the CPU 901, and also sends data generated by the CPU 901 to other devices via the communication network. Send to device.
  • NW Network
  • the media I/F 907 reads the program or data stored in the recording medium 912 and outputs it to the CPU 901 via the RAM 903.
  • the CPU 901 loads a program related to target processing from the recording medium 912 onto the RAM 903 via the media I/F 907, and executes the loaded program.
  • the recording medium 912 is an optical recording medium such as a DVD (Digital Versatile Disc) or a PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto Optical disk), a magnetic recording medium, a semiconductor memory, or the like.
  • the CPU 901 of the computer 900 realizes the functions of the power consumption reduction control device 100 by executing a program loaded onto the RAM 903. Furthermore, data in the RAM 903 is stored in the HDD 904 .
  • the CPU 901 reads a program related to target processing from the recording medium 912 and executes it. In addition, the CPU 901 may read a program related to target processing from another device via a communication network (NW 920).
  • the power consumption reduction control device is a power consumption reduction control device 100 that controls a CPU server 3, a GPU server 4, an accelerator 5, and a plurality of air conditioners 2. 5 are arranged, and an air conditioning control area 20 which is an area for measuring the effect of air conditioning control by the plurality of air conditioners 2 is set.
  • An air conditioning control value generation unit 221 that generates an air conditioning control value including at least a target temperature to be set in a plurality of air conditioners 2, and an air conditioning control execution unit that executes control of the plurality of air conditioners 2 using the air conditioning control value.
  • the target temperature is used as an index for the results of the air conditioning control execution unit 240 controlling the plurality of air conditioners 2 using air conditioning control values.
  • a remuneration calculation unit 222 calculates a remuneration to be evaluated and determines whether the remuneration satisfies a predetermined condition, and a remuneration calculation unit 222 that calculates a remuneration to be evaluated as an operation history creation unit 223 that acquires the air conditioning power consumption of the air conditioners and creates operation history information 201 that is associated with the predicted heat generation amount of each placement control area 30 in each of the plurality of placement patterns; the CPU server 3; A placement pattern calculation unit 310 that calculates a plurality of placement patterns in which new processing loads are placed using information on processing loads on the server 4 and the accelerator 5; An area heat generation amount estimation unit 320 that estimates the predicted heat generation amount of each placement control area 30 by summing the heat generation amount when processing loads are placed on the CPU server 3, GPU server 4, and accelerator 5, and each placement control an operation history information extraction unit that uses information on the predicted amount of heat generation in the area 30, refers to the operation history information 201, and extracts temperature distribution information 64 and air
  • a server power consumption prediction unit 330 that calculates the server power consumption by summing up the power consumption of each accelerator 5, and the server power consumption of each of the placement control areas 30 in each placement pattern.
  • an arrangement pattern determination unit 340 that calculates the total amount of the total server power consumption and the extracted air conditioning power consumption, and determines the arrangement pattern in which the calculated total amount is the minimum as the arrangement pattern for allocating the processing load; It is characterized by comprising the following.
  • the power reduction control device 100 can reduce the total power consumption consisting of server power consumption and air conditioning power consumption in an environment where the CPU server 3, GPU server 4, and accelerator 5 coexist. I can do it.
  • the power consumption reduction control device is a power consumption reduction control device 100 that controls a CPU server 3, a GPU server 4, an accelerator 5, and a plurality of air conditioners 2, which are included in a data center 10.
  • a plurality of placement control areas 30 in which any of the CPU servers 3, GPU servers 4, and accelerators 5 are placed as a group of servers for placing processing loads, and air conditioning by a plurality of air conditioners 2.
  • the external world factor acquisition unit 211 divides the value of each external world factor into a predetermined range width, combines the divided ranges for each external world factor, defines a situation classification 62, and determines which situation classification 62 the acquired external world factor information is assigned to.
  • each of the Situation classifications 62 there is a Situation determination unit 212 that determines whether the system belongs to the Situation classification unit 212, an air conditioning control value generation unit 221 that generates an air conditioning control value that includes at least a target temperature to be set to a plurality of air conditioners 2, and an air conditioning control value generation unit 221 that uses the air conditioning control value to be set to a plurality of air conditioners 2.
  • an air conditioning control execution unit 240 that executes control of a plurality of air conditioners 2 using a target temperature; , a remuneration calculation unit 222 that determines whether the remuneration satisfies a predetermined condition, and a GPU suction unit 222 that indicates the temperature at the inlet of the GPU server 4 as a control result based on the air conditioning control value determined to satisfy the predetermined condition.
  • Temperature distribution information 64 indicating the mouth temperature and the accelerator suction port temperature indicating the temperature at the accelerator suction port, and the air conditioning power consumption of the plurality of air conditioners 2 when control is performed using the air conditioning control value.
  • An operation of creating operation history information 201 that associates temperature distribution information 64 and air conditioning power consumption acquired as control results with the Situation classification 62 and air conditioning control values obtained when air conditioning control is executed.
  • the history creation unit 223 and the information on the predicted heat generation amount of each placement control area 30 are acquired, the current situation classification 62 is determined via the situation determination unit 212, and each placement pattern is determined by referring to the operation history information 201.
  • the operation history information extraction unit 230 extracts temperature distribution information 64 and air conditioning power consumption when controlled by air conditioning control values, and the schedule for generating and deleting processing loads for the CPU server 3, GPU server 4, and accelerator 5.
  • a placement pattern calculation unit 310 that acquires the load processing schedule information shown in FIG.
  • An area heat generation estimation unit that estimates the predicted heat generation amount of each placement control area 30 by summing the heat generation amount when processing loads are placed in the CPU server 3, GPU server 4, and accelerator 5 belonging to each area 30.
  • the load processing schedule information, and the extracted temperature distribution information 64 the CPU server total power consumption, which is the sum of the power consumption of the CPU servers 3 in the placement control area 30, in each placement pattern, and the placement control
  • the total power consumption of GPU servers, which is the sum of the power consumption of GPU servers in area 30, and the total power consumption of accelerators, which is the sum of the power consumption of accelerators in placement control area 30, are calculated, and the total power consumption of CPU servers in the placement control area is calculated.
  • the server power consumption prediction unit 330 calculates the server power consumption of each placement control area 30 by summing the power amount, the GPU server total power consumption, and the accelerator total power consumption, and the placement control area 30 in each placement pattern. Add up the power consumption of each server, calculate the total server power consumption, which is the sum, and the extracted air conditioning power consumption, and select the layout pattern that minimizes the calculated total amount, based on the processing load. and an arrangement pattern determination unit 340 that determines an arrangement pattern for arranging.
  • the power consumption reduction control device 100 can reduce the total power consumption of the data center 10 consisting of server power consumption and air conditioning power consumption in an environment of the data center 10 in which CPU servers 3, GPU servers 4, and accelerators 5 are mixed. can reduce power consumption.
  • a CPU server power consumption learning model 302 that uses the resource usage status of the CPU server 3 as input information and the power consumption of the CPU server 3 as output information
  • a GPU A GPU server power consumption learning model 303 that uses the inlet temperature, the type of processing load, and the number of GPU cards as input information, and the power consumption of the GPU server 4 as output information, and the accelerator inlet temperature and processing load of the accelerator 5.
  • the accelerator power consumption learning model 304 uses the type of accelerator processing circuit and the number of accelerator processing circuits as input information, and the power consumption of the accelerator 5 as output information, and the server power consumption prediction unit 330 calculates the The power consumption is calculated using the CPU server power consumption learning model 302, the power consumption of the GPU server 4 is calculated using the GPU server power consumption learning model 303, and the power consumption of the accelerator 5 is calculated using the accelerator power consumption. It is characterized by calculation using a learning model 304.
  • the power consumption reduction control device 100 uses the CPU server power consumption learning model 302, the GPU server power consumption learning model 303, and the accelerator power consumption learning model 304 to Each power consumption amount can be suitably calculated.
  • the server power consumption prediction unit 330 includes a first GPU learning model that uses GPU temperature as output information, and a second GPU learning model that uses GPU temperature as input information and uses power consumption of the GPU server as output information. It is characterized in that the temperature is calculated using the first GPU learning model, and then the power consumption of the GPU server 4 is calculated using the second GPU learning model.
  • the power consumption reduction control device 100 calculates the GPU temperature using the first GPU learning model, and then suitably calculates the power consumption of the GPU server 4 by using the second GPU learning model. can do.
  • the accelerator inlet temperature, the type of processing load, and the number of accelerator processing circuits of the accelerator 5 are used as input information
  • the accelerator The server power consumption prediction unit 330 includes a first accelerator learning model that uses temperature as output information, and a second accelerator learning model that uses accelerator temperature as input information and uses power consumption of the accelerator as output information. The method is characterized in that the temperature is calculated using the first accelerator learning model, and then the power consumption of the accelerator is calculated using the second accelerator learning model.
  • the power consumption reduction control device 100 calculates the accelerator temperature using the first accelerator learning model, and then appropriately adjusts the power consumption of the accelerator 5 using the second accelerator learning model. It can be calculated.
  • the power consumption reduction control device 100 includes basic power consumption information 301 indicating the reference power consumption at a predetermined temperature for each of the CPU server 3, GPU server 4, and accelerator 5.
  • 320 calculates the amount of heat generated by each of the CPU server 3, GPU server 4, and accelerator 5 by calculating the amount of power consumed by each of the CPU server 3, GPU server 4, and accelerator 5 using the basic power consumption information 301. It is characterized by
  • the power consumption reduction control device 100 is equipped with basic power consumption information indicating the reference power consumption at a predetermined temperature, so that the amount of heat generated by each of the CPU server 3, GPU server 4, and accelerator 5 can be adjusted. It becomes possible to estimate.
  • Air conditioning control system Air conditioner 3 CPU server 4 GPU server 5 Accelerator 10 Data center (DC) 20 Air conditioning control area 30 Placement control area 62 Situation classification 63 Air conditioning control value information 64 Temperature distribution information 65 Air conditioning power consumption information 100 Electric power reduction control device 200 Air conditioning control unit 201 Operation history information 210 Situation recognition unit 211 External factor acquisition unit 212 Situation determination unit 220 Operation history information generation unit 221 Air conditioning control value generation unit 222 Reward calculation unit 223 Operation history creation unit 230 Operation history information extraction unit 240 Air conditioning control execution unit 300 Server control unit 301 Basic power consumption information 302 CPU server power consumption Learning model 303 GPU server power consumption learning model 304 Accelerator power consumption learning model 310 Arrangement pattern calculation unit 320 Area heat generation estimation unit 330 Server power consumption prediction unit 331 CPU power consumption prediction unit 332 GPU power consumption prediction unit 333 Accelerator power consumption prediction Section 340 Arrangement pattern determination section

Abstract

L'invention concerne un dispositif de commande de réduction de quantité d'énergie électrique (100) qui comprend : une unité de génération de valeur de commande de climatisation (221) qui génère une valeur de commande de climatisation ; une unité d'exécution de commande de climatisation (240) qui provoque l'exécution d'une commande d'une pluralité de climatiseurs à l'aide de la valeur de commande de climatisation ; une unité de création d'historique de fonctionnement (223) qui acquiert des informations de distribution de température (64) et les quantités de consommation d'énergie électrique de climatisation de la pluralité de climatiseurs en tant que résultat de commande et qui crée des informations d'historique de fonctionnement (201) ; une unité de calcul de motif d'agencement (310) qui calcule un motif d'agencement pour une charge de traitement ; une unité d'estimation de quantité de production de chaleur de zone (320) qui estime une quantité de production de chaleur prédite pour chacune des zones de commande d'agencement ; une unité de prédiction de quantité de consommation d'énergie électrique de serveur (330) qui calcule une quantité de consommation d'énergie électrique de serveur pour chacune des zones de commande d'agencement (30) ; et une unité de détermination de motif d'agencement (340) qui calcule le total de la quantité de consommation d'énergie électrique de serveur et des quantités de consommation d'énergie électrique de climatisation dans chaque motif d'agencement et qui détermine un motif d'agencement dans lequel le total est le plus petit.
PCT/JP2022/017844 2022-04-14 2022-04-14 Dispositif de commande de réduction de quantité d'énergie électrique, procédé de commande de réduction de quantité d'énergie électrique, système de commande de réduction de quantité d'énergie électrique, et programme WO2023199482A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/017844 WO2023199482A1 (fr) 2022-04-14 2022-04-14 Dispositif de commande de réduction de quantité d'énergie électrique, procédé de commande de réduction de quantité d'énergie électrique, système de commande de réduction de quantité d'énergie électrique, et programme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/017844 WO2023199482A1 (fr) 2022-04-14 2022-04-14 Dispositif de commande de réduction de quantité d'énergie électrique, procédé de commande de réduction de quantité d'énergie électrique, système de commande de réduction de quantité d'énergie électrique, et programme

Publications (1)

Publication Number Publication Date
WO2023199482A1 true WO2023199482A1 (fr) 2023-10-19

Family

ID=88329400

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/017844 WO2023199482A1 (fr) 2022-04-14 2022-04-14 Dispositif de commande de réduction de quantité d'énergie électrique, procédé de commande de réduction de quantité d'énergie électrique, système de commande de réduction de quantité d'énergie électrique, et programme

Country Status (1)

Country Link
WO (1) WO2023199482A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013152552A (ja) * 2012-01-24 2013-08-08 Hitachi Ltd 情報処理システムの運用管理方法
JP2015050378A (ja) * 2013-09-03 2015-03-16 日本電信電話株式会社 空調制御方法および空調制御システム
JP2018048750A (ja) * 2016-09-20 2018-03-29 株式会社東芝 空調制御装置、空調制御方法及び空調制御プログラム
WO2019154739A1 (fr) * 2018-02-07 2019-08-15 Abb Schweiz Ag Procédé et système de régulation de la consommation d'énergie d'un centre de données sur la base d'une attribution de charge et de mesures de température

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013152552A (ja) * 2012-01-24 2013-08-08 Hitachi Ltd 情報処理システムの運用管理方法
JP2015050378A (ja) * 2013-09-03 2015-03-16 日本電信電話株式会社 空調制御方法および空調制御システム
JP2018048750A (ja) * 2016-09-20 2018-03-29 株式会社東芝 空調制御装置、空調制御方法及び空調制御プログラム
WO2019154739A1 (fr) * 2018-02-07 2019-08-15 Abb Schweiz Ag Procédé et système de régulation de la consommation d'énergie d'un centre de données sur la base d'une attribution de charge et de mesures de température

Similar Documents

Publication Publication Date Title
Beloglazov et al. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers
Liu et al. Enhancing energy-efficient and QoS dynamic virtual machine consolidation method in cloud environment
Moore et al. Making Scheduling" Cool": Temperature-Aware Workload Placement in Data Centers.
US8904383B2 (en) Virtual machine migration according to environmental data
CN102096460B (zh) 在数据中心动态分配功率的方法和设备
US8677365B2 (en) Performing zone-based workload scheduling according to environmental conditions
US9037880B2 (en) Method and system for automated application layer power management solution for serverside applications
US20120005505A1 (en) Determining Status Assignments That Optimize Entity Utilization And Resource Power Consumption
Lee et al. Proactive thermal-aware resource management in virtualized HPC cloud datacenters
JP2021512419A (ja) ファンの効率および/もしくは動作性能またはファン配置を最適化する方法
JP5563986B2 (ja) エクセルギー損目標値を満たすためのシステム統合
US20140238656A1 (en) Air-conditioning control apparatus for data center
Arroba et al. Heuristics and metaheuristics for dynamic management of computing and cooling energy in cloud data centers
JP6455937B2 (ja) シミュレーション装置、シミュレーション方法及びプログラム
Ran et al. Optimizing data center energy efficiency via event-driven deep reinforcement learning
WO2023199482A1 (fr) Dispositif de commande de réduction de quantité d'énergie électrique, procédé de commande de réduction de quantité d'énergie électrique, système de commande de réduction de quantité d'énergie électrique, et programme
Issa et al. Using logistic regression to improve virtual machines management in cloud computing systems
Rahmani et al. Kullback-Leibler distance criterion consolidation in cloud
Marcel et al. Thermal aware workload consolidation in cloud data centers
CN111083201B (zh) 一种工业物联网中数据驱动制造服务的节能资源分配方法
Wolke et al. Evaluating dynamic resource allocation strategies in virtualized data centers
Lin et al. Allocating workload to minimize the power consumption of data centers
EP2575003B1 (fr) Procédé pour déterminer l'attribution de charges de centre de données et système de traitement d'informations
Zhang et al. Real time thermal management controller for data center
CN114741160A (zh) 一种基于平衡能耗与服务质量的动态虚拟机整合方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22937461

Country of ref document: EP

Kind code of ref document: A1