WO2024113906A1 - 一种服务器集群温度调节方法和装置 - Google Patents

一种服务器集群温度调节方法和装置 Download PDF

Info

Publication number
WO2024113906A1
WO2024113906A1 PCT/CN2023/109059 CN2023109059W WO2024113906A1 WO 2024113906 A1 WO2024113906 A1 WO 2024113906A1 CN 2023109059 W CN2023109059 W CN 2023109059W WO 2024113906 A1 WO2024113906 A1 WO 2024113906A1
Authority
WO
WIPO (PCT)
Prior art keywords
server cluster
servers
temperature
parameters
parameter
Prior art date
Application number
PCT/CN2023/109059
Other languages
English (en)
French (fr)
Inventor
石勇
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024113906A1 publication Critical patent/WO2024113906A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D23/00Control of temperature
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F2110/00Control inputs relating to air properties
    • F24F2110/50Air quality properties
    • F24F2110/64Airborne particle content
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of energy-saving control, and in particular to a method and device for regulating the temperature of a server cluster.
  • Server clusters in related technologies are often equipped with air conditioning systems to achieve heat dissipation of server cluster equipment, which is conducive to ensuring the stability of server equipment.
  • the current air conditioning system of the data center often monitors the temperature, adjusts the air conditioning system to operate in variable frequency according to the monitored ambient temperature, and adjusts the cooling capacity to achieve temperature control.
  • This type of temperature control often has a lag, that is, the temperature is lowered only after the server equipment temperature rises, and it is impossible to achieve timely and effective temperature control.
  • the purpose of the embodiments of the present application is to provide a method and device for adjusting the temperature of a server cluster, so as to solve the problem that it is difficult to effectively control the temperature of a server cluster in a timely manner.
  • a server cluster temperature adjustment method comprising: monitoring a load parameter of a server cluster in a historical period and an ambient temperature parameter of the server cluster; based on the load parameter of the server cluster in a historical period, The load parameters are used to predict the number of servers increased or decreased in the server cluster in a future period; a temperature adjustment strategy for the future period is determined according to the number of servers increased or decreased and the ambient temperature parameters, wherein the temperature adjustment strategy includes a temperature increase or decrease parameter, and a changing trend of the number of servers increased or decreased is negatively correlated with a changing trend of the temperature increase or decrease parameter; and the air-conditioning system is controlled to perform an action matching the temperature adjustment strategy in the future period to adjust the ambient temperature of the server cluster.
  • a server cluster temperature adjustment device comprising: a monitoring module, monitoring the load parameters of the server cluster in a historical period and the ambient temperature parameters of the server cluster; a prediction module, predicting the increase or decrease in the number of servers in the server cluster in a future period based on the load parameters; a determination module, determining a temperature adjustment strategy in the future period according to the increase or decrease in the number of servers and the ambient temperature parameters, wherein the temperature adjustment strategy includes a temperature increase or decrease parameter, and the changing trend of the increase or decrease in the number of servers is negatively correlated with the changing trend of the temperature increase or decrease parameter; a control module, controlling the air-conditioning system to execute an action matching the temperature adjustment strategy in the future period to adjust the ambient temperature of the server cluster.
  • an electronic device comprises a processor, a memory, and a computer program stored in the memory and executable on the processor.
  • the computer program is executed by the processor, the steps of the method according to the first aspect are implemented.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method according to the first aspect are implemented.
  • FIG1 is a flow chart of a method for adjusting the temperature of a server cluster according to an embodiment of the present application. one.
  • FIG. 2 is a second flow chart of a method for adjusting temperature of a server cluster according to an embodiment of the present application.
  • FIG. 3 a is a third flow chart of a method for adjusting temperature of a server cluster according to an embodiment of the present application.
  • FIG. 3 b is a schematic diagram of a control flow of an air circulation strategy for executing external air circulation based on FIG. 3 a .
  • FIG. 3 c is a schematic diagram of a control flow of an air circulation strategy for executing air internal circulation based on FIG. 3 a .
  • FIG. 4 is a fourth flow chart of a method for adjusting temperature of a server cluster according to an embodiment of the present application.
  • FIG. 5 a is a fifth flow chart of a method for adjusting temperature of a server cluster according to an embodiment of the present application.
  • FIG5 b is a flow chart of increasing the number of servers in a method for adjusting the temperature of a server cluster according to an embodiment of the present application.
  • FIG5c is a flow chart of reducing the number of servers in a method for adjusting the temperature of a server cluster according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of the structure of a temperature adjustment device for a server cluster according to an embodiment of the present application.
  • the air conditioning system is often used to cool down the server after the server temperature has risen. This solution has a lag effect and cannot achieve timely and effective temperature control. Can cause waste of resources.
  • the load of the server cluster increases, and the overall temperature of the server cluster continues to rise.
  • the air conditioning system is triggered to cool down.
  • the server cluster has been at a high temperature for a period of time, and it takes a period of time for the air conditioning system to cool down before it can return to normal temperature. It can be seen that this solution has a lag and cannot effectively control the temperature in time.
  • the above warning value is lowered, and the air conditioning system is triggered to cool down when the temperature of the server cluster is not too high.
  • this lower warning value can control the temperature relatively timely.
  • the business volume of the server cluster is constantly changing. If the load of the server cluster increases first, causing the temperature to reach the warning value, triggering the air conditioning system to cool down, and then the load decreases, the overall temperature of the server cluster does not rise as fast as expected, which causes the air conditioning system to excessively lower the temperature, resulting in a waste of resources.
  • an embodiment of the present application provides a server cluster temperature adjustment method, as shown in FIG1 , comprising the following steps.
  • S11 Monitor the load parameters of the server cluster in a historical period and the ambient temperature parameters of the server cluster.
  • the load parameters of the server cluster can be obtained by monitoring the server cluster, and the ambient temperature parameters of the server cluster can be obtained by a temperature collector.
  • the server business load monitoring module can obtain the load parameters of the server cluster, and the load parameters can be continuously obtained by real-time monitoring.
  • the monitored parameters can be obtained and counted by remote control, such as performing statistics, cleaning and persistence of data through Hypertext Transfer Protocol Secure (HTTPS) or Secret File Transfer Protocol (SFTP).
  • HTTPS Hypertext Transfer Protocol Secure
  • SFTP Secret File Transfer Protocol
  • load parameters may refer to parameters related to the device temperature of the server cluster, such as central processing unit (CPU) utilization, memory utilization, port traffic, application programming interface (API) service access times, etc.
  • CPU central processing unit
  • API application programming interface
  • the above-mentioned ambient temperature parameters can be collected by the air conditioning management component, and may include, for example, outdoor temperature/humidity, cold air duct temperature/humidity, heat dissipation rear air duct, server rack unit temperature, etc. These ambient temperature parameters can characterize the temperature, cooling efficiency, cooling intensity, etc. of the server cluster. Based on the ambient temperature parameters, it is helpful to dynamically and flexibly adjust the control strategy to cope with the constantly changing actual temperature in the server cluster, and it is helpful to keep the temperature of the server cluster within a safe range.
  • the temperature of the server cluster will also change accordingly, that is, the load parameters are closely related to the temperature of the server cluster.
  • the load parameters and ambient temperature parameters in the historical period are obtained to monitor the load changes and temperature changes of the server cluster, thereby providing a data basis for the subsequent step of prediction and determination of control strategies.
  • S12 Predicting the increase or decrease in the number of servers in the server cluster in a future period based on the load parameter.
  • the load parameters obtained in the above steps can show the load changes of the server cluster in the historical period.
  • the load of the server cluster has a certain continuity
  • the load parameters in the historical period can show the load change trend of the server cluster. If the load parameters show that the overall load of the server cluster has increased, the server cluster needs to increase the server reserve to provide
  • the steps may be performed using a server business load monitoring component or a server business indicator collection module.
  • the required feature data may be collected through an agent in the server operating system, the collected data may be cleaned and persisted, and then data analysis may be performed to obtain an analysis result of whether the server cluster can perform business migration/power-off/input power adjustment.
  • the server cluster may adjust the number of servers in the cluster by powering on or off the servers to cope with the changing business load.
  • the temperature adjustment strategy is determined based on the increase or decrease in the number of servers in the cluster and the ambient temperature parameters determined in the above steps.
  • the temperature adjustment strategy takes into account the overall temperature change of the server cluster caused by the change in the number of servers, and the influence of the temperature of the environment where the server cluster is located on the heat dissipation of the server cluster.
  • the temperature adjustment strategy includes temperature increase and decrease parameters. If servers are added to the server cluster, the overall load of the server cluster will increase and the heat generation will increase. The temperature increase and decrease parameters in the temperature adjustment strategy generated at this time are used to control the air-conditioning system to increase the cooling efficiency of the server cluster, such as increasing the cold air ventilation volume or lowering the cold air temperature to cope with the upcoming increase in load.
  • the temperature rise and fall parameters in the generated temperature adjustment strategy are used to control the air-conditioning system to reduce the cooling efficiency of the server cluster, such as reducing the cold air ventilation volume or increasing the cold air temperature, so as to achieve energy saving and emission reduction when the server cluster load is low.
  • S14 Controlling the air conditioning system to execute an action matching the temperature adjustment strategy within the future period, so as to adjust the ambient temperature of the server cluster.
  • the temperature adjustment strategy generated in the above steps can control the air conditioning system to perform control operations in the future period to adjust the ambient temperature of the server cluster.
  • a business scheduling component or a computing power scheduling module can be applied to dynamically schedule computing power services, perform business migration and scheduling for servers added or reduced in the server cluster, and control its own module to shut down or enter energy-saving mode after scheduling is completed to achieve energy conservation and emission reduction. After shutting down or entering energy-saving mode, if you need to perform business scheduling and migration again, you can interact with the server power-on and power-off management system, execute startup based on control instructions, and perform the required scheduling and migration, which can help to horizontally expand the business capabilities of the server cluster.
  • the temperature adjustment strategy can be coordinated with the server scheduling and migration of the server.
  • server business scheduling the entrance of the server cluster is pre-configured through load sharing software.
  • the traffic or business requests are directed to the backend computing cluster, so as to predict the number of servers that need to be run according to the business request volume, thereby realizing the business volume assessment. If the number of servers needs to be reduced, business concentration can be performed in the above future period, and redundant running servers can be shut down or powered off.
  • the business request volume increases, after the business scheduling module performs artificial intelligence calculation and judgment, if the number of servers needs to be increased, the redundant servers that are powered off and available for scheduling can be pulled up in the above future period.
  • the air conditioning system performs matching actions according to the temperature adjustment strategy in the future period to achieve temperature control.
  • the air conditioning management component can be used as the executor of the precise heat dissipation and energy-saving control strategy, control the total cooling capacity according to the above temperature adjustment strategy, and accurately transmit the cooling capacity through the electronically controlled air duct, so as to perform timely and effective temperature control on the server cluster.
  • temperature control can be performed through an electrically controllable air conditioning duct device, which can specifically include a rack air cooling device, which is used to accurately manage the independent air intake and air outlet of each server, and the hot air from the rack outlet can be collected in the air outlet duct of the air conditioning system for unified processing.
  • the air conditioning duct device can also include a fan pressurizing device for flexibly adjusting the air pressure, and controlling the amount of cold air introduced into the equipment of the server cluster based on the above temperature adjustment strategy.
  • a server cluster can reduce the number of running servers
  • the business on the server can be migrated or concentrated to the serviceable servers
  • the redundant servers can be powered off (or the power input can be lowered)
  • the cooling capacity of the air-conditioning system can be estimated in advance
  • the production cooling capacity can be reduced
  • the redundant air-conditioning equipment can be shut down or stopped.
  • the servers that are powered off will be powered on (or adjusted to normal power input) so that the servers are automatically included in the business cluster.
  • the cooling capacity of the air-conditioning system can be estimated in advance to wake up the standby air-conditioning equipment, increase the production cooling capacity, and complete precise heat dissipation management.
  • the solution provided in the embodiment of the present application is used to effectively control the temperature of the server cluster, wherein the load changes are predicted by monitoring the load parameters of the server cluster, thereby effectively controlling the temperature in the future period.
  • this solution collects the working conditions of peripheral equipment, monitors the server business load and total operating power and other parameters, and obtains sample data of server business load data in different time periods.
  • the collected load parameters may include the port traffic of the switching device, the number of visits to the corresponding interface address on the firewall device, and other parameters, and then the business indicators of the server cluster can be modeled, and the data collected after modeling can be iteratively trained to form a prediction model, and the number of servers in the server cluster can be determined by comprehensively considering multiple factors, and then the temperature adjustment strategy can be determined according to the number of servers and the ambient temperature, so as to achieve effective and accurate temperature control of the server cluster.
  • the solution provided by the embodiment of the present application can realize the overall energy consumption management and automatic control of the data center, and can realize accurate and intelligent energy consumption management.
  • this solution can control the air conditioning system to perform temperature control accordingly with the business volume change, achieve accurate energy consumption management of the entire data center, achieve the goal of energy saving and consumption reduction, and help achieve maximum economic benefits.
  • the load parameters of the server cluster in the historical period and the ambient temperature parameters of the server cluster are monitored, and then, based on the load parameters, the number of servers in the server cluster in the future period is predicted, and then, according to the number of servers increased or decreased and the ambient temperature parameters, the temperature adjustment strategy in the future period is determined, wherein the temperature adjustment strategy includes temperature rise and fall parameters, and the change trend of the number of servers increased or decreased is negatively correlated with the change trend of the temperature rise and fall parameters, and then, the air conditioning system is controlled to perform actions matching the temperature adjustment strategy in the future period to adjust the ambient temperature of the server cluster.
  • This solution predicts load changes by monitoring the load parameters of the server cluster, thereby predetermining the temperature adjustment strategy to be executed before the actual temperature changes of the server cluster, and then, in the future period, the temperature adjustment action of the air conditioning system can be adjusted accordingly with the load changes of the server cluster, so as to achieve effective temperature control, which has the advantages of being timely and effective, and is conducive to energy saving and consumption reduction.
  • the load parameter of the server cluster includes at least one of the following: service business load parameter, network device traffic parameter, number of network device port requests, number of network link sessions.
  • the increase or decrease in the number of servers in the server cluster can be determined to more accurately predict the business changes of the server cluster and improve the accuracy of temperature control.
  • the load parameters of the server cluster include multiple items.
  • step S12 includes the following steps.
  • S21 Inputting a plurality of load parameters into a parameter prediction model based on a naive Bayes algorithm to obtain predicted rise and fall probabilities of the plurality of load parameters in the future period.
  • the Naive Bayesian algorithm is a classification method based on the Bayesian theorem and the assumption of conditional independence of features.
  • the Naive Bayesian Classifier (NBC) model can be applied, which requires fewer parameters to be estimated, is not very sensitive to missing data, and has a relatively simple algorithm, which can facilitate and efficiently realize parameter prediction.
  • the naive Bayes method used in the solution provided in the embodiment of the present application is a corresponding simplification based on the Bayes algorithm, that is, it is assumed that the attributes are conditionally independent of each other when the target value is given. There is no algorithm that has a larger proportion for the decision result, or which attribute variable has a smaller proportion for the decision result.
  • D ⁇ d ⁇ 1 ⁇ , d ⁇ 2 ⁇ , ⁇ , d ⁇ n ⁇
  • D can be divided into y ⁇ m ⁇ categories, among which x ⁇ 1 ⁇ , x ⁇ 2 ⁇ , ⁇ , x ⁇ d ⁇ are random and independent of each other.
  • the posterior probability can be calculated as:
  • the above-mentioned naive Bayesian model is applied to multiple load parameters to perform predictive calculations, thereby obtaining the probability of increasing or decreasing servers in the server set.
  • the solution provided in the embodiment of the present application can apply the load parameters shown in the "Condition Category” column in the following Table 1, and perform prediction calculations through the above-mentioned naive Bayesian model to obtain the probability of increasing or decreasing the computing power of the server cluster corresponding to each load parameter (as shown in the "Calculated Probability” column in Table 1).
  • increasing the computing power indicates increasing the number of servers
  • decreasing the computing power indicates reducing the number of servers.
  • S22 Determine the number of servers to be added or decreased in the server cluster in a future period according to the predicted increase or decrease probabilities of the multiple load parameters.
  • the server increase and decrease judgment is executed. If P increases computing power > P decreases computing power, the computing server horizontal expansion is triggered and powered on. If P decreases computing power > P increases computing power, the computing server service reduction is triggered and powered off.
  • the scheduling decision of horizontal expansion of server computing power (powering on the server) or reduction of server computing power (powering off the server) is determined. Furthermore, after determining to increase or decrease computing power, the number of servers to be increased or decreased can be further determined based on the changing trends of multiple load parameters. In practical applications, the levels can be pre-set to perform step-by-step increase or decrease of servers to avoid frequent changes in the number of servers.
  • the increase or decrease of servers is performed in units. If it is determined that the computing power of the server cluster is increased, it is determined that the number of servers is increased by one unit. On the contrary, if it is determined that the computing power of the server cluster is reduced, it is determined that the number of servers is reduced by one unit.
  • the increase or decrease in the number of servers is related to the size of the server cluster and the actual load parameters, and can be flexibly adjusted according to actual applications.
  • the ambient temperature parameter of the server cluster includes at least one of the following.
  • the rack temperature includes the rack temperature of each unit, which can be used to calculate the average rack temperature of the server cluster.
  • the air duct pressure includes the inlet duct pressure and the outlet duct pressure, and the air duct pressure can also be divided into the indoor duct pressure and the outdoor duct pressure.
  • temperature parameters can be collected from multiple points near and far in the environment where the server cluster is located, so as to more accurately obtain the temperature and cooling efficiency of the server cluster.
  • the air-conditioning system includes outdoor and indoor ventilation ducts
  • the outdoor ambient temperature can also be obtained, which is more conducive to comprehensively determining the temperature control strategy, so as to flexibly call the indoor and outdoor temperature difference to perform temperature control, which is conducive to energy saving and emission reduction.
  • the ambient temperature parameters of the server cluster include multiple items
  • the temperature adjustment strategy includes an air circulation strategy
  • the air circulation strategy can include indoor circulation strategy and indoor and outdoor circulation strategy.
  • the outdoor air can be called to directly perform temperature control on the server cluster, thereby reducing the energy consumed by the air-conditioning system to perform temperature control.
  • step S13 includes the following steps.
  • S31 Inputting a plurality of ambient temperature parameters into a parameter prediction model based on a naive Bayes algorithm to obtain prediction cycle strategies corresponding to the plurality of ambient temperature parameters in the future period.
  • the parameter prediction model of the naive Bayes algorithm used in the steps of this application may be consistent with the model described in step S21 of the above embodiment.
  • multiple ambient temperature parameters are respectively input into the above parameter prediction model to obtain prediction cycle strategies corresponding to each ambient temperature parameter.
  • the various ambient temperature parameters shown in the "Condition Category” column below are applied, and the prediction calculation is performed through the above-mentioned naive Bayes model to obtain the probability of the server cluster switching to the external circulation or switching to the internal circulation corresponding to each ambient temperature parameter (as shown in the "Calculated Probability” column in Table 2).
  • switching to the external circulation indicates that outdoor air is introduced into the server cluster to realize indoor and outdoor air circulation
  • switching to the internal circulation indicates that indoor air is used to introduce into the server cluster, and outdoor air is not used for air circulation.
  • S32 Determine an air circulation strategy for the server cluster in a future period according to the predicted circulation strategies corresponding to the multiple ambient temperature parameters, wherein the air circulation strategy is used to instruct the air conditioning system to perform internal air circulation or external air circulation.
  • the determined air circulation strategy is used to control the air conditioning system to execute the internal and external circulation action in the fresh air mode.
  • the solution provided in the embodiment of the present application can be used to control the fresh air intake/hot air exhaust system of the data center based on the temperature of the outdoor air and the current server cluster operation status data to dynamically determine whether to use internal air circulation or switch to external fresh air intake.
  • This solution combines intelligent calculation results with automated switching methods to control the air conditioning system to operate with a more economical and flexible energy-saving strategy, thereby achieving energy conservation and emission reduction on the basis of achieving effective control of the server cluster temperature.
  • the air circulation strategy determined in this solution can be used to control the fresh air intake/hot air exhaust equipment in the air conditioning system to meet the needs of intelligent fresh air management.
  • This solution flexibly applies outdoor temperature changes. With the change of seasons and the temperature difference between day and night, this solution can flexibly call outdoor air to perform temperature control and achieve full utilization of energy.
  • FIG3b shows a schematic diagram of the control flow of the air circulation strategy for executing the external air circulation based on FIG3a.
  • the internal circulation of the hot air after the rack heat dissipation can be turned off.
  • the system discharges hot air to the outdoor environment and draws in low-temperature outdoor air for refrigeration cycle.
  • the hot air after heat dissipation of the air-conditioning system is discharged to the outdoor environment, and the low-temperature air outside is sucked into the cold air outlet of the air-conditioning.
  • the number of working equipment of the air-conditioning cluster cooling is automatically adjusted, or the total power of the refrigeration equipment is reduced, and the total cooling capacity of the air-conditioning cluster is reduced, so as to save energy and reduce consumption with a more economical heat dissipation method and improve economic benefits.
  • FIG3c shows a schematic diagram of the control flow of the air circulation strategy for executing internal air circulation based on FIG3a.
  • the temperature of the external air inlet is high, the entry of external fresh air is cut off, and the hot air after the rack dissipates heat is used for internal cooling circulation to achieve economical and environmentally friendly temperature control.
  • the entry of external fresh air can be cut off, and the air after the rack has cooled can be recycled and entered into the internal circulation pipeline, and the decision-making calculation and judgment of the cooling output of the air-conditioning cluster can be coordinated.
  • the redundant standby air-conditioning host is awakened or the power of the refrigeration system is increased to increase the cooling capacity and perform equipment cooling more accurately and timely.
  • the method further includes the following steps.
  • monitoring the operating efficiency of the air-conditioning system can help improve the effectiveness of temperature control, ensure the stable operation of the air-conditioning system, and help achieve energy conservation and emission reduction.
  • step S13 includes the following steps.
  • S42 predicting the total power of powered-on servers in the server cluster according to the increase or decrease in the number of servers.
  • the total power of the servers in the server cluster that are powered on after the number of servers in the server cluster changes is calculated according to the number of servers added or reduced.
  • the total power of the servers can reflect the overall energy consumption of the server cluster, and further indicate the overall temperature increase trend of the server cluster.
  • S43 Input the rack temperature of the server cluster, the air duct pressure parameters of the air conditioning system, the operating parameters of the air conditioning system, and the total power of the powered-on servers into a parameter prediction model based on the naive Bayes algorithm to obtain multiple cooling capacity parameters of the air conditioning system in the future period.
  • the parameter prediction model of the naive Bayes algorithm used in the steps of this application may be consistent with the model described in step S21 of the above embodiment.
  • the rack temperature of the server cluster, the duct pressure parameter of the air conditioning system, the operating parameters of the air conditioning system, and the total power of the powered-on servers are respectively input into the above parameter prediction model to obtain the cooling capacity parameters of the air conditioning system corresponding to each parameter.
  • S44 Determine the cooling capacity of the air conditioning system of the server cluster in a future period according to the multiple cooling capacity parameters of the air conditioning system.
  • the cooling capacity change trend of the air conditioning system of the server cluster is determined, such as P increasing cooling If P is lowered, the air conditioner is triggered to switch to increase the cooling capacity. If P is lowered, the air conditioner is triggered to switch to increase the cooling capacity. According to the calculated decision result, the air conditioner unit is executed to decide whether to increase or reduce the cooling capacity.
  • the specific value of the cooling capacity change can be further determined based on the change trend of multiple parameters applied in this example.
  • the levels can be pre-set to perform the cooling capacity increase and decrease stepwise to avoid frequent changes in the cooling action of the air conditioning system.
  • the cooling capacity is increased or decreased by a unit parameter. If it is determined to increase the cooling capacity, the cooling capacity is increased by one unit parameter. On the contrary, if it is determined to decrease the cooling capacity, the cooling capacity is decreased by one unit parameter.
  • the increase or decrease of the cooling capacity parameter is related to the scale of the server cluster and the actual ambient temperature and other parameters, and can be flexibly adjusted according to actual applications.
  • step S12 Based on the solution provided in the above embodiment, optionally, as shown in FIG. 5a , after the above step S12 , the following steps are further included.
  • a server increase scheduling strategy is generated based on the first number, and the server increase scheduling strategy is used to power on a first number of unpowered servers in the server cluster within the future time period, and perform business migration on the first number of servers after power-on.
  • Figure 5b is a flow chart of increasing the number of servers in a server cluster temperature adjustment method according to an embodiment of the present application.
  • the overall business volume and load of the server cluster increase, the number of external business requests collected, the business volume perceived by the server business load monitoring module, the switching device traffic, the switching device port request, the network connection session, the historical business volume comparison and other parameters are used to perform model calculation and execution judgment through the pre-built algorithm module, and then the horizontal expansion of the business is performed according to the judgment result.
  • the redundant standby servers are controlled to power on and join the server cluster.
  • the server is powered on, the total power data of the current monitoring operation is triggered, and then Control the air conditioning system to coordinate and execute actions that match the temperature adjustment strategy.
  • it can include the air conditioning system executing cooling capacity adjustment, adjusting the opening and closing angle of the rack's air duct, increasing or pressurizing the cold air input of the rack, etc., to meet the heat dissipation needs of the newly powered-on and expanded computing cluster, and achieve accurate and effective temperature control while achieving energy saving and consumption reduction.
  • a server reduction scheduling strategy is generated based on the second number, and the server reduction scheduling strategy is used to perform business migration on the second number of servers in the server cluster within the future time period, and to power off the second number of servers after the business migration is performed.
  • FIG5c is a flow chart of reducing the number of servers in a method for adjusting the temperature of a server cluster according to an embodiment of the present application.
  • the air conditioning system When the server is powered off, the total power data of the currently monitored operation is triggered, and then the air conditioning system is controlled to coordinate the execution of actions matching the temperature adjustment strategy. Specifically, it can include actions such as the air conditioning system performing cooling capacity adjustment, the air duct opening and closing angle adjustment of the rack, and reducing or depressurizing the cold air input of the rack, so as to further achieve energy saving and consumption reduction on the basis of meeting the overall heat dissipation requirements of the server cluster after the new power-off.
  • the load parameters such as the number of ports of the switching device and the number of visits to the service interface monitored on the firewall device can be calculated in combination with the current service volume of the server cluster, and the future load changes can be estimated. In this way, it is determined whether it is necessary to implement the service migration of the service cluster, the power-down capacity reduction of the corresponding computing cluster, and the horizontal capacity expansion operations, so as to achieve more economical energy-saving and consumption-reduction goals and improve economic benefits.
  • the solution provided in the embodiment of the present application can dynamically adjust the cooling capacity by controlling the action of the air-conditioning system to execute the strategy matching, and dynamically manage the running or standby equipment of the air-conditioning system cluster to achieve precise control. Accurate cooling capacity production to achieve energy saving and consumption reduction of air-conditioning system and improve economic benefits.
  • the embodiment of the present application performs parameter prediction based on the pre-built model, flexibly applies the air resources formed by the temperature difference between indoor and outdoor, and controls the data center fresh air intake or hot air exhaust system to calculate and determine whether to use the air conditioner internal circulation or switch to external circulation.
  • This can achieve more economical air conditioner operation mode management of internal and external circulation, use more flexible heat dissipation strategies, complete energy saving and consumption reduction of the heat dissipation system, and achieve economic benefits.
  • a precise and intelligent energy management system can be built in the data center, and monitoring hardware and software for the indicator data to be collected can be deployed to monitor the port traffic of the switching equipment, the number of business interface accesses monitored by the firewall equipment, and other load parameters, and perform parameter prediction through the artificial intelligence model combined with the naive Bayes algorithm. Then, based on the predicted parameters of the future period, the server cluster can be powered on to increase computing power or powered off to reduce capacity, and the air conditioning system can be controlled to adjust the cooling capacity accordingly to meet the overall heat dissipation requirements of the server cluster, which is conducive to achieving precise energy management.
  • the embodiment of the present application further provides a server cluster temperature adjustment device 60, as shown in FIG6, comprising the following modules.
  • the monitoring module 61 monitors the load parameters of the server cluster in a historical period and the ambient temperature parameters of the server cluster.
  • the prediction module 62 predicts the increase or decrease in the number of servers in the server cluster in a future period based on the load parameters.
  • Determination module 63 determines the temperature adjustment strategy for the future period according to the increase or decrease number of servers and the ambient temperature parameters, wherein the temperature adjustment strategy includes temperature increase or decrease parameters, and the change trend of the increase or decrease number of servers is negatively correlated with the change trend of the temperature increase or decrease parameters.
  • the control module 64 controls the air conditioning system to execute an action matching the temperature adjustment strategy in the future period to adjust the ambient temperature of the server cluster.
  • the device provided in the embodiment of the present application can monitor the load parameters of the server cluster to predict the load change, thereby predetermining the temperature adjustment strategy to be executed before the actual temperature change of the server cluster, and then adjusting the temperature of the air conditioning system accordingly with the load change of the server cluster in the future period. Adjusting action to achieve effective temperature control has the advantages of being timely and effective, and is beneficial to energy saving and consumption reduction.
  • the above modules in the device provided by the embodiment of the present application can also implement the method steps provided by the above method embodiment.
  • the device provided by the embodiment of the present application can also include other modules in addition to the above modules to implement the method steps provided by the above method embodiment.
  • the device provided by the embodiment of the present application can achieve the technical effects that can be achieved by the above method embodiment.
  • an embodiment of the present application also provides an electronic device, including a processor, a memory, and a computer program stored in the memory and executable on the processor.
  • the computer program is executed by the processor, the various processes of the above-mentioned server cluster temperature adjustment method embodiment are implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored.
  • a computer program is stored.
  • the computer-readable storage medium is, for example, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented in one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
  • processors CPU
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in a computer-readable medium, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer readable media include permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information.
  • Information can be computer readable instructions, data structures, program modules or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary computer readable media (transitory media), such as modulated data signals and carrier waves.
  • the embodiments of the present application may be provided as methods, systems or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Automation & Control Theory (AREA)
  • Air Conditioning Control Device (AREA)

Abstract

本申请公开了一种服务器集群温度调节方法和装置。本申请提供的方案包括:监测服务器集群在历史时段内的负荷参数和服务器集群所处的环境温度参数;基于负荷参数预测未来时段内的服务器集群的服务器增减数量;根据服务器增减数量和环境温度参数确定未来时段内的温度调节策略,其中,温度调节策略包括温度升降参数,服务器增减数量的变化趋势和温度升降参数的变化趋势负相关;控制空调系统在未来时段内执行温度调节策略相匹配的动作,以对服务器集群所处的环境温度执行调节。

Description

一种服务器集群温度调节方法和装置
相关申请的交叉引用
本申请要求在2022年11月28日提交中国专利局、申请号为202211501797.4、发明名称为“服务器集群温度调节方法和装置”的中国专利申请的优先权,该中国专利申请的全部内容通过引用包含于此。
技术领域
本申请涉及节能控制领域,尤其涉及一种服务器集群温度调节方法和装置。
背景技术
随着数字化基础设施的建设计划和数据中心的服务器设备集中/集约化管理,精准节能降耗的实施急需一套科学有效的计算模型。相关技术中的服务器集群往往配置有空调系统,以实现对服务器集群的设备散热,有利于保证服务器设备稳定性。
当前的数据中心的空调系统往往通过监控温度的方式,按照监控的外围温度来调节空调系统进行变频运行,调控制冷量以实现温度控制。这种方式的温度控制往往具有滞后性,即在服务器设备温度升高之后才执行降温,无法实现及时有效的温度控制。
如何提高服务器集群的温度控制有效性,是本申请所要解决的技术问题。
发明内容
本申请实施例的目的是提供一种服务器集群温度调节方法和装置,用以解决难以及时有效控制服务器集群温度的问题。
第一方面,提供了一种服务器集群温度调节方法,包括:监测服务器集群在历史时段内的负荷参数和所述服务器集群所处的环境温度参数;基于所 述负荷参数预测未来时段内的所述服务器集群的服务器增减数量;根据所述服务器增减数量和所述环境温度参数确定所述未来时段内的温度调节策略,其中,所述温度调节策略包括温度升降参数,所述服务器增减数量的变化趋势和所述温度升降参数的变化趋势负相关;控制空调系统在所述未来时段内执行所述温度调节策略相匹配的动作,以对所述服务器集群所处的环境温度执行调节。
第二方面,提供了一种服务器集群温度调节装置,包括:监测模块,监测服务器集群在历史时段内的负荷参数和所述服务器集群所处的环境温度参数;预测模块,基于所述负荷参数预测未来时段内的所述服务器集群的服务器增减数量;确定模块,根据所述服务器增减数量和所述环境温度参数确定所述未来时段内的温度调节策略,其中,所述温度调节策略包括温度升降参数,所述服务器增减数量的变化趋势和所述温度升降参数的变化趋势负相关;控制模块,控制空调系统在所述未来时段内执行所述温度调节策略相匹配的动作,以对所述服务器集群所处的环境温度执行调节。
第三方面,提供了一种电子设备,该电子设备包括处理器、存储器及存储在该存储器上并可在该处理器上运行的计算机程序,该计算机程序被该处理器执行时实现如第一方面的方法的步骤。
第四方面,提供了一种计算机可读存储介质,该计算机可读存储介质上存储计算机程序,该计算机程序被处理器执行时实现如第一方面的方法的步骤。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是本申请的一个实施例一种服务器集群温度调节方法的流程示意图 之一。
图2是本申请的一个实施例一种服务器集群温度调节方法的流程示意图之二。
图3a是本申请的一个实施例一种服务器集群温度调节方法的流程示意图之三。
图3b是基于图3a的执行空气外循环的空气循环策略的控制流程示意图。
图3c是基于图3a的执行空气内循环的空气循环策略的控制流程示意图。
图4是本申请的一个实施例一种服务器集群温度调节方法的流程示意图之四。
图5a是本申请的一个实施例一种服务器集群温度调节方法的流程示意图之五。
图5b是本申请的一个实施例一种服务器集群温度调节方法的增加服务器数量的流程示意图。
图5c是本申请的一个实施例一种服务器集群温度调节方法的减少服务器数量的流程示意图。
图6是本申请的一个实施例一种服务器集群温度调节装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。本申请中附图编号仅用于区分方案中的各个步骤,不用于限定各个步骤的执行顺序,具体执行顺序以说明书中描述为准。
在服务器集群温度调控领域,往往是在服务器温度已经升高之后再通过空调系统执行降温,这种方案具有滞后性,无法实现及时有效控制温度,可 能造成资源浪费。
举例而言,在一种情况下,由于业务量升高而导致服务器集群负荷增大,服务器集群整体温度持续升高。在温度升高达到警戒值之后,才触发空调系统执行降温。而此时服务器集群已经处于较高温度一段时间,需要经过空调系统降温一段时间后才能恢复正常温度。由此可见,这种方案具有滞后性,不能及时有效控制温度。
在另一种情况下,为了避免服务器集群处于高温较长时间,因而降低上述警戒值,在服务器集群温度未过高的情况下就触发空调系统执行降温。虽然,在服务器集群整体温度持续升高的场景中,这种较低的警戒值能相对及时控制温度。但是,服务器集群的业务量是不断变化的,如果服务器集群的负荷先升高,使温度达到了警戒值,触发空调系统执行降温,而后负荷又降低,服务器集群整体温度升高速度并未设想得那样快,这就使得空调系统过度降低温度,造成资源浪费。
由此可见,在服务器集群温度控制的场景中,难以对不断变化的服务器集群执行及时有效温度控制,存在资源浪费的情况。
为了解决现有技术中存在的问题,本申请实施例提供一种服务器集群温度调节方法,如图1所示,包括以下步骤。
S11:监测服务器集群在历史时段内的负荷参数和所述服务器集群所处的环境温度参数。
在实际应用中,可以通过监听服务器集群的方式获取服务器集群的负荷参数,可以通过温度采集器来获取服务器集群所处的环境温度参数。
举例而言,可以由服务器业务负荷监控模块来获取服务器集群的负荷参数,可以采用实时监听的方式持续获取该负荷参数。在实际应用中,可以通过远程控制的方式获取并统计监测到的参数,例如通过超文本传输安全协议(Hypertext Transfer Protocol Secure,HTTPS)或安全文件传送协议(Secret File Transfer Protocol,SFTP)的方式对数据执行统计、清洗与持久化。
其中,负荷参数可以是指与服务器集群的设备温度有关联的参数,比如可以包括中央处理器(central processing unit,CPU)使用率、内存使用率、端口流量、应用程序接口(Application Programming Interface,API)服务访问次数等。
上述环境温度参数可以由空调管理组件执行采集,例如可以包括室外温度/湿度、冷风道温度/湿度、散热后风道、服务器机架单元温度等。这些环境温度参数能表征服务器集群所处的温度、降温效率、降温强度等,基于该环境温度参数有利于动态灵活调整控制策略,以应对服务器集群中不断变化的实际温度,有利于将服务器集群的温度保持在安全范围内。
通常而言,在服务器集群负荷升高后,服务器集群的温度也会相应变化,即负荷参数与服务器集群的温度关系密切。本步骤中,获取历史时段内的负荷参数和环境温度参数,能监测服务器集群的负荷变化以及温度变化,从而为随后步骤预测与确定控制策略提供数据基础。
S12:基于所述负荷参数预测未来时段内的所述服务器集群的服务器增减数量。
上述步骤中获取的负荷参数能表现服务器集群在历史时段内的负荷变化,通常而言,服务器集群的负荷具有一定连续性,历史时段内的负荷参数能表现出服务器集群的负荷变化趋势。如果负荷参数表现出服务器集群的整体负荷增加,则服务器集群需要增加服务器储量以提供
举例而言,可以采用服务器业务负荷监控组件或服务器业务指标采集模块来执行步骤。具体而言,可以通过服务器操作系统内的代理采集所需的特征数据,将采集的数据进行清洗、持久化,进而执行数据分析,得出服务器集群是否可以执行业务迁移/下电/输入功率调整的分析结果。随后,服务器集群可以通过上电或下电服务器的方式来调整集群中的服务器数量,以应对变化的业务负荷。
S13:根据所述服务器增减数量和所述环境温度参数确定所述未来时段内 的温度调节策略,其中,所述温度调节策略包括温度升降参数,所述服务器增减数量的变化趋势和所述温度升降参数的变化趋势负相关。
基于上述步骤中确定的集群中的服务器增减数量和环境温度参数来确定温度调节策略。该温度调节策略一方面考虑了服务器数量变化而导致服务器集群整体温度变化,另一方面考虑了服务器集群所在环境的温度对服务器集群的散热影响。
其中,温度调节策略中包括温度升降参数,如果服务器集群中增加服务器,则服务器集群整体负荷增大,产热增多,此时生成的温度调节策略中的温度升降参数用于控制空调系统对服务器集群加大降温效率,例如增大冷气通风量或降低冷气温度,以应对即将增长的负荷。
相反的,如果服务器集群中减少服务器,则表明服务器集群整体负荷减小,产热减少,此时生成的温度调节策略中的温度升降参数用于控制空调系统对服务器集群减小降温效率,例如减小冷气通风量或升高冷气温度,以在服务器集群负荷较低时实现节能减排。
S14:控制空调系统在所述未来时段内执行所述温度调节策略相匹配的动作,以对所述服务器集群所处的环境温度执行调节。
上述步骤中生成的温度调节策略可以控制空调系统在未来时段内执行控制操作来对服务器集群所处的环境温度执行调节。比如说,可以应用业务调度组件或算力调度模块,用于动态将算力业务进行调度,对于服务器集群中增加或减少的服务器执行业务迁移与调度,另外还可以在完成调度后控制自身模块关机或进入节能模式,以实现节能减排。在关机或进入节能模式后,如需再次执行业务的调度与迁移,则可以与服务器上下电管理系统交互,基于控制指令执行开机并执行所需的调度与迁移,能有利于水平扩容服务器集群的业务能力。
在实际应用中,温度调节策略与服务器的服务器调度与迁移可以协同进行,在服务器的业务调度方面,预先通过负荷分担软件将服务器集群的入口 的流量或业务请求导向到后台的计算集群,从而根据业务请求量来预测需要运行的服务器数量,从而实现业务量的评估。如果需要减少服务器数量,则可以在上述未来时段内执行业务集约,对冗余的运行服务器执行关闭或下电。在业务请求量上升时,业务调度模块进行人工智能计算判定后,如需增加服务器数量,则可以在上述未来时段内拉起可供调度的处于下电状态的冗余服务器。
空调系统在未来时段内按照温度调节策略执行相匹配的动作,以实现温度控制。举例而言,空调管理组件可以作为精确散热与节能控制策略的执行方,根据上述温度调节策略对总制冷量进行控制生产,通过电控风道进行精准冷量传输,从而对服务器集群执行及时有效温度控制。
在实际应用中,可以通过可电控的空调风道装置执行温度控制,该空调风道装置中具体可以包含机架风冷设备,该机架风冷设备用于精准对每台服务器进行独立的进风与出风管理,机架出风口的热风可以归集到空调系统的出风口风道,以进行统一处理。另外,空调风道装置还可以包括风扇加压装置,用于灵活调整风压,基于上述温度调节策略控制向服务器集群的设备通入的冷风量。
举例而言,在判定得出服务器集群可减少服务器运行数量的情况下,对服务器上业务进行迁移或集中到可服务服务器,将冗余服务器的下电(或者调低功率输入),提前估算空调系统的制冷量,将低生产制冷量,并将冗余的空调设备停机或停服。
在得出服务器集群需要增加服务器运行数量的情况下,将处于下电状态的服务器进行上电(或者调为正常功率输入),使服务器自动纳入业务集群,并且提前估算空调系统的制冷量,可唤醒待机的空调设备,加大生产制冷量,完成精准散热管理。
本申请实施例提供的方案用于有效控制服务器集群温度,其中,通过监测服务器集群的负荷参数来预测负荷变化,进而在未来时段有效控温。避免 在已经升温之后才开始控温而存在的控温滞后性问题。
其中,本方案通过集采外围设备的工况、监控服务器业务负荷与运行总功率等参数,获取不同时间段服务器业务负荷数据的样本数据。采集的负荷参数可以包括交换设备的端口流量、防火墙设备上对应接口地址的访问量等参数,进而可以对服务器集群的业务指标进行建模,并对建模后采集的数据进行迭代训练出预测模型,综合考虑多种因素共同判定出服务器集群的服务器增减数量,进而针对服务器增减数量和环境温度确定温度调节策略,以实现对服务器集群有效精准温度控制。
本申请实施例提供的方案能实现数据中心的整体能耗管理与自动化控制,能实现精准智能的能耗管理。服务器集群在业务量变化时,本方案能随业务量变化相应控制空调系统执行温度控制,达成整个数据中心的能耗精准管理,实现节能降耗的目标,有利于实现最大化的经济效益。
在本申请实施例中,首先,监测服务器集群在历史时段内的负荷参数和服务器集群所处的环境温度参数,随后,基于负荷参数预测未来时段内的服务器集群的服务器增减数量,接着,根据服务器增减数量和环境温度参数确定未来时段内的温度调节策略,其中,温度调节策略包括温度升降参数,服务器增减数量的变化趋势和温度升降参数的变化趋势负相关,然后,控制空调系统在未来时段内执行温度调节策略相匹配的动作,以对服务器集群所处的环境温度执行调节。本方案通过监测服务器集群的负荷参数来预测负荷变化,从而在服务器集群的实际温度变化之前预先确定要执行的温度调节策略,进而在未来时段能随服务器集群的负荷变化相应调整空调系统的温度调节动作,实现有效控温,具有及时有效的优点,且有利于节能减耗。
基于上述实施例提供的方案,可选的,所述服务器集群的负荷参数包括以下至少一项:服务业务负荷参数、网络设备流量参数、网络设备端口请求数量、网络链接会话数量。
通过本申请实施例提供的方案,可以根据多种与业务负荷相关的参数综 合确定出服务器集群中服务器增减数量,以更准确地预测服务器集群的业务变化,提高温度控制的准确性。
基于上述实施例提供的方案,可选的,所述服务器集群的负荷参数包括多项。
其中,如图2所示,上述步骤S12,包括以下步骤。
S21:将多项负荷参数输入基于朴素贝叶斯算法的参数预测模型,得到所述未来时段内的所述多项负荷参数的预测升降概率。
朴素贝叶斯算法(Naive Bayesian algorithm)是基于贝叶斯定理与特征条件独立假设的分类方法。其中可以应用朴素贝叶斯分类器(Naive Bayes Classifier,NBC)模型,所需估计的参数较少,对缺失数据不太敏感,算法也比较简单,可以方便高效实现参数预测。
本申请实施例提供的方案中应用的朴素贝叶斯方法是在贝叶斯算法的基础上进行了相应的简化,即假定给定目标值时属性之间相互条件独立。不存在对于决策结果来说占有着较大的比重,或哪个属性变量对于决策结果占有着较小的比重的算法。
下面对本方案的朴素贝叶斯方法进行说明。
设有样本数据集D={d~1~,d~2~,···,d~n~},对应样本集的特征属性集为X={x~1~,x~2~,···,x~d~},类变量为Y={y~1~,y~2~,···,y~m~},即D可以分成y~m~类别,其中,x~1~,x~2~,···,x~d~相互随机独立。
则Y的先验概率P~prior~=P(Y),Y的后验概率P~post~=P(Y|X)。
则由朴素贝叶斯算法可得,后验概率可以由先验概率P~prior~=P(Y)、证据P(X)、类条件概率P(X|Y)计算出:
P(X|Y)=P(Y)P(X∣Y)/P(X)P(X|Y)=P(X)P(Y)P(X∣Y)
基于各特征相互独立,在给定类别为y的情况下,上式可以进一步表示为下式:
由以上两式可以算出后验概率为:
本申请实施例提供的方案中,对多项负荷参数应用上述朴素贝叶斯模型执行预测计算,从而得到服务器集合增减服务器的概率。
举例而言,本申请实施例提供的方案可以应用下述表格1中的“条件类别”列中所示的各项负荷参数,通过上述朴素贝叶斯模型执行预测计算,得到各项负荷参数对应的服务器集群提升算力或降低算力的概率(如表1中“计算得出概率”列所示)。其中,提升算力则表明增加服务器数量,降低算力则表明减少服务器数量。
表1
S22:根据所述多项负荷参数的预测升降概率确定未来时段内的所述服务器集群的服务器增减数量。
在上述步骤得到多项负荷参数对应的预测升降概率之后,对多项负荷参数对应的升降概率进行汇总,具体而言,P(提升算力)=a*c*e*g*i,P(降低算力)=b*d*f*h*j,该a~j均为表1中“计算得出概率”列的参数。
随后执行服务器增减的判定,如P提升算力>P降低算力,则触发计算服务器水平扩容上电。如P降低算力>P提升算力,则触发计算服务器服务缩容下电。
根据计算的决策结果,确定服务器算力的水平扩容(服务器上电),或者计算服务器算力的缩容(服务器下电)的调度决策。进一步的,在确定了提升算力或降低算力之后,可以基于多项负荷参数的变化趋势进一步确定服务器增减的数量。在实际应用中,可以预先设定等级来阶梯性执行服务器增减,以避免服务器数量的频繁变动。
例如,以单位数量执行服务器增减,如果确定服务器集群提升算力,则确定增加一个单位数量的服务器。相反的,如果确定服务器集群降低算力,则确定减少一个单位数量的服务器。
应理解的是,服务器数量的增减与服务器集群的规模以及实际的负荷参数相关,可以根据实际应用灵活调整。
基于上述实施例提供的方案,可选的,所述服务器集群所处的环境温度参数包括以下至少一项。
所述服务器集群的机架温度、所述空调系统制冷量参数、所述空调系统的风道压强参数、所述服务器集群的室外温度参数、所述空调系统的热风道与室外新风入口温差、所述服务器集群的室外湿度参数。
其中,机架温度包括每个单元的机架温度,可以用于计算服务器集群的平均机架温度。风道压强包括进风道压强和出风道压强,风道压强也可分为室内风道压强和室外风道压强。
通过本申请实施例提供的方案,能从服务器集群所处环境的远近多个点位采集温度参数,从而更准确地获取到服务器集群所处的温度、所处的降温效率。在空调系统包含室外与室内通风管道的情况下,还能获取到室外环境温度,更有利于综合确定温度控制策略,以灵活调用室内外温差执行温度控制,有利于节能减排。
基于上述实施例提供的方案,可选的,所述服务器集群所处的环境温度参数包括多项,所述温度调节策略包括空气循环策略。
其中,空气循环策略可以包括室内循环策略与室内外循环策略,在室外空气符合温度控制条件的情况下,可以调用室外空气直接对服务器集群执行温度控制,降低空调系统执行温度控制消耗的能源。
其中,如图3a所示,上述步骤S13,包括以下步骤。
S31:将多项环境温度参数输入基于朴素贝叶斯算法的参数预测模型,得到所述未来时段内的多项环境温度参数分别对应的预测循环策略。
本申请步骤中应用的朴素贝叶斯算法的参数预测模型可以与上述实施例步骤S21描述的模型一致。本步骤中,将多项环境温度参数分别输入上述参数预测模型,以得到分别与各项环境温度参数对应的预测循环策略。
举例而言,如下表2所示,应用下述“条件类别”列中所示的各项环境温度参数,通过上述朴素贝叶斯模型执行预测计算,得到各项环境温度参数对应的服务器集群切换外循环或切换内循环的概率(如表2中“计算得出概率”列所示)。其中,切换外循环表明将室外空气通入服务器集群以实现室内外空气循环,切换内循环表明使用室内空气通入服务器集群,不使用室外空气进行空气循环。
表2

S32:根据所述多项环境温度参数分别对应的预测循环策略确定未来时段内的所述服务器集群的空气循环策略,所述空气循环策略用于指示所述空调系统执行空气内循环或空气外循环。
在上述步骤得到多项环境温度参数对应的预测循环策略之后,对多项预测循环策略进行汇总,具体而言,P(切换外循环)=a*c*e*g*i*k,P(切换内循环)=b*d*f*h*j*l,该参数a~l均为表2中“计算得出概率”列中的参数。
随后执行空气循环策略的判定,如P切换外循环>P切换内循环>则触发空调切换外循环,如P切换内循环>P切换外循环>则触发空调切内外循环。确定的空气循环策略用于控制空调系统执行新风模式内外循环动作。
本申请实施例提供的方案,根据室外空气的温度情况,结合当前服务器集群运行状态数据进行决策判断,能用于控制数据中心的新风吸入/热风排出系统,动态判定使用内部空气循环还是切换为外部新风吸入。本方案结合智能化的计算结果与自动化的切换手段,能控制空调系统以更经济、更灵活的节能策略运行,从而在实现服务器集群温度有效控制的基础上,实现节能减排。
本方案中确定的空气循环策略具体可以用于控制空调系统中的新风吸入/热风排出设备,用于满足智能新风管理的需要。本方案灵活应用室外温度变化,随着季节变化、昼夜温差变化,本方案能灵活调用室外空气执行温度控制,实现能源充分利用。
图3b示出了基于图3a的执行空气外循环的空气循环策略的控制流程示意图,当外部进风口温度符合算法要求,可以关闭机架散热后的热风内循环 系统,将热风排出到室外环境,吸入室外的低温空气进行制冷循环。
具体而言,判定切换到外循环模式后,将空调系统散热后的热风排出到室外环境,吸入室外的低温空气进入空调冷风口,配合空调集群制冷输出量的决策计算判定,当确认降低制冷量时,自动调节空调集群制冷的工作设备台数,或者降低制冷设备的总功率,降低空调集群的总制冷量,以更经济的散热方式节能降耗,提升经济效益。
图3c示出了基于图3a的执行空气内循环的空气循环策略的控制流程示意图,当外部进风口温度高时候,切断外部新风进入,使用机架散热后的热风进行内部冷却循环,实现经济环保控制温度。
具体而言,判定切换到内循环模式后,可以切断外部新风进入,循环使用机架散热后的空气,进行入内循环管道,配合空调集群制冷输出量的决策计算判定,当确认提升制冷量时,唤醒冗余待机空调主机或提高制冷系统的功率,提升制冷量,更精确及时地进行设备散热。
基于上述实施例提供的方案,可选的,如图4所示,所述方法还包括以下步骤。
S41:监测所述空调系统的运行参数。
本申请实施例提供的方案中,监测空调系统的运行效率,能有利于提高温度控制有效性,保证空调系统稳定运行,有利于实现节能减排。
其中,上述步骤S13,包括以下步骤。
S42:根据所述服务器增减数量预测所述服务器集群的上电服务器总功率。
在本步骤中,根据服务器增减数量来计算服务器集群中服务器数量变化之后的处于上电状态的服务器总功率。该服务器总功率能反映出服务器集群整体能耗,进而表示服务器集群的整体升温趋势。
S43:将所述服务器集群的机架温度、所述空调系统的风道压强参数、所述空调系统的运行参数、所述上电服务器总功率分别输入基于朴素贝叶斯算法的参数预测模型,得到所述未来时段内的多项空调系统制冷量参数。
本申请步骤中应用的朴素贝叶斯算法的参数预测模型可以与上述实施例步骤S21描述的模型一致。本步骤中,将服务器集群的机架温度、所述空调系统的风道压强参数、所述空调系统的运行参数、所述上电服务器总功率这四项参数分别输入上述参数预测模型,以得到分别与各项参数对应的空调系统制冷量参数。
举例而言,如下表3所示,应用下述“条件类别”列中所示的各项参数,通过上述朴素贝叶斯模型执行预测计算,得到各项参数对应的提升制冷量与降低制冷量的概率(如表3中“计算得出概率”列所示)。其中,提升制冷量用于进一步降低用于通入服务器集群的空气温度,降低制冷量用于提升用于通入服务器集群的空气温度。
表3
S44:根据所述多项空调系统制冷量参数确定未来时段内的所述服务器集群的空调系统制冷量。
在上述步骤得到多项环境温度参数对应的预测循环策略之后,对多项预测循环策略进行汇总,具体而言,P(提升制冷量)=a*c*e*g,P(降低制冷量)=b*d*f*h,该参数a~h均为表3中“计算得出概率”列中的参数。
随后执行服务器集群的空调系统制冷量变化趋势的判定,如P提升制冷 量>P降低制冷量>则触发空调切换提升制冷量,如P降低制冷量>P提升制冷量>则触发空调切内提升制冷量。根据计算的决策结果,执行空调机组的提升制冷量还是降低制冷量的操作决策。
进一步的,在确定了提升制冷量或降低制冷量之后,可以基于本实例中应用的多项参数的变化趋势进一步确定制冷量变化的具体数值。在实际应用中,可以预先设定等级来阶梯性执行制冷量增减,以避免空调系统制冷动作的频繁变动。
例如,以单位参数执行制冷量增减,如果确定提升制冷量,则确定增加一个单位参数的制冷量。相反的,如果确定降低制冷量,则确定减少一个单位参数的制冷量。
应理解的是,制冷量参数的增减与服务器集群的规模以及实际的环境温度等参数相关,可以根据实际应用灵活调整。
基于上述实施例提供的方案,可选的,如图5a所示,在上述步骤S12之后,还包括以下步骤。
S51:如果所述服务器增减数量表示增加第一数量个服务器,则根据所述第一数量生成服务器增加调度策略,所述服务器增加调度策略用于在所述未来时段内对所述服务器集群内的第一数量个未上电服务器执行上电,并对上电后的第一数量个服务器执行业务迁移。
图5b是本申请的一个实施例一种服务器集群温度调节方法的增加服务器数量的流程示意图。在服务器集群整体业务量升高、负荷增大的情况下,根据采集外部业务请求数量、服务器业务负荷监控模块感知到业务量、交换设备流量、交换设备端口请求、网络连接session(会话)、历史同时间的业务量对比等参数,通过预构建的算法模块进行模型计算执行判定,进而根据判定结果执行业务的水平扩容。
在增加服务器数量时,控制冗余待机状态的服务器上电启动,以加入服务器集群。当服务器上电完成后,触发当前监控的运行的总功率数据,进而 控制空调系统协同执行温度调节策略相匹配的动作。具体而言,可以包括空调系统执行制冷量调节、机架的风道开合角度调节、加大或加压该机架的冷风输入量等动作,以满足新上电扩容后的计算集群的散热需要,实现温度精准有效控制的同时,能实现节能降耗。
S52:如果所述服务器增减数量表示减少第二数量个服务器,则根据所述第二数量生成服务器减少调度策略,所述服务器减少调度策略用于在所述未来时段内对所述服务器集群内的第二数量个服务器执行业务迁移,并对执行业务迁移后的第二数量个服务器执行下电。
图5c是本申请的一个实施例一种服务器集群温度调节方法的减少服务器数量的流程示意图。在降低服务器数量时,根据采集外部业务请求数量、服务器业务负荷监控模块感知到业务量、交换设备流量、交换设备端口请求、网络连接session(会话)、历史同时间的业务量对比等参数,通过预构建的算法模块进行模型计算执行判定,进而根据判定结果将集群中的一台或多台服务器从业务集群中隔离,将隔离出的服务器中的业务迁移至可用服务器,随后通过服务器上下电管理系统对隔离出的服务器下电。当服务器下电完成后,触发当前监控的运行的总功率数据,进而控制空调系统协同执行温度调节策略相匹配的动作。具体而言,可以包括空调系统执行制冷量调节、机架的风道开合角度调节、减小或降压该机架的冷风输入量等动作,以在满足新下电后服务器集群整体散热需求的基础上,进一步实现节能降耗。
通过本申请实施例提供的方案,能以交换设备端口量、防火墙设备上监控的服务接口的访问量等负荷参数,结合当前服务器集群的业务量进行计算,进而估算将来的负荷变化。从而判定是否需要实施服务集群的业务迁移、对应的计算集群下电缩容、水平扩容的操作,以便实现更经济的节能降耗目标,实现经济效益提升。
本申请实施例提供的方案通过控制空调系统执行策略匹配的动作,能动态调整制冷量,对于空调系统集群的运行或待机设备进行动态管理,实现精 准的制冷量生产,以完成空调系统的节能降耗,实现经济效益提升。
另外,本申请实施例基于预构建的模型执行参数预测,灵活应用室内外温差形成的空气资源,控制数据中心新风吸入或热风排出系统计算判定使用空调内部循环还是切换为外部循环。进而实现更经济的内外循环的空调运行模式管理,使用更灵活的散热策略,完成散热系统的节能降耗,实现经济效益提升。
在实际应用中,可以在数据中心搭建一套精准智能能耗管理系统,部署需采集的指标数据的监控硬件和软件,监测交换设备的端口流量、防火墙设备监控的业务接口访问次数等负荷参数,通过人工智能模型结合朴素贝叶斯算法执行参数预测。进而根据预测的未来时段的参数,对服务器集群进行上电提升算力或者下电缩容的操作,同时控制空调系统相应调整制冷量,以满足服务器集群整体散热需求,有利于实现精准的能耗管理。
为了解决现有技术中存在的问题,本申请实施例还提供一种服务器集群温度调节装置60,如图6所示,包括以下模块。
监测模块61,监测服务器集群在历史时段内的负荷参数和所述服务器集群所处的环境温度参数。
预测模块62,基于所述负荷参数预测未来时段内的所述服务器集群的服务器增减数量。
确定模块63,根据所述服务器增减数量和所述环境温度参数确定所述未来时段内的温度调节策略,其中,所述温度调节策略包括温度升降参数,所述服务器增减数量的变化趋势和所述温度升降参数的变化趋势负相关。
控制模块64,控制空调系统在所述未来时段内执行所述温度调节策略相匹配的动作,以对所述服务器集群所处的环境温度执行调节。
通过本申请实施例提供的装置,能监测服务器集群的负荷参数来预测负荷变化,从而在服务器集群的实际温度变化之前预先确定要执行的温度调节策略,进而在未来时段能随服务器集群的负荷变化相应调整空调系统的温度 调节动作,实现有效控温,具有及时有效的优点,且有利于节能减耗。
其中,本申请实施例提供的装置中的上述模块还可以实现上述方法实施例提供的方法步骤。或者,本申请实施例提供的装置还可以包括除上述模块以外的其他模块,用以实现上述方法实施例提供的方法步骤。且本申请实施例提供的装置能够实现上述方法实施例所能达到的技术效果。
可选的,本申请实施例还提供一种电子设备,包括处理器,存储器,存储在存储器上并可在所述处理器上运行的计算机程序,该计算机程序被处理器执行时实现上述一种服务器集群温度调节方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述一种服务器集群温度调节方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。其中,所述的计算机可读存储介质,如只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流 程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖 非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (10)

  1. 一种服务器集群温度调节方法,包括:
    监测服务器集群在历史时段内的负荷参数和所述服务器集群所处的环境温度参数;
    基于所述负荷参数预测未来时段内的所述服务器集群的服务器增减数量;
    根据所述服务器增减数量和所述环境温度参数确定所述未来时段内的温度调节策略,其中,所述温度调节策略包括温度升降参数,所述服务器增减数量的变化趋势和所述温度升降参数的变化趋势负相关;
    控制空调系统在所述未来时段内执行所述温度调节策略相匹配的动作,以对所述服务器集群所处的环境温度执行调节。
  2. 如权利要求1所述的方法,其中,所述服务器集群的负荷参数包括以下至少一项:
    服务业务负荷参数、网络设备流量参数、网络设备端口请求数量、网络链接会话数量。
  3. 如权利要求2所述的方法,其中,所述服务器集群的负荷参数包括多项;
    其中,基于所述负荷参数预测未来时段内的所述服务器集群的服务器增减数量,包括:
    将多项负荷参数输入基于朴素贝叶斯算法的参数预测模型,得到所述未来时段内的所述多项负荷参数的预测升降概率;
    根据所述多项负荷参数的预测升降概率确定未来时段内的所述服务器集群的服务器增减数量。
  4. 如权利要求2所述的方法,其中,所述服务器集群所处的环境温度参数包括以下至少一项:
    所述服务器集群的机架温度、所述空调系统制冷量参数、所述空调系统的风道压强参数、所述服务器集群的室外温度参数、所述空调系统的热风道 与室外新风入口温差、所述服务器集群的室外湿度参数。
  5. 如权利要求4所述的方法,其中,所述服务器集群所处的环境温度参数包括多项,所述温度调节策略包括空气循环策略;
    其中,根据所述服务器增减数量和所处环境的温度确定所述未来时段内的温度调节策略,包括:
    将多项环境温度参数输入基于朴素贝叶斯算法的参数预测模型,得到所述未来时段内的多项环境温度参数分别对应的预测循环策略;
    根据所述多项环境温度参数分别对应的预测循环策略确定未来时段内的所述服务器集群的空气循环策略,所述空气循环策略用于指示所述空调系统执行空气内循环或空气外循环。
  6. 如权利要求4所述的方法,其中,还包括:监测所述空调系统的运行参数;
    其中,根据所述服务器增减数量和所述环境温度参数确定所述未来时段内的温度调节策略,包括:
    根据所述服务器增减数量预测所述服务器集群的上电服务器总功率;
    将所述服务器集群的机架温度、所述空调系统的风道压强参数、所述空调系统的运行参数、所述上电服务器总功率分别输入基于朴素贝叶斯算法的参数预测模型,得到所述未来时段内的多项空调系统制冷量参数;
    根据所述多项空调系统制冷量参数确定未来时段内的所述服务器集群的空调系统制冷量。
  7. 如权利要求1~6任一项所述的方法,其中,在基于所述负荷参数预测未来时段内的所述服务器集群的服务器增减数量之后,还包括:
    如果所述服务器增减数量表示增加第一数量个服务器,则根据所述第一数量生成服务器增加调度策略,所述服务器增加调度策略用于在所述未来时段内对所述服务器集群内的第一数量个未上电服务器执行上电,并对上电后的第一数量个服务器执行业务迁移;
    如果所述服务器增减数量表示减少第二数量个服务器,则根据所述第二数量生成服务器减少调度策略,所述服务器减少调度策略用于在所述未来时段内对所述服务器集群内的第二数量个服务器执行业务迁移,并对执行业务迁移后的第二数量个服务器执行下电。
  8. 一种服务器集群温度调节装置,包括:
    监测模块,监测服务器集群在历史时段内的负荷参数和所述服务器集群所处的环境温度参数;
    预测模块,基于所述负荷参数预测未来时段内的所述服务器集群的服务器增减数量;
    确定模块,根据所述服务器增减数量和所述环境温度参数确定所述未来时段内的温度调节策略,其中,所述温度调节策略包括温度升降参数,所述服务器增减数量的变化趋势和所述温度升降参数的变化趋势负相关;
    控制模块,控制空调系统在所述未来时段内执行所述温度调节策略相匹配的动作,以对所述服务器集群所处的环境温度执行调节。
  9. 一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至7中任一项所述的方法的步骤。
  10. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的方法的步骤。
PCT/CN2023/109059 2022-11-28 2023-07-25 一种服务器集群温度调节方法和装置 WO2024113906A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211501797.4A CN118093301A (zh) 2022-11-28 2022-11-28 一种服务器集群温度调节方法和装置
CN202211501797.4 2022-11-28

Publications (1)

Publication Number Publication Date
WO2024113906A1 true WO2024113906A1 (zh) 2024-06-06

Family

ID=91155977

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/109059 WO2024113906A1 (zh) 2022-11-28 2023-07-25 一种服务器集群温度调节方法和装置

Country Status (2)

Country Link
CN (1) CN118093301A (zh)
WO (1) WO2024113906A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118382276A (zh) * 2024-06-25 2024-07-23 张北云联数据服务有限责任公司 一种高密度服务器机房的智能温度管理方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050273208A1 (en) * 2004-06-03 2005-12-08 Kazuaki Yazawa Electronic device cooling apparatus and method for cooling electronic device with temperature prediction
US20100218005A1 (en) * 2009-02-23 2010-08-26 Microsoft Corporation Energy-aware server management
US20120158206A1 (en) * 2010-12-20 2012-06-21 International Business Machines Corporation Regulating the temperature of a datacenter
US20150180719A1 (en) * 2013-12-20 2015-06-25 Facebook, Inc. Self-adaptive control system for dynamic capacity management of latency-sensitive application servers
CN112539529A (zh) * 2020-11-27 2021-03-23 珠海格力电器股份有限公司 空调系统的控制方法及控制装置、机房空调系统
CN112788101A (zh) * 2020-12-22 2021-05-11 宇龙计算机通信科技(深圳)有限公司 服务器集群的控制方法、装置、终端及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050273208A1 (en) * 2004-06-03 2005-12-08 Kazuaki Yazawa Electronic device cooling apparatus and method for cooling electronic device with temperature prediction
US20100218005A1 (en) * 2009-02-23 2010-08-26 Microsoft Corporation Energy-aware server management
US20120158206A1 (en) * 2010-12-20 2012-06-21 International Business Machines Corporation Regulating the temperature of a datacenter
US20150180719A1 (en) * 2013-12-20 2015-06-25 Facebook, Inc. Self-adaptive control system for dynamic capacity management of latency-sensitive application servers
CN112539529A (zh) * 2020-11-27 2021-03-23 珠海格力电器股份有限公司 空调系统的控制方法及控制装置、机房空调系统
CN112788101A (zh) * 2020-12-22 2021-05-11 宇龙计算机通信科技(深圳)有限公司 服务器集群的控制方法、装置、终端及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118382276A (zh) * 2024-06-25 2024-07-23 张北云联数据服务有限责任公司 一种高密度服务器机房的智能温度管理方法及系统

Also Published As

Publication number Publication date
CN118093301A (zh) 2024-05-28

Similar Documents

Publication Publication Date Title
CN109800066B (zh) 一种数据中心节能调度方法及系统
US11754984B2 (en) HVAC system with predictive airside control
WO2021042339A1 (zh) 散热控制与模型训练方法、设备、系统及存储介质
CN104698843B (zh) 一种基于模型预测控制的数据中心节能控制方法
JP4822165B2 (ja) 大規模データセンターまたはitインフラにおけるエネルギー消費の確定および動的制御方法
JP5529114B2 (ja) 計算環境内のエネルギ消費を管理するシステムおよび方法
CN111174375B (zh) 面向数据中心能耗最小化的作业调度和机房空调调控方法
US20130191658A1 (en) Methods of achieving cognizant power management
US9274585B2 (en) Combined dynamic and static power and performance optimization on data centers
WO2024113906A1 (zh) 一种服务器集群温度调节方法和装置
Cheng et al. A survey of energy-saving technologies in cloud data centers
CN110222398B (zh) 冷水机组人工智能控制方法、装置、存储介质及终端设备
CN107036238B (zh) 动态预测外气与负载智慧节能控制方法
Ran et al. Optimizing energy efficiency for data center via parameterized deep reinforcement learning
CN115164361B (zh) 一种数据中心控制方法、装置、电子设备和存储介质
JP2023532492A (ja) 空調制御方法及び装置、電気機器、媒体
CN110762739B (zh) 数据中心空调控制方法、装置、设备及存储介质
Sathupadi Ai-driven energy optimization in sdn-based cloud computing for balancing cost, energy efficiency, and network performance
Kumar et al. Effect of cooling systems on the energy efficiency of data centers: machine learning optimisation
CN117395942A (zh) 一种基于智算中心的冷量自动调度系统
Fang et al. Using model predictive control in data centers for dynamic server provisioning
CN116261300A (zh) 一种数据中心制冷设备与气流组织的联合优化方法及装置
CN115560430A (zh) 一种基于云边计算的空调模型优化方法及系统
Wu et al. Data center job scheduling algorithm based on temperature prediction
Kumar et al. Data center air handling unit fan speed optimization using machine learning techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23896069

Country of ref document: EP

Kind code of ref document: A1