CN117545228A

CN117545228A - Air conditioner control system, method, equipment and medium based on reinforcement learning

Info

Publication number: CN117545228A
Application number: CN202311471208.7A
Authority: CN
Inventors: 刘博�; 赵予风; 田清森
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2023-11-07
Filing date: 2023-11-07
Publication date: 2024-02-09

Abstract

The application discloses an air conditioner control system, a method, equipment and a medium based on reinforcement learning, wherein the system comprises: the signal transmission layer is used for transmitting the state information of the air conditioning equipment, the first control instruction issued by the interaction layer, the second control instruction forwarded by the centralized control layer and the execution result of the air conditioning equipment; the edge calculation layer is used for controlling and adjusting the execution result according to the offline control model to obtain a first adjustment instruction; the centralized control layer is used for performing first conversion processing on the first control instruction to obtain a second control instruction, and updating and issuing an offline control model to the edge calculation layer according to the state information and the first control instruction; and the interaction layer is used for acquiring and obtaining a first control instruction according to the state information of the air conditioning equipment. The system can improve the fine degree of air conditioner control of the machine room, reduce the PUE energy consumption and the expenditure cost of the machine room, save energy resources and have higher energy consumption adaptability. The method and the device can be widely applied to the technical field of air conditioners.

Description

Air conditioner control system, method, equipment and medium based on reinforcement learning

Technical Field

The present disclosure relates to the field of air conditioning technologies, and in particular, to an air conditioning control system, method, apparatus, and medium based on reinforcement learning.

Background

With the continuous and deep development of basic communication service, especially the starting of 5G construction, the number of various communication base stations and machine rooms is increased by times, and the whole communication industry is faced with huge energy saving and emission reduction pressure. The air conditioner in the machine room (base station) is the main power consumption equipment in the machine room (base station), so that the energy conservation and emission reduction are the primary aim to reduce the energy consumption cost of the air conditioner. According to statistics, in the total power consumption of the base station, the power consumption of the base station main equipment is 51%, the power consumption of an air conditioner is 46%, and the power consumption of other matched equipment is 3%.

At present, the traditional machine room energy consumption control mode mainly comprises the steps of setting a fixed air conditioning temperature in advance, stopping running the machine room air conditioner periodically in autumn and winter, wherein the fine control degree of the machine room air conditioner is low, the machine room air conditioner is generally unreasonable in temperature setting, unreasonable in working mode and the like, so that the machine room air conditioner is invalid to run, and the machine room PUE (Power Usage Effectiveness) is high in energy consumption and high in energy consumption cost. In addition, the traditional machine room energy consumption control mode generally sets a fixed air conditioning temperature for a certain number of machine rooms uniformly, but the hardware environments in each machine room (base station) are quite different, the heat generated by hardware equipment in the machine room is quite different, and on the premise of ensuring that the hardware equipment of each machine room does not have a high temperature phenomenon, the air conditioning temperature set by a part of the machine rooms is quite low, and the machine room energy consumption control mode has quite low energy consumption and energy consumption adaptability.

Accordingly, there is a need for solving and optimizing the problems associated with the prior art.

Disclosure of Invention

In order to solve at least one of the technical problems, the application provides an air conditioner control system, an air conditioner control method, air conditioner control equipment and an air conditioner control medium based on reinforcement learning, wherein the air conditioner control system can effectively improve the fineness of air conditioner control of a machine room, effectively reduce the PUE energy consumption of the machine room, reduce the energy consumption branch-out cost of the machine room, effectively save energy resources and have higher energy consumption adaptability.

According to a first aspect of the present application, there is provided an air conditioner control system based on reinforcement learning, including: the system comprises a signal transmission layer, an edge calculation layer, a centralized control layer and an interaction layer;

the signal transmission layer is used for acquiring and transmitting state information of the air conditioning equipment, a first control instruction issued by the interaction layer, a second control instruction forwarded by the centralized control layer and an execution result of the air conditioning equipment, wherein the execution result comprises an execution action of the air conditioning equipment, and an inner environment temperature and an outer environment temperature of a machine room in which the air conditioning equipment is located;

the edge calculation layer is used for receiving an offline control model issued by the centralized control layer, performing control adjustment processing on the execution result according to the offline control model, and obtaining and sending a first adjustment instruction to the air conditioning equipment;

The centralized control layer is used for performing first conversion processing on the first control instruction, obtaining and sending the second control instruction to the air conditioning equipment, and updating and sending an offline control model to the edge calculation layer according to the state information and the first control instruction;

and the interaction layer is used for acquiring and obtaining a first control instruction according to the state information of the air conditioning equipment. Further, in the embodiment of the present application, the signal transmission layer includes a first communication module and a second communication module;

the first communication module is used for acquiring and transmitting the state information of the transmission air conditioning equipment and transmitting a second control instruction forwarded by the centralized control layer;

the second communication module is used for transmitting the first control instruction issued by the interaction layer and the execution result of the air conditioning equipment.

Further, in the embodiment of the present application, the edge computing layer includes an edge computing module and an edge storage module;

the edge storage module is used for receiving an offline control model issued by the centralized control layer and receiving and sending the first adjustment instruction to the air conditioning equipment;

And the edge operation module is used for controlling and adjusting the execution result according to the offline control model to obtain and send the first control instruction to the edge storage module.

Further, in the embodiment of the application, the centralized control layer comprises a reinforcement learning module and an instruction conversion module;

the instruction conversion module is used for performing first conversion processing on the first control instruction to obtain and send the second control instruction to the air conditioning equipment;

and the reinforcement learning module is used for acquiring and updating parameters of the initialized offline control model according to the historical data and the first instruction to obtain a trained offline control model, and sending the trained offline control model to the edge calculation layer.

Further, in the embodiment of the present application, the interaction layer includes an interface module, a policy setting module, and a statistics report module;

the interface module is used for carrying out visual processing on the state information of the air conditioning equipment to obtain a state visual interface of the air conditioning equipment;

the strategy setting module is used for responding to an air conditioning strategy instruction issued by a user and generating a first control instruction corresponding to the air conditioning strategy instruction;

And the statistical report module is used for carrying out statistical processing on the state information and the execution result to generate a statistical report corresponding to the air conditioning equipment.

According to a second aspect of the present application, an embodiment of the present application provides an air conditioning control method based on reinforcement learning, which performs air conditioning control by the air conditioning control system based on reinforcement learning described in the first aspect, the air conditioning control method based on reinforcement learning includes:

acquiring state information of the air conditioning equipment through the signal transmission layer and sending the state information to the interaction layer;

receiving and according to the state information of the air conditioning equipment, obtaining a first control instruction issued by the interaction layer through the interaction layer;

performing first conversion processing on the first control instruction through the centralized control layer to obtain a second control instruction;

the second control instruction is sent to the air conditioning equipment through the centralized control layer, and an execution result of the air conditioning equipment is obtained, wherein the execution result comprises an execution action of the air conditioning equipment, and an inner environment temperature and an outer environment temperature of a machine room where the air conditioning equipment is located;

and performing control adjustment processing on the execution result through an offline control model in the edge calculation layer to obtain and send a first adjustment instruction to the air conditioning equipment. Further, in this embodiment of the present application, the sending the second control instruction to the air conditioning device obtains an execution result of the air conditioning setting, where the execution result includes an execution action of the air conditioning device, and includes:

Sending the second control instruction to the air conditioning equipment to obtain an intermediate result;

and performing second conversion processing on the intermediate result to obtain an execution result of the air conditioning equipment.

Further, in the embodiment of the present application, the offline control model of the edge computing layer is obtained by:

acquiring and updating parameters of the initialized offline control model according to the historical data and the first instruction through the centralized control layer to obtain a trained offline control model;

and sending the trained offline control model to the edge calculation layer through the centralized control layer.

According to a third aspect of the present application, there is provided a computer device comprising:

at least one processor;

at least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement the reinforcement learning-based air conditioning control method as described in the above aspect.

According to a fourth aspect of the present application, there is provided a computer-readable storage medium in which a processor-executable program is stored, which when executed by the processor is for implementing the reinforcement learning-based air conditioner control method as described in the above aspect.

The beneficial effects of the technical scheme that this application embodiment provided are:

the application provides an air conditioner control system, a method, equipment and a medium based on reinforcement learning, wherein the air conditioner control system comprises: the system comprises a signal transmission layer, an edge calculation layer, a centralized control layer and an interaction layer; the signal transmission layer is used for acquiring and transmitting state information of the air conditioning equipment, a first control instruction issued by the interaction layer, a second control instruction forwarded by the centralized control layer and an execution result of the air conditioning equipment, wherein the execution result comprises an execution action of the air conditioning equipment, and an inner environment temperature and an outer environment temperature of a machine room in which the air conditioning equipment is located; the edge calculation layer is used for receiving an offline control model issued by the centralized control layer, performing control adjustment processing on the execution result according to the offline control model, and obtaining and sending a first adjustment instruction to the air conditioning equipment; the centralized control layer is used for performing first conversion processing on the first control instruction, obtaining and sending the second control instruction to the air conditioning equipment, and updating and sending an offline control model to the edge calculation layer according to the state information and the first control instruction; and the interaction layer is used for acquiring and obtaining a first control instruction according to the state information of the air conditioning equipment. The air conditioner control system can effectively improve the fineness of air conditioner control of the machine room, effectively reduce the PUE energy consumption of the machine room, reduce the energy consumption branch-out cost of the machine room, effectively save energy resources and have higher energy consumption adaptability.

Drawings

Fig. 1 is a structural framework diagram of an air conditioner control system based on reinforcement learning according to an embodiment of the present application;

fig. 2 is a structural frame diagram of a signal transmission layer according to an embodiment of the present application;

FIG. 3 is a structural framework diagram of an edge computation layer according to an embodiment of the present disclosure;

fig. 4 is a structural framework diagram of a centralized control layer according to an embodiment of the present application;

FIG. 5 is a structural framework diagram of an interaction layer according to an embodiment of the present disclosure;

fig. 6 is a schematic flow chart of an air conditioner control method based on reinforcement learning according to an embodiment of the present application;

fig. 7 is a timing chart of an air conditioner control method based on reinforcement learning according to an embodiment of the present application;

fig. 8 is a schematic flow chart of step 140 according to an embodiment of the present application;

fig. 9 is a schematic flow chart of obtaining an offline control model by the edge computing layer according to an embodiment of the present application;

fig. 10 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

The present application is further described below with reference to the drawings and specific examples. The described embodiments should not be taken as limitations of the present application, and all other embodiments that would be apparent to one of ordinary skill in the art without undue burden are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

The following description is presented in terms of terms involved in this application:

PUE energy consumption: the PUE energy consumption is an index for evaluating the energy efficiency of the data center, and is the ratio of all the energy consumed by the data center to the energy consumed by the IT equipment load. Specifically, pue=total data center energy consumption/IT equipment energy consumption, where the total data center energy consumption includes IT equipment energy consumption and energy consumption of systems such as refrigeration, power distribution, and the like, and a value greater than 1, and a value closer to 1 indicates that the lower the non-IT equipment energy consumption is, the better the energy efficiency level is.

Reinforcement learning (Reinforcement Learning, RL): reinforcement learning, also known as re-excitation learning, evaluation learning, or reinforcement learning, is used to describe and solve the problem of an agent through learning strategies to maximize return or achieve a specific goal during interaction with an environment.

In view of this, the embodiment of the application provides an air conditioner control system, a method, equipment and a medium based on reinforcement learning, wherein the air conditioner control system can effectively improve the fineness of air conditioner control of a machine room, effectively reduce the PUE energy consumption of the machine room, reduce the energy consumption supporting cost of the machine room, effectively save energy resources and have higher energy consumption adaptability.

The embodiments of the present application provide an air conditioner control system, a method, an apparatus, and a medium based on reinforcement learning, and may specifically be described by the following embodiments, first, an air conditioner control system based on reinforcement learning in the embodiments of the present application is described.

The air conditioner control system based on reinforcement learning, which is provided by the embodiment of the application, can be applied to cloud computing (cloud service) application scenes. In the cloud computing application scene, the cloud computing service provider can realize intelligent control of the air conditioner of the machine room of the cloud computing equipment through the air conditioner control system based on reinforcement learning, so that the fine degree of air conditioner control of the machine room is improved, the PUE energy consumption of the machine room is reduced, the air conditioner control logic can be adaptively set for each cloud computing machine room, the energy resources are effectively saved, and the energy consumption supporting cost of the machine room is reduced.

The air conditioner control system based on reinforcement learning, which is provided by the embodiment of the application, can be applied to the application scene of the Internet of things. In the application scene of the internet of things, the air conditioner control system based on reinforcement learning, which is provided by the embodiment of the application, can be used for realizing intelligent control of the air conditioner of the environment where the internet of things equipment is located, so that air conditioner control logic is adaptively set for the environment where each internet of things equipment is located, energy resources are saved, and energy consumption expenditure cost is reduced.

The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It should be noted that, in each specific embodiment of the present application, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first, and the collection, use, processing, and the like of these data comply with related laws and regulations and standards. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the user is explicitly acquired, necessary user related data for enabling the embodiment of the application to normally operate is acquired.

Referring to fig. 1, fig. 1 is an optional structural framework diagram of an air conditioner control system based on reinforcement learning according to an embodiment of the present application.

The air conditioner control system based on reinforcement learning comprises a signal transmission layer, an edge calculation layer, a centralized control layer and an interaction layer;

referring to fig. 2, in some embodiments, the signal transmission layer includes a first communication module and a second communication module;

In the embodiment of the application, the signal transmission layer is used for realizing signal transmission between the air conditioning equipment and the interaction layer, signal transmission between the air conditioning equipment and the edge calculation layer, and signal transmission between the edge calculation layer and the centralized control layer. In practice, because manufacturers of air conditioners in each machine room are inconsistent, control instructions received by the air conditioners in different machine rooms have differences in data formats and the like, for signal transmission between the air conditioning equipment and the interaction layer, an execution result sent to the interaction layer by the air conditioning equipment can be docked through the centralized control layer, and a first control instruction sent to the air conditioning equipment by the interaction layer can be docked through the centralized control layer.

It can be understood that the first communication module mentioned in the embodiment of the present application is configured to directly interact with the air conditioning device, and is responsible for sending the second control instruction obtained by converting the centralized control layer to the air conditioning device, collecting the status information, the execution result, and the like of the air conditioning device. Specifically, the state information of the air conditioner comprises a temperature set value, an air outlet temperature value, a return air temperature value, an air outlet speed, a compressor running state, an air conditioning equipment running state and the like, the second control instruction comprises a wind power addition and subtraction instruction, a temperature rise and fall instruction, a compressor switching instruction, a machine switching instruction and the like, the first communication module can be realized through the RS485 communication module, the application example is only illustrative, no limitation is made to the application, and the actual demand can be met.

It can also be understood that the second communication module mentioned in the embodiment of the present application is used for signal interaction among the edge calculation layer, the centralized control layer and the interaction layer, and is responsible for transmitting the first control instruction issued by the interaction layer to the centralized control layer; receiving a trained offline control model issued by a centralized control layer, and transmitting the received offline control model to an edge calculation layer; and transmitting the state information and the execution result acquired by the first communication module to the centralized control layer. Specifically, the first communication module may be implemented by constructing an NB network through an NBIot module, which is only illustrated in the examples of the present application, and is not limited in any way, so as to meet the actual requirements.

referring to fig. 3, the edge calculation layer includes an edge calculation module and an edge storage module;

In the embodiment of the application, the offline control module issued by the centralized control layer can be received through the edge storage module, the execution result transmitted by the signal transmission layer is received and stored, then the offline control model is called through the edge operation module in an offline or online state, and the control logic of the current air conditioning equipment is adjusted according to the execution action, the internal environment temperature and the external environment temperature in the execution result, so that the adaptive control of the operation logic of the corresponding air conditioning equipment is obtained.

It may be understood that, after the first adjustment instruction is obtained, the edge calculation layer mentioned in the embodiment of the present application may be directly sent to the corresponding air conditioning device to execute the first adjustment instruction, so that the corresponding air conditioning device executes the first adjustment instruction, thereby implementing adaptive control of the air conditioning device operation logic. Or, the edge computing layer in the embodiment of the present application may send the first adjustment instruction to the corresponding air conditioning device indirectly after obtaining the first adjustment instruction, and by way of example, the embodiment of the present application may transmit the first adjustment instruction to the centralized control layer through the edge storage module in the edge computing layer, after the data format of the first adjustment instruction is converted by the centralized control layer, then send the converted first adjustment instruction to the corresponding air conditioning device to execute the first adjustment instruction, so that the corresponding air conditioning device executes the first adjustment instruction, thereby implementing adaptive control of the air conditioning device operation logic. It should be noted that, in the embodiment of the present application, the offline control model may also be used to control and adjust the execution result generated after the air conditioning device executes the first adjustment instruction, and repeat the process, so as to implement long-time fine control on the air conditioning device.

referring to fig. 4, the centralized control layer includes a reinforcement learning module and an instruction conversion module;

In this embodiment of the present application, the centralized control layer may operate at a cloud server, which may be composed of multiple server clusters, and the server clusters may be, for example, any one of a high availability cluster (High Avability Cluster, HA), a responsible balancing cluster (Load Balance Cluster, LB) and a high performance cluster (High Performance Computing Cluster, HPC), which is illustrated in this application, and may be specifically set according to actual requirements, so as to meet the actual requirements.

It may be understood that the first conversion processing of the instruction conversion module may be type conversion of a first control instruction issued by the interaction layer, and specifically in this embodiment, after receiving the first control instruction, the instruction conversion module first queries a target air conditioning device to which the first control instruction points, then converts the first control instruction according to a data format required by the target air conditioning device, generates a second control instruction, and then sends the generated second control instruction to the air conditioning device and is executed by the air conditioning device.

In the embodiment of the present application, the signal transmission layer may further transmit an execution result of the air conditioning device to the centralized control layer. The signal transmission layer may also transmit the execution result of the air conditioning device to the instruction conversion module, and the instruction conversion module performs data format conversion on the execution result to obtain an execution result after the data format conversion, and then sends the execution result after the data format conversion to the interaction layer.

It should be noted that the reinforcement Learning module mentioned in the embodiment of the present application may be implemented by a table-based Q Learning (Q-Learning) algorithm, a value-based DQN (Deep Q Network) algorithm, and a policy-based PG (Policy Gradient) algorithm, which are exemplified by a DQN algorithm. Specifically, the history data may be corresponding to current state information and execution results of the air conditioning device, and state information and execution results before the air conditioning device, and the first instruction may be a first control instruction currently issued by the interaction layer and a first adjustment instruction currently issued by the edge calculation layer, and a first control instruction previously issued by the interaction layer and a first adjustment instruction previously issued by the edge calculation layer. And then, based on the acquired historical data and the first instruction, combining a punishment and punishment mechanism preset by the DQN algorithm to train the offline control model (namely, the air conditioning equipment takes an action and punishs the internal environment temperature and the external environment temperature caused by the action of the air conditioning equipment). Further, for machine learning models, the accuracy of model predictions may be measured by a Loss Function (Loss Function) defined on a single training data for measuring the prediction error of a training data, specifically determining the Loss value of the training data from the label of the single training data and the model's predictions of the training data. In actual training, one training data set has a lot of training data, so that a Cost Function (Cost Function) is generally adopted to measure the overall error of the training data set, and the Cost Function is defined on the whole training data set and is used for calculating the average value of the prediction errors of all the training data, so that the prediction effect of the model can be better measured. For a general machine learning model, based on the cost function, a regular term for measuring the complexity of the model can be used as a training objective function, and based on the objective function, the loss value of the whole training data set can be obtained. There are many kinds of common loss functions, such as 0-1 loss function, square loss function, absolute loss function, logarithmic loss function, cross entropy loss function, etc., which can be used as the loss function of the machine learning model, and will not be described in detail herein. In embodiments of the present application, a loss function may be selected from among which to determine a trained loss value, such as a cross entropy loss function. Based on the trained loss value, updating the parameters of the model by adopting a back propagation algorithm, and iterating for several rounds to obtain the trained offline control model. The specific number of iteration rounds may be preset or training may be deemed complete when the test set meets the accuracy requirements.

For example, for a compressor of an air conditioning apparatus, the execution actions provided in the embodiments of the present application are on, off, and hold; the environmental factors are an inner environmental temperature and an outer environmental temperature; the corresponding prize punishment mechanisms can be: the value of the DQN algorithm is reduced by 1 minute every time the compressor is started; the value of the DQN algorithm is increased by 1 minute every time the compressor is turned off; the internal environment temperature is greater than the first temperature threshold, and the value of the DQN algorithm is reduced by 10 points; the value of the DQN algorithm is increased by 10 minutes when the temperature of the external environment is smaller than the second temperature threshold, and the specific values of the first time threshold, the first temperature threshold and the second temperature threshold can be set according to practical situations, for example, the first time threshold is 15 minutes, the first temperature threshold is 29 degrees celsius, the second temperature threshold is 10 degrees celsius, and the like, so that practical requirements can be met.

Referring to fig. 5, the interaction layer is configured to obtain and obtain a first control instruction according to the state information of the air conditioning equipment.

The interaction layer comprises an interface module, a strategy setting module and a statistical report module;

In the embodiment of the application, the interaction layer is used for performing visual interaction with a user. Specifically, the interface module can perform visual display on all acquired state information to generate a state visual interface, the strategy setting module can provide a preset control strategy for a user, and after the user can browse and select a proper air conditioning strategy instruction, the strategy setting module generates a corresponding first control instruction according to the air conditioning strategy instruction issued by the user and transmits the first control instruction to the centralized control module; the statistics report module can statistically display the running time of each air conditioner in the machine room, the region where the air conditioner is located, the average running time of each air conditioner in the machine room, the total running time, the PUE energy consumption of the machine room, the energy consumption of the single air conditioner and the like.

In addition, referring to fig. 6, fig. 6 is an optional flowchart of an air conditioning control method based on reinforcement learning provided in an embodiment of the present application, fig. 7 is an optional timing flowchart of an air conditioning control method based on reinforcement learning provided in the present application, and the air conditioning control method based on reinforcement learning in fig. 6 may include, but is not limited to, steps S110 to S140.

Step S110, acquiring state information of the air conditioning equipment through the signal transmission layer and sending the state information to the interaction layer;

step S120, receiving and according to the state information of the air conditioning equipment through the interaction layer, obtaining a first control instruction issued by the interaction layer;

step S130, performing first conversion processing on the first control instruction through the centralized control layer to obtain a second control instruction;

step S140, sending the second control instruction to the air conditioning equipment through the centralized control layer, so as to obtain an execution result of the air conditioning equipment, where the execution result includes an execution action of the air conditioning equipment, and an internal environment temperature and an external environment temperature of a machine room where the air conditioning equipment is located;

referring to fig. 8, the step S140 of sending the second control instruction to the air conditioning device to obtain an execution result of the air conditioning setting includes:

step S141, the second control instruction is sent to the air conditioning equipment, and an intermediate result is obtained;

and step S142, performing second conversion processing on the intermediate result to obtain an execution result of the air conditioning equipment.

In the embodiment of the present application, step S110 and step S130 are similar to the content related to the air conditioner control system described above, and can be obtained by a simple analogy, and the description thereof will not be repeated here. For step S140, since the data formats required by the air conditioning apparatuses of the respective rooms are not necessarily the same, in the embodiment of the present application, the second control instruction may be sent to the air conditioning apparatus first, and after being executed by the air conditioning apparatus, an intermediate result is obtained, and in the centralized control layer, the second conversion processing is performed on the intermediate result, so as to obtain the execution result of the air conditioning apparatus.

And step S150, performing control adjustment processing on the execution result through an offline control model in the edge calculation layer to obtain and send a first adjustment instruction to the air conditioning equipment.

Referring to FIG. 9, in some embodiments, the offline control model of the edge computation layer is obtained by:

step S160, acquiring and updating parameters of the initialized offline control model according to the historical data and the first instruction through the centralized control layer to obtain a trained offline control model;

and step S170, transmitting the trained offline control model to the edge calculation layer through the centralized control layer.

It can be appreciated that, for steps S150 to S170, which are similar to the content of the air conditioner control system described above, it can be simply analogized, and the description thereof will not be repeated here.

Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application, including:

at least one processor 980;

at least one memory 920 for storing at least one program;

the at least one program, when executed by the at least one processor 980, causes the at least one processor 980 to implement the methods as described in the various embodiments previously discussed.

The embodiments of the present application also provide a computer readable storage medium, in which a processor executable program is stored, the processor executable program being configured to implement the method according to the foregoing embodiments when executed by the processor 980.

In particular, the computer device may be a user terminal or a server, as shown in fig. 10.

In this embodiment, taking a computer device as an example, the computer device is a user terminal, the specific steps are as follows:

as shown in fig. 10, the computer device 900 may include RF (Radio Frequency) circuitry 910, memory 920 including one or more computer-readable storage media, an input unit 930, a display unit 940, a sensor 950, an audio circuit 960, a WiFi module 970, a processor 980 including one or more processing cores, and a power supply 990. It will be appreciated by those skilled in the art that the device structure shown in fig. 10 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The RF circuit 910 may be used for receiving and transmitting signals during a message or a call, and in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 980; in addition, data relating to uplink is transmitted to the base station. Typically, the RF circuitry 910 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier ), a duplexer, and the like. In addition, the RF circuitry 910 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, GSM (Global System of Mobile communication, global system for mobile communications), GPRS (General Packet Radio Service ), CDMA (Code Division Multiple Access, code division multiple access), WCDMA (Wideband Code Division Multiple Access ), LTE (Long Term Evolution, long term evolution), email, SMS (Short Messaging Service, short message service), and the like.

Memory 920 may be used to store software programs and modules. Processor 980 performs various functional applications and data processing by executing software programs and modules stored in memory 920. The memory 920 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the device 900 (such as audio data, phonebooks, etc.), and the like. In addition, memory 920 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, memory 920 may also include a memory controller to provide access to memory 920 by processor 980 and input unit 930. While fig. 10 shows RF circuitry 910, it is to be understood that it is not a necessary component of device 900 and may be omitted entirely as desired without changing the essence of the invention.

The input unit 930 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 930 may comprise a touch-sensitive surface 932 and other input devices 931. The touch-sensitive surface 932, also referred to as a touch display screen or a touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch-sensitive surface 932 or thereabout using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection device according to a predetermined program. Alternatively, the touch-sensitive surface 932 may include both a touch-detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 980, and can receive commands from the processor 980 and execute them. In addition, the touch-sensitive surface 932 may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. In addition to the touch-sensitive surface 932, the input unit 930 may also comprise other input devices 931. In particular, other input devices 931 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 940 may be used to display information entered by a user or information provided to a user and various graphical user interfaces of the control 900, which may be composed of graphics, text, icons, video and any combination thereof. The display unit 940 may include a display panel 941, and alternatively, the display panel 941 may be configured in the form of an LCD (Liquid Crystal Display ), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch sensitive surface 932 may be overlaid on the display panel 941, and upon detection of a touch operation thereon or thereabout by the touch sensitive surface 932, the touch sensitive surface 932 is passed to the processor 980 to determine a type of touch event, and the processor 980 then provides a corresponding visual output on the display panel 941 based on the type of touch event. Although in fig. 10 the touch-sensitive surface 932 and the display panel 941 are implemented as two separate components for input and output functions, in some embodiments the touch-sensitive surface 932 may be integrated with the display panel 941 to implement input and output functions.

The computer device 900 may also include at least one sensor 950, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 941 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 941 and/or the backlight when the device 900 is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile phone is stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the device 900 are not described in detail herein.

Audio circuitry 960, speaker 961, microphone 962 may provide an audio interface between a user and device 900. Audio circuit 960 may transmit the received electrical signal converted from audio data to speaker 961, where it is converted to a sound signal by speaker 961 for output; on the other hand, microphone 962 converts the collected sound signals into electrical signals, which are received by audio circuit 960 and converted into audio data, which are processed by audio data output processor 980 for transmission to another control device via RF circuit 910 or for output to memory 920 for further processing. Audio circuitry 960 may also include an ear bud jack to provide communication of peripheral headphones with device 900.

The device 900 may communicate information with a wireless transmission module provided on the combat device via the WiFi module 970.

Processor 980 is a control center for device 900, connecting various portions of the overall control device using various interfaces and wiring, performing various functions of device 900 and processing data by running or executing software programs and/or modules stored in memory 920, and invoking data stored in memory 920, thereby performing overall monitoring of the control device. Optionally, processor 980 may include one or more processing cores; alternatively, processor 980 may integrate an application processor with a modem processor, where the application processor primarily handles operating systems, user interfaces, applications programs, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 950.

The device 900 also includes a power source 990 (e.g., a battery) that provides power to the various components, preferably in logical communication with the processor 980 through a power management system, for managing charge, discharge, and power consumption by the power management system. The power source 990 may also include one or more of any components, such as a direct current or alternating current power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, the device 900 may also include a camera, a bluetooth module, etc., which will not be described in detail herein.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program, which when executed by a processor, causes the processor to perform the method described in the foregoing embodiments.

The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of operation in sequences other than those illustrated or described herein, for example. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The step numbers in the above method embodiments are set for convenience of illustration, and the order of steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments described above, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present invention, and these equivalent modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.

Claims

1. An air conditioner control system based on reinforcement learning, comprising: the system comprises a signal transmission layer, an edge calculation layer, a centralized control layer and an interaction layer;

and the interaction layer is used for acquiring and obtaining a first control instruction according to the state information of the air conditioning equipment.

2. The reinforcement learning based air conditioner control system of claim 1, wherein the signal transmission layer comprises a first communication module and a second communication module;

3. The reinforcement learning-based air conditioner control system of claim 1, wherein the edge calculation layer comprises an edge calculation module and an edge storage module;

4. The reinforcement learning-based air conditioner control system of claim 1, wherein the centralized control layer comprises a reinforcement learning module and an instruction conversion module;

5. The reinforcement learning based air conditioner control system of claim 1, wherein the interaction layer comprises an interface module, a policy setting module, and a statistics reporting module;

6. An air conditioning control method based on reinforcement learning, characterized in that air conditioning control is performed by the air conditioning control system based on reinforcement learning according to any one of claims 1 to 5, the air conditioning control method based on reinforcement learning comprising:

and performing control adjustment processing on the execution result through an offline control model in the edge calculation layer to obtain and send a first adjustment instruction to the air conditioning equipment.

7. The reinforcement learning-based air conditioner control method according to claim 6, wherein the sending the second control instruction to the air conditioner apparatus, to obtain the execution result of the air conditioner setting, includes:

8. The reinforcement learning-based air conditioner control method according to claim 6, wherein the offline control model of the edge calculation layer is obtained by:

9. A computer device, comprising:

at least one processor;

at least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement the reinforcement learning-based air conditioning control method of any of claims 6-8.

10. A computer-readable storage medium in which a processor-executable program is stored, characterized in that the processor-executable program is for realizing the reinforcement learning-based air conditioner control method according to any one of claims 6 to 8 when executed by the processor.