CN115930384A

CN115930384A - Intelligent air conditioner control device and control method using reinforcement learning and thermal imaging

Info

Publication number: CN115930384A
Application number: CN202310231267.0A
Authority: CN
Inventors: 崔璨; 薛佳慧; 刘运涛; 李春晓; 黎明
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2023-03-13
Filing date: 2023-03-13
Publication date: 2023-04-07
Anticipated expiration: 2043-03-13
Also published as: CN115930384B

Abstract

The application relates to intelligent air conditioner control equipment and a control method by means of reinforcement learning and thermal imaging, and belongs to the technical field of control. The apparatus comprises: the system comprises a data acquisition unit, a pre-state processing unit, a database establishing unit and a reinforcement learning unit, wherein the data acquisition unit takes the current human body thermal imaging image and the indoor temperature value as input, and trains the air conditioning intelligent body by using a reinforcement learning algorithm so as to complete the regulation and control of the indoor temperature and improve the comfort level of the human body. The temperature is adjusted according to the human body thermal imaging image, and the technical problems that in the prior art, the stability of indoor temperature is only considered, the human body comfort level cannot be effectively improved, the acquisition cost of input variables needed by models such as PMV (pulse-modulated visual velocimetry) and the like is high, and the poor prediction performance is displayed when the model is applied to individuals are solved.

Description

Intelligent air conditioner control device and control method using reinforcement learning and thermal imaging

Technical Field

The present invention relates to the field of control technology, and more particularly, to an intelligent air conditioner control apparatus and control method using reinforcement learning and thermal imaging.

Background

Air conditioners are the most common tools for regulating indoor temperature and are always the key factors affecting the comfort of people. The skin surface is the most direct place for exchanging the human body with the environment, and a plurality of influencing factors can influence the surface temperature of the skin surface, so that the comfort level state of the human body is influenced. Therefore, the skin temperature is one of the parameters that most directly characterize the comfort state of the human body. In previous studies, it has been demonstrated that an optimal skin temperature range is comfortable, while too high or too low a skin temperature can cause discomfort. Therefore, the control of the indoor temperature enables the skin surface temperature to reach a certain range, improves the comfort of people, and has important significance in improving the work and learning efficiency of people.

At present, there are many control methods for air conditioning systems, such as classical proportional-integral-derivative control, linear quadratic regulator control, fuzzy control, model predictive control. Most of the previous control methods focus on ensuring the stability of the system, such as the indoor temperature. But the control strategy for stabilizing the indoor temperature is not effective in improving the comfort of people. And due to the complexity and uncertainty of the thermal models of the air conditioning system and the room, conventional model-based control methods often fail to achieve satisfactory results in practical applications.

In addition, an air conditioner control method using a PMV (Predicted Mean volume) model as a control target exists, and although the PMV model contains more parameters influencing the thermal comfort of a human body compared with the simple temperature control, the PMV model has greater comfort and energy-saving potential. However, the input variables (such as human body metabolic rate and clothing thermal resistance) required by the PMV model are expensive to obtain and difficult to obtain in the actual building use process. Second, when applied to individuals, all showed poor prediction performance. This is because these models assume that the thermal sensations of the persons in the room are static and do not differ from one another, but that the thermal comfort varies greatly from person to person, and therefore their accuracy is reduced when predicting the thermal comfort response of an individual.

Accordingly, it is desirable to provide an improved intelligent air conditioning solution.

Disclosure of Invention

The embodiment of the application provides intelligent air conditioner control equipment and a control method by utilizing reinforcement learning and thermal imaging, which can adjust the temperature of an intelligent air conditioner based on reinforcement learning by using a thermal imaging image, thereby improving the comfort level of a human body.

According to an aspect of the present application, there is provided an intelligent air conditioner control apparatus using reinforcement learning and thermal imaging, including: the data acquisition unit is used for acquiring a current thermal imaging image and an indoor temperature value of the human body; the preprocessing unit is used for preprocessing the thermal imaging image and the indoor temperature value to obtain a preprocessing state tensor; the database establishing unit is used for determining the correlation between the indoor temperature value, the gray value corresponding to the thermal imaging image and the comfort level of the human body; and the reinforcement learning unit is used for taking the intelligent air conditioner as an intelligent agent, taking the regulated temperature value as the action of the intelligent agent, training the intelligent agent through reinforcement learning, and taking the current thermal imaging image of the human body and the indoor temperature value as input to output the control target value of the indoor temperature.

In the above-described intelligent air conditioning control apparatus using reinforcement learning and thermal imaging, the pre-state processing unit includes: an image extraction subunit for extracting a plurality of region partial images of a predetermined region of a human body, respectively; the temperature value processing subunit is used for processing the indoor temperature value into a temperature data matrix with the same dimensionality as the area partial image; and the dimension combining subunit is used for combining the plurality of area partial images and the temperature data matrix in a channel dimension to obtain the preprocessed state tensor.

In the above-mentioned intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging, the reward function r of the reinforcement learning unit _t Comprises the following steps:

，

where Y is the mean gray value.

In the above intelligent air conditioner control device using reinforcement learning and thermal imaging, the reinforcement learning unit employs a DQN algorithm as a reinforcement control algorithm of the reinforcement learning unit, and an update expression of a state-reward action value function of the DQN algorithm is represented as:

；

wherein

Indicating that the state-action value function is maximized after adjusting the action value a), -based on the state-action value function, or>

Is a discount factor->

Is the learning rate and r is the reward function.

In the above-mentioned intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging, the agent control part of the reinforcement learning unit using DQN algorithm includes: the playback memory unit is used for storing the interaction state, action and reward of the intelligent agent and the environment; a current value network for outputting a parameterized action value function; a target value network for deriving a tag value based on parameters of the current value network; and an error function for training the reinforcement learning unit based thereon.

In the above-mentioned utilization enhancementIn the intelligent air conditioner control device for learning and thermal imaging, the playback memory unit stores the transition state(s) in each time step t → t +1 _t , a _t , r _t , s _t+1 ) To the replay memory.

In the above intelligent air-conditioning control device using reinforcement learning and thermal imaging, the current value network is trained by using RMSProp (Root Mean Square Prop: root Mean Square transfer) method, and the neural network parameter updating process is represented as:

；

；

；

；

where w is the number of samples, y is the corresponding sample target, g is the gradient, r is the cumulative gradient, ρ is a hyperparameter representing the rate of decay of the cumulative gradient,

for the dot product operation, η is the overall learning rate, and δ is a constant.

In the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the application, in the DQN algorithm, parameters are adjusted

Get the action value function>

The value of Bellman equation is represented by

It is estimated that, where i =1, 2, \ 8230, is the current iteration number,/>

is the parameter of the previous iteration.

In the intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application, the target value network

Generating a target value, wherein for any i =1,2, \8230, the target value is network +>

Are updated periodically and the parameters are fixed during the update interval.

In the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the present application, for the ith iteration, the loss function is defined as:

。

in the above intelligent air conditioning control apparatus using reinforcement learning and thermal imaging, the agent selects an action in a state using the current value network and an epsilon greedy method, including:

the probability of the agent randomly selecting the action is epsilon, 0< epsilon <1;

the agent selecting an action

The probability of (a) is 1-epsilon;

where ε is calculated as:

，

wherein epsilon _start For randomly selecting an initial value of the probability of an action, epsilon, at the beginning of the training _end Is the final value of ε, the parameter t _total Representing the total number of time steps that have passed through the training process. For example, ε may be set _start Is 0.99,. Epsilon _end Is 0.1,t _total Is 800.

In the above-described intelligent air conditioning control apparatus using reinforcement learning and thermal imaging, the current value network includes three convolutional layers, the input is a preprocessing state tensor of 50 × 50 × 5, the first convolutional layer is set to include 32 convolutional kernels of 10 × 10, the step size is 2, the second convolutional layer is set to include 64 convolutional kernels of 5 × 5, the step size is 2, the third convolutional layer is set to include 64 convolutional kernels of 3 × 3, the step size is 1, and all the convolutional layers use a rectifying linear unit as an activation function.

According to another aspect of the present application, there is provided an intelligent air conditioner control method using reinforcement learning and thermal imaging, including: acquiring a thermal imaging image by using a thermal imaging camera; preprocessing the thermal imaging graph; obtaining an initial state based on the indoor temperature value and the preprocessed thermal imaging graph; and training a reinforcement learning algorithm.

The intelligent air conditioner control device and the control method utilizing reinforcement learning and thermal imaging provided by the embodiment of the application can adjust the temperature of the intelligent air conditioner based on reinforcement learning by using the thermal imaging image, so that the comfort level of a human body is improved.

Drawings

Various other advantages and benefits of the present application will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. It is obvious that the drawings described below are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. Also, like parts are designated by like reference numerals throughout the drawings.

Fig. 1 shows a schematic block diagram of an intelligent air conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application.

Fig. 2 shows an updated schematic diagram of the DQN algorithm employed in the intelligent air conditioner control device using reinforcement learning and thermal imaging according to an embodiment of the present application.

Fig. 3 shows a schematic flowchart of an intelligent air conditioner control method using reinforcement learning and thermal imaging according to an embodiment of the present application.

Fig. 4 is a schematic diagram illustrating an intelligent air conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

As shown in fig. 1, an intelligent air conditioner control apparatus 100 using reinforcement learning and thermal imaging according to an embodiment of the present application includes the following units.

A data obtaining unit 110, configured to obtain a current thermal imaging image of the human body and an indoor temperature value. Here, the thermal imaging image acquisition unit 110 is used for data acquisition of a thermal imaging image of a human body, and may be a thermal imager, for example. Moreover, because the thermal comfort of people has a great relationship with the skin temperature, the thermal imaging image of the human body can effectively capture the thermal information of the relevant body part and directly reflect the comfort degree of people.

A preprocessing unit 120, configured to preprocess the thermal imaging image and the indoor temperature value to obtain a preprocessing state tensor. Specifically, since different colors on the thermal imaging image represent different skin temperatures of the human body under test, the thermal imaging image collected by the thermal imager is the entire image area including the face, neck and palm interior of the human body. Previous researches show that the skin temperature of four areas of the forehead, the cheeks on two sides and the center of the hand can better sense the human body temperature and better improve the comfort level. Therefore, in the embodiment of the present application, images of four region portions of the forehead, the cheeks, and the center of one hand are performedExtracted separately, adjusted to a three-dimensional matrix of, for example, 50 × 50 × 3 in size. Where the parameter 50 × 50 is the spatial dimension, i.e. the pixel size of the image, and the parameter 3 indicates that the image has three RGB channels, where RGB is the color model of the image under red (R), green (G), and blue (B) light. Each point is divided into 256 levels from 0 to 255, with 0 representing the darkest and 255 the brightest. Then, the image is converted into a gray image, an array with the size of 50 × 50 × 1 is formed, and four parts are overlapped to form a tensor with the size of 50 × 50 × 4, for example, which is marked as T _g . The RGB image is usually converted into a grayscale image by the following methods:

Y=0.299R+0.578G+0.114B

wherein, Y is the gray value of the (M, N) th point in the image extracted independently, and R, G and B are the RGB three-component values of the (M, N) th point respectively. And respectively randomly selecting 50 points from the extracted four parts of human body area images to carry out gray value calculation, and finally carrying out average calculation to obtain gray value pixels of the specific human body thermal imaging image. Therefore, in the embodiment of the present application, different skin temperatures are sensed by comparing pixels, and then whether the skin temperature is comfortable or not is determined.

The acquired room temperature values then need to be expanded to the same size as the extracted thermographic partial image and then combined to form the overall pre-processing state. Specifically, first, the temperature of the gas flow is, for example, denoted as t _air Extending to the range of gray value 0-255 based on the following formula to obtain t _expand ：

，

Wherein t is _airup And t _airlow The highest and lowest temperature values representing the room temperature.

The temperature values are then expanded into a matrix T by _e ：

T _e =I _50×50 ×t _expand

Wherein I _50×50 Is a 50 x 50 matrix composed of constants 1, whose dimensions can be seen to be the same as the grayscale portion of the thermal imaging map. The matrix is then generalized to dimensions 50 x 5Tensor of 0 x 1.

Finally, T is merged in the channel dimension _g And T _e A complete preprocessed state matrix is obtained, for example a 50 x 5 matrix.

Therefore, in the intelligent air conditioner control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application, the pre-state processing unit includes: an image extraction subunit for extracting a plurality of region partial images of a predetermined region of a human body, respectively; the temperature value processing subunit is used for processing the indoor temperature value into a temperature data matrix with the same dimensionality as the area partial image; and the dimension combining subunit is used for combining the plurality of area partial images and the temperature data matrix in a channel dimension to obtain the preprocessed state tensor.

The database establishing unit 130 is configured to determine an association relationship between the indoor temperature value, the gray value corresponding to the thermal imaging image, and the comfort level of the human body.

Specifically, in order to simulate the variation of human skin temperature, i.e. comfort, with indoor temperature, in the embodiment of the present application, a temperature-gray value-comfort database is established to simulate human performance at different temperatures. In particular, the details of the temperature-grey value-comfort database may be as shown in table 1 below:

table 1: temperature-gray value-comfort level database

Indoor temperature	Gray value Y	Comfort level
			18/30	60-100	±3
20/28	130-170	±2
			22/26	170-200	±1
24	200-220	0

Here, when the temperature is too high or too low, the skin temperature (mapped to the gradation value) of a person also changes, and the person feels uncomfortable. When the temperature is appropriate, the skin temperature (gray value) is in an appropriate state. In the present example, the room temperature was set to 7 values, limited to between 18 ℃ and 30 ℃. Comfort was set to-3 to 3, where 0 is indicated as most comfortable and spreading to both sides indicates cooler or warmer, i.e., less comfortable. The applicant of the present application has found experimentally that most people are most comfortable when the room temperature is 24 ℃.

And the reinforcement learning unit 140 is configured to train the intelligent agent by using the intelligent air conditioner as the intelligent agent and using the adjusted temperature value as the motion of the intelligent agent, and output a control target value of the indoor temperature by using the current thermal imaging image of the human body and the indoor temperature value as inputs.

That is, the reinforcement learning unit 140 includes two parts, i.e., a training part and an inference part, and is configured to train the intelligent air conditioner as an intelligent agent for reinforcement learning, and to output a control target value of the indoor temperature by training the intelligent agent, when the current thermal imaging image of the human body and the indoor temperature value are input, so as to automatically adjust the indoor temperature. Wherein the content of the first and second substances,the reinforcement learning unit 140 defines the feedback obtained by the intelligent agent as a reward, and is used for judging whether the temperature increment and the comfort level change obtained by the intelligent agent in the current environment state are good or bad, and the intelligent agent obtains a feedback reward at each time step t until the intelligent agent is in a termination state, so as to guide the intelligent agent to achieve the target adjustment of the comfort level of the human body. Wherein the reward function r _t It is important for the agent because it tells the agent what the learning objective is. Here, considering that the object of the present application is to satisfy the human body comfort control by temperature adjustment, the reward function is designed as:

；

therefore, in the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the present application, the reward function r of the reinforcement learning unit _t Comprises the following steps:

，

where Y is the mean gray value.

The reward function is in the form of a formalized, numerical table of agent control objectives. When the action selected by the intelligent agent causes the human body comfort level to deviate from the set value, the environment makes direct feedback (punishment) on the action of the intelligent agent, the punishment reflects the real-time interaction condition of the intelligent agent and the room environment, the intelligent agent is convenient to timely adjust the action value selection according to the punishment given by the reward function, the convergence of a reinforcement learning algorithm is promoted, and the demand control effect is more quickly and effectively achieved.

Specifically, in the embodiment of the present application, a DQN (Deep Q-Learning Network) algorithm may be used as the reinforcement control algorithm of the reinforcement Learning unit, where the DQN algorithm may approximate a parameterized action value function using a Deep neural Network

Referred to as a Q network. Where θ is the dimensionAnd (4) randomly distributing an initial value by using a limited parameter vector, and updating by using a target value network in the introduction of a DQN algorithm. And fitting the Q value by adopting a deep neural network, and introducing parameters of the neural network. The updated expression of the state-reward action value function of DQN becomes:

，

wherein

Is a discount factor->

Is the learning rate and r is the reward function.

Fig. 2 shows a block diagram of the updated DQN algorithm, where fig. 2 shows a diagram of the updated DQN algorithm adopted in the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the present application. The system comprises an environment, a playback memory unit, a DQN error function, a current value network and a target value network. In the DQN algorithm according to an embodiment of the present application, a replay memory unit, a current value network, a target value network, and a DQN error function constitute an agent control portion in the reinforcement learning unit.

Specifically, the playback memory unit stores the state, action and reward of interaction between the intelligent agent and the environment. The data samples in the playback memory unit can overcome the correlation among data by using a random extraction method in the DQN, enhance the learning randomness, improve the utilization rate of each sample and prevent local optimization of network updating.

For example, the playback memory unit may use an experience replay mechanism, that is, store the transition state(s) in each time step, i.e., t → t +1 _t , a _t , r _t , s _t+1 ) To a replay memory D, so that there is a replay memoryThe experience in the reservoir D is always new.

The current value network, i.e. the Q network as described above, is used to output a parameterized action value function. In the embodiment of the present application, a RMSProp (Root Mean Square Prop: root Mean Square transfer) method may be adopted to train the Q network, and the neural network parameter updating process of the method is as follows:

（1）

（2）

（3）

（4）

where w is the number of samples, y is the corresponding sample object, i.e. the label value, g is the gradient, r is the cumulative gradient, p is a hyperparameter representing the decay rate of the cumulative gradient,

for the dot product operation, η is the overall learning rate and δ is a constant to avoid division by zero.

In addition, a loss function and the target value network need to be introduced. In the DQN algorithm, by adjusting parameters

Can obtain the action value function

The value of Bellman equation is represented by

It is estimated that, where i =1,2, \8230, for the current iteration number,

is the parameter of the previous iteration. The target value is formed by the target value network

Generation, structure and

the same is true. For any i =1,2, \ 8230;, the target value network

Will be updated periodically and the parameters are fixed during the update interval. Thus, for the ith iteration, the loss function is defined as:

。

in addition, the method for the intelligent agent to explore the environment can be further increased. In particular, the agent selects actions that are in states using a Q-network and an epsilon greedy approach. This means that the probability ε,0 of an agent selecting an action at random<ε<1, selection action

Has a probability of 1-epsilon. ε may be calculated as:

，

In addition, in the embodiment of the present application, the Q network uses a convolutional neural network whose inputs are preprocessed states including a thermal imaging map and an indoor temperature and outputs action values as set effective actions, for example, a total of three types of operations, that is, raising, maintaining, and lowering a temperature set point of an air conditioner may be set, and the magnitude of the increase/decrease amount may be set to 2 ℃, for example. Specifically, the convolutional neural network may be composed of three convolutional layers, the input is a preprocessing state tensor of 50 × 50 × 5, the first layer is set to contain 32 convolutional kernels of 10 × 10 with a step size of 2, the second layer is set to contain 64 convolutional kernels of 5 × 5 with a step size of 2, the third layer is set to contain 64 convolutional kernels of 3 × 3 with a step size of 1, all convolutional layers use a rectifying linear unit (ReLU) as an activation function. When the state tensor passes through the convolutional layer, the spatial dimension of the preprocessed state tensor is reduced and the channel dimension is increased. The output three-dimensional tensor of the convolutional layer will be flattened into a vector before passing through the linear hidden layer. The last hidden layer is a fully connected linear layer consisting of 512 neurons. The output layer is a fully connected layer, and comprises 3 neurons, namely the number of effective actions. Thus, the convolutional neural network estimates the current optimal action based on the observed environmental conditions.

Therefore, in the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the present application, the reinforcement learning unit employs a DQN algorithm as a reinforcement control algorithm of the reinforcement learning unit, and an updated expression of a state-reward action value function of the DQN algorithm is expressed as:

，

wherein

Means for maximizing a state-action value function after adjusting an action value a, and>

is the discount factor that is to be used,

is the learning rate and r is the reward function.

In the intelligent air conditioner control device using reinforcement learning and thermal imaging according to an embodiment of the present application, the agent control part of the reinforcement learning unit employing the DQN algorithm includes:

the playback memory unit is used for storing the interaction state, action and reward of the intelligent agent and the environment;

a current value network for outputting a parameterized action value function;

a target value network for outputting a target value as a tag value; and

an error function for training the reinforcement learning unit based thereon.

In the intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to the embodiment of the present application, the playback memory unit stores the transition state(s) in each time step t → t +1 _t , a _t , r _t , s _t+1 ) To the replay memory.

In the intelligent air-conditioning control device using reinforcement learning and thermal imaging according to the embodiment of the present application, the current value network is trained by RMSProp (Root Mean Square Prop: root Mean Square transfer), and the neural network parameter updating process is represented as follows:

；

；

；

；

In the intelligent air-conditioning control device utilizing reinforcement learning and thermal imaging according to the embodiment of the application, the air-conditioning control device is arranged in the air-conditioning control deviceIn the DQN algorithm, parameters are adjusted

Obtaining a function of action values

The value of Bellman equation is represented by

It is estimated that, where i =1, 2, \ 8230, is the current iteration number,

is the parameter of the previous iteration.

Generating a target value, wherein, for any i =1, 2, 8230the target value network>

Periodically updated and the parameters are fixed during the update interval.

In the intelligent air conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application, for the ith iteration, the loss function is defined as:

。

in the intelligent air conditioning control device utilizing reinforcement learning and thermal imaging according to the embodiment of the application, the agent selects the action in the state by using the current value network and an epsilon greedy method, and the action comprises the following steps:

the agent selecting an action

The probability of (a) is 1-epsilon;

where ε is calculated as:

，

In the intelligent air-conditioning control device using reinforcement learning and thermal imaging according to the embodiment of the present application, the current value network includes three convolutional layers, the input is a preprocessing state tensor of 50 × 50 × 5, the first convolutional layer is set to include 32 convolutional kernels of 10 × 10, the step size is 2, the second convolutional layer is set to include 64 convolutional kernels of 5 × 5, the step size is 2, the third convolutional layer is set to include 64 convolutional kernels of 3 × 3, the step size is 1, and all the convolutional layers use a rectifying linear unit as an activation function.

As shown in fig. 3, the intelligent air conditioner control method using reinforcement learning and thermal imaging according to an embodiment of the present application includes the following steps.

In step S210, a thermal imaging graph is acquired using a thermal imager.

Step S220, preprocessing the thermal imaging map, specifically including: cutting out images of four areas of the forehead, two cheeks and the center of one hand from the thermal imaging image, and adjusting the size to 50 multiplied by 3; converting the extracted image into a gray image to form an array with the size of 50 multiplied by 1; combining 4 partial arrays to form a 50 x 4 tensor T _g (ii) a According to Y =0.299R +0.578G +0.114B, 50 points are randomly selected on an image to carry out gray value calculation, and a specific gray value pixel is obtained through averaging calculation.

Step S230, combining the preprocessed thermal imaging graph to obtain an initial temperature value based on the indoor temperature valueThe initial state specifically comprises: obtaining a temperature matrix T based on _e ：

；

Will tensor T _g And a temperature matrix T _e The 50 × 50 × 5 tensors are combined to obtain the initial state.

Step S240, training the reinforcement learning algorithm, including the following steps:

step S240-1: initializing Q network with random weight and weight of

Let i =1, initialize state s _t And a total time step t _total =1, initialize parameters, calculate ε, ε _start 、ε _end And t _decay ；

Step S240-2: initializing the initial size and the total size of a replay memory D, the number E of training sets, an updating interval K and the maximum time step number of each set T, and initializing the replay memory D;

step S240-3: preprocessing the state to obtain a preprocessed state;

step S240-4: defining H epicode;

step S240-5: according to the formula:

，

calculating epsilon, selecting a random action a by probability _t Otherwise, select

；

Step S240-6-execution of action a _t And observe the next state s _t+1 To obtain a reward r _t ；

Step S240-7: to s _t+1 Performing pretreatment to obtain

；

Step S240-8: sampling a small batch containing D to w transitions;

step S240-9: for each transition

If->

If the status is a terminal pre-processing status, setting >>

Otherwise, set the value>

；

Step S240-10: training the Q network using RMSprop;

step S240-11: order to

；

Step S240-12: copying parameters of the Q network to a target network every K steps;

step S240-13: repeating steps S230-S240-7 until S _t Is in a termination state;

step S240-14: returning to the deep Q network.

Here, it can be understood by those skilled in the art that other details of the intelligent air conditioner control method using reinforcement learning and thermal imaging according to the embodiment of the present application are the same as those of the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the present application described previously, and are not described again here to avoid redundancy.

In summary, the control device and the control method using reinforcement learning and thermal imaging according to the embodiment of the present application abandon the idea of ensuring temperature stability and indiscriminate comfort prediction in the previous research, and adjust the temperature of the air conditioning system according to the skin temperature, thereby improving the comfort of the human body. In the embodiment of the application, the thermal imaging image can display the surface temperature of the human body, the air conditioning system can be regarded as an intelligent agent through an air conditioning temperature adjusting scheme based on reinforcement learning and taking the human thermal imaging image as input, the collected thermal imaging image of the people and the initial temperature in a room are used as input, the indoor temperature is adjusted through self-learning and learning of a proper control strategy, and then the skin temperature is adjusted, so that the comfort level of people is improved. And the simulation result shows that the intelligent air conditioner control equipment and the control method utilizing reinforcement learning and thermal imaging can effectively improve the comfort level of people according to the embodiment of the application by combining with EnergyPlus building energy simulation software to carry out real environment simulation and temperature change combined control.

Fig. 4 is a schematic diagram illustrating an intelligent air conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application. As shown in fig. 4, a thermal imaging camera is used to shoot a human target to generate a thermal imaging image, a temperature sensor is used to collect indoor temperature, the current thermal imaging image of a human body and an indoor temperature value are preprocessed to be used as input, and the indoor temperature is output through a convolution layer and a full connection layer to control an air conditioning system so as to achieve the purpose of automatically adjusting the indoor temperature.

EnergyPlus is widely used open source building and HVAC simulation software at present, but the EnergyPlus software has certain limitation in the development and optimization of HVAC system control algorithms, and it is very difficult to directly apply some advanced control algorithms in built-in software. Aiming at the problem, the invention establishes a collaborative simulation test bed combining the building HVAC system and the HVAC system control module, and realizes the dynamic data transmission and interaction between the two modules. Python is an open source programming software, and is convenient for realizing DRL control algorithm based on a neural network. Functional Model Interface (FMI) is a standard that provides a uniform model interface for model exchange and co-simulation between multiple modeling and simulation software. Therefore, the invention uses an EnergyPlus-Python collaborative simulation mode with FMI standard for controlling the air conditioning system. The building HVAC system is modeled by EnergyPlus and the DQN-based HVAC system control module is implemented by Python.

The energy plus based building HVAC module uses an external interface, the functional model unit derives: to: scheduling objects of the external interface group are packaged into a Functional Model Unit (FMU) model with FMI standards. The original file is modified using the DQN controller to replace the built-in controller. PyFMI is a Python package based on FMI library, and supports FMU model loading and interaction for model exchange and co-simulation. In the interactive process, firstly, an EnergyPlus simulation model is packaged into an FMU model based on EnergyPlusToFMU; then, the air conditioner intelligent agent sends the action to the packaged FMU model, and the FMU model executes one-step model simulation; and finally, returning the simulation state and the obtained reward to the intelligent agent of the air conditioning system, and repeating the whole process until the simulation is finished. Specifically, at the beginning of each round, the indoor temperature is initialized to a supercooling or overheating value, and according to the current state, the air conditioner can send out one of three actions of increasing the temperature set value, keeping the temperature set value unchanged and decreasing the temperature set value, which is actually represented by changing the temperature set value by 2 ℃, and when the most comfortable state is reached, the iteration is finished.

It can be seen that the embodiment of the application realizes innovation of an air conditioner control system algorithm, and the air conditioner temperature setting point is controlled by collecting human thermal imaging pictures and indoor temperature, so that the room temperature is adaptively in the most comfortable state of the human body, and manual setting by people is not needed. In addition, the defect of a static model represented by PMV and the like is overcome, the PMV model assumes that the thermal senses of indoor people are static and are not different from each other, but actually the thermal comfort degree difference between people is large, so when the thermal comfort degree response of the individual is predicted, the accuracy is greatly reduced.

The basic principles of the present application have been described above with reference to specific embodiments, but it should be noted that advantages, effects, etc. mentioned in the present application are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. As used herein, the words "or" and "refer to, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, each component or step can be decomposed and/or re-combined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. An intelligent air conditioner control apparatus using reinforcement learning and thermal imaging, comprising:

the data acquisition unit is used for acquiring a current thermal imaging image and an indoor temperature value of the human body;

the preprocessing unit is used for preprocessing the thermal imaging image and the indoor temperature value to obtain a preprocessing state tensor;

the database establishing unit is used for determining the correlation between the indoor temperature value, the gray value corresponding to the thermal imaging image and the comfort level of the human body;

and the reinforcement learning unit is used for taking the intelligent air conditioner as an intelligent agent, taking the regulated temperature value as the action of the intelligent agent, training the intelligent agent through reinforcement learning, and taking the current thermal imaging image of the human body and the indoor temperature value as input to output the control target value of the indoor temperature.

2. The intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to claim 1, wherein the pre-state processing unit comprises:

an image extraction subunit for extracting a plurality of region partial images of a predetermined region of a human body, respectively;

the temperature value processing subunit is used for processing the indoor temperature value into a temperature data matrix with the same dimensionality as the area partial image; and the number of the first and second groups,

and the dimension combining subunit is used for combining the plurality of region partial images and the temperature data matrix in a channel dimension to obtain the preprocessed state tensor.

3. The intelligent air-conditioning control device using reinforcement learning and thermal imaging according to claim 1, wherein the reward function r of the reinforcement learning unit _t Comprises the following steps:

，

where Y is the mean gray value.

4. An intelligent air-conditioning control device utilizing reinforcement learning and thermal imaging according to claim 1, wherein the reinforcement learning unit adopts DQN algorithm as the reinforcement control algorithm of the reinforcement learning unit, and the updated expression of the state-reward action value function of the DQN algorithm is expressed as:

；

wherein

is a discount factor->

Is the learning rate and r is the reward function.

5. The intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to claim 1, wherein the agent control part of the reinforcement learning unit adopting DQN algorithm comprises:

a current value network for outputting a parameterized action value function;

a target value network for obtaining a tag value based on the parameters of the current value network; and

an error function for training the reinforcement learning unit based thereon.

6. The intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to claim 5, wherein the playback memory unit stores the transition state(s) in each time step t → t +1 _t , a _t , r _t , s _t+1 ) To the replay memory.

7. The intelligent air-conditioning control device utilizing reinforcement learning and thermal imaging as claimed in claim 5, wherein the current value network is trained by using RMSProp root mean square transfer method, and the neural network parameter updating process is expressed as:

；

；

；

；

8. The intelligent air-conditioning control device utilizing reinforcement learning and thermal imaging according to claim 5, wherein in the DQN algorithm, parameters are adjusted

Get the action value function>

The value of Bellman equation is represented by

Estimate, where i =12' \ 82305 indicates the current number of iterations>

Is the parameter of the previous iteration.

9. The intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to claim 8, wherein the target value network

10. The intelligent climate control device utilizing reinforcement learning and thermal imaging according to claim 8, wherein for the ith iteration, the loss function is defined as:

。

11. the intelligent climate control device utilizing reinforcement learning and thermal imaging of claim 5, wherein the agent selects the action in the state using the current value network and an epsilon greedy approach, comprising:

the agent selecting an action

The probability of (a) is 1-epsilon;

where ε is calculated as:

，

wherein epsilon _start For an initial value, ε, of the probability of randomly selecting an action at the beginning of training _end Is the final value of ε, the parameter t _total Representing the total number of time steps that have passed through the training process.

12. The intelligent air-conditioning control device utilizing reinforcement learning and thermal imaging according to claim 5, wherein the current value network comprises three convolutional layers, the input is a preprocessing state tensor of 50 x 5, a first convolutional layer is set to contain 32 convolutional kernels of 10 x 10, a convolutional layer with a step size of 2, a second convolutional layer is set to contain 64 convolutional kernels of 5 x 5, a convolutional layer with a step size of 2, a third convolutional layer is set to contain 64 convolutional kernels of 3 x 3, a convolutional layer with a step size of 1, and all convolutional layers use a rectifying linear unit as an activation function.

13. An intelligent air conditioner control method utilizing reinforcement learning and thermal imaging is characterized by comprising the following steps:

acquiring a thermal imaging image by using a thermal imaging camera;

preprocessing the thermal imaging graph;

obtaining an initial state based on the indoor temperature value and the preprocessed thermal imaging graph; and

and training a reinforcement learning algorithm.