CN115930384A - Intelligent air conditioner control device and control method using reinforcement learning and thermal imaging - Google Patents

Intelligent air conditioner control device and control method using reinforcement learning and thermal imaging Download PDF

Info

Publication number
CN115930384A
CN115930384A CN202310231267.0A CN202310231267A CN115930384A CN 115930384 A CN115930384 A CN 115930384A CN 202310231267 A CN202310231267 A CN 202310231267A CN 115930384 A CN115930384 A CN 115930384A
Authority
CN
China
Prior art keywords
thermal imaging
reinforcement learning
value
intelligent air
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310231267.0A
Other languages
Chinese (zh)
Other versions
CN115930384B (en
Inventor
崔璨
薛佳慧
刘运涛
李春晓
黎明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202310231267.0A priority Critical patent/CN115930384B/en
Publication of CN115930384A publication Critical patent/CN115930384A/en
Application granted granted Critical
Publication of CN115930384B publication Critical patent/CN115930384B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B30/00Energy efficient heating, ventilation or air conditioning [HVAC]
    • Y02B30/70Efficient control or regulation technologies, e.g. for control of refrigerant flow, motor or heating

Landscapes

  • Air Conditioning Control Device (AREA)

Abstract

The application relates to intelligent air conditioner control equipment and a control method by means of reinforcement learning and thermal imaging, and belongs to the technical field of control. The apparatus comprises: the system comprises a data acquisition unit, a pre-state processing unit, a database establishing unit and a reinforcement learning unit, wherein the data acquisition unit takes the current human body thermal imaging image and the indoor temperature value as input, and trains the air conditioning intelligent body by using a reinforcement learning algorithm so as to complete the regulation and control of the indoor temperature and improve the comfort level of the human body. The temperature is adjusted according to the human body thermal imaging image, and the technical problems that in the prior art, the stability of indoor temperature is only considered, the human body comfort level cannot be effectively improved, the acquisition cost of input variables needed by models such as PMV (pulse-modulated visual velocimetry) and the like is high, and the poor prediction performance is displayed when the model is applied to individuals are solved.

Description

Intelligent air conditioner control device and control method using reinforcement learning and thermal imaging
Technical Field
The present invention relates to the field of control technology, and more particularly, to an intelligent air conditioner control apparatus and control method using reinforcement learning and thermal imaging.
Background
Air conditioners are the most common tools for regulating indoor temperature and are always the key factors affecting the comfort of people. The skin surface is the most direct place for exchanging the human body with the environment, and a plurality of influencing factors can influence the surface temperature of the skin surface, so that the comfort level state of the human body is influenced. Therefore, the skin temperature is one of the parameters that most directly characterize the comfort state of the human body. In previous studies, it has been demonstrated that an optimal skin temperature range is comfortable, while too high or too low a skin temperature can cause discomfort. Therefore, the control of the indoor temperature enables the skin surface temperature to reach a certain range, improves the comfort of people, and has important significance in improving the work and learning efficiency of people.
At present, there are many control methods for air conditioning systems, such as classical proportional-integral-derivative control, linear quadratic regulator control, fuzzy control, model predictive control. Most of the previous control methods focus on ensuring the stability of the system, such as the indoor temperature. But the control strategy for stabilizing the indoor temperature is not effective in improving the comfort of people. And due to the complexity and uncertainty of the thermal models of the air conditioning system and the room, conventional model-based control methods often fail to achieve satisfactory results in practical applications.
In addition, an air conditioner control method using a PMV (Predicted Mean volume) model as a control target exists, and although the PMV model contains more parameters influencing the thermal comfort of a human body compared with the simple temperature control, the PMV model has greater comfort and energy-saving potential. However, the input variables (such as human body metabolic rate and clothing thermal resistance) required by the PMV model are expensive to obtain and difficult to obtain in the actual building use process. Second, when applied to individuals, all showed poor prediction performance. This is because these models assume that the thermal sensations of the persons in the room are static and do not differ from one another, but that the thermal comfort varies greatly from person to person, and therefore their accuracy is reduced when predicting the thermal comfort response of an individual.
Accordingly, it is desirable to provide an improved intelligent air conditioning solution.
Disclosure of Invention
The embodiment of the application provides intelligent air conditioner control equipment and a control method by utilizing reinforcement learning and thermal imaging, which can adjust the temperature of an intelligent air conditioner based on reinforcement learning by using a thermal imaging image, thereby improving the comfort level of a human body.
According to an aspect of the present application, there is provided an intelligent air conditioner control apparatus using reinforcement learning and thermal imaging, including: the data acquisition unit is used for acquiring a current thermal imaging image and an indoor temperature value of the human body; the preprocessing unit is used for preprocessing the thermal imaging image and the indoor temperature value to obtain a preprocessing state tensor; the database establishing unit is used for determining the correlation between the indoor temperature value, the gray value corresponding to the thermal imaging image and the comfort level of the human body; and the reinforcement learning unit is used for taking the intelligent air conditioner as an intelligent agent, taking the regulated temperature value as the action of the intelligent agent, training the intelligent agent through reinforcement learning, and taking the current thermal imaging image of the human body and the indoor temperature value as input to output the control target value of the indoor temperature.
In the above-described intelligent air conditioning control apparatus using reinforcement learning and thermal imaging, the pre-state processing unit includes: an image extraction subunit for extracting a plurality of region partial images of a predetermined region of a human body, respectively; the temperature value processing subunit is used for processing the indoor temperature value into a temperature data matrix with the same dimensionality as the area partial image; and the dimension combining subunit is used for combining the plurality of area partial images and the temperature data matrix in a channel dimension to obtain the preprocessed state tensor.
In the above-mentioned intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging, the reward function r of the reinforcement learning unit t Comprises the following steps:
Figure SMS_1
where Y is the mean gray value.
In the above intelligent air conditioner control device using reinforcement learning and thermal imaging, the reinforcement learning unit employs a DQN algorithm as a reinforcement control algorithm of the reinforcement learning unit, and an update expression of a state-reward action value function of the DQN algorithm is represented as:
Figure SMS_2
wherein
Figure SMS_3
Indicating that the state-action value function is maximized after adjusting the action value a), -based on the state-action value function, or>
Figure SMS_4
Is a discount factor->
Figure SMS_5
Is the learning rate and r is the reward function.
In the above-mentioned intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging, the agent control part of the reinforcement learning unit using DQN algorithm includes: the playback memory unit is used for storing the interaction state, action and reward of the intelligent agent and the environment; a current value network for outputting a parameterized action value function; a target value network for deriving a tag value based on parameters of the current value network; and an error function for training the reinforcement learning unit based thereon.
In the above-mentioned utilization enhancementIn the intelligent air conditioner control device for learning and thermal imaging, the playback memory unit stores the transition state(s) in each time step t → t +1 t , a t , r t , s t+1 ) To the replay memory.
In the above intelligent air-conditioning control device using reinforcement learning and thermal imaging, the current value network is trained by using RMSProp (Root Mean Square Prop: root Mean Square transfer) method, and the neural network parameter updating process is represented as:
Figure SMS_6
Figure SMS_7
Figure SMS_8
Figure SMS_9
where w is the number of samples, y is the corresponding sample target, g is the gradient, r is the cumulative gradient, ρ is a hyperparameter representing the rate of decay of the cumulative gradient,
Figure SMS_10
for the dot product operation, η is the overall learning rate, and δ is a constant.
In the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the application, in the DQN algorithm, parameters are adjusted
Figure SMS_11
Get the action value function>
Figure SMS_12
The value of Bellman equation is represented by
Figure SMS_13
It is estimated that, where i =1, 2, \ 8230, is the current iteration number,/>
Figure SMS_14
is the parameter of the previous iteration.
In the intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application, the target value network
Figure SMS_15
Generating a target value, wherein for any i =1,2, \8230, the target value is network +>
Figure SMS_16
Are updated periodically and the parameters are fixed during the update interval.
In the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the present application, for the ith iteration, the loss function is defined as:
Figure SMS_17
in the above intelligent air conditioning control apparatus using reinforcement learning and thermal imaging, the agent selects an action in a state using the current value network and an epsilon greedy method, including:
the probability of the agent randomly selecting the action is epsilon, 0< epsilon <1;
the agent selecting an action
Figure SMS_18
The probability of (a) is 1-epsilon;
where ε is calculated as:
Figure SMS_19
wherein epsilon start For randomly selecting an initial value of the probability of an action, epsilon, at the beginning of the training end Is the final value of ε, the parameter t total Representing the total number of time steps that have passed through the training process. For example, ε may be set start Is 0.99,. Epsilon end Is 0.1,t total Is 800.
In the above-described intelligent air conditioning control apparatus using reinforcement learning and thermal imaging, the current value network includes three convolutional layers, the input is a preprocessing state tensor of 50 × 50 × 5, the first convolutional layer is set to include 32 convolutional kernels of 10 × 10, the step size is 2, the second convolutional layer is set to include 64 convolutional kernels of 5 × 5, the step size is 2, the third convolutional layer is set to include 64 convolutional kernels of 3 × 3, the step size is 1, and all the convolutional layers use a rectifying linear unit as an activation function.
According to another aspect of the present application, there is provided an intelligent air conditioner control method using reinforcement learning and thermal imaging, including: acquiring a thermal imaging image by using a thermal imaging camera; preprocessing the thermal imaging graph; obtaining an initial state based on the indoor temperature value and the preprocessed thermal imaging graph; and training a reinforcement learning algorithm.
The intelligent air conditioner control device and the control method utilizing reinforcement learning and thermal imaging provided by the embodiment of the application can adjust the temperature of the intelligent air conditioner based on reinforcement learning by using the thermal imaging image, so that the comfort level of a human body is improved.
Drawings
Various other advantages and benefits of the present application will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. It is obvious that the drawings described below are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. Also, like parts are designated by like reference numerals throughout the drawings.
Fig. 1 shows a schematic block diagram of an intelligent air conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application.
Fig. 2 shows an updated schematic diagram of the DQN algorithm employed in the intelligent air conditioner control device using reinforcement learning and thermal imaging according to an embodiment of the present application.
Fig. 3 shows a schematic flowchart of an intelligent air conditioner control method using reinforcement learning and thermal imaging according to an embodiment of the present application.
Fig. 4 is a schematic diagram illustrating an intelligent air conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.
Fig. 1 shows a schematic block diagram of an intelligent air conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application.
As shown in fig. 1, an intelligent air conditioner control apparatus 100 using reinforcement learning and thermal imaging according to an embodiment of the present application includes the following units.
A data obtaining unit 110, configured to obtain a current thermal imaging image of the human body and an indoor temperature value. Here, the thermal imaging image acquisition unit 110 is used for data acquisition of a thermal imaging image of a human body, and may be a thermal imager, for example. Moreover, because the thermal comfort of people has a great relationship with the skin temperature, the thermal imaging image of the human body can effectively capture the thermal information of the relevant body part and directly reflect the comfort degree of people.
A preprocessing unit 120, configured to preprocess the thermal imaging image and the indoor temperature value to obtain a preprocessing state tensor. Specifically, since different colors on the thermal imaging image represent different skin temperatures of the human body under test, the thermal imaging image collected by the thermal imager is the entire image area including the face, neck and palm interior of the human body. Previous researches show that the skin temperature of four areas of the forehead, the cheeks on two sides and the center of the hand can better sense the human body temperature and better improve the comfort level. Therefore, in the embodiment of the present application, images of four region portions of the forehead, the cheeks, and the center of one hand are performedExtracted separately, adjusted to a three-dimensional matrix of, for example, 50 × 50 × 3 in size. Where the parameter 50 × 50 is the spatial dimension, i.e. the pixel size of the image, and the parameter 3 indicates that the image has three RGB channels, where RGB is the color model of the image under red (R), green (G), and blue (B) light. Each point is divided into 256 levels from 0 to 255, with 0 representing the darkest and 255 the brightest. Then, the image is converted into a gray image, an array with the size of 50 × 50 × 1 is formed, and four parts are overlapped to form a tensor with the size of 50 × 50 × 4, for example, which is marked as T g . The RGB image is usually converted into a grayscale image by the following methods:
Y=0.299R+0.578G+0.114B
wherein, Y is the gray value of the (M, N) th point in the image extracted independently, and R, G and B are the RGB three-component values of the (M, N) th point respectively. And respectively randomly selecting 50 points from the extracted four parts of human body area images to carry out gray value calculation, and finally carrying out average calculation to obtain gray value pixels of the specific human body thermal imaging image. Therefore, in the embodiment of the present application, different skin temperatures are sensed by comparing pixels, and then whether the skin temperature is comfortable or not is determined.
The acquired room temperature values then need to be expanded to the same size as the extracted thermographic partial image and then combined to form the overall pre-processing state. Specifically, first, the temperature of the gas flow is, for example, denoted as t air Extending to the range of gray value 0-255 based on the following formula to obtain t expand
Figure SMS_20
Wherein t is airup And t airlow The highest and lowest temperature values representing the room temperature.
The temperature values are then expanded into a matrix T by e
T e =I 50×50 ×t expand
Wherein I 50×50 Is a 50 x 50 matrix composed of constants 1, whose dimensions can be seen to be the same as the grayscale portion of the thermal imaging map. The matrix is then generalized to dimensions 50 x 5Tensor of 0 x 1.
Finally, T is merged in the channel dimension g And T e A complete preprocessed state matrix is obtained, for example a 50 x 5 matrix.
Therefore, in the intelligent air conditioner control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application, the pre-state processing unit includes: an image extraction subunit for extracting a plurality of region partial images of a predetermined region of a human body, respectively; the temperature value processing subunit is used for processing the indoor temperature value into a temperature data matrix with the same dimensionality as the area partial image; and the dimension combining subunit is used for combining the plurality of area partial images and the temperature data matrix in a channel dimension to obtain the preprocessed state tensor.
The database establishing unit 130 is configured to determine an association relationship between the indoor temperature value, the gray value corresponding to the thermal imaging image, and the comfort level of the human body.
Specifically, in order to simulate the variation of human skin temperature, i.e. comfort, with indoor temperature, in the embodiment of the present application, a temperature-gray value-comfort database is established to simulate human performance at different temperatures. In particular, the details of the temperature-grey value-comfort database may be as shown in table 1 below:
table 1: temperature-gray value-comfort level database
Indoor temperature Gray value Y Comfort level
18/30 60-100 ±3
20/28 130-170 ±2
22/26 170-200 ±1
24 200-220 0
Here, when the temperature is too high or too low, the skin temperature (mapped to the gradation value) of a person also changes, and the person feels uncomfortable. When the temperature is appropriate, the skin temperature (gray value) is in an appropriate state. In the present example, the room temperature was set to 7 values, limited to between 18 ℃ and 30 ℃. Comfort was set to-3 to 3, where 0 is indicated as most comfortable and spreading to both sides indicates cooler or warmer, i.e., less comfortable. The applicant of the present application has found experimentally that most people are most comfortable when the room temperature is 24 ℃.
And the reinforcement learning unit 140 is configured to train the intelligent agent by using the intelligent air conditioner as the intelligent agent and using the adjusted temperature value as the motion of the intelligent agent, and output a control target value of the indoor temperature by using the current thermal imaging image of the human body and the indoor temperature value as inputs.
That is, the reinforcement learning unit 140 includes two parts, i.e., a training part and an inference part, and is configured to train the intelligent air conditioner as an intelligent agent for reinforcement learning, and to output a control target value of the indoor temperature by training the intelligent agent, when the current thermal imaging image of the human body and the indoor temperature value are input, so as to automatically adjust the indoor temperature. Wherein the content of the first and second substances,the reinforcement learning unit 140 defines the feedback obtained by the intelligent agent as a reward, and is used for judging whether the temperature increment and the comfort level change obtained by the intelligent agent in the current environment state are good or bad, and the intelligent agent obtains a feedback reward at each time step t until the intelligent agent is in a termination state, so as to guide the intelligent agent to achieve the target adjustment of the comfort level of the human body. Wherein the reward function r t It is important for the agent because it tells the agent what the learning objective is. Here, considering that the object of the present application is to satisfy the human body comfort control by temperature adjustment, the reward function is designed as:
Figure SMS_21
therefore, in the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the present application, the reward function r of the reinforcement learning unit t Comprises the following steps:
Figure SMS_22
where Y is the mean gray value.
The reward function is in the form of a formalized, numerical table of agent control objectives. When the action selected by the intelligent agent causes the human body comfort level to deviate from the set value, the environment makes direct feedback (punishment) on the action of the intelligent agent, the punishment reflects the real-time interaction condition of the intelligent agent and the room environment, the intelligent agent is convenient to timely adjust the action value selection according to the punishment given by the reward function, the convergence of a reinforcement learning algorithm is promoted, and the demand control effect is more quickly and effectively achieved.
Specifically, in the embodiment of the present application, a DQN (Deep Q-Learning Network) algorithm may be used as the reinforcement control algorithm of the reinforcement Learning unit, where the DQN algorithm may approximate a parameterized action value function using a Deep neural Network
Figure SMS_23
Referred to as a Q network. Where θ is the dimensionAnd (4) randomly distributing an initial value by using a limited parameter vector, and updating by using a target value network in the introduction of a DQN algorithm. And fitting the Q value by adopting a deep neural network, and introducing parameters of the neural network. The updated expression of the state-reward action value function of DQN becomes:
Figure SMS_24
wherein
Figure SMS_25
Indicating that the state-action value function is maximized after adjusting the action value a), -based on the state-action value function, or>
Figure SMS_26
Is a discount factor->
Figure SMS_27
Is the learning rate and r is the reward function.
Fig. 2 shows a block diagram of the updated DQN algorithm, where fig. 2 shows a diagram of the updated DQN algorithm adopted in the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the present application. The system comprises an environment, a playback memory unit, a DQN error function, a current value network and a target value network. In the DQN algorithm according to an embodiment of the present application, a replay memory unit, a current value network, a target value network, and a DQN error function constitute an agent control portion in the reinforcement learning unit.
Specifically, the playback memory unit stores the state, action and reward of interaction between the intelligent agent and the environment. The data samples in the playback memory unit can overcome the correlation among data by using a random extraction method in the DQN, enhance the learning randomness, improve the utilization rate of each sample and prevent local optimization of network updating.
For example, the playback memory unit may use an experience replay mechanism, that is, store the transition state(s) in each time step, i.e., t → t +1 t , a t , r t , s t+1 ) To a replay memory D, so that there is a replay memoryThe experience in the reservoir D is always new.
The current value network, i.e. the Q network as described above, is used to output a parameterized action value function. In the embodiment of the present application, a RMSProp (Root Mean Square Prop: root Mean Square transfer) method may be adopted to train the Q network, and the neural network parameter updating process of the method is as follows:
Figure SMS_28
(1)
Figure SMS_29
(2)
Figure SMS_30
(3)
Figure SMS_31
(4)
where w is the number of samples, y is the corresponding sample object, i.e. the label value, g is the gradient, r is the cumulative gradient, p is a hyperparameter representing the decay rate of the cumulative gradient,
Figure SMS_32
for the dot product operation, η is the overall learning rate and δ is a constant to avoid division by zero.
In addition, a loss function and the target value network need to be introduced. In the DQN algorithm, by adjusting parameters
Figure SMS_33
Can obtain the action value function
Figure SMS_34
The value of Bellman equation is represented by
Figure SMS_35
It is estimated that, where i =1,2, \8230, for the current iteration number,
Figure SMS_36
is the parameter of the previous iteration. The target value is formed by the target value network
Figure SMS_37
Generation, structure and
Figure SMS_38
the same is true. For any i =1,2, \ 8230;, the target value network
Figure SMS_39
Will be updated periodically and the parameters are fixed during the update interval. Thus, for the ith iteration, the loss function is defined as:
Figure SMS_40
in addition, the method for the intelligent agent to explore the environment can be further increased. In particular, the agent selects actions that are in states using a Q-network and an epsilon greedy approach. This means that the probability ε,0 of an agent selecting an action at random<ε<1, selection action
Figure SMS_41
Has a probability of 1-epsilon. ε may be calculated as:
Figure SMS_42
wherein epsilon start For randomly selecting an initial value of the probability of an action, epsilon, at the beginning of the training end Is the final value of ε, the parameter t total Representing the total number of time steps that have passed through the training process. For example, ε may be set start Is 0.99,. Epsilon end Is 0.1,t total Is 800.
In addition, in the embodiment of the present application, the Q network uses a convolutional neural network whose inputs are preprocessed states including a thermal imaging map and an indoor temperature and outputs action values as set effective actions, for example, a total of three types of operations, that is, raising, maintaining, and lowering a temperature set point of an air conditioner may be set, and the magnitude of the increase/decrease amount may be set to 2 ℃, for example. Specifically, the convolutional neural network may be composed of three convolutional layers, the input is a preprocessing state tensor of 50 × 50 × 5, the first layer is set to contain 32 convolutional kernels of 10 × 10 with a step size of 2, the second layer is set to contain 64 convolutional kernels of 5 × 5 with a step size of 2, the third layer is set to contain 64 convolutional kernels of 3 × 3 with a step size of 1, all convolutional layers use a rectifying linear unit (ReLU) as an activation function. When the state tensor passes through the convolutional layer, the spatial dimension of the preprocessed state tensor is reduced and the channel dimension is increased. The output three-dimensional tensor of the convolutional layer will be flattened into a vector before passing through the linear hidden layer. The last hidden layer is a fully connected linear layer consisting of 512 neurons. The output layer is a fully connected layer, and comprises 3 neurons, namely the number of effective actions. Thus, the convolutional neural network estimates the current optimal action based on the observed environmental conditions.
Therefore, in the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the present application, the reinforcement learning unit employs a DQN algorithm as a reinforcement control algorithm of the reinforcement learning unit, and an updated expression of a state-reward action value function of the DQN algorithm is expressed as:
Figure SMS_43
wherein
Figure SMS_44
Means for maximizing a state-action value function after adjusting an action value a, and>
Figure SMS_45
is the discount factor that is to be used,
Figure SMS_46
is the learning rate and r is the reward function.
In the intelligent air conditioner control device using reinforcement learning and thermal imaging according to an embodiment of the present application, the agent control part of the reinforcement learning unit employing the DQN algorithm includes:
the playback memory unit is used for storing the interaction state, action and reward of the intelligent agent and the environment;
a current value network for outputting a parameterized action value function;
a target value network for outputting a target value as a tag value; and
an error function for training the reinforcement learning unit based thereon.
In the intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to the embodiment of the present application, the playback memory unit stores the transition state(s) in each time step t → t +1 t , a t , r t , s t+1 ) To the replay memory.
In the intelligent air-conditioning control device using reinforcement learning and thermal imaging according to the embodiment of the present application, the current value network is trained by RMSProp (Root Mean Square Prop: root Mean Square transfer), and the neural network parameter updating process is represented as follows:
Figure SMS_47
Figure SMS_48
Figure SMS_49
Figure SMS_50
where w is the number of samples, y is the corresponding sample target, g is the gradient, r is the cumulative gradient, ρ is a hyperparameter representing the rate of decay of the cumulative gradient,
Figure SMS_51
for the dot product operation, η is the overall learning rate, and δ is a constant.
In the intelligent air-conditioning control device utilizing reinforcement learning and thermal imaging according to the embodiment of the application, the air-conditioning control device is arranged in the air-conditioning control deviceIn the DQN algorithm, parameters are adjusted
Figure SMS_52
Obtaining a function of action values
Figure SMS_53
The value of Bellman equation is represented by
Figure SMS_54
It is estimated that, where i =1, 2, \ 8230, is the current iteration number,
Figure SMS_55
is the parameter of the previous iteration.
In the intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application, the target value network
Figure SMS_56
Generating a target value, wherein, for any i =1, 2, 8230the target value network>
Figure SMS_57
Periodically updated and the parameters are fixed during the update interval.
In the intelligent air conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application, for the ith iteration, the loss function is defined as:
Figure SMS_58
in the intelligent air conditioning control device utilizing reinforcement learning and thermal imaging according to the embodiment of the application, the agent selects the action in the state by using the current value network and an epsilon greedy method, and the action comprises the following steps:
the probability of the agent randomly selecting the action is epsilon, 0< epsilon <1;
the agent selecting an action
Figure SMS_59
The probability of (a) is 1-epsilon;
where ε is calculated as:
Figure SMS_60
wherein epsilon start For randomly selecting an initial value of the probability of an action, epsilon, at the beginning of the training end Is the final value of ε, the parameter t total Representing the total number of time steps that have passed through the training process. For example, ε may be set start Is 0.99,. Epsilon end Is 0.1,t total Is 800.
In the intelligent air-conditioning control device using reinforcement learning and thermal imaging according to the embodiment of the present application, the current value network includes three convolutional layers, the input is a preprocessing state tensor of 50 × 50 × 5, the first convolutional layer is set to include 32 convolutional kernels of 10 × 10, the step size is 2, the second convolutional layer is set to include 64 convolutional kernels of 5 × 5, the step size is 2, the third convolutional layer is set to include 64 convolutional kernels of 3 × 3, the step size is 1, and all the convolutional layers use a rectifying linear unit as an activation function.
Fig. 3 shows a schematic flowchart of an intelligent air conditioner control method using reinforcement learning and thermal imaging according to an embodiment of the present application.
As shown in fig. 3, the intelligent air conditioner control method using reinforcement learning and thermal imaging according to an embodiment of the present application includes the following steps.
In step S210, a thermal imaging graph is acquired using a thermal imager.
Step S220, preprocessing the thermal imaging map, specifically including: cutting out images of four areas of the forehead, two cheeks and the center of one hand from the thermal imaging image, and adjusting the size to 50 multiplied by 3; converting the extracted image into a gray image to form an array with the size of 50 multiplied by 1; combining 4 partial arrays to form a 50 x 4 tensor T g (ii) a According to Y =0.299R +0.578G +0.114B, 50 points are randomly selected on an image to carry out gray value calculation, and a specific gray value pixel is obtained through averaging calculation.
Step S230, combining the preprocessed thermal imaging graph to obtain an initial temperature value based on the indoor temperature valueThe initial state specifically comprises: obtaining a temperature matrix T based on e
Figure SMS_61
Will tensor T g And a temperature matrix T e The 50 × 50 × 5 tensors are combined to obtain the initial state.
Step S240, training the reinforcement learning algorithm, including the following steps:
step S240-1: initializing Q network with random weight and weight of
Figure SMS_62
Let i =1, initialize state s t And a total time step t total =1, initialize parameters, calculate ε, ε start 、ε end And t decay
Step S240-2: initializing the initial size and the total size of a replay memory D, the number E of training sets, an updating interval K and the maximum time step number of each set T, and initializing the replay memory D;
step S240-3: preprocessing the state to obtain a preprocessed state;
step S240-4: defining H epicode;
step S240-5: according to the formula:
Figure SMS_63
calculating epsilon, selecting a random action a by probability t Otherwise, select
Figure SMS_64
Step S240-6-execution of action a t And observe the next state s t+1 To obtain a reward r t
Step S240-7: to s t+1 Performing pretreatment to obtain
Figure SMS_65
Step S240-8: sampling a small batch containing D to w transitions;
step S240-9: for each transition
Figure SMS_66
If->
Figure SMS_67
If the status is a terminal pre-processing status, setting >>
Figure SMS_68
Otherwise, set the value>
Figure SMS_69
Step S240-10: training the Q network using RMSprop;
step S240-11: order to
Figure SMS_70
Step S240-12: copying parameters of the Q network to a target network every K steps;
step S240-13: repeating steps S230-S240-7 until S t Is in a termination state;
step S240-14: returning to the deep Q network.
Here, it can be understood by those skilled in the art that other details of the intelligent air conditioner control method using reinforcement learning and thermal imaging according to the embodiment of the present application are the same as those of the intelligent air conditioner control device using reinforcement learning and thermal imaging according to the embodiment of the present application described previously, and are not described again here to avoid redundancy.
In summary, the control device and the control method using reinforcement learning and thermal imaging according to the embodiment of the present application abandon the idea of ensuring temperature stability and indiscriminate comfort prediction in the previous research, and adjust the temperature of the air conditioning system according to the skin temperature, thereby improving the comfort of the human body. In the embodiment of the application, the thermal imaging image can display the surface temperature of the human body, the air conditioning system can be regarded as an intelligent agent through an air conditioning temperature adjusting scheme based on reinforcement learning and taking the human thermal imaging image as input, the collected thermal imaging image of the people and the initial temperature in a room are used as input, the indoor temperature is adjusted through self-learning and learning of a proper control strategy, and then the skin temperature is adjusted, so that the comfort level of people is improved. And the simulation result shows that the intelligent air conditioner control equipment and the control method utilizing reinforcement learning and thermal imaging can effectively improve the comfort level of people according to the embodiment of the application by combining with EnergyPlus building energy simulation software to carry out real environment simulation and temperature change combined control.
Fig. 4 is a schematic diagram illustrating an intelligent air conditioning control apparatus using reinforcement learning and thermal imaging according to an embodiment of the present application. As shown in fig. 4, a thermal imaging camera is used to shoot a human target to generate a thermal imaging image, a temperature sensor is used to collect indoor temperature, the current thermal imaging image of a human body and an indoor temperature value are preprocessed to be used as input, and the indoor temperature is output through a convolution layer and a full connection layer to control an air conditioning system so as to achieve the purpose of automatically adjusting the indoor temperature.
EnergyPlus is widely used open source building and HVAC simulation software at present, but the EnergyPlus software has certain limitation in the development and optimization of HVAC system control algorithms, and it is very difficult to directly apply some advanced control algorithms in built-in software. Aiming at the problem, the invention establishes a collaborative simulation test bed combining the building HVAC system and the HVAC system control module, and realizes the dynamic data transmission and interaction between the two modules. Python is an open source programming software, and is convenient for realizing DRL control algorithm based on a neural network. Functional Model Interface (FMI) is a standard that provides a uniform model interface for model exchange and co-simulation between multiple modeling and simulation software. Therefore, the invention uses an EnergyPlus-Python collaborative simulation mode with FMI standard for controlling the air conditioning system. The building HVAC system is modeled by EnergyPlus and the DQN-based HVAC system control module is implemented by Python.
The energy plus based building HVAC module uses an external interface, the functional model unit derives: to: scheduling objects of the external interface group are packaged into a Functional Model Unit (FMU) model with FMI standards. The original file is modified using the DQN controller to replace the built-in controller. PyFMI is a Python package based on FMI library, and supports FMU model loading and interaction for model exchange and co-simulation. In the interactive process, firstly, an EnergyPlus simulation model is packaged into an FMU model based on EnergyPlusToFMU; then, the air conditioner intelligent agent sends the action to the packaged FMU model, and the FMU model executes one-step model simulation; and finally, returning the simulation state and the obtained reward to the intelligent agent of the air conditioning system, and repeating the whole process until the simulation is finished. Specifically, at the beginning of each round, the indoor temperature is initialized to a supercooling or overheating value, and according to the current state, the air conditioner can send out one of three actions of increasing the temperature set value, keeping the temperature set value unchanged and decreasing the temperature set value, which is actually represented by changing the temperature set value by 2 ℃, and when the most comfortable state is reached, the iteration is finished.
It can be seen that the embodiment of the application realizes innovation of an air conditioner control system algorithm, and the air conditioner temperature setting point is controlled by collecting human thermal imaging pictures and indoor temperature, so that the room temperature is adaptively in the most comfortable state of the human body, and manual setting by people is not needed. In addition, the defect of a static model represented by PMV and the like is overcome, the PMV model assumes that the thermal senses of indoor people are static and are not different from each other, but actually the thermal comfort degree difference between people is large, so when the thermal comfort degree response of the individual is predicted, the accuracy is greatly reduced.
The basic principles of the present application have been described above with reference to specific embodiments, but it should be noted that advantages, effects, etc. mentioned in the present application are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. As used herein, the words "or" and "refer to, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, each component or step can be decomposed and/or re-combined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (13)

1. An intelligent air conditioner control apparatus using reinforcement learning and thermal imaging, comprising:
the data acquisition unit is used for acquiring a current thermal imaging image and an indoor temperature value of the human body;
the preprocessing unit is used for preprocessing the thermal imaging image and the indoor temperature value to obtain a preprocessing state tensor;
the database establishing unit is used for determining the correlation between the indoor temperature value, the gray value corresponding to the thermal imaging image and the comfort level of the human body;
and the reinforcement learning unit is used for taking the intelligent air conditioner as an intelligent agent, taking the regulated temperature value as the action of the intelligent agent, training the intelligent agent through reinforcement learning, and taking the current thermal imaging image of the human body and the indoor temperature value as input to output the control target value of the indoor temperature.
2. The intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to claim 1, wherein the pre-state processing unit comprises:
an image extraction subunit for extracting a plurality of region partial images of a predetermined region of a human body, respectively;
the temperature value processing subunit is used for processing the indoor temperature value into a temperature data matrix with the same dimensionality as the area partial image; and the number of the first and second groups,
and the dimension combining subunit is used for combining the plurality of region partial images and the temperature data matrix in a channel dimension to obtain the preprocessed state tensor.
3. The intelligent air-conditioning control device using reinforcement learning and thermal imaging according to claim 1, wherein the reward function r of the reinforcement learning unit t Comprises the following steps:
Figure QLYQS_1
where Y is the mean gray value.
4. An intelligent air-conditioning control device utilizing reinforcement learning and thermal imaging according to claim 1, wherein the reinforcement learning unit adopts DQN algorithm as the reinforcement control algorithm of the reinforcement learning unit, and the updated expression of the state-reward action value function of the DQN algorithm is expressed as:
Figure QLYQS_2
wherein
Figure QLYQS_3
Means for maximizing a state-action value function after adjusting an action value a, and>
Figure QLYQS_4
is a discount factor->
Figure QLYQS_5
Is the learning rate and r is the reward function.
5. The intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to claim 1, wherein the agent control part of the reinforcement learning unit adopting DQN algorithm comprises:
the playback memory unit is used for storing the interaction state, action and reward of the intelligent agent and the environment;
a current value network for outputting a parameterized action value function;
a target value network for obtaining a tag value based on the parameters of the current value network; and
an error function for training the reinforcement learning unit based thereon.
6. The intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to claim 5, wherein the playback memory unit stores the transition state(s) in each time step t → t +1 t , a t , r t , s t+1 ) To the replay memory.
7. The intelligent air-conditioning control device utilizing reinforcement learning and thermal imaging as claimed in claim 5, wherein the current value network is trained by using RMSProp root mean square transfer method, and the neural network parameter updating process is expressed as:
Figure QLYQS_6
Figure QLYQS_7
Figure QLYQS_8
Figure QLYQS_9
where w is the number of samples, y is the corresponding sample target, g is the gradient, r is the cumulative gradient, ρ is a hyperparameter representing the rate of decay of the cumulative gradient,
Figure QLYQS_10
for the dot product operation, η is the overall learning rate, and δ is a constant.
8. The intelligent air-conditioning control device utilizing reinforcement learning and thermal imaging according to claim 5, wherein in the DQN algorithm, parameters are adjusted
Figure QLYQS_11
Get the action value function>
Figure QLYQS_12
The value of Bellman equation is represented by
Figure QLYQS_13
Estimate, where i =12' \ 82305 indicates the current number of iterations>
Figure QLYQS_14
Is the parameter of the previous iteration.
9. The intelligent air-conditioning control apparatus using reinforcement learning and thermal imaging according to claim 8, wherein the target value network
Figure QLYQS_15
Generating a target value, wherein, for any i =1, 2, 8230the target value network>
Figure QLYQS_16
Are updated periodically and the parameters are fixed during the update interval.
10. The intelligent climate control device utilizing reinforcement learning and thermal imaging according to claim 8, wherein for the ith iteration, the loss function is defined as:
Figure QLYQS_17
11. the intelligent climate control device utilizing reinforcement learning and thermal imaging of claim 5, wherein the agent selects the action in the state using the current value network and an epsilon greedy approach, comprising:
the probability of the agent randomly selecting the action is epsilon, 0< epsilon <1;
the agent selecting an action
Figure QLYQS_18
The probability of (a) is 1-epsilon;
where ε is calculated as:
Figure QLYQS_19
wherein epsilon start For an initial value, ε, of the probability of randomly selecting an action at the beginning of training end Is the final value of ε, the parameter t total Representing the total number of time steps that have passed through the training process.
12. The intelligent air-conditioning control device utilizing reinforcement learning and thermal imaging according to claim 5, wherein the current value network comprises three convolutional layers, the input is a preprocessing state tensor of 50 x 5, a first convolutional layer is set to contain 32 convolutional kernels of 10 x 10, a convolutional layer with a step size of 2, a second convolutional layer is set to contain 64 convolutional kernels of 5 x 5, a convolutional layer with a step size of 2, a third convolutional layer is set to contain 64 convolutional kernels of 3 x 3, a convolutional layer with a step size of 1, and all convolutional layers use a rectifying linear unit as an activation function.
13. An intelligent air conditioner control method utilizing reinforcement learning and thermal imaging is characterized by comprising the following steps:
acquiring a thermal imaging image by using a thermal imaging camera;
preprocessing the thermal imaging graph;
obtaining an initial state based on the indoor temperature value and the preprocessed thermal imaging graph; and
and training a reinforcement learning algorithm.
CN202310231267.0A 2023-03-13 2023-03-13 Intelligent air conditioner control equipment and control method using reinforcement learning and thermal imaging Active CN115930384B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310231267.0A CN115930384B (en) 2023-03-13 2023-03-13 Intelligent air conditioner control equipment and control method using reinforcement learning and thermal imaging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310231267.0A CN115930384B (en) 2023-03-13 2023-03-13 Intelligent air conditioner control equipment and control method using reinforcement learning and thermal imaging

Publications (2)

Publication Number Publication Date
CN115930384A true CN115930384A (en) 2023-04-07
CN115930384B CN115930384B (en) 2023-06-06

Family

ID=85834033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310231267.0A Active CN115930384B (en) 2023-03-13 2023-03-13 Intelligent air conditioner control equipment and control method using reinforcement learning and thermal imaging

Country Status (1)

Country Link
CN (1) CN115930384B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111561771A (en) * 2020-06-16 2020-08-21 重庆大学 Intelligent air conditioner temperature adjusting method
US20200333033A1 (en) * 2017-10-30 2020-10-22 Daikin Industries, Ltd. Air-conditioning control device
CN111832697A (en) * 2019-04-21 2020-10-27 于长河 Intelligent human body recognition energy-saving temperature regulating system
CN113888737A (en) * 2021-09-16 2022-01-04 杭州英集动力科技有限公司 Hot user room temperature measuring method and system based on temperature measurement image shot by intelligent equipment
CN114355915A (en) * 2021-12-27 2022-04-15 杭州电子科技大学 AGV path planning based on deep reinforcement learning
CN114370698A (en) * 2022-03-22 2022-04-19 青岛理工大学 Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning
CN114627539A (en) * 2022-02-15 2022-06-14 华侨大学 Thermal comfort degree prediction method and system and air conditioner adjusting method and device
CN114969452A (en) * 2022-05-05 2022-08-30 南京邮电大学 Personal thermal comfort prediction method and system
CN114964506A (en) * 2022-05-13 2022-08-30 同济大学 Indoor human body thermal comfort intelligent regulation and control method and system based on infrared thermal imaging

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200333033A1 (en) * 2017-10-30 2020-10-22 Daikin Industries, Ltd. Air-conditioning control device
CN111832697A (en) * 2019-04-21 2020-10-27 于长河 Intelligent human body recognition energy-saving temperature regulating system
CN111561771A (en) * 2020-06-16 2020-08-21 重庆大学 Intelligent air conditioner temperature adjusting method
CN113888737A (en) * 2021-09-16 2022-01-04 杭州英集动力科技有限公司 Hot user room temperature measuring method and system based on temperature measurement image shot by intelligent equipment
CN114355915A (en) * 2021-12-27 2022-04-15 杭州电子科技大学 AGV path planning based on deep reinforcement learning
CN114627539A (en) * 2022-02-15 2022-06-14 华侨大学 Thermal comfort degree prediction method and system and air conditioner adjusting method and device
CN114370698A (en) * 2022-03-22 2022-04-19 青岛理工大学 Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning
CN114969452A (en) * 2022-05-05 2022-08-30 南京邮电大学 Personal thermal comfort prediction method and system
CN114964506A (en) * 2022-05-13 2022-08-30 同济大学 Indoor human body thermal comfort intelligent regulation and control method and system based on infrared thermal imaging

Also Published As

Publication number Publication date
CN115930384B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
Peng et al. Temperature-preference learning with neural networks for occupant-centric building indoor climate controls
US11371739B2 (en) Predictive building control system with neural network based comfort prediction
JP2019522163A (en) Controller for operating air conditioning system and method for controlling air conditioning system
Zhuang et al. Data-driven predictive control for smart HVAC system in IoT-integrated buildings with time-series forecasting and reinforcement learning
CN107120782B (en) A kind of HVAC system control method based on multi-user&#39;s hot comfort data
Şencan et al. A new approach using artificial neural networks for determination of the thermodynamic properties of fluid couples
CA2550180C (en) Robust modeling
CN112963946B (en) Heating, ventilating and air conditioning system control method and device for shared office area
CN111609534B (en) Temperature control method and device and central temperature control system
CN111461466B (en) Heating valve adjusting method, system and equipment based on LSTM time sequence
Baghaee et al. User comfort and energy efficiency in HVAC systems by Q-learning
CN113485498B (en) Indoor environment comfort level adjusting method and system based on deep learning
Kim et al. Building energy management for demand response using kernel lifelong learning
Zhang et al. Two-stage reinforcement learning policy search for grid-interactive building control
CN115585538A (en) Indoor temperature adjusting method and device, electronic equipment and storage medium
CN115682312A (en) Air conditioner energy-saving control method, device and equipment and readable storage medium
CN114811713B (en) Two-level network inter-user balanced heat supply regulation and control method based on mixed deep learning
WO2020179686A1 (en) Equipment control system
CN115930384B (en) Intelligent air conditioner control equipment and control method using reinforcement learning and thermal imaging
EP3771957A1 (en) Method and system for controlling of heating, ventilation and air conditioning
CN113719975B (en) Human body thermal comfort real-time sensing and indoor environment intelligent regulation and control method and system
CN116227883A (en) Intelligent household energy management system prediction decision-making integrated scheduling method based on deep reinforcement learning
US11280514B1 (en) System and method for thermal control based on invertible causation relationship
Wang et al. A Comparison of Classical and Deep Reinforcement Learning Methods for HVAC Control
Leow et al. Occupancy-moderated zonal space-conditioning under a demand-driven electricity price

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant