TWI802034B

TWI802034B - Method for controlling air conditioner demand response

Info

Publication number: TWI802034B
Application number: TW110137199A
Authority: TW
Inventors: 楊宏澤; 蔡漢蒼
Original assignee: 國立成功大學
Priority date: 2021-10-06
Filing date: 2021-10-06
Publication date: 2023-05-11
Also published as: TW202316327A

Abstract

A method for controlling air conditioner demand response, the method to be implemented using an air-conditioning demand response control system to adjust each temperature of N field spaces, where N is a positive integer and N≥2. The method comprises the following steps. When the air-conditioning response control system receives a demand response request command, it executes a double deep Q network machine learning model to calculate a customer baseline load corresponding to each field space according to a reward function. The air-conditioning response control system further calculates an electricity limit of each field space according to the N customer baseline loads and a total electricity limit. The double deep Q network machine learning model trains an exploratory action based on the N customer baseline loads, the N electricity limits, and N temperature feedback values. The exploratory action is related to finding a maximum reward to achieve a minimum standard of comfort.

Description

Air Conditioning Response Control Method

本發明是有關於一種控制方法，特別是指一種空調反應控制方法。The invention relates to a control method, in particular to an air-conditioning response control method.

由於近年的全球暖化影響，為維持建築物裡的人的舒適度，空調設備裝設數量有增無減。不過，大量的使用空調，除增加電網負擔外，若溫度控制不佳亦可能造成不必要的電能損失。透過空調設備參與需量反應可提供系統營運商有效的抑低尖峰負載，提高電網可靠性。然而，在執行需量反應時，過度降低空調負載將可能導致人需忍受不舒適的環境，減少參與需量反應的意願，反之，當溫度控制過於寬鬆卻又無法提供足夠的卸載量。因此，如何折衷以取得平衡是電能管理系統在控制空調負載時的重點。以下將簡述現有技術的缺點:Due to the impact of global warming in recent years, in order to maintain the comfort of people in buildings, the number of air-conditioning equipment installations has continued to increase. However, extensive use of air conditioners will not only increase the burden on the grid, but also may cause unnecessary power loss if the temperature is not well controlled. Participating in demand response through air-conditioning equipment can provide system operators with effective suppression of peak loads and improve grid reliability. However, when performing demand response, excessive reduction of air-conditioning load may cause people to endure an uncomfortable environment and reduce the willingness to participate in demand response. Conversely, when the temperature control is too loose, it cannot provide enough unloading. Therefore, how to make a compromise to achieve a balance is the focus of the power management system when controlling the air-conditioning load. The shortcoming of prior art will be briefly described below:

缺點一，使用模糊控制理論進行空調負載卸載與節能無法有效考慮需量反應中所需要卸載的用電量，且無法保證達成卸載需求。Disadvantage 1. The use of fuzzy control theory for air-conditioning load unloading and energy saving cannot effectively consider the power consumption that needs to be unloaded in demand response, and cannot guarantee that the unloading demand will be met.

缺點二，透過分區輪停的方式降低空調負載無法有效計算每一區的空調數量以及停機時間，同時將空調直接停機會造成室內溫度急遽上升，無法維持人體舒適度。Disadvantage 2. Reducing the load of air conditioners by means of district rotation stops cannot effectively calculate the number of air conditioners in each area and the downtime. At the same time, directly shutting down the air conditioners will cause the indoor temperature to rise sharply, which cannot maintain human comfort.

缺點三，考慮舒適度的基於規則（rule-based）空調控制策略雖然可以將室內溫度控制在特定範圍之內，但無法考慮需量反應卸載量，同時隨著氣候環境變化，使用特定範圍的溫度限制缺乏調整彈性以及模型自我修正能力。Disadvantage 3: Although the rule-based air-conditioning control strategy that considers comfort can control the indoor temperature within a specific range, it cannot consider the demand response unloading amount. At the same time, as the climate environment changes, a specific range of temperature is used The limitation is the lack of adjustment flexibility and model self-correction ability.

因此，本發明的目的，即在提供一種解決先前技術至少一缺點且同時達成卸載需求與兼顧舒適度的空調反應控制方法。Therefore, the object of the present invention is to provide an air-conditioning response control method that solves at least one shortcoming of the prior art and simultaneously achieves unloading requirements and comfort.

於是，本發明空調反應控制方法，由一空調反應控制系統執行以調整N個場域空間的溫度，N是正整數且N≥2，該方法包含以下步驟（s2）～（s4）。Therefore, the air-conditioning response control method of the present invention is executed by an air-conditioning response control system to adjust the temperature of N field spaces, where N is a positive integer and N≥2, and the method includes the following steps (s2)-(s4).

步驟（s2）該空調反應控制系統執行一雙深度Q網路演算法，使該雙深度Q網路演算法根據一獎勵函式進行機器學習訓練以建立一雙深度Q網路機器學習模型。該獎勵函式相關於一預期不滿意率、一需量反應的預期耗電量及一需量反應的累積耗電量。其中，該需量反應的定義是相關於台電的一總用電量限制。Step (s2) The air-conditioning response control system executes a dual deep Q network algorithm, so that the dual deep Q network algorithm performs machine learning training according to a reward function to establish a dual deep Q network machine learning model. The reward function is related to an expected dissatisfaction rate, a demand response expected power consumption and a demand response cumulative power consumption. Wherein, the definition of the demand response is related to a total power consumption limit of Taipower.

步驟（s3）當該空調反應控制系統接收到一需量反應請求指令時，執行該雙深度Q網路機器學習模型以根據該獎勵函式分別計算每一場域空間所對應的一基準用電負載，該基準用電負載是正比該場域空間的歷史平均用電量，該雙深度Q網路機器學習模型更根據該N個基準用電負載與該總用電量限制，計算每一場域空間的一限電分配量。Step (s3) When the air-conditioning response control system receives a demand response request command, execute the double-depth Q-network machine learning model to calculate a reference electrical load corresponding to each field space according to the reward function , the benchmark power load is proportional to the historical average power consumption of the field space, and the double-deep Q network machine learning model calculates each field space based on the N benchmark power loads and the total power consumption limit A limited distribution of electricity.

步驟（s4）該雙深度Q網路機器學習模型根據該N個基準用電負載、該N個限電分配量與N個溫度反饋值進行一探索動作的訓練，該探索動作相關於找尋一最大獎勵以達到一舒適度的最低標準，該舒適度是反比該預期不滿意率，該最大獎勵的定義是該舒適度的一最低臨界值。Step (s4) The double-depth Q-network machine learning model conducts an exploration action training based on the N reference power loads, the N power-limited allocations, and the N temperature feedback values, and the exploration action is related to finding a maximum Rewarding for reaching a minimum standard of a comfort level that is inversely proportional to the expected dissatisfaction rate, the maximum reward is defined as a minimum threshold of the comfort level.

本發明的功效在於：本發明空調反應控制方法解決以往空調反應控制方法無法有效的計算空調負載卸載量的問題，確保整體空調負載能夠達成與台電公司所簽訂的需量反應卸載量，且同時具有較佳的舒適度表現。The efficacy of the present invention lies in that the air-conditioning response control method of the present invention solves the problem that the previous air-conditioning response control method cannot effectively calculate the unloading amount of the air-conditioning load, and ensures that the overall air-conditioning load can reach the demand response unloading amount signed with Taipower Company, and at the same time has Better comfort performance.

在本發明被詳細描述之前，應當注意在以下的說明內容中，類似的元件是以相同的編號來表示。Before the present invention is described in detail, it should be noted that in the following description, similar elements are denoted by the same numerals.

參閱圖1與圖2，圖1與圖2是一流程圖，說明根據本發明空調反應控制方法的一實施例的步驟，由一空調反應控制系統執行以調整N個場域空間的溫度，N是正整數且N≥2。Referring to FIG. 1 and FIG. 2, FIG. 1 and FIG. 2 are a flow chart illustrating the steps of an embodiment of the air-conditioning response control method according to the present invention, executed by an air-conditioning response control system to adjust the temperature of N field spaces, N is a positive integer and N≥2.

在步驟100～101中，該空調反應控制系統執行一等效熱模型用以產生每一場域空間的一模擬溫度資料，且將該模擬溫度資料傳送到一溫度資料庫。該空調反應控制系統執行一遞迴神經網路演算法，使該遞迴神經網路演算法根據該溫度資料庫的資料進行機器學習訓練以建立一遞迴神經網路機器學習模型。In steps 100-101, the air-conditioning response control system executes an equivalent thermal model to generate a simulated temperature data of each field space, and transmits the simulated temperature data to a temperature database. The air-conditioning response control system executes a recurrent neural network algorithm, so that the recurrent neural network algorithm performs machine learning training according to the data of the temperature database to establish a recurrent neural network machine learning model.

由於目前難以取得大量的實際場域空間的溫度資料，所以採用該等效熱模型來產生該模擬溫度資料。這種作法在實務應用上也有幫助，因為不同建築物的建材會有各自典型的熱模型參數，透過該等效熱模型產生該模擬溫度資料後可以預先訓練該遞迴神經網路機器學習模型，讓該遞迴神經網路機器學習模型在上線後不需蒐集大量環境的溫度資料來作訓練，該遞迴神經網路機器學習模型可以更快的收斂到實際環境的熱模型。Since it is currently difficult to obtain a large amount of temperature data in the actual field space, the equivalent thermal model is used to generate the simulated temperature data. This approach is also helpful in practical applications, because the building materials of different buildings will have their own typical thermal model parameters. After the simulated temperature data is generated through the equivalent thermal model, the recurrent neural network machine learning model can be pre-trained. The recurrent neural network machine learning model does not need to collect a large amount of environmental temperature data for training after it goes online, and the recurrent neural network machine learning model can converge to the thermal model of the actual environment faster.

因為室內溫度的變化是具有時間相關性，所以兩相鄰的溫度資料有很強的相關性，因此，該遞迴神經網路機器學習模型可以對下個時間點的室內溫度作出準確的預測。該遞迴神經網路機器學習模型具有一當前的室內溫度、一室外溫度及一空調的設定溫度，並根據該當前的室內溫度、該室外溫度及該空調的設定溫度預測下一個時間點的室內溫度。Because the change of indoor temperature is time-dependent, two adjacent temperature data have a strong correlation. Therefore, the recurrent neural network machine learning model can make an accurate prediction of the indoor temperature at the next time point. The recurrent neural network machine learning model has a current indoor temperature, an outdoor temperature, and a set temperature of the air conditioner, and predicts the indoor temperature at the next time point according to the current indoor temperature, the outdoor temperature, and the set temperature of the air conditioner. temperature.

在步驟102中，該空調反應控制系統執行一雙深度Q網路（Double Deep Q Network，DDQN）演算法，使該雙深度Q網路演算法根據一獎勵函式進行機器學習訓練以建立一雙深度Q網路機器學習模型。單一深度Q網路（Deep Q Network，DQN）模型是一深度強化學習模型，但在學習過程中會有Q值高估的問題，透過使用兩個架構相同的深度Q網路模型可以最大程度的減輕Q值高估問題，也就是該雙深度Q網路機器學習模型。In step 102, the air conditioning response control system executes a double deep Q network (Double Deep Q Network, DDQN) algorithm, so that the double deep Q network algorithm performs machine learning training according to a reward function to establish a double deep Q network algorithm. QNet machine learning model. A single Deep Q Network (DQN) model is a deep reinforcement learning model, but there is a problem of overestimating the Q value during the learning process. By using two deep Q network models with the same architecture, the maximum Alleviate the problem of Q value overestimation, that is, the double deep Q network machine learning model.

該獎勵函式

， PPD是一預期不滿意率（Predicted Percentage of Dissatisfied，PPD），該預期不滿意率

，PMV是一預期熱感覺指標（Predicted Mean Vote，PMV），該預期熱感覺指標是採用一IOS7730標準文件第三版（發表於2005-11-15）的4.1節的式1至式4相關參數作計算，該預期不滿意率與該預期熱感覺指標關係如該IOS7730標準文件的第5節的圖1，

、

、

，將該預期不滿意率的數值映射在一二次曲線函式

以計算舒適度所獲得的獎勵，目的是引導該雙深度Q網路機器學習模型要盡可能採取一舒適度愈高的探索動作，獎勵範圍是介於－1至1。a、b、c是使

介於－1至1的常數，

是一需量反應的最後時步，

是該需量反應的累積耗電量，

是該需量反應的預期耗電量。其中，該需量反應的定義是相關於台電的一總用電量限制。 The reward function

, PPD is an expected dissatisfaction rate (Predicted Percentage of Dissatisfied, PPD), the expected dissatisfaction rate

, PMV is an expected thermal sensation index (Predicted Mean Vote, PMV), the expected thermal sensation index is the use of the third edition of the IOS7730 standard document (published on 2005-11-15) Section 4.1 related parameters of formula 1 to formula 4 For calculation, the relationship between the expected dissatisfaction rate and the expected thermal sensation index is shown in Figure 1 in Section 5 of the IOS7730 standard document,

,

, mapping the value of the expected dissatisfaction rate to a quadratic curve function

The reward obtained by calculating the comfort level is to guide the double-depth Q-network machine learning model to take an exploration action with a higher comfort level as much as possible, and the reward range is between -1 and 1. a, b, c are to make

a constant between -1 and 1,

is the last time step of a demand response,

is the cumulative power consumption of the demand response,

is the expected power consumption of the demand response. Wherein, the definition of the demand response is related to a total power consumption limit of Taipower.

在步驟103中，當該空調反應控制系統接收到一需量反應請求指令時，執行該雙深度Q網路機器學習模型以根據該獎勵函式分別計算每一場域空間所對應的一基準用電負載。該基準用電負載是正比該場域空間的歷史平均用電量，舉例來說，收到該需量反應請求指令的前五日的平均用電量。該雙深度Q網路機器學習模型更根據該N個基準用電負載與該總用電量限制，計算每一場域空間的一限電分配量。當該空調反應控制系統根據該N個限電分配量，分別對該N個場域空間進行限電時，該遞迴神經網路機器學習模型根據該N個場域空間的一實際溫度，分別產生所對應的該N個溫度反饋值。In step 103, when the air-conditioning response control system receives a demand response request command, execute the double-depth Q-network machine learning model to calculate a reference power consumption corresponding to each field space according to the reward function load. The benchmark electricity load is proportional to the historical average electricity consumption of the field space, for example, the average electricity consumption of the previous five days when the demand response request instruction is received. The dual-depth Q-network machine learning model further calculates a limited power distribution amount for each field space according to the N reference power loads and the total power consumption limit. When the air-conditioning response control system limits the power of the N field spaces respectively according to the N power limit allocations, the recurrent neural network machine learning model respectively The corresponding N temperature feedback values are generated.

在步驟104中，該雙深度Q網路機器學習模型根據該N個基準用電負載、該N個限電分配量與該N個溫度反饋值進行一探索動作的訓練。該探索動作相關於找尋一最大獎勵以達到一舒適度的最低標準。該舒適度是反比該預期不滿意率。舉例來說，當人感覺很熱(例如溫度範圍＞29℃)或很冷(例如溫度範圍＜22℃)時，該預期不滿意率會升高，相反的，當人感覺很舒適、很溫和(25℃＜溫度範圍＜27℃)，該預期不滿意率會降低。該最大獎勵的定義是該舒適度的一最低臨界值。該探索動作是利用一epsilon－貪婪（epsilon─Greedy）算法

、

來進行，ε是執行一隨機探索動作的機率，

是採取可以獲得該最大獎勵的探索動作，

是一貪婪因子衰變率，

是一最小探索機率，n是一交互作用的次數，s是一場域空間的狀態，a是該探索動作。需注意的是，ε會隨著該雙深度Q網路機器學習模型的訓練次數增加而逐漸減小，目的是為了在訓練的早期增加隨機探索動作的機率，並隨著訓練次數增加，逐漸確定哪些探索動作更好後，逐漸降低隨機探索動作的機率。 In step 104 , the dual-depth Q-network machine learning model conducts an exploration action training according to the N reference electrical loads, the N power-limited allocations, and the N temperature feedback values. The exploration action is related to finding a maximum reward to achieve a minimum standard of comfort. The comfort level is inversely proportional to the expected dissatisfaction rate. For example, when people feel very hot (such as temperature range > 29 ℃) or very cold (such as temperature range < 22 ℃), the expected dissatisfaction rate will increase, on the contrary, when people feel very comfortable, very mild (25°C<temperature range<27°C), the expected dissatisfaction rate will decrease. The maximum reward is defined as a minimum threshold of the comfort level. The exploration action is using an epsilon-greedy (epsilon─Greedy) algorithm

,

to proceed, ε is the probability of performing a random exploration action,

is to take the exploration action that can get the maximum reward,

is a greedy factor decay rate,

is a minimum exploration probability, n is the number of interactions, s is the state of a field space, and a is the exploration action. It should be noted that ε will gradually decrease as the training times of the double-depth Q network machine learning model increase, the purpose is to increase the probability of random exploration actions in the early stage of training, and gradually determine After which exploration actions are better, gradually reduce the probability of random exploration actions.

在步驟105中，該空調反應控制系統判斷當前時間是否是一需量反應的執行區間。In step 105, the air-conditioning response control system judges whether the current time is an execution interval of demand response.

若是，則在步驟106中，該雙深度Q網路機器學習模型會介入空調的控制以執行該探索動作以找尋該最大獎勵。If yes, then in step 106, the dual deep Q network machine learning model will intervene in the control of the air conditioner to perform the exploration action to find the maximum reward.

若否，則在步驟107中，該空調反應控制系統根據一輸入指令以產生對應每一場域空間的一溫度設定值。舉例來說，人拿遙控器對著空調輸入該溫度設定值。If not, then in step 107, the air-conditioning response control system generates a temperature setting value corresponding to each field space according to an input command. For example, a person takes a remote controller and inputs the temperature setting value to the air conditioner.

在步驟108中，該空調反應控制系統會將該N個場域空間的該等實際溫度資料傳送至該溫度資料庫，且回到該步驟101使該遞迴神經網路機器學習模型進行校正訓練。In step 108, the air-conditioning response control system will transmit the actual temperature data of the N field spaces to the temperature database, and return to step 101 to enable the recurrent neural network machine learning model to perform calibration training .

舉例來說，一棟建築物有20個房間，每一房間裝設一空調，台電要求的需量反應卸載量是每小時需降載20度電，總共需要維持2小時，換言之，總用電量限制是40度電，每一基準用電負載取收到該需量反應請求指令的前五日的平均用電量。表1是每一房間的基準用電負載、限電量及耗電量限制，如下，其中，限制＝基準用電負載－限電量。For example, a building has 20 rooms, and each room is equipped with an air conditioner. The demand response unloading capacity required by Taipower is to reduce the load by 20 kWh per hour, and it needs to be maintained for a total of 2 hours. In other words, the total power consumption The quantity limit is 40 kilowatt-hours, and each benchmark electric load takes the average electricity consumption of the previous five days when the demand response request instruction is received. Table 1 is the reference power load, power limit and power consumption limit of each room, as follows, wherein, limit=base power load-limit power.

表1、每一房間的限電量及耗電量限制（度）房號基準用電負載限電量限制房號基準用電負載限電量限制 1 4.11 1.87 2.24 11 3.94 1.79 2.15 2 4.11 1.87 2.24 12 5.10 2.32 2.78 3 6.40 2.92 3.48 13 6.79 3.09 3.70 4 3.98 1.81 2.17 14 3.58 1.63 1.95 5 5.16 2.35 2.81 15 3.04 1.39 1.65 6 4.02 1.83 2.19 16 4.01 1.83 2.18 7 5.00 2.28 2.72 17 4.42 2.01 2.41 8 3.07 1.40 1.67 18 3.24 1.48 1.76 9 4.27 1.94 2.33 19 5.56 2.53 3.03 10 4.91 2.23 2.68 20 3.13 1.43 1.70 總計 87.84 40.00 47.84 Table 1. Power limit and power consumption limit of each room (degrees) Room No Base electrical load Power limit limit Room No Base electrical load Power limit limit 1 4.11 1.87 2.24 11 3.94 1.79 2.15 2 4.11 1.87 2.24 12 5.10 2.32 2.78 3 6.40 2.92 3.48 13 6.79 3.09 3.70 4 3.98 1.81 2.17 14 3.58 1.63 1.95 5 5.16 2.35 2.81 15 3.04 1.39 1.65 6 4.02 1.83 2.19 16 4.01 1.83 2.18 7 5.00 2.28 2.72 17 4.42 2.01 2.41 8 3.07 1.40 1.67 18 3.24 1.48 1.76 9 4.27 1.94 2.33 19 5.56 2.53 3.03 10 4.91 2.23 2.68 20 3.13 1.43 1.70 total 87.84 40.00 47.84

表2是每一房間的限電量、耗電量限制、實際耗電量及實際限電量，如下。Table 2 is the power limit, power consumption limit, actual power consumption and actual power limit of each room, as follows.

表2、每一房間的耗電量資訊（度）房號限電量限制實際耗電量實際限電量房號限電量限制實際耗電量實際限電量 1 1.87 2.24 2.03 2.08 11 1.79 2.15 1.99 1.95 2 1.87 2.24 1.80 2.30 12 2.32 2.78 2.63 2.47 3 2.92 3.48 3.44 2.96 13 3.09 3.70 3.59 3.20 4 1.81 2.17 2.07 1.91 14 1.63 1.95 1.82 1.75 5 2.35 2.81 2.58 2.58 15 1.39 1.65 1.52 1.53 6 1.83 2.19 2.07 1.95 16 1.83 2.18 2.10 1.91 7 2.28 2.72 2.24 2.76 17 2.01 2.41 2.38 2.04 8 1.40 1.67 1.53 1.55 18 1.48 1.76 1.62 1.63 9 1.94 2.33 2.20 2.07 19 2.53 3.03 2.82 2.74 10 2.23 2.68 2.39 2.51 20 1.43 1.70 1.61 1.52 總計 40.00 47.84 44.43 43.41 Table 2. Power consumption information of each room (degrees) Room No Power limit limit Actual power consumption Actual power limit Room No Power limit limit Actual power consumption Actual power limit 1 1.87 2.24 2.03 2.08 11 1.79 2.15 1.99 1.95 2 1.87 2.24 1.80 2.30 12 2.32 2.78 2.63 2.47 3 2.92 3.48 3.44 2.96 13 3.09 3.70 3.59 3.20 4 1.81 2.17 2.07 1.91 14 1.63 1.95 1.82 1.75 5 2.35 2.81 2.58 2.58 15 1.39 1.65 1.52 1.53 6 1.83 2.19 2.07 1.95 16 1.83 2.18 2.10 1.91 7 2.28 2.72 2.24 2.76 17 2.01 2.41 2.38 2.04 8 1.40 1.67 1.53 1.55 18 1.48 1.76 1.62 1.63 9 1.94 2.33 2.20 2.07 19 2.53 3.03 2.82 2.74 10 2.23 2.68 2.39 2.51 20 1.43 1.70 1.61 1.52 total 40.00 47.84 44.43 43.41

參閱圖3，圖3是在收到台電的需量反應通知後，將每一房間的空調停機以符合台電要求的需量反應卸載量40度電。可以看到溫度在10分鐘內從28℃急遽上升至接近31℃，而預期不滿意率在10分鐘內從接近20％急遽上升至超過60％，停機超過20分鐘後，預期不滿意率皆超過80％。也就是說，雖然將每一房間的空調停機可以符合台電要求的需量反應卸載量，然而在房間中的人們感受是不舒適的。Refer to Figure 3. Figure 3 shows that after receiving the demand response notification from Taipower, the air conditioners in each room were shut down to meet the demand response unloading capacity of 40 kilowatt-hours required by Taipower. It can be seen that the temperature rises sharply from 28°C to close to 31°C within 10 minutes, and the expected dissatisfaction rate rises sharply from close to 20% to over 60% within 10 minutes. After more than 20 minutes of shutdown, the expected dissatisfaction rate exceeds 80%. That is to say, although shutting down the air conditioner in each room can meet the demand response unloading capacity required by Taipower, the people in the room feel uncomfortable.

參閱圖4，圖4是在收到台電的需量反應通知後，將每一房間的空調提高設定溫度以符合台電要求的需量反應卸載量40度電。可以看到溫度在10分鐘內從28℃上升至接近29℃，而預期不滿意率在10分鐘內從接近20％上升到接近25％，提高設定溫度超過30分鐘後，預期不滿意率接近30％。也就是說，雖然將每一房間的空調提高設定溫度可以符合台電要求的需量反應卸載量，然而在房間中的人們感受是不舒適的。Refer to Figure 4. Figure 4 shows that after receiving the demand response notification from Taipower, the set temperature of the air conditioner in each room is raised to meet the demand response unloading capacity required by Taipower by 40 kWh. It can be seen that the temperature rises from 28°C to close to 29°C within 10 minutes, and the expected dissatisfaction rate rises from close to 20% to close to 25% within 10 minutes. After increasing the set temperature for more than 30 minutes, the expected dissatisfaction rate is close to 30% %. That is to say, although increasing the set temperature of the air conditioner in each room can meet the demand response unloading required by Taipower, the people in the room feel uncomfortable.

參閱圖5，圖5是在收到台電的需量反應通知後，該雙深度Q網路機器學習模型介入空調的控制以執行該探索動作以找尋該最大獎勵，以符合台電要求的需量反應卸載量40度電。可以看到溫度在120分鐘的限電時間內維持在28℃以下，最低來到26℃，而預期不滿意率在在120分鐘的限電時間內維持在20％，最低來到10％。也就是說，該雙深度Q網路機器學習模型介入空調的控制可以符合台電要求的需量反應卸載量，同時在房間中的人們感受是舒適的。Referring to Figure 5, Figure 5 shows that after receiving the demand response notification from Taipower, the double-depth Q network machine learning model intervenes in the control of the air conditioner to perform the exploration action to find the maximum reward, so as to meet the demand response required by Taipower The unloading capacity is 40 kWh. It can be seen that the temperature remained below 28°C during the 120-minute power-limiting time, and the lowest reached 26°C, while the expected dissatisfaction rate remained at 20% during the 120-minute power-limiting time, and the lowest reached 10%. That is to say, the dual-depth Q-network machine learning model intervening in the control of air conditioners can meet the demand response unloading required by Taipower, and at the same time, people in the room feel comfortable.

表3是前述三種限電方式的平均預期不滿意率，如下。Table 3 is the average expected dissatisfaction rate of the aforementioned three power reduction methods, as follows.

表3、停機、提高設定溫度、雙深度Q網路機器學習模型的平均預期不滿意率限電方式預期不滿意率（％）停機 78.63 提高設定溫度 28.30 雙深度Q網路機器學習模型 10.58 Table 3. The average expected dissatisfaction rate of machine learning model with shutdown, increased set temperature, and double-deep Q network Power limit mode Expected dissatisfaction rate (%) shutdown 78.63 Increase set temperature 28.30 Double Deep Q-Network Machine Learning Model 10.58

可以看到透過該雙深度Q網路機器學習模型介入空調的控制比起讓空調停機或提高設定溫度有較低的預期不滿意率。It can be seen that intervening in the control of the air conditioner through the double deep Q network machine learning model has a lower expected dissatisfaction rate than stopping the air conditioner or increasing the set temperature.

綜上所述，上述實施例具有以下優點：優點一，該雙深度Q網路機器學習模型根據該N個基準用電負載、該N個限電分配量與N個溫度反饋值進行一探索動作的訓練，該探索動作相關於找尋一最大獎勵以達到該舒適度的最低標準。達成功效是解決以往空調反應控制方法無法有效的計算空調負載卸載量的問題，確保整體空調負載能夠達成與台電公司所簽訂的需量反應卸載量。同時，本發明空調反應控制方法相較於將空調停機或是直接將空調升溫皆有較佳的舒適度表現。To sum up, the above-mentioned embodiment has the following advantages: Advantage 1, the double-depth Q-network machine learning model performs an exploratory action according to the N reference power loads, the N power-limited allocations and N temperature feedback values In the training of , the exploratory action is related to finding a maximum reward to achieve the minimum standard of comfort. The achievement of the effect is to solve the problem that the previous air-conditioning response control method cannot effectively calculate the unloading amount of the air-conditioning load, and ensure that the overall air-conditioning load can reach the demand response unloading amount signed with Taipower. At the same time, the air-conditioning response control method of the present invention has better comfort performance than shutting down the air-conditioning or directly heating the air-conditioning.

優點二，當場域空間的空調數量要做變更時，可以直接做變動，該雙深度Q網路機器學習模型及該遞迴神經網路機器學習模型並不需要進行修改。達成功效是提高模型運用靈活性。The second advantage is that when the number of air conditioners in the field space needs to be changed, the change can be made directly, and the double-depth Q network machine learning model and the recurrent neural network machine learning model do not need to be modified. To achieve the effect is to improve the flexibility of the model.

優點三，本發明空調反應控制方法可運用在大量空調專設場域空間，例如，學校、住宅大樓、辦公大樓等，達成功效是除了能參與台電公司所公布的需量反應以賺取回饋金外，亦可避免不必要的電能浪費。故確實能達成本發明的目的。Advantage 3, the air-conditioning response control method of the present invention can be applied to a large number of air-conditioning special field spaces, such as schools, residential buildings, office buildings, etc., to achieve the effect, in addition to participating in the demand response announced by Taipower Corporation to earn rewards In addition, unnecessary waste of electric energy can also be avoided. Therefore really can reach the purpose of the present invention.

惟以上所述者，僅為本發明的實施例而已，當不能以此限定本發明實施的範圍，凡是依本發明申請專利範圍及專利說明書內容所作的簡單的等效變化與修飾，皆仍屬本發明專利涵蓋的範圍內。But the above-mentioned ones are only embodiments of the present invention, and should not limit the scope of the present invention. All simple equivalent changes and modifications made according to the patent scope of the present invention and the content of the patent specification are still within the scope of the present invention. Within the scope covered by the patent of the present invention.

100～101··············· 建立一遞迴神經網路機器學習模型的步驟 102····················· 建立一雙深度Q網路機器學習模型的步驟 103～106··········· 一空調反應控制系統執行一需量反應的步驟 107······················· 非該雙深度Q網路機器學習模型控制溫度的步驟 108····················· 將多筆實際溫度資料傳送至一溫度資料庫的步驟 100～101···························································································································································································································· 102························································································································· 103～106··································································································································································································································································································· 107 ····················· 108···································································································································································

本發明的其他的特徵及功效，將於參照圖式的實施方式中清楚地呈現，其中：圖1是本發明空調反應控制方法的一實施例的一流程圖；圖2是該實施例的一流程圖；圖3是該實施例的一透過停機控制空調的溫度變化折線圖及一透過停機控制空調的預期不滿意率折線圖；圖4是該實施例的一透過提高設定溫度控制空調的溫度變化折線圖及一透過提高設定溫度控制空調的預期不滿意率折線圖；及圖5是該實施例的一透過本發明空調反應控制方法的溫度變化折線圖及一透過本發明空調反應控制方法的預期不滿意率折線圖。 Other features and effects of the present invention will be clearly presented in the implementation manner with reference to the drawings, wherein: Fig. 1 is a flowchart of an embodiment of the air-conditioning response control method of the present invention; Fig. 2 is a flow chart of this embodiment; Fig. 3 is a line graph of the temperature change of the air conditioner controlled by shutdown and a line diagram of the expected dissatisfaction rate of the air conditioner controlled by shutdown of the embodiment; Fig. 4 is a line graph of the temperature change of the air conditioner controlled by increasing the set temperature and a line chart of the expected dissatisfaction rate of the air conditioner controlled by increasing the set temperature; and FIG. 5 is a line graph of temperature change through the air conditioning response control method of the present invention and a line graph of expected dissatisfaction rate through the air conditioning response control method of the present invention according to the embodiment.

104～106 ·············· 一空調反應控制系統執行一需量反應的步驟 107······················· 非該雙深度Q網路機器學習模型控制溫度的步驟 108····················· 將多筆實際溫度資料傳送至一溫度資料庫的步驟 104～106 ······························································································································································································································································ 107 ····················· 108···································································································································································

Claims

An air-conditioning response control method, executed by an air-conditioning response control system to adjust the temperature of N field spaces, N is a positive integer and N

2. The method includes: (s2) the air-conditioning response control system executes a dual-depth Q-network algorithm, so that the dual-depth Q-network algorithm performs machine learning training according to a reward function to establish a dual-depth Q-network machine learning model, the reward function is related to an expected dissatisfaction rate, an expected power consumption of a demand response and a cumulative power consumption of the demand response, wherein the definition of the demand response is related to a Taipower total power consumption limit; (s3) when the air-conditioning response control system receives a demand response request command, execute the double-depth Q-network machine learning model to calculate each of the field spaces according to the reward function One corresponds to a benchmark power load, the benchmark power load is proportional to the historical average power consumption of the field space, and the double-depth Q network machine learning model is based on the N benchmark power loads and the total power consumption Power limit, calculating each limited power distribution of these field spaces; and (s4) the double-depth Q network machine learning model is based on the N benchmark power loads, N power limited distributions and N temperatures Feedback values are used to train an exploratory action relative to finding a maximum reward to achieve a minimum level of comfort that is inversely proportional to the expected dissatisfaction rate, the maximum reward being defined as a minimum of the comfort level critical value.

The air-conditioning response control method as described in Claim 1, wherein the method further includes the following steps: (s0) the air-conditioning response control system executes an equivalent thermal model to generate generate each simulated temperature data of the field spaces, and transmit the simulated temperature data to a temperature database; and (s1) the air-conditioning response control system executes a recurrent neural network algorithm, so that the recurrent neural network The roadshow algorithm performs the machine learning training according to the data of the temperature database to establish a recurrent neural network machine learning model.

The air-conditioning response control method as described in claim 2, wherein the step (s3) further includes: when the air-conditioning response control system limits the power of the N field spaces respectively according to the N power-limited distribution amounts, The recurrent neural network machine learning model generates the corresponding N temperature feedback values respectively according to an actual temperature of the N field spaces.

The air-conditioning response control method as described in Claim 2, wherein the method further includes the following steps: (s5) the air-conditioning response control system judges whether the current time is an execution interval of the demand response; (s6) if so, the The double deep Q network machine learning model will intervene in the control of the air conditioner to perform the exploration action to find the maximum reward; and (s7) the air conditioner response control system will send the actual temperature data of the N field spaces to the temperature data Library, and return to the step (s1) to make the recurrent neural network machine learning model perform corrective training.

The air-conditioning response control method as described in claim 4, wherein the method further includes a step (s6 ' ), if the judgment result of the step (S5) is negative, the air-conditioning response control system generates a corresponding Each temperature setpoint for the field space.

The air conditioner response control method as claimed in claim 1, wherein the search action in step (s4) is related to the temperature setting value of the air conditioner.

The air-conditioning response control method as described in Claim 1, wherein the reward function in the step (s2) is used to guide the double-depth Q-network machine learning model to take exploration actions with higher comfort.