TWI767868B - Method and apparatus for planning energy usage of charging station based on reinforcement learning - Google Patents
Method and apparatus for planning energy usage of charging station based on reinforcement learning Download PDFInfo
- Publication number
- TWI767868B TWI767868B TW110141537A TW110141537A TWI767868B TW I767868 B TWI767868 B TW I767868B TW 110141537 A TW110141537 A TW 110141537A TW 110141537 A TW110141537 A TW 110141537A TW I767868 B TWI767868 B TW I767868B
- Authority
- TW
- Taiwan
- Prior art keywords
- reinforcement learning
- charging station
- energy
- reward
- learning table
- Prior art date
Links
Images
Abstract
Description
本發明是有關於一種強化學習方法及裝置,且特別是有關於一種基於強化學習的充電站能源使用規劃方法及裝置。The present invention relates to a reinforcement learning method and device, and in particular, to a reinforcement learning-based energy usage planning method and device for a charging station.
近年來,由於環保意識的提高,許多人開始使用電動車輛,而隨著電動車輛用戶的大幅增加,電動車輛充電站的需求也同步提升。然而,由於電動車輛用戶的習慣不同,其對於充電站的需求有所差異,在大量電動車輛充電的情況下,將造成充電站充電的不協調,並且對整體電網有負面影響。In recent years, many people have started to use electric vehicles due to the increasing awareness of environmental protection, and with the substantial increase in electric vehicle users, the demand for electric vehicle charging stations has also increased. However, due to the different habits of electric vehicle users, their demand for charging stations is different. In the case of a large number of electric vehicles charging, the charging station will be uncoordinated and have a negative impact on the overall power grid.
先前用於多個電動車輛充電站之間的能源使用規劃係採用非線性規劃(nonlinear programming)演算法,其需要實時預測價格、電動車輛需求和可再生能源數據,從而導致性能難以提升。為了解決此問題,目前已有部分文獻提出無模型的強化學習演算法,但此種演算法的收斂速度低,結果將產生較高的花費並造成能源浪費,無法達到充電站整體利益的最大化。Energy usage planning previously used between multiple electric vehicle charging stations employed nonlinear programming algorithms that required real-time forecasting of prices, electric vehicle demand and renewable energy data, making performance improvements difficult. In order to solve this problem, some literatures have proposed a model-free reinforcement learning algorithm, but the convergence speed of this algorithm is low, resulting in higher costs and energy waste, which cannot maximize the overall benefits of the charging station. .
本發明提供一種基於強化學習的充電站能源使用規劃方法及裝置,可妥善安排各充電站之電池充放電與能源提供,達到充電站整體利益最大化。The present invention provides a method and device for planning the energy use of a charging station based on reinforcement learning, which can properly arrange the battery charging and discharging and energy supply of each charging station, so as to maximize the overall benefit of the charging station.
本發明提供一種基於強化學習的充電站能源使用規劃方法,適於由能源共享區域內多個充電站中的指定充電站規劃能源使用。所述方法包括使用自身的電力需求、電池剩餘電量以及能源共享區域的全局電力需求與內部電價定義多個系統狀態,並預估各個系統狀態下安排能源使用動作的期望報酬以建構一個強化學習表,其中全局電力需求是由合作者裝置整合各個充電站上傳的電力需求而得;依據強化學習表選擇適於在當前系統狀態下安排的能源使用動作並上傳至合作者裝置,根據合作者裝置所安排的交易電量及所計算的採用此能源使用動作的獎勵,以更新強化學習表;以及記錄當前系統狀態、能源使用動作、獎勵及當前系統狀態的歷經次數以產生一模擬環境,並在此模擬環境下,計算在各個系統狀態下安排電源使用動作所獲得的獎勵,據以更新強化學習表。The invention provides a method for planning energy use of a charging station based on reinforcement learning, which is suitable for planning energy use by a designated charging station among a plurality of charging stations in an energy sharing area. The method includes defining multiple system states using its own power demand, remaining battery power, and the global power demand and internal electricity price in the energy sharing area, and estimating the expected reward for arranging energy use actions in each system state to construct a reinforcement learning table , where the global power demand is obtained by integrating the power demand uploaded by each charging station by the partner device; according to the reinforcement learning table, select the energy use action suitable for the arrangement in the current system state and upload it to the partner device, according to the partner device Arranged transaction amount and calculated reward for adopting this energy usage action to update the reinforcement learning table; and record the current system state, energy usage action, reward and the number of epochs of the current system state to generate a simulated environment, and simulate here In the environment, the reward obtained by arranging the power usage action in each system state is calculated, and the reinforcement learning table is updated accordingly.
本發明提供一種基於強化學習的充電站能源使用規劃裝置,其係配置於指定充電站。此充電站能源使用規劃裝置包括連接裝置、儲存裝置及處理器。連接裝置是用以連接合作者裝置,此合作者裝置是用以管理能源共享區域內包括指定充電站在內的多個充電站。儲存裝置用以儲存電腦程式。處理器耦接連接裝置及儲存裝置,且經配置以載入並執行電腦程式以使用指定充電站的電力需求、電池剩餘電量以及能源共享區域的全局電力需求與內部電價定義多個系統狀態,並預估在各個系統狀態下安排能源使用動作的期望報酬以建構一個強化學習表,其中全局電力需求是由合作者裝置整合各個充電站上傳的電力需求而得;依據強化學習表選擇適於在當前系統狀態下安排的能源使用動作並上傳至合作者裝置,根據合作者裝置所安排的交易電量及所計算的採用能源使用動作的獎勵,以更新強化學習表;以及記錄當前系統狀態、能源使用動作、獎勵及當前系統狀態的歷經次數以產生一模擬環境,並在此模擬環境下,計算在各個系統狀態下安排電源使用動作所獲得的獎勵,據以更新強化學習表。The present invention provides an energy use planning device for a charging station based on reinforcement learning, which is arranged at a designated charging station. The charging station energy use planning device includes a connection device, a storage device and a processor. The connection device is used to connect the partner device, and the partner device is used to manage a plurality of charging stations including the designated charging station in the energy sharing area. The storage device is used to store computer programs. The processor is coupled to the connection device and the storage device, and is configured to load and execute a computer program to define a plurality of system states using the power demand of the specified charging station, the remaining battery capacity, and the global power demand and internal electricity price of the energy sharing area, and Estimate the expected rewards of arranging energy use actions in various system states to construct a reinforcement learning table, in which the global power demand is obtained by integrating the power requirements uploaded by each charging station by the partner device; The energy usage actions arranged in the system state are uploaded to the partner device, and the reinforcement learning table is updated according to the transaction power arranged by the partner device and the calculated rewards for using energy usage actions; and the current system state and energy usage actions are recorded. , rewards and the number of times of the current system state to generate a simulation environment, and in this simulation environment, calculate the rewards obtained by arranging power usage actions in each system state, and update the reinforcement learning table accordingly.
為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more obvious and easy to understand, the following embodiments are given and described in detail with the accompanying drawings as follows.
本發明實施例運用強化學習方法於充電站,根據來自外界的電力需求資訊,使用基於模型的多智能體(multi-agent)強化學習演算法,透過更新迭代並在固定時間段作電動車輛充電站的能源使用規劃,安排各充電站的電池充放電與所提供的電動汽車能源之策略,以達到充電站整體利益最大化。The embodiment of the present invention applies the reinforcement learning method to the charging station. According to the power demand information from the outside world, the model-based multi-agent reinforcement learning algorithm is used to perform the electric vehicle charging station through the update iteration and in a fixed period of time. The energy use planning of each charging station, and the strategy of arranging the charging and discharging of the batteries of each charging station and the energy provided by the electric vehicle, so as to maximize the overall interests of the charging station.
圖1是根據本發明一實施例所繪示的能源共享系統的示意圖。請參考圖1,本實施例的能源共享系統10適用於一電動車輛充電站的合作區域,其中包括多個電動車輛的充電站EVCS 1~EVCS I(其中I為正整數)及負責傳遞資訊的至少一個合作者裝置12。該區域下每個充電站EVCS 1~EVCS I皆備有儲能裝置(energy storage system,ESS),其能向其他電動車輛充電站售出多餘的電量或是購買不足的電量。充電站EVCS 1~EVCS I會依照電廠14所提供的實時電價(real-time-price)、自身電動車輛用戶的充電需求、儲能裝置的剩餘電量與所有電動車用戶的充電需求,決定電動車輛之電量供給並適當調整充放電策略。FIG. 1 is a schematic diagram of an energy sharing system according to an embodiment of the present invention. Referring to FIG. 1 , the
圖2是根據本發明一實施例所繪示的基於強化學習的充電站能源使用規劃裝置的方塊圖。請同時參考圖1及圖2,本發明實施例的充電站能源使用規劃裝置20例如是配置在圖1的充電站EVCS 1中,但在其他實施例中,充電站能源使用規劃裝置20也可以配置在圖1的其他充電站中。充電站能源使用規劃裝置20例如是具有運算能力的檔案伺服器、資料庫伺服器、應用程式伺服器、工作站或個人電腦等計算機裝置,其中包括連接裝置22、儲存裝置24及處理器26等元件,這些元件的功能分述如下:2 is a block diagram of an apparatus for planning energy use of a charging station based on reinforcement learning according to an embodiment of the present invention. Please refer to FIG. 1 and FIG. 2 at the same time, the charging station energy
連接裝置22例如是可與合作者裝置12連接的任意的有線或無線的介面裝置,其可用以將充電站EVCS1自身的電力需求上傳至合作者裝置12,並接收由合作者裝置12回傳的全局電力需求。對於有線方式而言,連接裝置22可以是通用序列匯流排(universal serial bus,USB)、RS232、通用非同步接收器/傳送器(universal asynchronous receiver/transmitter,UART)、內部整合電路(I2C)、序列周邊介面(serial peripheral interface,SPI)、顯示埠(display port)或雷電埠(thunderbolt)等介面,但不限於此。對於無線方式而言,連接裝置22可以是支援無線保真(wireless fidelity,Wi-Fi)、RFID、藍芽、紅外線、近場通訊(near-field communication,NFC)或裝置對裝置(device-to-device,D2D)等通訊協定的裝置,亦不限於此。在一些實施例中,連接裝置22亦可包括支援乙太網路(Ethernet)或是支援802.11g、802.11n、802.11ac等無線網路標準的網路卡,使得充電站能源使用規劃裝置20可經由網路連接合作者裝置12,以上傳或接收電力需求、全局電力需求、交易電量等資料。The
儲存裝置24例如是任意型式的固定式或可移動式隨機存取記憶體(Random Access Memory,RAM)、唯讀記憶體(Read-Only Memory,ROM)、快閃記憶體(Flash memory)、硬碟或類似元件或上述元件的組合,而用以儲存可由處理器26執行的電腦程式。在一些實施例中,儲存裝置24例如還可儲存由處理器26所建立的強化學習表以及由連接裝置22從合作者裝置12接收的全局電力需求。The
處理器26例如是中央處理單元(Central Processing Unit,CPU),或是其他可程式化之一般用途或特殊用途的微處理器(Microprocessor)、微控制器(Microcontroller)、數位訊號處理器(Digital Signal Processor,DSP)、可程式化控制器、特殊應用積體電路(Application Specific Integrated Circuits,ASIC)、可程式化邏輯裝置(Programmable Logic Device,PLD)或其他類似裝置或這些裝置的組合,本發明不在此限制。在本實施例中,處理器26可從儲存裝置24載入電腦程式,以執行本發明實施例的基於強化學習的充電站能源使用規劃方法。The
本發明實施例的基於強化學習的充電站能源使用規劃方法例如是將多個電動車輛充電站的運作過程形塑成馬可夫決策過程(Markov decision process,MDP),並將各個電動車輛充電站視為智能體(agent)進行學習,使運作時間離散化(time slot),為了提升規劃(planning)效率,例如是採用回合式設定(episode)。The reinforcement learning-based charging station energy usage planning method according to the embodiment of the present invention, for example, forms the operation process of a plurality of electric vehicle charging stations into a Markov decision process (MDP), and regards each electric vehicle charging station as a Markov decision process (MDP). The agent learns to discretize the operating time (time slot). In order to improve the planning efficiency, for example, an episode is used.
詳細而言,圖3是依照本發明一實施例所繪示的基於強化學習的充電站能源使用規劃方法的流程圖。請同時參照圖1、圖2及圖3,本實施例的方法適用於上述充電站能源使用規劃裝置20,以下即搭配充電站能源使用規劃裝置20的各項元件說明本實施例的充電站能源使用規劃方法的詳細步驟。In detail, FIG. 3 is a flowchart of a method for planning energy use of a charging station based on reinforcement learning according to an embodiment of the present invention. Please refer to FIG. 1 , FIG. 2 and FIG. 3 at the same time, the method of this embodiment is applicable to the above-mentioned charging station energy
在步驟S302中,由充電站能源使用規劃裝置20的處理器26以充電站EVCS 1的電力資訊、電池剩餘電量以及能源共享區域的全局電力需求與內部電價定義多個電力狀態,並預估在各個系統狀態下安排能源使用動作的期望報酬以建構一強化學習表。其中,處理器26例如是利用連接裝置22將自身的電力需求上傳至能源共享區域的合作者裝置12,並接收由合作者裝置12整合各個充電站EVCS 1~EVCS I上傳的電力需求所得的全局電力需求(即,充電站總體的電力需求),並根據充電站所在區域通知各個充電站當前的全局電力需求。In step S302, the
詳言之,處理器26例如會給定一狀態空間S及一動作空間A,並將在時間段t的狀態標記為
,其中
,以及將在狀態
下於時間段t選擇的動作標記為
,其中
。在狀態
下選擇動作
之後,此環境將轉變為下一狀態
,並產生整體利益P(t)。其中,在狀態
下選擇動作
的機率函數可標記為策略
,而用以評估在時間段t使用策略
的累計利益的期望值的動作值函數(即,Q函數)
可定義為:
,
In detail, the
其中, 為折扣率(discount factor)。 in, is the discount factor.
在本實施例中,處理器26例如是將第i個充電站在時間段t的狀態
定義為:
In this embodiment, the
其中, 為在時間段t的能源共享區域的全局電力需求, 為第i個充電站的電池電量, 為第i個充電站的電力需求, 則為能源共享區域的內部電價,例如電廠所提供的實時電價。其中, 係作為觀察用指標,其可幫助充電站學習其他充電站動作的效果,並改善學習效率。 in, is the global power demand of the energy-sharing area at time period t, is the battery level of the i-th charging station, is the power demand of the i-th charging station, It is the internal electricity price of the energy sharing area, such as the real-time electricity price provided by the power plant. in, The system is used as an observation indicator, which can help the charging station learn the effect of other charging station actions and improve the learning efficiency.
在其他實施例中,處理器26例如是將第i個充電站在時間段t的狀態
定義為:
In other embodiments, the
其中, 為在時間段t的能源共享區域的全局電力需求, 為第i個充電站的電池電量, 為第i個充電站的緊急需求, 為第i個充電站的常規需求, 為第i個充電站的再生能源電量, 則為能源共享區域的內部電價。其中,上述的緊急需求例如是滿足至少一個緊急條件的電力需求,而所述的緊急條件例如是充電時間限制(例如為1小時)、充電量限制等充電相關的限制條件,在此不設限。 in, is the global power demand of the energy-sharing area at time period t, is the battery level of the i-th charging station, for the urgent needs of the i-th charging station, is the regular demand for the i-th charging station, is the renewable energy power of the i-th charging station, is the internal electricity price in the energy sharing area. The above-mentioned emergency demand is, for example, a power demand that satisfies at least one emergency condition, and the emergency condition is, for example, a charging time limit (for example, 1 hour), a charging amount limit and other charging-related constraints, which are not limited here. .
每個充電站的動作可定義為: The actions of each charging station can be defined as:
其中, 為充放電需求量, 為電池充放電電量。其中,當 為正值時,代表充電站需購電,而當 為負值,代表充電站可售電。 in, For the charging and discharging demand, Charge and discharge the battery. Among them, when When the value is positive, it means that the charging station needs to purchase electricity, and when the A negative value means that the charging station can sell electricity.
在步驟S304中,處理器26依據強化學習表選擇適於在當前系統狀態下安排的能源使用動作並上傳至合作者裝置12,根據合作者裝置12所安排的交易電量及所計算的採用此能源使用動作的獎勵,以更新強化學習表。In step S304, the
在一些實施例中,處理器26例如是選擇強化學習表中所記錄的當前系統狀態下的多個能源使用動作中的最優動作,並利用連接裝置22將當前系統狀態以及所選擇的能源使用動作一併傳送給合作者裝置12,而由合作者裝置12計算出安排給充電站的交易電量,其中包括與其他充電站的能源分享量以及向電廠14購買/賣出的電量。處理器26例如還將充電站獲得的利益資訊傳送給合作者裝置12,並由合作者裝置12計算出全部充電站的整體利益,而可作為處理器26採用此能源使用動作的獎勵。In some embodiments, the
詳細而言,每個充電站的最佳化問題是根據當前的系統狀態去找出能夠最大化整體利益的期望值的最佳策略 ,而最佳化動作值函數可標記為 。所述最佳策略 的選擇基於下式: , In detail, the optimization problem of each charging station is to find the best strategy to maximize the expected value of the overall benefit according to the current system state , and the optimal action-value function can be marked as . the best strategy is chosen based on the following formula: ,
其中, 為充電站的狀態範圍, 則為充電站的動作範圍,其例如為在時刻t需滿足電動車輛之電量供給與充放電量的範圍,而與充電站自身的電力需求和儲存裝置的剩餘電量有關。 in, is the state range of the charging station, Then is the action range of the charging station, which is, for example, the range that needs to satisfy the power supply and charging and discharging capacity of the electric vehicle at time t, and is related to the power demand of the charging station itself and the remaining power of the storage device.
根據合作者裝置12所計算的交易電量及全部充電站的整體利益,處理器26可依據下式更新強化學習表中的學習值
:
According to the transaction power calculated by the
其中, 為學習率(learning rate)、 為折扣率(discount factor), 為在系統狀態 下安排交易電量 所得的學習值。上述的學習率 例如為數值介於0.1至0.5之間的任意數,其可決定新系統狀態 對於原系統狀態 的學習值的影響比例。上述的折扣率 例如為數值介於0.9至0.99之間的任意數,其可決定新系統狀態 的學習值相對於所回饋的獎勵 的比率。 in, is the learning rate, is the discount rate (discount factor), for the system state Arrange the transaction power below the learned value obtained. The above learning rate For example, any number between 0.1 and 0.5, which determines the new system state For the original system state The impact ratio of the learned value. Discount rate above For example, any number between 0.9 and 0.99, which determines the new system state The learned value of relative to the reward given back The ratio.
在步驟S306中,處理器26記錄當前系統狀態、能源使用動作、獎勵及當前系統狀態的歷經次數以產生一模擬環境,並在模擬環境下,計算在各個系統狀態下安排電源使用動作所獲得的獎勵,據以更新強化學習表。In step S306, the
詳細而言,當充電站實際運作時,充電站在每個時刻都會記錄其系統狀態、所執行的動作、執行動作所獲得的獎勵和每個系統狀態的歷經次數,並可利用所記錄的資料產生模擬環境以進行學習。其中,若系統狀態的歷經次數愈高,即表示該系統狀態在未來發生機率愈高,因此,歷經次數可決定系統狀態在規劃過程中的優先程度。In detail, when the charging station is actually operating, the charging station will record its system state, the actions performed, the rewards obtained by performing the actions, and the number of times each system state has been experienced at each moment, and the recorded data can be used. Generate simulated environments for learning. Among them, if the number of times of the system state is higher, it means that the probability of the system state to occur in the future is higher. Therefore, the number of times of experience can determine the priority of the system state in the planning process.
在強化學習表建立之後,即可利用所產生的模擬環境在本地端進行學習。在一些實施例中,為了有足夠的資料能在本地端進行學習,規劃的執行例如是以回合為單位(即,固定每隔一定的時刻執行規劃)。而為了避免不必要的規劃而浪費系統資源,可進一步根據整體利益的變化來判斷是否需要進入規劃。After the reinforcement learning table is established, the generated simulation environment can be used to learn locally. In some embodiments, in order to have enough data for learning at the local end, the execution of the plan is, for example, in units of rounds (ie, the plan is executed at regular intervals). In order to avoid unnecessary planning and waste of system resources, it can be further judged whether it is necessary to enter the planning according to the changes of the overall interests.
詳細而言,圖4是依照本發明一實施例所繪示的基於強化學習的充電站能源使用規劃方法的流程圖。請同時參照圖1、圖2、圖3及圖4,本實施例說明圖3實施例的步驟S306的詳細步驟。In detail, FIG. 4 is a flowchart of a method for planning energy use of a charging station based on reinforcement learning according to an embodiment of the present invention. Please refer to FIG. 1 , FIG. 2 , FIG. 3 and FIG. 4 at the same time. This embodiment describes the detailed steps of step S306 in the embodiment of FIG. 3 .
在步驟S402中,處理器26記錄當前系統狀態、能源使用動作、獎勵及當前系統狀態的歷經次數以產生一模擬環境,並在模擬環境下,計算在各個系統狀態下安排電源使用動作所獲得的獎勵。其中,處理器26例如是其在當前系統狀態選擇能源使用動作的情況下充電站可獲得的利益資訊傳送給合作者裝置12,而將合作者裝置12所計算的全部充電站的整體利益作為採用此能源使用動作的獎勵。In step S402, the
在步驟S404中,處理器26會計算整體利益的變化率,並判斷此變化率是否超過預設閾值。所述變化率
的公式如下:
In step S404, the
其中,
表示在時刻t由合作者裝置12根據各充電站的利益資訊所計算的整體利益(平均利益),而則
表示時刻t-1至時刻t的整體利益變化率。在步驟S404中,處理器26即根據此變化率
,判斷是否需要進入規劃。
in, represents the overall benefit (average benefit) calculated by the
若變化率
大於預設閾值
,則在步驟S406中,處理器26即規劃強化學習表中的電源使用動作。反之,若變化率
不大於預設閾值
,或是強化學習表的規劃完成後,則在步驟S408中,處理器26將等待充電站進入下一個系統狀態時,再依據更新後的強化學習表,選擇適於在下一個系統狀態下安排的能源使用動作並執行強化學習表的更新。
If the rate of change greater than the preset threshold , then in step S406, the
圖5是依照本發明一實施例所繪示的基於強化學習的充電站能源使用規劃方法的流程圖。請同時參照圖1、圖2、圖4及圖5,本實施例說明圖4實施例的步驟S406的詳細步驟。FIG. 5 is a flowchart of a method for planning energy use of a charging station based on reinforcement learning according to an embodiment of the present invention. Please refer to FIG. 1 , FIG. 2 , FIG. 4 and FIG. 5 at the same time. This embodiment describes the detailed steps of step S406 in the embodiment of FIG. 4 .
在步驟S502中,處理器26依據強化學習表中所記錄的各個系統狀態的歷經次數,依序選擇一個系統狀態。其中,處理器26例如會依據所記錄的各個系統狀態的歷經次數,將這些系統狀態排序,而選擇歷經次數最高的系統狀態進行本地端學習。In step S502, the
在步驟S504,處理器26隨機選擇強化學習表中所記錄的該系統狀態下的其中一個能源使用動作,並用以計算在所選擇系統狀態下採用所選擇能源使用動作所獲得的獎勵。在此模擬過程中,處理器26例如是採用如前述實施例所述的方法,將當前選擇的系統狀態及能源使用動作上傳至合作者裝置12,而由合作者裝置12計算所有充電站的整體利益,並提供給所述處理器26作為獎勵,用以判斷是否更新強化學習表。In step S504, the
在步驟S506中,處理器26判斷在當前所選擇系統狀態下採用所選擇能源使用動作所獲得的獎勵是否大於先前記錄的獎勵。若當前獎勵未大於先前獎勵,則回到步驟S504,重新選擇能源使用動作,並重新計算獎勵。In step S506, the
若當前獎勵大於先前獎勵,則在步驟S508,處理器26即使用當前選擇的能源使用動作更新強化學習表。例如,更新強化學習表中在所選擇系統狀態下選擇此能源使用動作的學習值。If the current reward is greater than the previous reward, in step S508, the
在步驟S510中,處理器26會判斷規劃過程中已選擇用來計算獎勵以更新強化學習表的系統狀態的個數是否超過預定比例。所述比例例如為四分之一或其他數值,在此不設限。In step S510, the
若已選擇的系統狀態的個數未超過預定比例,則回到步驟S502,由處理器26重新選擇下一個系統狀態,以進行強化學習表的更新。反之,若已選擇的系統狀態的個數超過預定比例,則在步驟S512中,處理器26將會結束規劃過程。藉由限定規劃過程中選取的系統狀態個數,可大幅加快學習速度。在規劃過程結束後,更新的強化學習表已具備一定的經驗,因此能夠在實際運作中,為充電站提供適於當前系統狀態的充放電策略,達到充電站整體利益的最大化。If the number of selected system states does not exceed the predetermined ratio, the process returns to step S502, and the
綜上所述,在本發明實施例的基於強化學習的充電站能源使用規劃方法及裝置中,依據充電站能源共享區域內的資訊以及當時環境資料來決定供給電動車輛的電量,透過更新迭代以及在固定時間段進行充電站的能源使用規劃的方式,對強化學習表進行更新,藉此可加速多智能體學習模型的學習速度以利快速適應環境,並可達到充電站整體利益的最大化。To sum up, in the method and device for planning the energy use of a charging station based on reinforcement learning according to the embodiment of the present invention, the amount of electricity supplied to the electric vehicle is determined according to the information in the energy sharing area of the charging station and the current environment data, and through updating iteration and The reinforcement learning table is updated by means of planning the energy use of the charging station in a fixed period of time, which can accelerate the learning speed of the multi-agent learning model to quickly adapt to the environment and maximize the overall benefits of the charging station.
雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed above by the embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, The protection scope of the present invention shall be determined by the scope of the appended patent application.
10:能源共享系統 12:合作者裝置 14:電廠 20:充電站能源使用規劃裝置 22:連接裝置 24:儲存裝置 26:處理器 EVCS 1~EVCS I:充電站 S302~S306、S402~S408、S502~S512:步驟10: Energy Sharing System 12: Collaborator Installation 14: Power Plant 20: Energy use planning device for charging stations 22: Connection device 24: Storage device 26: Processor EVCS 1~EVCS I: charging station S302~S306, S402~S408, S502~S512: Steps
圖1是根據本發明一實施例所繪示的能源共享系統的示意圖。 圖2是根據本發明一實施例所繪示的基於強化學習的充電站能源使用規劃裝置的方塊圖。 圖3是依照本發明一實施例所繪示的基於強化學習的充電站能源使用規劃方法的流程圖。 圖4是依照本發明一實施例所繪示的基於強化學習的充電站能源使用規劃方法的流程圖。 圖5是依照本發明一實施例所繪示的基於強化學習的充電站能源使用規劃方法的流程圖。 FIG. 1 is a schematic diagram of an energy sharing system according to an embodiment of the present invention. 2 is a block diagram of an apparatus for planning energy use of a charging station based on reinforcement learning according to an embodiment of the present invention. 3 is a flowchart of a method for planning energy use of a charging station based on reinforcement learning according to an embodiment of the present invention. FIG. 4 is a flowchart of a method for planning energy use of a charging station based on reinforcement learning according to an embodiment of the present invention. FIG. 5 is a flowchart of a method for planning energy use of a charging station based on reinforcement learning according to an embodiment of the present invention.
S302~S306:步驟 S302~S306: Steps
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW110141537A TWI767868B (en) | 2021-11-08 | 2021-11-08 | Method and apparatus for planning energy usage of charging station based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW110141537A TWI767868B (en) | 2021-11-08 | 2021-11-08 | Method and apparatus for planning energy usage of charging station based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI767868B true TWI767868B (en) | 2022-06-11 |
TW202320002A TW202320002A (en) | 2023-05-16 |
Family
ID=83103861
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW110141537A TWI767868B (en) | 2021-11-08 | 2021-11-08 | Method and apparatus for planning energy usage of charging station based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI767868B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200108732A1 (en) * | 2018-10-09 | 2020-04-09 | Regents Of The University Of Minnesota | Physical model-guided machine learning framework for energy management of vehicles |
CN111934335A (en) * | 2020-08-18 | 2020-11-13 | 华北电力大学 | Cluster electric vehicle charging behavior optimization method based on deep reinforcement learning |
CN112396223A (en) * | 2020-11-10 | 2021-02-23 | 华北电力大学 | Electric vehicle charging station energy management method under interactive energy mechanism |
CN113159578A (en) * | 2021-04-22 | 2021-07-23 | 杭州电子科技大学 | Charging optimization scheduling method of large-scale electric vehicle charging station based on reinforcement learning |
EP3863882A1 (en) * | 2018-10-11 | 2021-08-18 | Vitesco Technologies GmbH | Method and back end device for predictively controlling a charging process for an electric energy store of a motor vehicle |
-
2021
- 2021-11-08 TW TW110141537A patent/TWI767868B/en active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200108732A1 (en) * | 2018-10-09 | 2020-04-09 | Regents Of The University Of Minnesota | Physical model-guided machine learning framework for energy management of vehicles |
EP3863882A1 (en) * | 2018-10-11 | 2021-08-18 | Vitesco Technologies GmbH | Method and back end device for predictively controlling a charging process for an electric energy store of a motor vehicle |
CN111934335A (en) * | 2020-08-18 | 2020-11-13 | 华北电力大学 | Cluster electric vehicle charging behavior optimization method based on deep reinforcement learning |
CN112396223A (en) * | 2020-11-10 | 2021-02-23 | 华北电力大学 | Electric vehicle charging station energy management method under interactive energy mechanism |
CN113159578A (en) * | 2021-04-22 | 2021-07-23 | 杭州电子科技大学 | Charging optimization scheduling method of large-scale electric vehicle charging station based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
TW202320002A (en) | 2023-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2024001341A (en) | Battery service providing system and method | |
Gazafroudi et al. | Organization-based multi-agent structure of the smart home electricity system | |
JP7243425B2 (en) | BATTERY INFORMATION MANAGEMENT SYSTEM AND BATTERY INFORMATION MANAGEMENT METHOD | |
Meng et al. | An integrated optimization+ learning approach to optimal dynamic pricing for the retailer with multi-type customers in smart grids | |
JP2021149788A (en) | Information presentation system, server, information presentation method, and information presentation device | |
Shalaby et al. | A dynamic optimal battery swapping mechanism for electric vehicles using an LSTM-based rolling horizon approach | |
Liu et al. | A blockchain-based trustworthy collaborative power trading scheme for 5G-enabled social internet of vehicles | |
US20230127845A1 (en) | Method for aggregating group of electric vehicles based on electric vehicle flexibility, electronic device, and storage medium | |
TWI763087B (en) | Method and apparatus for peer-to-peer energy sharing based on reinforcement learning | |
Sultanuddin et al. | Development of improved reinforcement learning smart charging strategy for electric vehicle fleet | |
TWI767525B (en) | Method and apparatus for renewable energy allocation based on reinforcement learning | |
Korotunov et al. | Genetic algorithms as an optimization approach for managing electric vehicles charging in the smart grid. | |
TWI767868B (en) | Method and apparatus for planning energy usage of charging station based on reinforcement learning | |
Fu et al. | Electric vehicle charging scheduling control strategy for the large-scale scenario with non-cooperative game-based multi-agent reinforcement learning | |
Lohat et al. | AROA: Adam Remora Optimization Algorithm and Deep Q network for energy harvesting in Fog-IoV network | |
CN112329215B (en) | Reliability evaluation method and computing equipment for power distribution network comprising electric automobile power exchange station | |
Chen et al. | Reinforcement learning for smart charging of electric buses in smart grid | |
JP2021077508A (en) | Storing method for secondary battery, storing system for secondary battery and program | |
JPWO2019171728A1 (en) | Power management systems, power management methods, and programs | |
CN116307606B (en) | Shared energy storage flexible operation scheduling method based on block chain | |
TWI779732B (en) | Method for renewable energy bidding using multiagent transfer reinforcement learning | |
TWI699729B (en) | Systems and methods for charging management of charging devices | |
CN117374995B (en) | Power dispatching optimization method, device, equipment and storage medium | |
CN114697341B (en) | Method for calculating power of electronic device, control device and storage medium | |
US20240157836A1 (en) | Systems and methods for energy distribution entities and networks for electric vehicle energy delivery |