NL2026738B1

NL2026738B1 - Cooperative-optimization control method of charging station based on double-center q-learning method

Info

Publication number: NL2026738B1
Application number: NL2026738A
Authority: NL
Inventors: Tang Ziyu; Tang Hao; Zhao Chuanxin; Fang Daohong; Fang Mingxing
Original assignee: Univ Anhui Normal; Univ Hefei Technology
Priority date: 2019-12-19
Filing date: 2020-10-23
Publication date: 2022-03-18
Also published as: CN110991931B; NL2026738A; CN110991931A

Abstract

The present invention discloses a cooperative-optimization control method of a charging station based on a double-center Q-learning method, including: 1, describing a control process of charging service requests of two charging forms of electric vehicles which arrive randomly as an event-driven decision-making process; 2, describing a process of controlling the electric vehicle which is charged in the charging station to respond to a power-grid peak regulation electricity price plan as a sequential decision-making process; 3, taking a peak regulation electricity price and the online service state of a charging point as a system state, 3, taking the fact that the electric vehicle arrives and makes the service request as an event, and selecting whether the electric vehicle is admitted and provided with charging service or not as an admission control action, 4, at the epoch when the peak regulation electricity price is issued, selecting charging and discharging actions of all the AC charging electric vehicles which are served as a peak regulation control action, and 5, performing online cooperative optimization on an electric-vehicle admission control center and a control center for peak regulation response of a system with a Q-learning algorithm. With the present invention, effective electric- vehicle intelligent admission control and peak regulation response control may be performed on the charging station, thereby adapting to peak regulation demands of a power grid. 1

Description

-1- COOPERATIVE-OPTIMIZATION CONTROL METHOD OF CHARGING STATION BASED ON DOUBLE-CENTER Q-LEARNING METHOD

TECHNICAL FIELD

[0001] The present invention pertains to the technical field of intelligent control and optimization, and particularly relates to a cooperative-optimization control method of a charging station based on a double-center Q-learning method.

BACKGROUND

[0002] At present, China is the largest vehicle consumption market all over the world, vehicle manufacturers shift research, development and production emphases from vehicles powered by traditional energy to new energy vehicles, and electric vehicles are the mainstream of development of the new energy vehicles in a long period of time, and have huge consumption potential and an increasing market share. Charging points are important infrastructures for providing charging service for the electric vehicles, and also are an important link in the industrialization and commercialization process of the electric vehicles. With a rapid development of the electric vehicle industry and a great increase of market holdings of the electric vehicles, a charging station where centralized management and operation are performed on a plurality of charging points will be an important business mode and service form in the future. In addition, with an increase of the permeability of new energy, such as wind electricity, photovoltaic energy, or the like, as well as a mature development of an interaction technology (V2G technology) between the electric vehicles and a power grid, the intelligence and the adaptability of electricity production and service will be improved in the future, and effective management and guidance of electricity consumption of a power consumer, such as the charging station, or the like, will be a trend. For example, a dispatching center at each level may make an electricity peak regulation plan according to source-load prediction data and issue a real-time electricity price, thereby guiding the power consumer, for example, the charging station for the electric vehicle, to consume electricity reasonably and perform V2G electricity feedback, and promoting an autonomous peak shaving or peak shifting action at the consumer side.

[0003] A time-of-use electricity price mechanism which is quite simple and fixed is adopted for the existing power-grid electricity price, a power-grid peak regulation

-2- electricity price plan is not dynamically made or regulated according to actual source- load prediction conditions of the power grid, and a service system of the charging station also does not dynamically and adaptively perform adaptive admission control on a charging request of the electric vehicle and adaptive peak regulation response control on charging and discharging actions of the electric vehicle, according to the actual power- grid peak regulation electricity price plan. Therefore, under a real-time power-grid peak regulation electricity price mechanism, how the intelligent service system of the charging station adaptively responds to the charging service request of the electric vehicle which arrives randomly and an instruction of issuing the power-grid peak regulation electricity price in real time, that is, controls admission of the electric vehicle into the service and energy interaction between the electric vehicle and the power grid, according to the real-time power-grid peak regulation electricity price and online service states of all the charging points in the station, thereby improving the running economy of the charging station and adapting to a power-grid peak regulation demand, will be a problem to be researched and solved.

SUMMARY

[0004] In order to solve the defects in the prior art, the present invention provides a cooperative-optimization control method of a charging station based on a double- center Q-learning method, so as to online cooperatively optimize admission control for a service request of an electric vehicle of an admission control center and peak regulation control of a control center for peak regulation response in the charging station, thereby improving the running economy of the charging station and adapting the power-grid peak regulation demand.

[0005] In order to solve the technical problems, the following technical solution is adopted in the present invention:

[0006] the cooperative-optimization control method of the charging station based on the double-center Q-learning method according to the present invention is characterized by being applied to a service system of the charging station, which is provided with Jo DC charging points, Ja AC charging points and Jap AC and DC- hybrid charging points and provides paid charging service for Mb DC fast-charging electric vehicles which arrive randomly and Ma AC slow-charging electric vehicles

-3- which arrive randomly;

[0007] each DC charging point is enabled to meet charging power demands of the Mp DC fast-charging electric vehicles, each AC charging point is enabled to meet charging power demands of the Ma AC slow-charging electric vehicles, each AC and DC-hybrid charging point is enabled to meet the charging power demands of the Mp DC fast-charging electric vehicles and the MA AC slow-charging electric vehicles, and one charging point is enabled to provide the charging service for only one electric vehicle at a time; CSP‚CSP CSP, CSP

[0008] the JD DC charging points are denoted as °° Jo e Co CSP, CSy,-+, CS, CS respectively, the Ja AC charging points are denoted as ! : ! Ta respectively, and the Jap AC and DC-hybrid charging points are denoted as CSP, CSP, SPP, CSD b : : 2 PI Vo respectively; CS represents the jth DC charging 1 = 2... : : point, JC Dr Di, = 1.2. Jp} represents a set of codes of the DC charging points, A : 12. CS; represents the jth AC charging point, J Pi Di, = il, 2 Ja} represents a set … CSfP of codes of the AC charging points, “+ represents the jth AC and DC-hybrid charging ; _ ee y point, Je Disp, and Dip = 12, Jan} represents a set of codes of the AC and DC- hybrid charging points;

[0009] the charging power demands of the Mp pc fast-charging electric PP p>... PP... PP / vehicles are denoted as 2 0 Mo ‚ and the charging power demands of the | | | PA PA PA... PA D Ma AC slow-charging electric vehicles are denoted as ~~ => 0 Ma. Pn represents the charging power demand of the mth DC charging electric vehicle, 12 m € Du, ‚ Dat = t,2, Mp} represents a set of codes of types of the DC charging

A electric vehicles, Pm represents the charging power demand of the mth AC charging = es 2 electric vehicle, ™ © DM, ‚ and DM, = 412,00 Maj represents a set of codes of types of the AC charging electric vehicles;

[0010] it is assumed that the power-grid peak regulation electricity price is periodically issued according to a dispatching instruction, K is the number of issued

4- periods of the dispatching instruction in one day, a corresponding total time length is T, the power-grid peak regulation electricity price at any epoch t under the total time length T is denoted as PR PR © Per and PPR is a limited electricity-price state space; if Tk is the epoch when the kth peak regulation electricity price PR, is issued, a peak regulation electricity price sequence is denoted as ie PR. Jk =0L2 kl = 0} PR, er tx =T and PRyPRPR.

[0011] it is assumed that the Mt th electric vehicle randomly arrives at the charging station at the epoch t to apply for charging service, and my € Oy, UDM, „if the current state of charge (SOC) of a battery of the M! th electric vehicle is SOC, (1) the arrival event of the ™t th electric vehicle is denoted as E(m, SOC, (©) ;

[0012] the combined state of the three types of charging points at the epoch t is denoted as C= ICS. cs? csi , CSP = (CSP (t), CSP (t) CS (t) CSP, (1) nd CS) (9= (m? (6).50C,p () represents the service state of the jth DC charging point; my (®) represents the type of the electric vehicle which is served by the jth DC charging point CS) at the epoch t, my’ (1) =0 indicates that no vehicle is admitted at the jth DC charging point CS} at the epoch t, and mj (t) Das, indicates that the jth DC charging point CS) is charging one electric vehicle in Pwo : SOC np (1 represents the current SOC of the battery of the my (1) th electric vehicle which is served by the jth DC charging point CS} at the epoch t; 0013] CSf = (CSP (1),C83 (1), CSP (t), CS}, (1) | nd cs; (t)= (m5 (9-50C () represents the service state of the jth AC charging point; mj (0) represents the type of the electric vehicle which is served by the jth AC charging

-5- A Arey point CS at the epoch t, m; (£)=0 indicates that no vehicle is admitted at the jth AC

A A charging point C85 at the epoch t, and ™i (DDM, indicates that the jth AC charging A SOC a(t point CS) jg charging one electric vehicle in Du, ; mij ( represents the current

A SOC of the battery of the m; (t) th AC charging electric vehicle which is served by the oooi CS jth AC charging point ~~} at the epoch t; CSP =(CSP (1),CS2P (1), +, CSP (1), CSP (t

[0014] t ( 1 ( ) 2 ( ) J ( ) me ( )) , and CSP (1) =(mf” (£),50C,m (1) ° ’ 3 represents the service state of the jth AC and DC- : Co mi Co LL hybrid charging point; represents the type of the electric vehicle which is served AD AD(t}— by the jth AC and DC-hybrid charging point CS at the epoch t, m; (6) =0 indicates

AD that no vehicle is admitted at the jth AC and DC-hybrid charging point CS; at the

AD epoch t, and (Dx, UDM, indicates that the jth AC and DC-hybrid charging AD SOC so (t point Sj ig charging one electric vehicle in Pry op DM, i > (1) represents m5? (t) . . . the current SOC of the battery of the th DC or AC charging electric vehicle

CSP which is served by the jth AC and DC-hybrid charging point ~~! at the epoch t;

[0015] the state of the service system of the charging station at any epoch t is — 1 denoted as *! tt CPR} :

[0016] the epoch Tk of issuing the kth peak regulation electricity price PR, is taken as a decision-making epoch for peak regulation of the control center for peak regulation response, and the energy exchange directions of all the AC charging electric vehicles admitted by the service system of the charging station are denoted as actions dk at the decision-making epoch for peak regulation, di ={(a (0) at (2), df (3). df (34). (A (01), dE (2). (5)? (J an))} df ()eDe=(-101 ied, dP()eDi={-L01 ig je Pio. represents discharging peak regulation, O represents no charging and discharging action,

-6- and 1 represents a charging action; Dr represents a set of peak regulation control actions for each electric vehicle;

[0017] it is assumed that the DC charging electric vehicle admitted into the service in the service system of the charging station does not participate in peak regulation, and the AC charging electric vehicle admitted into the service may participate in peak regulation;

[0018] at the kth decision-making epoch 7% for peak regulation, if AD dP (iy =1 mj (0) € Pay in the jth AC and DC-hybrid charging point, "© (J) , and if AD (1) — AD (yg : Ary — mj (1) =0 , di (9) ‚ and JC Pro. if Mi ()=0 in the jth AC charging point, A; =0 TL (3) ‚and JC Pr.

[0019] at the kth decision-making epoch 7% for peak regulation, a set of feasible Ak peak regulation control actions dk of the charging station is defined as Dr | and ko A Dr Dr wherein Ds is a Cartesian product of Ja +JaD gets Dr of peak regulation control actions; E(m, SOC, (t

[0020] the occurrence epoch t of the arrival event ( to an { ) of the Mt th electric vehicle is taken as a decision-making epoch for admission control of the admission control center for the electric vehicle, and event information of the decision- making epoch for admission control and state information of the service system of the charging station are combined and defined as an event-extended state sy = {t, Ct, PR,, Mt, SOC, ()} .

[0021] at the decision-making epoch for admission control, whether the service system of the charging station admits the electric vehicle and provides the charging service is denoted as an admission control action a, and the action at the nth decision- . : 10.1 : making epoch Ta for admission control is denoted as 3», and 32 © Da =10.1} , wherein 0 represents service refusal, 1 represents service admission, and Da represents a set of actions of the admission control center;

[0022] at the nth decision-making epoch Th for admission control, if the type Mt € Du, of the arriving electric vehicle, all the DC charging points and all the AC and

-7- Co Co mPme died, DC-hybrid charging points are in service, ie, ‚ and im € Par, U Pu) © @1 ‚8270. if the type MSC PM, of the arriving electric vehicle, all the AC charging points and all the AC and DC-hybrid charging points | oo IG e Dy, ij € Oy, } EG € Dy, U Oyj ED; } are in service, i.e, ‚ and , ag = 0.

[0023] the cooperative-optimization control method of the charging station based on the double-center Q-learning method is divided into electric-vehicle admission control and peak regulation response control;

[0024] in the electric-vehicle admission control, a control process of charging service requests of the DC charging electric vehicle and the AC charging electric vehicle which arrive randomly is described as an event-driven decision-making process, the electric vehicles arriving and making the service requests is taken as an event, the peak regulation electricity price and the online service state of the charging point are taken as the state of the service system of the charging station, when the event occurs, the event information and the state information of the service system are combined into the event- extended state, whether the electric vehicle is admitted and provided with the charging service or not is selected as the admission control action, sample data feedback is thus obtained, a Q-value table for admission control is updated with the Q-learning method, and finally, a strategy table for admission control is obtained,

[0025] in the peak regulation response control, a process of controlling the electric vehicle which is charged in the charging station responds to a power-grid peak regulation electricity price plan is described as a sequential decision-making process, and at the epoch when the peak regulation electricity price is issued, the charging and discharging actions of all the AC charging electric vehicles which are served are selected as the peak regulation control actions according to the state of the service system of the charging station, sample data feedback is thus obtained, a Q-value table for peak regulation control is updated with a Q-learning method, and finally, a strategy table for peak regulation control is obtained.

[0026] The cooperative-optimization control method of the charging station based on the double-center Q-learning method is also characterized in that the admission control of the electric vehicle includes the following steps:

-8-

[0027] step 1: defining and initializing an exploration rate of the admission control action at the nth decision-making epoch In for admission control as €a and letting 0 <&n <1.

[0028] defining elements in the Q-value table for admission control as discretization event-extended state-action pair learning values, and initializing the elements in the Q-value table for admission control;

[0029] defining a current greedy strategy table V for admission control as a set formed by actions corresponding to the maximum discretization event-extended state- action pair learning value of each row in the Q-value table for admission control;

[0030] step 2: initializing t= 9 and n =1; assigning the current exploration rate a for the admission control action to an initial exploration rate ê; assigning the current p going greedy strategy table V for admission control to an original strategy table V0;

[0031] step 3: at the Nth decision-making epoch Th for admission control of the . . . . E{m, SOC, (t service system of the charging station when the arrival event ( b m ) occurs, observing the current state St of the service system of the charging station to form the sc event-extended state °!;

[0032] denoting the discretization state corresponding to the event-extended € state St of the Nth decision-making epoch Tu for admission control in the Q-value table as Sh <

[0033] denoting the action which is actually taken in the event-extended state 5! c at the Dn th decision-making epoch Ta for admission control as VS!) , wherein v(s{) € Da. e

[0034] in the event-extended state St at the nth decision-making epoch Ta for ce admission control, extracting a greedy action in the discretization state 52 corresponding sy oo v(sh) to >t from the Q-value table and denoting it as >»; Cc

[0035] in the event-extended state Si at the nth decision-making epoch Tn for . . . my © Dy FE . . admission control, if the type © of the arriving electric vehicle, all the DC

-9- charging points and all the AC and DC-hybrid charging points are in service, or the type mi € DM, of the arriving electric vehicle, all the AC charging points and all the AC and . . . . . . v(st) =0 . . . v(sy) DC-hybrid charging points are in service, letting \\>t- ‚ otherwise, assigning \>2 v(s{) Ik 1-8 ; : v(sy) to "7 with a probability n selecting an action other than ’\*27 from the action set Paat the exploration rate £» as an exploration action “*» and assigning the action

[0036] after the admission control center of the charging station takes the action v(st) : 7 Ln (st, vsf), 7) tJ observing and obtaining a system transition sample track transited from the nth decision-making epoch Ta for admission control to the n+ th decision- making epoch I= for admission control or the epoch T , wherein t=Tn t = laa <T © ro r_ ={T PR,0,0 ‚or t =T:; when t'=T letting st ={T,C1,PR,,0, }. € © <

[0037] step 4: observing and calculating the combined quantity rise, visi).sy) of charging rewards and peak regulation rewards obtained in the state transition process Aat of the service system of the charging station from the current action visi) taking state e _ v STIL Cn PRM, SOC (OF 44 the n th decision-making epoch In for admission c _ 1 ‘ control to the state SU Tt Cr PR, mi, SOC. (1)} gt the n+1 th decision-making epoch Tost for admission control or the epoch T;

[0038] step 5: updating the discretization event-extended state-action pair < 7 € . . e . . . . € learning value RG: VS:) for taking the action YO) in the discretization state Sn © corresponding to $t in the Q-value table for admission control by using a difference formula and a Q-value updating formula shown in Equ. (1) and Equ. (2), and assigning Cc c the value to QS: V(s0)) : d(st, v(st), sr) =r(st, v(st), st) + max Q(Sn+1,2) —Q(sh, v{st))

[0039] aeD (1)

[00409] Q(sn, v(st)): = Q(sn, v(st)) + y(sn, v(si)d(st, v(st), st) (2)

-10- oo Q(st 1,2) ee

[0041] wherein in Equ. (1), 11°) represents the discretization event- extended state-action pair learning value for taking the action 2 in the discretization € € . . . state Sn+ corresponding to the state 3¢ of transition to the n+1 th decision-making epoch Tot for admission control or the epoch T;

[0042] in Equ. (2), the operator ": =" indicates that the value of the right formula . . . (ss v(st)) . . is calculated first and then given to the left variable; >» *\>t7 is a learning step length c c for taking the action V(S1) in the discretization state Sn at the nth decision-making epoch Tu for admission control;

[0043] step 6: selecting the action corresponding to the maximum discretization event-extended state-action pair learning value of each row in the updated Q-value table for admission control to form the current action set for admission control, taking the current action set as the updated greedy strategy table for admission control, and assigning it to the current greedy strategy v for admission control, degrading the exploration rate ©» | thereby obtaining the updated exploration rate and assigning it to Ent:

[0044] step 7: if U <T assigning n+l to n, and returning to the step 3; otherwise, indicating t’= T, and performing step 8; and

[0045] step 8: judging whether the strategy table V for admission control is equal to Y° or not, if so, stopping updating and performing admission control on the random charging service requests of the M electric vehicles with the current strategy table v for admission control, otherwise, returning to the step 2 for execution;

[0046] the peak regulation response control includes the following steps:

[0047] step -1: defining and initializing an exploration rate of the peak regulation control action at the kth decision-making epoch 7x for peak regulation control as Ek and letting 0 <2 <1.

[0048] defining elements in the Q-value table for peak regulation control as state-action pair learning values of the service system of the charging station, and initializing the elements in the Q-value table for peak regulation control;

[0049] defining a current greedy strategy table V for peak regulation control as

-11- a set formed by actions corresponding to the maximum discretization event-extended state-action pair learning value of each row in the Q-value table for peak regulation control;

[0050] step -2: initializing t= 0 and k= 0: assigning the current exploration rate tk for the peak regulation control action to an original exploration rate 0: assigning the current greedy strategy table V for peak regulation control to an original strategy table Vo.

[0051] step -3: at the kth decision-making epoch Tk for peak regulation control of the service system of the charging station, observing the current state 3t of the service system of the charging station;

[0052] denoting the discretization state corresponding to the system state St of the kth decision-making epoch Tk for peak regulation control in the Q-value table for peak regulation control as Sk;

[0053] denoting the peak regulation control action which is actually taken in the system state St at the kth decision-making epoch 7% for peak regulation control as Ws) ‚ wherein Ms) € Dr.

[0054] in the system state St at the kth decision-making epoch *k for peak regulation control, extracting a greedy action in the discretization state Sk corresponding to the current state St from the Q-value table for peak regulation control and denoting it as VK),

[0055] in the system state St at the kth decision-making epoch 7% for peak regulation control, randomly selecting an action Va from the feasible action set Dr according to the current exploration rate Èk for peak regulation control and assigning the action to (st), and assigning (st) to VS) with the probability I= Ek.

[0056] after the control center for peak regulation of the charging station takes the action st) , observing and obtaining a system transition sample track (se Visi), St) transited from the kth decision-making epoch 7% for peak regulation control to the

-12- (k+1)th decision-making epoch T+! for peak regulation control, wherein 17 7, and t= Tij :

[0057] step -4: observing and calculating the combined quantity Hs, V(s), St) of charging rewards and peak regulation rewards obtained in the state transition process of the service system of the charging station from the current action vist) taking state St at the kth decision-making epoch 7% for peak regulation control to the state 5t at the (k+1)th decision-making epoch Tk+1 for peak regulation control;

[0058] step -5: updating the discretization state-action pair learning value Qs VS) For taking the action St) in the discretization state Sk corresponding to St in the Q-value table for peak regulation control by using a difference formula and a Q-value updating formula shown in Equ. (3) and Equ. (4), and assigning the value to Qs, (sn). d(st V(t), st") = 181, V(8¢), st) + max Q(Sk+1,d) —Q(s, V(st))

[0059] d<D; (3)

[0060] Olst. V(s0)): = Olst. vst) + (sk, v{s))d(s, (so), st) (4)

[0061] wherein in Equ. (3), Q(sk:1,d) represents the discretization state-action pair learning value for taking the feasible action d in the discretization state Sk: € corresponding to the state St of transition to the (k+1)th decision-making epoch Tk+1 for peak regulation control,

[0062] in Equ. (4), the operator ": =" indicates that the value of the right formula is calculated first and then given to the left variable; V8) is 4 learning step length for taking the action WS) in the discretization state Sk at the kth decision-making epoch Tk for peak regulation control;

[0063] step -6: selecting the action corresponding to the maximum discretization state-action pair learning value of each row in the updated Q-value table for peak regulation control to form the current action set for peak regulation control, taking the current action set as the updated greedy strategy table for peak regulation control, and assigning it to the current greedy strategy V for peak regulation control; degrading the

-13- exploration rate ek thereby obtaining the updated exploration rate and assigning it to Eiht;

[0064] step -7: if k<K assigning k+1 to k, and returning to the step -3; otherwise, performing step -8; and

[0065] step -8: judging whether the strategy table V for peak regulation control is equal to Vo or not, if so, stopping updating and performing peak regulation control on the AC charging electric vehicles served by the charging station with the current greedy strategy table V for peak regulation control, otherwise, returning to the step -2 for execution;

[0066] Compared with the prior art, the present invention has the following beneficial effects.

[0067] 1. In the present invention, the epoch when the power-grid peak regulation electricity price is issued is taken as the decision-making epoch for peak regulation of the control center for peak regulation response, the energy exchange directions of all the AC charging electric vehicles admitted by the service system of the charging station are taken as the decision-making actions, decisions are made according to the system state including the starting epoch of a peak regulation period, the real-time state of the charging points in the system and the current power-grid peak regulation electricity price, and the starting epoch of the peak regulation period and the current power-grid peak regulation electricity price are taken as part of the system state, thus facilitating reflection of the time sequence characteristic of peak regulation of the power grid, enabling the control strategy to adapt to the peak regulation demands of the power grid and better conform to actual situations, and improving the feasibility of the method.

[0068] 2. In the present invention, the power-grid peak regulation electricity price and the online service state of the charging point are taken as the state of the service system of the charging station; the charging service request of the electric vehicle which arrives randomly is taken as the event; the random event and the state of the service system of the charging station are combined into the event-extended state; whether the arriving electric vehicle is admitted into the charging station to be provided with the charging service is taken as the system action; the epoch when the charging service request of the electric vehicle arrives randomly is taken as the decision-making epoch for admission control; the intelligent admission control process of the electric vehicle at

-14- the charging station where the electric vehicle arrives randomly is described as a discrete event-driven decision-making process, and a corresponding action is taken according to the real-time event-extended state of the system; therefore, admission control of the electric vehicle of the charging station where the service request of the electric vehicle arrives randomly is processed effectively, and by optimization, the system may reasonably select the admission action, thus improving the running economy of the service system of the charging station, and adapting to the peak load regulation demands of the power grid.

[0069] 3. In the present invention, admission of the electric vehicle of the charging station is intelligently controlled and optimized with a Q-learning method of the electric-vehicle admission control center, and the energy interaction between the service AC electric vehicle of the charging station and the power grid is intelligently controlled and optimized with a Q-learning method of the control center for peak regulation response, compared with a theoretical solution method, in the present invention, a complete mathematical modeling process is not required to be performed on a control system, and particularly, the random characteristics in the system are not required to be modeled precisely. With the present invention, a better control strategy may be obtained by observing running samples of the system to perform a real-time online learning process. In addition, when random parameters of the system change, operators are not required to modify an algorithm, the online learning process may still be performed according to the actual running process of the system, and a better intelligent admission control strategy of the electric vehicle may be obtained adaptively; particularly, the double-center Q-learning method in the present invention solves the asynchronous decision problem in the cooperative-optimization control of the charging station and overcomes the defects of a centralized synchronous decision method.

[0070] 4. The cooperative-optimization control method of the charging station based on the double-center Q-learning method according to the present invention is also suitable for the situation where charging prices are different in different periods of time and the situation where the power-grid peak regulation electricity price is issued non- periodically (or randomly).

BRIEF DESCRIPTION OF THE DRAWINGS

[0071] Fig. 1 is a flow chart of an electric-vehicle admission control center in a

-15- method according to the present invention;

[0072] Fig. 2 is a flow chart of a control center for peak regulation response in the method according to the present invention; and

[0073] Fig. 3 is a schematic diagram of a service system of a charging station according to the present invention.

DETAILED DESCRIPTION

[0074] In this embodiment, as shown in Fig. 3, a cooperative-optimization control method of a charging station based on a double-center Q-learning method is applied to a service system of the charging station, which includes Io DC charging points 1, Ja AC charging points 2, Jap AC and DC-hybrid charging points 3, Mp DC fast-charging electric vehicles 4 which arrive randomly, Ma AC slow-charging electric vehicles 5 which arrive randomly, a power-grid peak regulation electricity price plan 6, an admission control center 7 and a control center 8 for peak regulation response;

[0075] each DC charging point is enabled to adaptively meet charging power demands of the Mp DC fast-charging electric vehicles, each AC charging point is enabled to adaptively meet charging power demands of the MA AC slow-charging electric vehicles, each AC and DC-hybrid charging point is enabled to meet the charging power demands of the Mp DC fast-charging electric vehicles and the MA AC slow- charging electric vehicles, and one charging point is enabled to provide charging service for only one electric vehicle at a time; D 412... Jot

[0076] the jth DC charging point is denoted as CS; Jen il 2 JD} gnd Pip represents a set of codes of the DC charging points, thereby denoting the Ip DC : : CSP, C83, CSP, CS? : : , charging points as toe J> > JD respectively; the jth AC charging point A =f1 2...

is denoted as CS; , Jed, ={L2, Ja} , and Pi, represents a set of codes of the AC charging points, thereby denoting the Ja AC charging points as CSDCSD CSP, cs? : : : Co ! 7) Ia respectively; the jth AC and DC-hybrid charging point is AD = ee denoted as CS; , JE Dp ={L2, Jap} , and Pro represents a set of codes of the AC and DC-hybrid charging points, thereby denoting the Jap AC and DC-hybrid

-16- LO CSP, CSP CSP SPP charging points as : “D respectively;

[0077] the charging power demand of the mth DC charging electric vehicle is

D denoted as Pm KW, the total capacity of a battery of the electric vehicle is denoted as

D Em KWH, and the charging power demand and the total capacity are determined by the configuration of the electric vehicle; thus, the charging power demands of the Mp DC , , Pp, PY, PR, Py fast-charging electric vehicles are denoted as @ ° " Mp _ we 2 m € Dy 11,2," Mp} , and Pup represents a set of codes of all types of the DC charging electric vehicles;

[0078] the charging power demand of the mth AC charging electric vehicle is

A denoted as Pm KW, the total capacity of a battery of the electric vehicle is denoted as

A Ein KWH, and the charging power demand and the total capacity are determined by the configuration of the electric vehicle; thus, the charging power demands of the Ma AC : : PP Pn, Pi slow-charging electric vehicles are denoted as Vote om MA = i. 1 me Py, = {1.2 Maj , and DM, represents a set of codes of all types of the AC charging electric vehicles;

[0079] K is set as the maximum period number in one day, a corresponding total time length is T, a power-grid peak regulation electricity price at any epoch t under the total time length T is denoted as PR, yuan/KWH, PR. Der and PPR is a limited electricity-price state space; it is assumed that the power-grid peak regulation electricity price is periodically issued according to a dispatching instruction, and 7% is the epoch when the kth peak regulation electricity price PR, is issued, the price is maintained to the epoch ™+1 when the next peak regulation electricity price is issued; that is, PR‘ = PR tu<t<m k=012K-1 gnd T0=0 ‚ a peak regulation electricity Tk, PR Ik =0,1,2,- K—115=0 price sequence is denoted as {( k J) 9 } , Wherein PR, ep. to =T and PR: PR - PR.

[0080] the charging station provides paid charging service, and the price of the charging service of the charging station is PRoy yuan/KWH; PRoy is at least less than

-17- the maximum peak regulation electricity price;

[0081] the event that the th electric vehicle with the battery having the state SOC, (t) : of charge (SOC) «+7 at the epoch t randomly arrives at the charging station to apply for the charging service, is denoted as an arrival event E(mt, SOC, (D) and mg Pu, U Pu, :

[0082] the service state of the jth DC charging point at the epoch t is denoted as CSP (t) = (m7 (t) SOC» (t il | U my’ ( )). thereby denoting the combined state of the JD DC CSP =(CSP (t), CSP (t) ‚CSP (t) CSP (t charging points at the epoch t as ' | ' | ) : | ) ! | ) vo ). D | m; (f) represents the type of the electric vehicle which is served by the jth DC charging D Dey = point CS at the epoch t, mj (1) 0 indicates that no vehicle is admitted at the jth DC

D D charging point CS; at the epoch t, and mj (!) € Dat indicates that the jth DC charging D SOC» (t point CSj is charging one electric vehicle in Dt, ; my (1) represents the SOC of

D the battery of the mj (1) th DC charging electric vehicle which is served by the jth DC l CSP charging point “+ at the epoch t;

[0083] the service state of the jth AC charging point at the epoch t is denoted as Cf (t) = (mi (t),S0C (t 5) | 0 mj | ) thereby denoting the combined state of the J4 AC CSP =(CSt*(1),CS3 (1), CSP (t) CSP (t charging points at the epoch t as ! | ) : | ) ! | ) nl ).

A m; (1) represents the type of the electric vehicle which is served by the jth AC charging A Arey — point CS at the epoch t, m; (t) =0 indicates that no vehicle is admitted at the jth AC

A A charging point CS; at the epoch t, and my (1) € Da, indicates that the jth AC charging A SOC, a(t point CS) is charging one electric vehicle in PM. mi ( represents the SOC of

A the battery of the mj’(t) th AC charging electric vehicle which is served by the jth AC

CSP charging point ~~! at the epoch t;

-18-

[0084] the service state of the jth AC and DC-hybrid charging point at the epoch CS) (t) = (mPP (t),S0C 0 (t tis denoted as | ) | ! ( ) mj )) thereby denoting the combined state of the JAD AC and DC-hybrid charging points at the epoch t as CSP = (CSD (t) CSP (t})-- CSP (t)-- CSP (t AD ' | ! | ) : | ) ! | ) pa ). mj (B) represents the type of the electric vehicle which is served by the jth AC and DC-hybrid charging point

AD AD CS; at the epoch t, m; (1) =0 indicates that no vehicle is admitted at the jth AC and

AD AD DC-hybrid charging point CS; at the epoch t, and ™ (Dx, UDM, 5 dicates : : : CSP : ee that the jth AC and DC-hybrid charging point ~~! is charging one electric vehicle in SOC_ wo (t AD Puy op PM, : Co (© represents the current SOC of the battery of the Di (Dh DC or AC charging electric vehicle which is served by the jth AC and DC-hybrid

CSP charging point “+ at the epoch t;

[0085] the combined state of the three types of charging points at the epoch t 1s C, ={CSP, CSP, CSP denoted as | | ' | J

[0086] the state of the service system of the charging station at any epoch t is denoted as ie CPR}.

[0087] it is assumed that the DC charging electric vehicle admitted into the service in the service system of the charging station does not participate in peak regulation, and the AC charging electric vehicle admitted into the service may participate in peak regulation; it is assumed that discharge power of one AC charging electric vehicle is equal to charging power, and the discharge reward per unit time per unit discharge power at any epoch is equal to a real-time power-grid electricity price;

[0088] the epoch 7% of issuing the kth peak regulation electricity price PR, is taken as a decision-making epoch for peak regulation of the control center for peak regulation response, and the energy exchange directions of all the AC charging electric vehicles admitted by the service system of the charging station are denoted as actions di at the decision-making epoch for peak regulation, de = {(df (1). 02 (2) (dt (a), (dl (1), dL (2). ++. a (3). ++. dE (Tan) )}

-19- INE _ : AD (: _ represents discharging peak regulation, O represents no charging and discharging action, 1 represents a charging action, and Ds represents a set of peak regulation control actions for each electric vehicle;

[0089] at the kth decision-making epoch 7% for peak regulation, in the jth AC AD AD. AD and DC-hybrid charging point, if ™ (1) € Py, , de (J) , and if ™ ()=0 , AD {: . A Ar: di” (3) 0 and? © Pro, in the jth AC charging point, if my (f) 0 di (9) , and Jje@y,

[0090] at the kth decision-making epoch 7% for peak regulation, a set of feasible Ak peak regulation control actions dk of the charging station is denoted as Dr ‚ and ko _ Dr © Dy , wherein D: is a Cartesian product of Ja+JaD gets Dr , le, Dy =D xD x--xDr . Dr jg 4 Cartesian product of Ja +JAD gets Pr of peak regulation control actions; the total number of actions in the set Dr is denoted as C ;

[0091] all the actions in Dr are encoded, de) is set to represent the cth action, and d(e) Dr, 0 =1,2,C. E(m;, SOC, (t

[0092] the occurrence epoch t of the arrival event ( te wm, (1) ) of the Mt th electric vehicle is taken as a decision-making epoch for admission control of the admission control center for the electric vehicle, and event information of the current epoch and current state information of the service system of the charging station are : =Ít CPR, mt, SOC mn, (t)} combined and denoted as an event-extended state St rb Ao MG m (1) s: st Ta ie t=T

[0093] the epoch when the nth event >! occurs is denoted as 'n ie, tn, and a corresponding peak regulation period for the power-grid electricity price is denoted as [Eke Trott): Kn E10, KB Ty em, Tk).

[0094] a change interval [0°11 of the SOC of the battery of the electric vehicle is discretized by using a smaller constant 8 to obtain a discretization event-extended © _ C e state 5n = {kn, Cn, PR, my, SOC m, (D)} corresponding to St, wherein n represents a

-20- numerical value or discretization value corresponding to the nth decision-making epoch — —=D ——A ——=AD . RT JET, | n for admission control; ™n represents tT, is the discretization combined state of the charging points corresponding to Ct \ =D [==D =D =D —D CS; (© (n) CS? (n).--.CS) (n), CS, (0) is the discretization combined . . . csP state of the DC charging points corresponding to t , —A [=A —A A ——A CS (cs (n).CS3 (n),-,CSs (n), CS, (0) is the discretization combined - . . . CsA state of the AC charging points corresponding to t , and ——AD [{——=AD —AD ——AD ——AD CSn (csi (n),CS2 (n),---.CSj (Dn), CS, (0) is the discretization . . . . . CSAP combined state of the AC and DC-hybrid charging points corresponding to “>t ; —D D “oA A CS; (n)= (m] (n),SOC,p (n)) CS; (n)= (mj (n), SOC ‚a (n)) ° 3 , | and ——AD CS; (n) =m" (n),SOC ‚440 (n)) | ! are the discretization states corresponding to

D A AD CSP(1) CS) nq €57 (D respectively, and 50m S9Cn? () SOC, (n) SOC» (n) {0,8,25,-+-,1-8,1}. PR, € Opr .

[0095] the state space formed by all possible discretization event-extended states € isdenoted as Pie, Sn € D and the total number of the discretization event-extended states of the system is denoted as S; €

[0096] all the possible discretization event-extended states are encoded, Sn(S) ¢ —_— … represents the Sth discretization event-extended state, and sa(s)e @,s=1.2,--S ; a set of all possible discretization event-extended states where a DC-charging-electric-vehicle arriving event occurs and all the DC charging points and all the AC and DC-hybrid

D charging points are busy is denoted as Do. 4 set of all possible discretization event- extended states where an AC-charging-electric-vehicle arriving event occurs and all the AC charging points and all the AC and DC-hybrid charging points are busy is denoted as Dh .

-21-

[0097] under the same discretization rule, the discretization state corresponding to the state St of the the service system of the charging station at any epoch t is denoted as Sk, 5k = tk, Cx, PR} , and Ck and Cn have a consistent value space; the state space formed by all the possible discretization states of the service system of the charging station is denoted as © i.e, Sk € © ‚ and the total number of the discretization states of the system is denoted as S.

[0098] all the possible discretization states of the service system of the charging station are encoded, 5 (5) represents the S th discretization state, and se(8)e®,5=1,2.---.S.

[0099] the decision-making epoch for admission control of the system is defined as the arrival epoch of any electric vehicle, i.e., the event occurrence epoch;

[00100] whether the service system of the charging station admits the charging request of the electric vehicle which arrives randomly and provides the charging service is taken as an admission control action a, and the action at the nth decision-making epoch ~ . . =o. 1} .

Ta for admission control is denoted as an and an €D, 0,1} wherein 0 represents service refusal, 1 represents service admission, and Da represents a set of actions of the admission control center;

[00101] at any decision-making epoch Tu for admission control, if the type my € Puy of the arriving electric vehicle, all the DC charging points and all the AC and D . | Co Im Oe [je ©, | DC-hybrid charging points are in service, i.e, ! > ") and {mf (0 € Dx, UD, [ied | =0 . m € P co. ’ aa VV: jf the type ! Ma of the arriving electric vehicle, all the AC charging points and all the AC and DC-hybrid charging points A AD . — fm] (t)e Dun, j € ®,,} fm] (t) e Dy, U Oyj ed, } are in service, i.e, , and , a, =0.

[00102] at the D th decision-making epoch Ta for admission control, if my € Du and an =1 , the arriving DC charging electric vehicle is preferentially admitted into any idle DC charging point and charged immediately; if ™ © PMs and

-22- ay =1 , the arriving AC charging electric vehicle is preferentially admitted into any idle g ging p y y AC charging point and is charged immediately; it is assumed that the electric vehicle leaves the charging station once full;

[00103] the cooperative-optimization control method of the charging station 1s divided into Q-learning control of the electric-vehicle admission control center and Q- learning control of the control center for peak regulation response;

[00104] as shown in Fig. 1, the Q-learning control method of the electric-vehicle admission control center of the charging station includes the following steps:

[00105] step 1: defining and initializing an exploration rate of the admission control action at the nth decision-making epoch In for admission control as En, and letting 9 <8 <1 for example, letting En = 0.8:

[00106] defining elements in a Q-value table for admission control as discretization event-extended state-action pair learning values, and initializing the elements in the Q-value table for admission control, for example, randomly initializing the value of each element to be O0 or making it be 0, wherein the Q-value table for admission control takes the discretization event-extended state of the system at the time of the event as a row and the admission action of the system as a column, i.e, Q(sa(1),0) Q(sa(1),]) Q(sn(2),9) Q(sn(2), 1) Q(sn(s),0) Q(sa(s), 1) D o Q(s:(S),0) Q(s;(S), 1) , and if su(s) € Dy Uap , s=12,--,8 i Q(s5(s),1) is a negative infinite value;

[00107] defining a current greedy control strategy table v as an action set formed by actions corresponding to the maximum discretization event-extended state-action pair learning value of each row in the Q-value table for admission control;

[00108] step 2: initializing variables t=9 and n=1 assigning the current exploration rate n for the admission control action to 1; letting an original strategy table YO TV;

[00109] step 3: at the Nth decision-making epoch Ta of the service system of the

-23- charging station when the arrival event E(my, SOC m, (1) occurs, observing the current Cc state st of the service system, and denoting the event-extended state as St;

[00110] denoting the discretization state corresponding to the current event- © ce extended state St of the Nth decision-making epoch Ta in the Q-value table as Sn;

[00111] denoting the admission control action which is actually taken in the € © current event-extended state St at the Nth decision-making epoch Tn as vise) wherein v(st) € Da. €

[00112] in the event-extended state St at the nth decision-making epoch Tn for e ec D A admission control, if the corresponding discretization state 5» meets Sn © D; UD, , letting “\* ‚ otherwise, in the current event-extended state ?!, extracting a greedy € c action in the discretization state Sn corresponding to St from the Q-value table, © © ie denoting the greedy action as Sn), assigning Y(n) to (51) with a probability I —2n Cc , selecting an action other than V(51) from the action set Pa at the exploration rate ên 7 . . . © as an exploration action Ven and assigning the action to v(st) ; ©

[00113] after the service system of the charging station takes the action vis)

CRC N observing and obtaining a transition sample track transited from the Nth decision-making epoch Ta for admission control to the n +1th decision-making epoch In+ for admission control or the epoch T, wherein t=In,t =In4< T or t'=T: when © "_ . 1 =3T PR,,0,0 V=T assuming St { Cr. PR1,0, i.

[00114] step 4: observing the service system of the charging station, and with Equ. … Tst, v(st),s5) (1), calculating the combined quantity 6 t° of accumulated charging rewards and peak regulation rewards obtained in the state transition process of the system from © ¢ = 1 . the current action Y(t) taking state *! LE PR, m4, SOC, (OF gt the n th decision- e _ ' , ¢ making epoch In for admission control to the state 5! © {t, Co, PR, my, SOC, (t)} atthe ntlth decision-making epoch Ti+t for admission control or the epoch T;

-24- | ’ > sgn(my( OPP + Yo Lr (mg OP a, | rst vss =|, ED seam ANE (PE aA” (Dj O)P [001 15] x(PRe, > PRr)dt (1)

D a ‚ rf — 5 Xo, 5 =

[00116] wherein in Equ. (1), ¥ = M2 Tue. 15. it js defined that when ™ (t)=0 Dy D . . D N A — sgn(m; (t))=0 and when 7 (t)>0 sgn(m; (t)) =1 . when Mi (t)=0 An _ Ace Ar _ AD sgn(m;'(t})) =0 and when Mi (t)>0 ’ sgn(mj (t)) =1 . when Mi ED, ’ AD Ny AD _ AD Ady, (MG (t)) =1 otherwise, Zo, (mj (1))=0 . when mj (t) € Du, ’ AD u AD u pP PA = . mj =0 p A Hoang, (M(H) L otherwise, Kou, (Mj (0) ‚ MID and PD represent the

D A charging power demands of the mj (£) th DC charging electric vehicle and the m; (t) th ee df (j),dP (j)eD AC charging electric vehicle respectively; (9), t (7) ' represent the power

A directions of the electric vehicles which are served by the jth AC charging point CS; : CSP : and the jth AC and DC-hybrid charging point ~~ of the service system of the charging station at the current epoch t under the peak regulation control action of the control center for peak regulation response;

[00117] step 5: updating the discretization event-extended state-action pair e /.e e © learning value 952: V(S) for taking the action YOU) in the discretization state Sn € corresponding to 3 in the Q-value table for admission control by using a difference formula and a Q-value updating formula shown in Equ. (2) and Equ. (3), obtaining the e 1e updated learning value and assigning it to Q(sn, v{s:)). d(st,v(st),st) =r(st,v(st),st) + max Q(s51,8) = Qlsn, vst)

[00118] asDa (2)

[00119] Q(sn, v(st)): = Q(sn, v(st)) + (sh, v(sr))d(st, v(si), 57) (3) oo Q(sc1.a) me

[00120] wherein in Equ. (2), “2: represents the discretization event- extended state-action pair learning value for taking the action 2 in the discretization e € state Sn+ corresponding to the state St of transition to the n+1 th decision-making epoch Tost for admission control or the epoch T;

-25-

[00121] in Equ. (3), the operator ": =" indicates that the value of the right formula ; 7 ‚able: 8 VSO) is a learn is calculated first and then given to the left variable; > "tJ is a learning step length qe € for taking the action V(St) in the discretization event-extended state Sn at the nth decision-making epoch In for admission control;

[00122] step 6: selecting the action corresponding to the maximum discretization event-extended state-action pair learning value of each row in the updated Q-value table for admission control to form the current action set for admission control, taking the current action set as the updated greedy strategy table for admission control, and assigning it to the current greedy strategy v for admission control; degrading the exploration rate ©n, thereby obtaining the updated exploration rate and assigning it to En+1 ;

[00123] step 7: if '<T assigning n+1 to n and returning to the step 3; otherwise, indicating t= T and performing step 8; and

[00124] step 8: judging whether the strategy table V for admission control is equal to Y9 or not, if so, stopping updating and performing admission control on the random charging service requests of the M electric vehicles with the current strategy table v for admission control, otherwise, returning to the step 2 for execution.

[00125] As shown in Fig. 2, the Q-learning control method of the control center for peak regulation of the charging station includes the following steps:

[00126] step -1: defining and initializing an exploration rate of the peak regulation control action at the kth decision-making epoch 7x for peak regulation control as êx, and letting 0 <€x <1 for example, letting ek = 0.9.

[00127] defining elements in a Q-value table for peak regulation control as state- action pair learning values of the service system of the charging station, and initializing the elements in the Q-value table for peak regulation control, for example, randomly initializing the value of each element to be O or making it be 0, wherein the Q-value table for peak regulation control takes the discretization state of the service system of the charging station as a row and the peak regulation control action of the system as a

-26- dM, dn) Qlsi(1).d(2)) ++ Q(s(1),d(c)) + QUsi(1),d(C)) Q(si(2),d()) QUsk(2).d(2)) == sk), do) ++ QUsi(2), d(C) Qs (3),d(1) Q(sk(3),d(2)) ++ Qs(3),d(c)) ++ Q(sk(3),d(C)) column. ie. LQEK®.dD) Ask(®.d@) + AUsi($d@) + QAsi(§),d(©) , for any element, Qlsx(5).d(<)) s=L2S and if 99) is not the peak regulation control action which is feasible in the system state S 5) ie, dc) 2 Dr Qs (3). d(c)) 1s a negative infinite value;

[00128] defining a current greedy strategy table V for peak regulation control as an action set formed by actions corresponding to the maximum discretization state- action pair learning value of each row in the Q-value table for peak regulation control,

[00129] step -2: initializing t= 9 and k=0. assigning the current exploration rate €k for the peak regulation control action to go. setting an original greedy strategy table for peak regulation control as vozV ;

[00130] step -3: at the kth decision-making epoch 7% for peak regulation control of the service system of the charging station, observing the current state >t of the service system;

[00131] denoting the discretization state corresponding to the system state St of the kth decision-making epoch 7% for peak regulation control in the Q-value table for peak regulation control as Sk;

[00132] denoting the peak regulation control action which is actually taken in the system state St at the kth decision-making epoch Tk for peak regulation control as Ws) ‚ wherein Us) € Dr.

[00133] in the system state St at the kth decision-making epoch °k for peak regulation control, extracting a greedy action in the discretization state Sk corresponding to St from the Q-value table for peak regulation control and denoting it as (sk),

[00134] in the system state St at the kth decision-making epoch 7% for peak regulation control, randomly selecting an action Va, from the current feasible action set

-27- AK ~ ~ Dr according to the exploration rate © and assigning the action to Vis) ‚and assigning VK) to V(80) with the probability Lek.

[00135] after the control center for peak regulation of the charging station takes LY . . i» St, V(s¢), St the action vs) , observing and obtaining a system transition sample track | 0 Vs st ) transited from the kth decision-making epoch 7x for peak regulation control to the (k+1)th decision-making epoch Tk+! for peak regulation control, wherein '= Tk and t' = Tk+1 ;

[00136] step -4: observing the service system of the charging station, and with i i oo TS, (80). 81) ; Equ. (4), calculating the combined quantity of charging rewards and peak regulation rewards obtained in the state transition process of the system from the current Low . — 1 . . action V0) taking state St = CPR} ot the kth decision-making epoch Tk for peak . S= ft Ca 1 we regulation control to the state St WC PR} gt the (k+1)th decision-making epoch Tk+1 for peak regulation control; > sen(my (D)P pat tb (m2 OP lo Hs. (sr), Sr) = I. OI sgn(mf (0)di (Py 2 ing, (mj (0)ALD (Prog,

[00137] {PR — PR )dt (4)

[00138] step -5: updating the discretization state-action pair learning value Qs, ¥(50)) for taking the action vs) in the discretization state Sk corresponding to St in the Q-value table for peak regulation control by using a difference formula and a Q-value updating formula shown in Equ. (5) and Equ. (6), obtaining the updated learning value and assigning it to Qs, Vist). ds Us.) = 150, 950), 50) + max Alsi, d) - Qs, Hs)

[00139] deD} (5)

[00140] Qs. V(s1)): = Olst, V(st)) + v(sk, V(s))d(s. Vs). st) (6)

[00141] wherein in Equ. (5), Q(sk-1,d) represents the discretization state-action pair learning value for taking the feasible action d in the discretization state Sk © corresponding to the state Sy of transition of the system to the (k+1)th decision-making

-28- epoch K+ for peak regulation control;

[00142] in Equ. (6), the operator "* =" indicates that the value of the right formula is calculated first and then given to the left variable; YY) is a learning step length for taking the action VS) in the discretization state Sk at the kth decision-making epoch Tk for peak regulation control;

[00143] step -6: selecting the action corresponding to the maximum discretization state-action pair learning value of each row in the updated Q-value table for peak regulation control to form the current action set for peak regulation control, taking the current action set as the updated greedy strategy table for peak regulation control, and assigning it to the current greedy strategy V for peak regulation control; degrading the exploration rate Ek, thereby obtaining the updated exploration rate and assigning it to Ek ;

[00144] step -7: if k <K assigning k+1 to k, and then returning to the step -3; otherwise, performing step -8; and

[00145] step -8: judging whether the strategy table V for peak regulation control is equal to YO or not, if so, stopping updating and performing peak regulation control on the AC charging electric vehicles served by the charging station with the current greedy strategy table V for peak regulation control, otherwise, returning to the step -2 for execution.

Claims

-29. Conclusions l.

Control method for cooperative optimization of a charging station based on a dual center Q learning method, wherein the cooperative optimization control method for a charging station based on the dual center Q learning method is divided into electric vehicle admission control and peak regulation response control; in the admission scheme of an electric vehicle, a control process of charging service requests of a DC charging electric vehicle and an AC charging electric vehicle arriving randomly is described as an event-driven decision process wherein the arrival of the electric vehicles and making the service requests as an event where a peak regulation electricity price and the online service state of a charge point are taken as the state of a service system of the charging station, where, when the event occurs, event information and state information of the service system are combined in an event extended state, where either the electric vehicle is allowed or not and whether or not it is provided with a charging service is selected as an admission control action, thus obtaining sample data feedback, and whereby a Q value table for r admission scheme is updated with a Q learning method, and finally an admission scheme strategy table is obtained; in the peak regulation response scheme, a process of controlling the electric vehicle being charged in the charging station responding to a power network peak regulation electricity price plan is described as a sequential decision process, and wherein in the time period when the peak regulation electricity price is issued, charging and discharging actions of all AC charging electric vehicles served are selected as peak regulation control actions according to the state of the charging station's service system, thus obtaining sample data feedback, updating a Q value table for peak regulation control with a Q learning method, and finally strategy table for peak regulation control is obtained.