CN114444802A - Electric vehicle charging guide optimization method based on graph neural network reinforcement learning - Google Patents
Electric vehicle charging guide optimization method based on graph neural network reinforcement learning Download PDFInfo
- Publication number
- CN114444802A CN114444802A CN202210109887.2A CN202210109887A CN114444802A CN 114444802 A CN114444802 A CN 114444802A CN 202210109887 A CN202210109887 A CN 202210109887A CN 114444802 A CN114444802 A CN 114444802A
- Authority
- CN
- China
- Prior art keywords
- electric vehicle
- node
- neural network
- charging
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 92
- 230000002787 reinforcement Effects 0.000 title claims abstract description 74
- 238000005457 optimization Methods 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 54
- 230000006870 function Effects 0.000 claims abstract description 30
- 230000006399 behavior Effects 0.000 claims abstract description 28
- 230000004927 fusion Effects 0.000 claims abstract description 21
- 230000008569 process Effects 0.000 claims abstract description 5
- 230000005611 electricity Effects 0.000 claims description 23
- 230000009471 action Effects 0.000 claims description 15
- 230000009977 dual effect Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 9
- 239000003795 chemical substances by application Substances 0.000 claims description 6
- 239000003245 coal Substances 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 238000011478 gradient descent method Methods 0.000 claims description 5
- 108010074864 Factor XI Proteins 0.000 claims description 3
- 238000009795 derivation Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000003993 interaction Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000000547 structure data Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0283—Price estimation or determination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Accounting & Taxation (AREA)
- Game Theory and Decision Science (AREA)
- General Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Tourism & Hospitality (AREA)
- Finance (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Operations Research (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Charge And Discharge Circuits For Batteries Or The Like (AREA)
- Electric Propulsion And Braking For Vehicles (AREA)
Abstract
本发明提供了一种基于图神经网络强化学习的电动汽车充电引导优化方法,包括如下步骤:步骤S1:电力‑交通融合网协同优化模型初始化;步骤S2:更新电动汽车充电负荷;步骤S3:根据epsilon‑Greedy算法和图神经网络强化学习算法生成a i,t ;步骤S4:执行充电引导行为策略a i,t ;步骤S5:计算图神经网络强化学习算法的奖励函数;步骤S6:部分观测马尔科夫决策过程的状态x i,t 更新;步骤S7:将当前步的信息(x i,t , a i,t ,r i,t ,x i,t )存储于记忆单元D中;步骤S8:判断是否达到预定的时间T end;若否,则执行(2)~(7);若是,则输出图神经网络强化学习算法参数和相应输出结果。应用本技术方案可实现有效地降低电动汽车充电总成本,实现电动汽车的有序充电以及电力系统协同优化调度。
The present invention provides an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning, comprising the following steps: step S1: initialization of a collaborative optimization model of the power-transport fusion network; step S2: updating the electric vehicle charging load; step S3: according to The epsilon-Greedy algorithm and the graph neural network reinforcement learning algorithm generate a i,t ; Step S4: execute the charging guide behavior strategy ai ,t ; Step S5: calculate the reward function of the graph neural network reinforcement learning algorithm; The state xi ,t of the Kov decision-making process is updated; Step S7: Store the information of the current step ( xi ,t , a i,t , ri ,t , xi ,t ) in the memory unit D ; Step S8 : determine whether the predetermined time T end is reached; if not, execute (2) to (7); if so, output the graph neural network reinforcement learning algorithm parameters and corresponding output results. By applying the technical solution, the total cost of electric vehicle charging can be effectively reduced, and the orderly charging of the electric vehicle and the coordinated optimal scheduling of the power system can be realized.
Description
技术领域technical field
本发明涉及电力-交通融合网协同优化技术领域,特别是一种基于图神经网络强化学习的电动汽车充电引导优化方法。The invention relates to the technical field of collaborative optimization of an electric power-transportation fusion network, in particular to an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning.
背景技术Background technique
随着电动汽车规模化运行,电力系统和交通系统将会存在许多的交互融合,形成电力-交通融合网。该融合网涉及电动汽车、电力系统和交通系统等多个主体,包含了多种随机不确定因素。多个主体相互作用、多种随机因素的影响以及多种随机因素的耦合关系使得弄清电力和交通系统的交互影响机理以及解决电力-交通融合网协同优化变得更加困难。例如电动汽车用户的出行和心理行为以及驾驶行为均具有一定的随机性,这将会影响到交通系统的流量分布,使得交通流量也具有一定的不确定性,进一步影响到电动汽车达到充电站的时间,使得电动汽车的充电时间、排队时间和充电时长也具有很强的不确定性。不同于传统的电力负荷,电动汽车作为一种可移动的负荷,其随机性相比于传统的电力负荷更强,更加难以预测。With the large-scale operation of electric vehicles, there will be many interactions and integrations between the power system and the transportation system, forming a power-transportation fusion network. The fusion network involves multiple subjects such as electric vehicles, power systems, and transportation systems, and contains a variety of random uncertain factors. The interaction of multiple agents, the influence of multiple random factors, and the coupling relationship of multiple random factors make it more difficult to understand the interaction mechanism of the power and transportation systems and to solve the collaborative optimization of the power-transport fusion network. For example, the travel and psychological behavior and driving behavior of electric vehicle users have a certain degree of randomness, which will affect the flow distribution of the traffic system, making the traffic flow also have certain uncertainty, and further affecting the electric vehicle reaching the charging station. Time, making the charging time, queuing time and charging time of electric vehicles also have strong uncertainty. Different from traditional electric loads, electric vehicles, as a movable load, have stronger randomness and are more difficult to predict than traditional electric loads.
目前对电力-交通融合网研究可以分为三个研究方向:1)从电力系统角度出发,通过计算节点边际成本电价或优化充电站服务定价来引导电动汽车以最低的成本进行充电;2)从交通系统角度出发考虑充电路径优化实现充电成本最小化;3)综合考虑电动汽车、电力和交通系统的利益,通过优化电动汽车的充电策略和电力系统的调度决策实现综合效益最大化。但是现有的研究大部分属于静态优化问题,尚未考虑到电动汽车、充电站和电力系统等主体在连续时间尺度上的耦合关系;同时现有大部分研究没有考虑到多种不确定因素及其相关耦合性对电力-交通融合网协同优化的影响。更重要的是,现有的研究中没有考虑到电动汽车间交互影响对电力-交通融合网协同优化影响。At present, the research on the power-transport fusion network can be divided into three research directions: 1) From the perspective of the power system, by calculating the node marginal cost price or optimizing the charging station service pricing to guide the electric vehicle to charge at the lowest cost; 2) From the perspective of the power system From the perspective of the transportation system, the optimization of the charging path is considered to minimize the charging cost; 3) the interests of the electric vehicle, the electric power and the transportation system are comprehensively considered, and the comprehensive benefit is maximized by optimizing the charging strategy of the electric vehicle and the scheduling decision of the electric power system. However, most of the existing researches belong to static optimization problems, and have not considered the coupling relationship of electric vehicles, charging stations, and power systems on a continuous time scale. The influence of correlation coupling on the collaborative optimization of power-transportation fusion network. More importantly, the existing research has not considered the impact of the interaction between electric vehicles on the synergistic optimization of the power-transportation fusion network.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本发明的目的在于提供一种基于图神经网络强化学习的电动汽车充电引导优化方法,能够有效的在考虑电力-交通融合网多种不确定性因素的情况下,能够有效地降低电动汽车充电总成本,实现电动汽车的有序充电以及电力系统协同优化调度。In view of this, the purpose of the present invention is to provide an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning, which can effectively reduce the number of uncertainties in the power-transportation fusion network. The total cost of electric vehicle charging, realizing the orderly charging of electric vehicles and the coordinated optimal scheduling of the power system.
为实现上述目的,本发明采用如下技术方案:基于图神经网络强化学习的电动汽车充电引导优化方法,包括如下步骤:In order to achieve the above purpose, the present invention adopts the following technical scheme: an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning, comprising the following steps:
步骤S1:电力-交通融合网协同优化模型初始化;Step S1: initialization of the collaborative optimization model of the power-transport fusion network;
步骤S2:更新电动汽车充电负荷,并基于二阶锥松弛优化及对偶理论对电动汽车充电站所在的节点的边际成本电价进行优化计算;Step S2: updating the electric vehicle charging load, and optimizing the marginal cost electricity price of the node where the electric vehicle charging station is located based on the second-order cone relaxation optimization and dual theory;
步骤S3:根据epsilon-Greedy算法和图神经网络强化学习算法生成电动汽车充电引导行为策略ai,t;Step S3: according to the epsilon-Greedy algorithm and the graph neural network reinforcement learning algorithm, the electric vehicle charging guidance behavior strategy a i,t is generated;
步骤S4:执行充电引导行为策略ai,t,并对电动汽车的状态进行判断和更新;Step S4: executing the charging guidance behavior strategy a i,t , and judging and updating the state of the electric vehicle;
步骤S5:根据电力-交通融合环境计算图神经网络强化学习算法的奖励函数;Step S5: Calculate the reward function of the graph neural network reinforcement learning algorithm according to the power-transport fusion environment;
步骤S6:部分观测马尔科夫决策过程的状态xi,t更新;Step S6: update the state x i, t of the partially observed Markov decision process;
步骤S7:将当前步的信息(xi,t,ai,t,ri,t,xi,t’)存储于记忆单元D中,并基于随机梯度下降的方法对图神经网络强化学习算法权重进行更新;其中,xi,t,表示图神经网络强化学习当前状态;ai,t表示电动汽车行为策略;ri,t表示图神经网络强化学习的奖励函数值;xi,t’表示图神经网络强化学习下一步状态;Step S7: Store the information of the current step (x i,t ,a i,t ,r i,t , xi,t ') in the memory unit D, and perform reinforcement learning on the graph neural network based on the stochastic gradient descent method The algorithm weights are updated; among them, x i,t , represents the current state of the reinforcement learning of the graph neural network; a i,t represents the electric vehicle behavior strategy; ri ,t represents the reward function value of the reinforcement learning of the graph neural network; xi,t 'represents the next state of the reinforcement learning of the graph neural network;
步骤S8:判断是否达到预定的时间Tend;若否,则执行(2)~(7);若是,则输出图神经网络强化学习算法参数和相应输出结果。Step S8: determine whether the predetermined time T end is reached; if not, execute (2) to (7); if so, output the parameters of the graph neural network reinforcement learning algorithm and the corresponding output result.
在一较佳的实施例中,对电力-交通融合网协同优化模型初始化,包括以下步骤:In a preferred embodiment, initializing the collaborative optimization model of the power-transport fusion network includes the following steps:
步骤21:电力网络和交通网络拓扑结构和参数确定,包括电力系统节点、线路、初始电压、优化的上下限值,交通网络包括交通节点、道路参数、容量及行驶速度最大值;Step 21: Determine the topology and parameters of the power network and the transportation network, including power system nodes, lines, initial voltages, and optimized upper and lower limits, and the transportation network includes traffic nodes, road parameters, capacity, and maximum travel speed;
步骤22:神经网络参数初始化,包括神经网络权重初始化和超参数设置,如学习速率α、折扣因子γ、批大小B和记忆单元D容量大小;Step 22: Neural network parameter initialization, including neural network weight initialization and hyperparameter settings, such as learning rate α, discount factor γ, batch size B and memory unit D capacity;
步骤23:将研究区域中的每辆电动汽车看做一个代理,并将其视为一个节点n∈N,将电动汽车间的连接视为边e∈E,以此构成图网络结构G=(N,E),并对每辆电动汽车i在当前状态xi,t和邻接矩阵A进行初始化。Step 23: Consider each electric vehicle in the study area as an agent, and regard it as a node n∈N, and regard the connection between electric vehicles as an edge e∈E, so as to form a graph network structure G=( N, E), and initialize the current state x i, t and the adjacency matrix A of each electric vehicle i.
在一较佳的实施例中,更新电动汽车充电负荷和基于二阶锥松弛优化及对偶理论对电动汽车充电站所在的节点的边际成本电价进行优化计算步骤包括:In a preferred embodiment, the steps of updating the electric vehicle charging load and optimizing the marginal cost electricity price of the node where the electric vehicle charging station is located based on second-order cone relaxation optimization and dual theory include:
步骤31:更新电动汽车充电负荷:根据充电站中的电动汽车数量和充电功率计算各个充电站充电负荷,得到各个站的充电负荷后加上该节点的基础负荷即可以获得该节点的最终用电负荷;Step 31: Update the charging load of electric vehicles: Calculate the charging load of each charging station according to the number of electric vehicles in the charging station and the charging power, obtain the charging load of each station and add the basic load of the node to obtain the final power consumption of the node load;
步骤32:建立基于支路潮流模型的配电网最优潮流模型:Step 32: Establish the optimal power flow model of the distribution network based on the branch power flow model:
min f(p,q,P,Q,V,I) (1)min f(p,q,P,Q,V,I) (1)
式中,EN和EL分别表示配电网节点和线路集合;Pij和Qij表示从节点i流向节点j的支路有功功率和无功功率;Pjk表示从节点j流向节点k的支路有功功率;和表示发电机有功和无功出力,即注入到节点j的有功功率和无功功率;和表示风机注入到节点j的有功功率和无功功率;Qjs表示从节点j流向节点s的支路无功功率;rij和xij表示从节点i到节点j的支路电阻和电抗;Iij表示从节点i到节点j的支路电流;π(j)表示与节点j相连的支路集合;和表示连接在节点j上的有功负荷和无功负荷;Vi表示节点i的电压幅值;Vj表示节点j的电压幅值;zij表示连接节点i和节点j的支路阻抗,满足zij=rij+jxij;表示连接节点i和节点j的支路电流最大值;V j和表示节点j的最小和最大电压;表示连接到节点j的风机最大有功出力;表示连接到节点j的风机的功率因素;In the formula, EN and EL represent the distribution network nodes and line sets, respectively; P ij and Q ij represent the branch active power and reactive power flowing from node i to node j; P jk represent the flow from node j to node k. branch active power; and Represents the active and reactive output of the generator, that is, the active power and reactive power injected into node j; and represents the active power and reactive power injected by the fan into node j; Q js represents the branch reactive power flowing from node j to node s; r ij and x ij represent the branch resistance and reactance from node i to node j; I ij represents the branch current from node i to node j; π(j) represents the set of branches connected to node j; and represents the active load and reactive load connected to node j; V i represents the voltage amplitude of node i; V j represents the voltage amplitude of node j; z ij represents the branch impedance connecting node i and node j, satisfying z ij =r ij +jx ij ; represents the maximum value of the branch current connecting node i and node j; V j and represents the minimum and maximum voltage of node j; Represents the maximum active power output of the fan connected to node j; represents the power factor of the fan connected to node j;
配电网节点j的负荷包括基础负荷和电动汽车充电负荷即The load of distribution network node j including base load and electric vehicle charging load which is
根据配电网实际需求,其目标函数min f(p,q,P,Q,V,I)可以最终定义为:According to the actual demand of the distribution network, the objective function min f(p,q,P,Q,V,I) can be finally defined as:
式中,表示注入节点i发电机的有功出力;ai和bi分别表示发电机的二次煤耗和一次煤耗系数;和分别从主网中购买电量的电价和有功功率;In the formula, represents the active power output of the generator injected into node i; a i and b i represent the secondary coal consumption and primary coal consumption coefficient of the generator, respectively; and The electricity price and active power of electricity purchased from the main network, respectively;
步骤33:将以上非线性配电网最优潮流模型转换为二阶锥松弛规划模型:Step 33: Convert the above nonlinear distribution network optimal power flow model into a second-order cone relaxation programming model:
由于BFM-OPF是非线性规划模型,令支路电流幅值以及支路电压幅值并对式进行二阶锥松弛(SOCR)转换,可以得到以下模型:Since BFM-OPF is a nonlinear programming model, let the branch current amplitude and the branch voltage amplitude And the second-order cone relaxation (SOCR) transformation of the formula, the following model can be obtained:
式中||·||2表示二阶锥操作;上式-构成了松弛后的配电网最优潮流基本形式;where ||·|| 2 represents the second-order cone operation; the above formula - constitutes the basic form of the optimal power flow of the distribution network after relaxation;
步骤34:采用Gurobi求解器求解上述模型的原问题和对偶变量,获取充电站所在节点的边际成本电价λk。Step 34: Use the Gurobi solver to solve the original problem and dual variables of the above model, and obtain the marginal cost electricity price λ k of the node where the charging station is located.
在一较佳的实施例中,所述epsilon-Greedy算法包括以下步骤:In a preferred embodiment, the epsilon-Greedy algorithm includes the following steps:
步骤41:生成一个随机数u,判断其与epsilon-Greedy算法的衰退因子ξ的大小;Step 41: Generate a random number u, and judge the size of it and the decay factor ξ of the epsilon-Greedy algorithm;
步骤42:若u<ξ,则采用随机的方式在当前状态对每辆电动汽车生成一个行为ai,t,该行为在专利中表示电动汽车充电路径策略;Step 42: If u<ξ, generate a behavior a i,t for each electric vehicle in the current state in a random manner, and this behavior represents the electric vehicle charging path strategy in the patent;
ai,t=randint(Naction) (19)a i,t = randint(N action ) (19)
式中,Naction表示电动汽车行为决策的数量;where N action represents the number of EV behavior decisions;
步骤43:若u≥ξ,则根据图神经网络强化学习算法的经验对每辆电动汽车i在当前状态xi,t和邻接矩阵A下生成一个行为ai,t,即Step 43: If u≥ξ, generate a behavior a i,t for each electric vehicle i under the current state x i,t and the adjacency matrix A according to the experience of the graph neural network reinforcement learning algorithm, that is,
式中,θt表示图神经网络强化学习算法的参数;argmax()表示取最大值对应的参数操作;xi,t表示第i辆电动汽车在时间t时的状态,其主要由时间t时第i辆电动汽车的状态xi,t由电动汽车状态EVi,t、近邻交通道路信息Roi,t、近邻电动汽车状态Nei,t和各充电站信息CSt组成,即In the formula, θ t represents the parameters of the graph neural network reinforcement learning algorithm; argmax() represents the parameter operation corresponding to the maximum value; x i, t represents the state of the i-th electric vehicle at time t, which is mainly determined by time t. The state x i,t of the i-th electric vehicle is composed of the electric vehicle state EV i,t , the neighboring traffic road information Ro i,t , the neighboring electric vehicle state Ne i,t and the information CS t of each charging station, namely
xi,t=[EVi,t,Roi,t,Nei,t,CSt] (21)x i,t =[EV i,t ,Ro i,t ,Ne i,t ,CS t ] (21)
式中,第i辆电动汽车状态EVi,t包括电动汽车前往充电站时的下一节点道路编号电动汽车行驶速度vi,t和剩余电量SOCi,t;近邻交通道路信息状态Roi,t包括与电动汽车i所在下一节点相连的下一条道路的起始节点末节点道路长度以及道路上的电动车数量近邻电动汽车状态Nei,t包括各近邻电动汽车k的状态,如与第i辆电动汽车临近的第k辆电动汽车下一节点其所在的道路编号电动汽车行驶速度vi,k,t和剩余电量SOCi,k,t;充电站信息CSt包括各充电站的充电电价pc,t和电动汽车数量 In the formula, the state EV i,t of the ith electric vehicle includes the next node when the electric vehicle goes to the charging station road number Electric vehicle driving speed v i,t and remaining power SOC i,t ; neighbor traffic road information state Ro i,t includes the next node where electric vehicle i is located The starting node of the next connected road end node road length and the number of electric vehicles on the road Neighboring electric vehicle state Ne i,t includes the state of each neighboring electric vehicle k, such as the next node of the k-th electric vehicle adjacent to the i-th electric vehicle the road number it is on Electric vehicle running speed v i,k,t and remaining power SOC i,k,t ; charging station information CS t includes charging electricity price p c,t of each charging station and the number of electric vehicles
所述图神经网络强化学习算法其神经网络结构包括一层的输入层,一层的全连接层对输入的状态xi,t进行特征提取xi,t’,然后将提出的特征xi,t’和邻接矩阵A一起输入到两层的图神经网络中再进行特征提取,最后连接一层全连接层对电动汽车充电路径策略ai,t进行输出;其中,所述的图神经网络采用的是图注意力网络。The neural network structure of the graph neural network reinforcement learning algorithm includes an input layer of one layer, and a fully connected layer of one layer performs feature extraction x i, t ' on the input state x i, t , and then the proposed feature x i , t ' is extracted. t ' and the adjacency matrix A are input into the two-layer graph neural network for feature extraction, and finally a fully connected layer is connected to output the electric vehicle charging path strategy a i, t ; wherein, the graph neural network adopts is the graph attention network.
在一较佳的实施例中,所述图神经网络强化学习算法的奖励函数ri,t如式所示:In a preferred embodiment, the reward function ri ,t of the graph neural network reinforcement learning algorithm is shown in the formula:
式中,nodecur和nodetar表示电动汽车所在当前节点和电动汽车将要前往的任一充电站节点,step表示电动汽车已经行驶的步数;penalty表示一个很大的惩罚因子;wi表示第i辆电动汽车的单位时间成本;和分别表示在时间t时第i辆电动汽车前往第k个充电站时的行驶时间、充电等待时间和充电所需时间;λk,t表示在时间t时充电站k所在节点的边际成本电价;SOCi,k,t表示在时间t时第i辆电动汽车达到充电站k时的剩余电量SOCi,k,t;表示第i辆电动汽车电池额定容量;In the formula, node cur and node tar represent the current node where the electric vehicle is located and any charging station node that the electric vehicle will go to, step represents the number of steps the electric vehicle has traveled; penalty represents a large penalty factor; w i represents the i -th The unit time cost of an electric vehicle; and λ k, t represent the marginal cost electricity price of the node where the charging station k is located at time t; SOC i,k,t represents the remaining power SOC i,k,t when the ith electric vehicle reaches the charging station k at time t ; Indicates the rated capacity of the i-th electric vehicle battery;
从式可以看出该奖励函数ri,t是一个分段函数;若第i辆电动汽车没有到达充电站nodecur≠nodetar并且当前电动汽车前往充电站的步数在给定的最大充电步数内step<Nstep,此时其奖励函数ri,t=0;若第i辆电动汽车前往充电站的步数大于或等于给定的最大充电步数step≥Nstep,表明该次充电行为探索失败,此时给予其一个较大的负奖励ri,t=-penalty;若第i辆电动汽车到达充电站nodecur=nodetar并且当前电动汽车前往充电站的步数在给定的最大充电步数内step<Nstep,此时其奖励函数根据电动汽车行驶时间和充电时间以及充电时电费来计算;It can be seen from the formula that the reward function ri ,t is a piecewise function; if the i-th electric vehicle does not arrive at the charging station node cur ≠node tar and the current number of steps of the electric vehicle to the charging station is within the given maximum charging step step<N step in the number, then its reward function ri ,t = 0; if the number of steps taken by the i-th electric vehicle to the charging station is greater than or equal to the given maximum number of charging steps step≥N step , it indicates that this charging time If the behavior exploration fails, a large negative reward ri ,t =-penalty is given to it; if the ith electric vehicle reaches the charging station node cur = node tar and the current number of steps of the electric vehicle to the charging station is within the given value step<N step within the maximum number of charging steps, at this time, the reward function is based on the driving time of the electric vehicle and charging time And the electricity charge when charging is calculated;
第i辆电动汽车在路段a的通行时间ta,t根据美国联邦公路局函数(bureau ofpublic roads,BPR)来计算,即The travel time t a,t of the i-th electric vehicle on road segment a is calculated according to the U.S. Federal Highway Administration (BPR) function, that is,
式中,na,t表示t时刻路段a上的电动汽车数量;ca和分别表示路段a的容量上限和t时刻电动汽车自由通行时间;由此可以得到第i辆电动汽车前往充电站k所需时间即In the formula, n a, t represent the number of electric vehicles on road segment a at time t; c a and respectively represent the upper limit of the capacity of road section a and the free passage time of electric vehicles at time t; from this, the time required for the i-th electric vehicle to go to charging station k can be obtained. which is
此外,第i辆电动汽车的充电等待时间可以通过式得到;In addition, the charging waiting time of the i-th electric vehicle can be obtained by the formula;
式中,SOCt表示电动汽车剩余电量;表示电动汽车电池的额定容量;η表示充电功率因素,Pcharging表示电动汽车充电的额定功率。In the formula, SOC t represents the remaining power of the electric vehicle; Represents the rated capacity of the electric vehicle battery; η represents the charging power factor, and P charging represents the rated power of the electric vehicle charging.
在一较佳的实施例中,所述基于随机梯度下降的方法对图神经网络强化学习算法权重进行更新包括:In a preferred embodiment, the stochastic gradient descent-based method for updating the weights of the graph neural network reinforcement learning algorithm includes:
步骤61:从记忆单元D中随机抽取一定数量的样本Sample;Step 61: Randomly extract a certain number of samples from the memory unit D;
步骤62:构建损失函数如式所示,并在抽取的样本Sample下根据随机梯度下降方法对图神经网络强化学习算法权重进行更新如式所示;Step 62: Construct the loss function as shown in the formula, and update the weights of the graph neural network reinforcement learning algorithm according to the stochastic gradient descent method under the sampled sample as shown in the formula;
式中,x,a,x'和a'分别为当前状态、动作以及下一时刻的状态和动作;r表示图神经网络强化学习的立即奖励;θt表示当前时刻t的图神经网络强化学习算法参数;0≤γ≤1表示折扣因子,其反映未来Q值对当前动作的影响;表示在目标图神经网络强化学习算法参数θ′t下的状态-动作值;In the formula, x, a, x' and a' are the current state, action and state and action at the next moment, respectively; r represents the immediate reward of the reinforcement learning of the graph neural network; θ t represents the reinforcement learning of the graph neural network at the current moment t Algorithm parameters; 0≤γ≤1 represents the discount factor, which reflects the influence of the future Q value on the current action; represents the state-action value under the target graph neural network reinforcement learning algorithm parameter θ′ t ;
式中,θt表示当前时刻t的图神经网络强化学习算法参数;表示对θt进行求导操作;α表示学习速率;In the formula, θ t represents the parameters of the reinforcement learning algorithm of the graph neural network at the current time t; Represents the derivation operation on θ t ; α represents the learning rate;
步骤63:每经过一定的步数根据当前图神经网络强化学习参数θt对目标图神经网络强化学习参数θ′t进行更新。Step 63: Update the target graph neural network reinforcement learning parameter θ′ t according to the current graph neural network reinforcement learning parameter θ t every time a certain number of steps pass.
与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:
本发明提供了一种基于图神经网络强化学习的电动汽车充电引导优化方法,基于图理论将电动汽车间的相互影响关系转换为一种动态网络图结构,提出一种基于注意力机制的图神经网络强化学习来处理不规则非欧式结构数据,以此研究多智能体间的沟通、协作,探讨电动汽车间的相互影响。在考虑可再生能源出力的主动配电网基础上,通过二阶锥优化及对偶优化理论对配电网最优潮流进行求解并得到配电网节点边际成本电价,以此研究电力-交通融合网协同优化。所提出的基于图神经网络强化学习的电动汽车充电引导优化方法能够有效的在考虑电力-交通融合网多种不确定性因素的情况下,能够有效地降低电动汽车充电总成本,实现电动汽车的有序充电以及电力系统协同优化调度。The invention provides an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning. Based on graph theory, the mutual influence relationship between electric vehicles is converted into a dynamic network graph structure, and a graph neural network based on attention mechanism is proposed. Network reinforcement learning is used to deal with irregular non-European structure data, in order to study the communication and cooperation between multi-agents, and to explore the mutual influence between electric vehicles. Based on the active distribution network considering the output of renewable energy, the optimal power flow of the distribution network is solved through the second-order cone optimization and dual optimization theory, and the marginal cost price of the distribution network node is obtained, so as to study the power-transportation integrated network. Collaborative optimization. The proposed optimization method of electric vehicle charging guidance based on the reinforcement learning of graph neural network can effectively reduce the total cost of electric vehicle charging and realize the electric vehicle charging under the condition of considering various uncertain factors of the power-transport fusion network. Orderly charging and coordinated optimal scheduling of the power system.
附图说明Description of drawings
图1为本发明优选实施例的基于图神经网络强化学习的电动汽车充电引导优化方法流程图。FIG. 1 is a flow chart of an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning according to a preferred embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.
应该指出,以下详细说明都是例示性的,旨在对本申请提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the application. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方式;如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are for the purpose of describing particular embodiments only and are not intended to limit exemplary embodiments in accordance with the present application; as used herein, unless the context clearly dictates otherwise, the singular forms are also intended to include Plural forms, furthermore, should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and/or combinations thereof.
如图1所示,是本发明一种基于图神经网络强化学习的电动汽车充电引导优化方法,包括如下步骤:As shown in FIG. 1, it is an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning of the present invention, which includes the following steps:
S11:电力-交通融合网协同优化模型初始化;S11: initialization of the collaborative optimization model of the power-transport fusion network;
S12:更新电动汽车充电负荷,并基于二阶锥松弛优化及对偶理论对电动汽车充电站所在的节点的边际成本电价进行优化计算;S12: Update the electric vehicle charging load, and optimize the marginal cost electricity price of the node where the electric vehicle charging station is located based on the second-order cone relaxation optimization and dual theory;
S13:根据epsilon-Greedy算法和图神经网络强化学习算法生成电动汽车充电引导行为策略ai,t;S13: Generate an electric vehicle charging guidance behavior strategy a i,t according to the epsilon-Greedy algorithm and the graph neural network reinforcement learning algorithm;
S14:执行充电引导行为策略ai,t,并对电动汽车的状态进行判断和更新;S14: Execute the charging guidance behavior strategy a i,t , and judge and update the state of the electric vehicle;
S15:根据电力-交通融合环境计算图神经网络强化学习算法的奖励函数;S15: Calculate the reward function of the graph neural network reinforcement learning algorithm according to the power-transport fusion environment;
S16:部分观测马尔科夫决策过程的状态xi,t更新;S16: The state x i, t of the partially observed Markov decision process is updated;
S17:将当前步的信息(xi,t,ai,t,ri,t,xi,t’)存储于记忆单元D中,并基于随机梯度下降的方法对图神经网络强化学习算法权重进行更新;S17: Store the information of the current step (x i,t ,a i,t ,r i,t , xi,t ') in the memory unit D, and strengthen the learning algorithm of the graph neural network based on the method of stochastic gradient descent weights are updated;
S18:判断是否达到预定的时间Tend。若否,则执行(2)~(7);若是,则输出图神经网络强化学习算法参数和相应输出结果。S18: Determine whether the predetermined time T end is reached. If not, execute (2) to (7); if so, output the parameters of the graph neural network reinforcement learning algorithm and the corresponding output results.
具体的:specific:
一、电力-交通融合网协同优化模型初始化。主要的步骤包括电力网络和交通网络拓扑结构和参数确定,包括电力系统节点、线路、初始电压、优化的上下限值,交通网络包括交通节点、道路参数、容量、行驶速度最大值等。1. Initialization of the collaborative optimization model of the power-transport fusion network. The main steps include the determination of power network and traffic network topology and parameters, including power system nodes, lines, initial voltages, and optimized upper and lower limits. The traffic network includes traffic nodes, road parameters, capacity, and maximum driving speed.
神经网络参数初始化,包括神经网络权重初始化和超参数设置,如学习速率α、折扣因子γ、批大小B和记忆单元容量大小D;Neural network parameter initialization, including neural network weight initialization and hyperparameter settings, such as learning rate α, discount factor γ, batch size B and memory cell capacity size D;
将研究区域中的每辆电动汽车看做一个代理,并将其视为一个节点n∈N,将电动汽车间的连接视为边e∈E,以此构成图网络结构G=(N,E),并对每辆电动汽车i在当前状态xi,t和邻接矩阵A进行初始化。Each electric vehicle in the study area is regarded as an agent, and it is regarded as a node n∈N, and the connection between electric vehicles is regarded as an edge e∈E, so as to form a graph network structure G=(N,E ), and initialize the current state xi,t and adjacency matrix A for each electric vehicle i.
二、更新电动汽车充电负荷,并基于二阶锥松弛优化及对偶理论对电动汽车充电站所在的节点的边际成本电价进行优化计算。主要包括以下步骤:2. Update the electric vehicle charging load, and optimize the marginal cost electricity price of the node where the electric vehicle charging station is located based on the second-order cone relaxation optimization and dual theory. It mainly includes the following steps:
步骤21:更新电动汽车充电负荷:根据充电站中的电动汽车数量和充电功率计算各个充电站充电负荷,得到各个站的充电负荷后加上该节点的基础负荷即可以获得该节点的最终用电负荷;Step 21: Update the electric vehicle charging load: Calculate the charging load of each charging station according to the number of electric vehicles in the charging station and the charging power, and then add the base load of the node to obtain the final power consumption of the node after obtaining the charging load of each station. load;
步骤22:建立基于支路潮流模型的配电网最优潮流模型:Step 22: Establish the optimal power flow model of the distribution network based on the branch power flow model:
min f(p,q,P,Q,V,I) (1)min f(p,q,P,Q,V,I) (1)
式中,EN和EL分别表示配电网节点和线路集合;Pij和Qij表示从节点i流向节点j的支路有功功率和无功功率;和表示发电机有功和无功出力,即注入到节点j的有功功率和无功功率;和表示风机注入到节点j的有功功率和无功功率;rij和xij表示从节点i到节点j的支路电阻和电抗;Iij表示从节点i到节点j的支路电流;π(j)表示与节点j相连的支路集合;和表示连接在节点j上的有功负荷和无功负荷;Vi表示节点i的电压幅值;zij表示连接节点i和节点j的支路阻抗,满足zij=rij+jxij;表示连接节点i和节点j的支路电流最大值;V j和表示节点j的最小和最大电压;表示连接到节点j的风机最大有功出力;表示连接到节点j的风机的功率因素。In the formula, EN and EL represent the distribution network nodes and line sets, respectively; P ij and Q ij represent the branch active power and reactive power flowing from node i to node j; and Represents the active and reactive output of the generator, that is, the active power and reactive power injected into node j; and Represents the active power and reactive power injected by the fan into node j; r ij and x ij represent the branch resistance and reactance from node i to node j; I ij represents the branch current from node i to node j; π(j ) represents the set of branches connected to node j; and represents the active load and reactive load connected to node j; V i represents the voltage amplitude of node i; z ij represents the branch impedance connecting node i and node j, satisfying zi ij =r ij +jx ij ; represents the maximum value of the branch current connecting node i and node j; V j and represents the minimum and maximum voltage of node j; Represents the maximum active power output of the fan connected to node j; represents the power factor of the fan connected to node j.
配电网节点j的负荷包括基础负荷和电动汽车充电负荷即The load of distribution network node j including base load and electric vehicle charging load which is
根据配电网实际需求,其目标函数min f(p,q,P,Q,V,I)可以最终定义为:According to the actual demand of the distribution network, the objective function min f(p,q,P,Q,V,I) can be finally defined as:
式中,ai和bi分别表示发电机的二次煤耗和一次煤耗系数;和分别从主网中购买电量的电价和有功功率。In the formula, a i and b i represent the secondary coal consumption and primary coal consumption coefficient of the generator, respectively; and The electricity price and active power of electricity purchased from the main network, respectively.
步骤23、将以上非线性配电网最优潮流模型转换为二阶锥松弛规划模型:Step 23: Convert the above nonlinear distribution network optimal power flow model into a second-order cone relaxation programming model:
由于BFM-OPF是非线性规划模型,令以及并对式进行二阶锥松弛(SOCR)转换,可以得到以下模型:Since BFM-OPF is a nonlinear programming model, let as well as And the second-order cone relaxation (SOCR) transformation of the formula, the following model can be obtained:
式中||·||2表示二阶锥操作;上式-构成了松弛后的配电网最优潮流基本形式。In the formula ||·|| 2 represents the second-order cone operation; the above formula - constitutes the basic form of the optimal power flow of the distribution network after relaxation.
步骤24、采用Gurobi求解器求解上述模型的原问题和对偶变量,获取充电站所在节点的边际成本电价λk。Step 24: Use the Gurobi solver to solve the original problem and dual variables of the above model, and obtain the marginal cost electricity price λ k of the node where the charging station is located.
三、根据epsilon-Greedy算法和图神经网络强化学习算法生成电动汽车充电引导行为策略ai,t。主要包括以下步骤:3. Generate electric vehicle charging guidance behavior strategy a i,t according to epsilon-Greedy algorithm and graph neural network reinforcement learning algorithm. It mainly includes the following steps:
步骤31:生成一个随机数u,判断其与epsilon-Greedy算法的衰退因子ξ的大小。Step 31: Generate a random number u, and determine its size with the decay factor ξ of the epsilon-Greedy algorithm.
步骤32:若u<ξ,则采用随机的方式在当前状态对每辆电动汽车生成一个行为ai,t,该行为在专利中表示电动汽车充电路径策略;Step 32: If u<ξ, generate a behavior a i,t for each electric vehicle in the current state in a random manner, which represents the electric vehicle charging path strategy in the patent;
ai,t=randint(Naction) (19)a i,t = randint(N action ) (19)
式中,Naction表示电动汽车行为决策的数量。where N action represents the number of EV behavior decisions.
步骤33:若u≥ξ,则根据图神经网络强化学习算法的经验对每辆电动汽车i在当前状态xi,t和邻接矩阵A下生成一个行为ai,t,即Step 33: If u≥ξ, according to the experience of the graph neural network reinforcement learning algorithm, a behavior a i,t is generated for each electric vehicle i under the current state x i,t and the adjacency matrix A, that is,
式中,θt表示图神经网络强化学习算法的参数;argmax()表示取最大值对应的参数操作;xi,t表示第i辆电动汽车在时间t时的状态,其主要由时间t时第i辆电动汽车的状态xi,t由电动汽车状态EVi,t、近邻交通道路信息Roi,t、近邻电动汽车状态Nei,t和各充电站信息CSt组成,即In the formula, θ t represents the parameters of the graph neural network reinforcement learning algorithm; argmax() represents the parameter operation corresponding to the maximum value; x i, t represents the state of the i-th electric vehicle at time t, which is mainly determined by time t. The state x i,t of the i-th electric vehicle is composed of the electric vehicle state EV i,t , the neighboring traffic road information Ro i,t , the neighboring electric vehicle state Ne i,t and the information CS t of each charging station, namely
xi,t=[EVi,t,Roi,t,Nei,t,CSt] (21)x i,t =[EV i,t ,Ro i,t ,Ne i,t ,CS t ] (21)
式中,第i辆电动汽车状态EVi,t包括电动汽车前往充电站时的下一节点道路编号电动汽车行驶速度vi,t和剩余电量SOCi,t;近邻交通道路信息状态Roi,t包括与电动汽车i所在下一节点相连的下一条道路的起始节点末节点道路长度以及道路上的电动车数量近邻电动汽车状态Nei,t包括各近邻电动汽车k的状态,如与第i辆电动汽车临近的第k辆电动汽车下一节点其所在的道路编号电动汽车行驶速度vi,k,t和剩余电量SOCi,k,t;充电站信息CSt包括各充电站的充电电价pc,t和电动汽车数量 In the formula, the state EV i,t of the ith electric vehicle includes the next node when the electric vehicle goes to the charging station road number Electric vehicle driving speed v i,t and remaining power SOC i,t ; neighbor traffic road information state Ro i,t includes the next node where electric vehicle i is located The starting node of the next connected road end node road length and the number of electric vehicles on the road Neighboring electric vehicle state Ne i,t includes the state of each neighboring electric vehicle k, such as the next node of the k-th electric vehicle adjacent to the i-th electric vehicle the road number it is on Electric vehicle running speed v i,k,t and remaining power SOC i,k,t ; charging station information CS t includes charging electricity price p c,t of each charging station and the number of electric vehicles
所述图神经网络强化学习算法其神经网络结构包括一层的输入层,一层的全连接层对输入的状态xi,t进行特征提取xi,t’,然后将提出的特征xi,t’和邻接矩阵A一起输入到两层的图神经网络中再进行特征提取,最后连接一层全连接层对电动汽车充电路径策略ai,t进行输出。其中,本专利所述的图神经网络采用的是图注意力网络。The neural network structure of the graph neural network reinforcement learning algorithm includes an input layer of one layer, and a fully connected layer of one layer performs feature extraction x i, t ' on the input state x i, t , and then the proposed feature x i , t ' is extracted. t ' and the adjacency matrix A are input into a two-layer graph neural network for feature extraction, and finally a fully connected layer is connected to output the electric vehicle charging path strategy a i,t . Among them, the graph neural network described in this patent adopts the graph attention network.
四、执行充电引导行为策略ai,t,并对电动汽车的状态进行判断和更新。电动汽车的状态分为三种:决策状态、运行状态和充电状态。如果电动汽车抵达交叉路口nodecur=nodenext并且该路口不是充电站节点nodecur≠nodetar,此时电动处于决策状态,电动汽车执行充电引导行为策略ai,t,并更新道路状态如电动汽车数量、行驶理想速度,更新电动汽车状态如所在道路位置、行驶速度和距离等信息;若电动汽车没有抵达交叉路口nodecur≠nodenext,此时电动汽车处于运行状态,即电动汽车按照上一步的充电引导策略ai,t-1继续沿着当前的道路向前行驶,并更新此时的电动汽车位置信息、速度信息和SOC状态;若电动汽车所在节点位置充电站节点上nodecur=nodetar,此时电动汽车处于充电状态,若当前电动汽车数量大于充电站中充电桩的数量时,电动汽车需要排队等待进行充电,若充电站中有可用充电桩使用时,则电动汽车立即进行充电,并更新电动汽车充电等待时间、充电时间和电动汽车SOC状态。Fourth, implement the charging guidance behavior strategy a i,t , and judge and update the state of the electric vehicle. There are three states of electric vehicles: decision state, running state and charging state. If the electric vehicle arrives at the intersection node cur = node next and the intersection is not a charging station node node cur ≠node tar , at this time the electric vehicle is in a decision-making state, the electric vehicle executes the charging guidance behavior strategy a i,t , and updates the road state such as the electric vehicle Quantity, ideal driving speed, update the information of the electric vehicle status such as the road location, driving speed and distance; if the electric vehicle does not reach the intersection node cur ≠ node next , the electric vehicle is in the running state at this time, that is, the electric vehicle is in the running state according to the previous step. The charging guidance strategy a i, t-1 continues to drive forward along the current road, and updates the electric vehicle position information, speed information and SOC status at this time; if the electric vehicle is located at the node position of the charging station node cur = node tar , At this time, the electric vehicle is in the charging state. If the current number of electric vehicles is greater than the number of charging piles in the charging station, the electric vehicle needs to wait in line for charging. If there are available charging piles in the charging station, the electric vehicle will be charged immediately. And update EV charging waiting time, charging time and EV SOC status.
五、根据电力-交通融合环境计算图神经网络强化学习算法的奖励函数。具体地,奖励函数ri,t是一个分段函数:若第i辆电动汽车没有到达充电站nodecur≠nodetar并且当前电动汽车前往充电站的步数在给定的最大充电步数内step<Nstep,此时其奖励函数ri,t=0;若第i辆电动汽车前往充电站的步数大于或等于给定的最大充电步数step≥Nstep,表明该次充电行为探索失败,此时给予其一个较大的负奖励ri,t=-penalty;若第i辆电动汽车到达充电站nodecur=nodetar并且当前电动汽车前往充电站的步数在给定的最大充电步数内step<Nstep,此时其奖励函数根据电动汽车行驶时间充电等待时间充电时间以及充电时电费来计算,具体计算表达式如所示。5. Calculate the reward function of the graph neural network reinforcement learning algorithm according to the power-transport fusion environment. Specifically, the reward function r i,t is a piecewise function: if the ith electric vehicle does not arrive at the charging station node cur ≠ node tar and the current number of steps of the electric vehicle to the charging station is within the given maximum number of charging steps <N step , the reward function ri ,t = 0 at this time; if the number of steps taken by the i-th electric vehicle to the charging station is greater than or equal to the given maximum number of charging steps step≥N step , it indicates that the exploration of the charging behavior has failed. , at this time give it a large negative reward ri ,t =-penalty; if the i-th electric vehicle arrives at the charging station node cur = node tar and the current number of steps of the electric vehicle to the charging station is within the given maximum charging step Step<N step in the number, at this time, its reward function is based on the driving time of the electric vehicle Charging waiting time charging time And the electricity cost during charging is calculated, and the specific calculation expression is as shown.
行驶时间充电等待时间充电时间计算表达式如-所示。travel time Charging waiting time charging time The calculation expression is as shown in -.
第i辆电动汽车在路段a的通行时间根据美国联邦公路局函数(bureau of publicroads,BPR)来计算,即The transit time of the i-th electric vehicle on road segment a is calculated according to the U.S. Federal Highway Administration (BPR) function, that is,
式中,na,t表示t时刻路段a上的电动汽车数量;ca和分别表示路段a的容量上限和t时刻电动汽车自由通行时间。由此可以得到第i辆电动汽车前往充电站k所需时间即In the formula, n a, t represent the number of electric vehicles on road segment a at time t; c a and respectively represent the upper limit of the capacity of road segment a and the free passage time of electric vehicles at time t. From this, the time required for the i-th electric vehicle to go to the charging station k can be obtained. which is
此外,第i辆电动汽车的充电等待时间可以通过式得到。In addition, the charging waiting time of the i-th electric vehicle can be obtained by formula.
式中,SOCt表示电动汽车的剩余电量;表示电动汽车电池额定容量;η表示充电功率因素,Pcharging表示电动汽车充电的额定功率。In the formula, SOC t represents the remaining power of the electric vehicle; Represents the rated capacity of the electric vehicle battery; η represents the charging power factor, and P charging represents the rated power of the electric vehicle charging.
六、部分观测马尔科夫决策过程的状态xi,t更新,包括更新电动汽车状态EVi,t、近邻交通道路信息Roi,t、近邻电动汽车状态Nei,t和各充电站信息CSt。6. Partially observe the update of the state x i,t of the Markov decision process, including the update of the electric vehicle state EV i,t , the neighbor traffic road information Ro i,t , the neighbor electric vehicle state Ne i,t and the charging station information CS t .
七、将当前步的信息(xi,t,ai,t,ri,t,xi,t’)存储于记忆单元D中,并基于随机梯度下降的方法对图神经网络强化学习算法权重进行更新。其主要包括以下步骤:7. Store the information of the current step ( xi,t ,ai ,t ,ri ,t , xi,t ') in the memory unit D, and strengthen the learning algorithm of the graph neural network based on the method of stochastic gradient descent Weights are updated. It mainly includes the following steps:
步骤71:从记忆单元D中随机抽取一定数量的样本Sample;Step 71: Randomly extract a certain number of samples from the memory unit D;
步骤72:构建损失函数如式所示,并在抽取的样本Sample下根据随机梯度下降方法对图神经网络强化学习算法权重进行更新如式所示;Step 72: Construct the loss function as shown in the formula, and update the weights of the graph neural network reinforcement learning algorithm according to the stochastic gradient descent method under the sampled sample as shown in the formula;
式中,x,a,x'和a'分别为当前状态、动作以及下一时刻的状态和动作;θt表示当前时刻t的图神经网络强化学习算法参数;0≤γ≤1表示折扣因子,其反映未来Q值对当前动作的影响;表示在目标图神经网络强化学习算法参数θ′t下的状态-动作值。In the formula, x, a, x' and a' are the current state, action and state and action at the next moment, respectively; θ t represents the graph neural network reinforcement learning algorithm parameter at the current moment t; 0≤γ≤1 represents the discount factor , which reflects the impact of the future Q value on the current action; Represents the state-action value under the target graph neural network reinforcement learning algorithm parameter θ′ t .
式中,θt表示当前时刻t的图神经网络强化学习算法参数;表示对θt进行求导操作;α表示学习速率。In the formula, θ t represents the parameters of the reinforcement learning algorithm of the graph neural network at the current time t; represents the derivation operation on θ t ; α represents the learning rate.
步骤73:每经过一定的步数根据当前图神经网络强化学习参数θt对目标图神经网络强化学习参数θ′t进行更新。Step 73: Update the target graph neural network reinforcement learning parameter θ′ t according to the current graph neural network reinforcement learning parameter θ t every time a certain number of steps pass.
八、判断是否达到预定的时间Tend。若否,则执行(2)~(7);若是,则输出图神经网络强化学习算法参数和相应输出结果。8. Determine whether the predetermined time T end is reached. If not, execute (2) to (7); if so, output the parameters of the graph neural network reinforcement learning algorithm and the corresponding output results.
本发明一种基于图神经网络强化学习的电动汽车充电引导优化方法,基于图理论将电动汽车间的相互影响关系转换为一种动态网络图结构,提出一种基于注意力机制的图神经网络强化学习来处理不规则非欧式结构数据,以此研究多智能体间的沟通、协作,探讨电动汽车间的相互影响。在考虑可再生能源出力的主动配电网基础上,通过二阶锥优化及对偶优化理论对配电网最优潮流进行求解并得到配电网节点边际成本电价,以此研究电力-交通融合网协同优化。所提出的基于图神经网络强化学习的电动汽车充电引导优化方法能够有效的在考虑电力-交通融合网多种不确定性因素的情况下,能够有效地降低电动汽车充电总成本,实现电动汽车的有序充电以及电力系统协同优化调度。The invention is an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning. Based on graph theory, the mutual influence relationship between electric vehicles is converted into a dynamic network graph structure, and a graph neural network enhancement based on attention mechanism is proposed. Learn to deal with irregular non-European structure data, in order to study the communication and cooperation between multi-agents, and explore the mutual influence between electric vehicles. Based on the active distribution network considering the output of renewable energy, the optimal power flow of the distribution network is solved through the second-order cone optimization and dual optimization theory, and the marginal cost price of the distribution network node is obtained, so as to study the power-transportation integrated network. Collaborative optimization. The proposed optimization method of electric vehicle charging guidance based on the reinforcement learning of graph neural network can effectively reduce the total cost of electric vehicle charging and realize the electric vehicle charging under the condition of considering various uncertain factors of the power-transport fusion network. Orderly charging and coordinated optimal scheduling of the power system.
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present invention, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the patent of the present invention. It should be pointed out that for those skilled in the art, without departing from the concept of the present invention, several modifications and improvements can be made, which all belong to the protection scope of the present invention. Therefore, the protection scope of the patent of the present invention should be subject to the appended claims.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210109887.2A CN114444802B (en) | 2022-01-29 | 2022-01-29 | Electric vehicle charging guide optimization method based on graph neural network reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210109887.2A CN114444802B (en) | 2022-01-29 | 2022-01-29 | Electric vehicle charging guide optimization method based on graph neural network reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114444802A true CN114444802A (en) | 2022-05-06 |
CN114444802B CN114444802B (en) | 2024-06-04 |
Family
ID=81372174
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210109887.2A Active CN114444802B (en) | 2022-01-29 | 2022-01-29 | Electric vehicle charging guide optimization method based on graph neural network reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114444802B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115016938A (en) * | 2022-06-09 | 2022-09-06 | 北京邮电大学 | An automatic partitioning method of computational graph based on reinforcement learning |
CN116436019A (en) * | 2023-04-12 | 2023-07-14 | 国网江苏省电力有限公司电力科学研究院 | A multi-resource coordination optimization method, device and storage medium |
CN118098000A (en) * | 2024-04-24 | 2024-05-28 | 哈尔滨华鲤跃腾科技有限公司 | Urban comprehensive management method based on artificial intelligence |
CN118438918A (en) * | 2024-05-09 | 2024-08-06 | 烟台开发区德联软件有限责任公司 | A mobile energy storage device charging control method and device |
WO2024165229A1 (en) * | 2023-02-08 | 2024-08-15 | E.On Se | Edge computing with ai at mains supply network interconnection points |
CN119231508A (en) * | 2024-09-26 | 2024-12-31 | 北京智芯微电子科技有限公司 | A method, device and system for orderly charging control of distribution network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110570050A (en) * | 2019-09-25 | 2019-12-13 | 国网浙江省电力有限公司经济技术研究院 | A charging guidance method for electric vehicles considering road-network-vehicle |
TWI687785B (en) * | 2019-02-25 | 2020-03-11 | 華碩電腦股份有限公司 | Method of returning to charging station |
CN111934335A (en) * | 2020-08-18 | 2020-11-13 | 华北电力大学 | Cluster electric vehicle charging behavior optimization method based on deep reinforcement learning |
WO2021143075A1 (en) * | 2020-01-17 | 2021-07-22 | 南京东博智慧能源研究院有限公司 | Demand response method taking space-time distribution of electric vehicle charging loads into consideration |
CN113159578A (en) * | 2021-04-22 | 2021-07-23 | 杭州电子科技大学 | Charging optimization scheduling method of large-scale electric vehicle charging station based on reinforcement learning |
CN113515884A (en) * | 2021-04-19 | 2021-10-19 | 国网上海市电力公司 | Distributed electric vehicle real-time optimization scheduling method, system, terminal and medium |
-
2022
- 2022-01-29 CN CN202210109887.2A patent/CN114444802B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI687785B (en) * | 2019-02-25 | 2020-03-11 | 華碩電腦股份有限公司 | Method of returning to charging station |
CN110570050A (en) * | 2019-09-25 | 2019-12-13 | 国网浙江省电力有限公司经济技术研究院 | A charging guidance method for electric vehicles considering road-network-vehicle |
WO2021143075A1 (en) * | 2020-01-17 | 2021-07-22 | 南京东博智慧能源研究院有限公司 | Demand response method taking space-time distribution of electric vehicle charging loads into consideration |
CN111934335A (en) * | 2020-08-18 | 2020-11-13 | 华北电力大学 | Cluster electric vehicle charging behavior optimization method based on deep reinforcement learning |
CN113515884A (en) * | 2021-04-19 | 2021-10-19 | 国网上海市电力公司 | Distributed electric vehicle real-time optimization scheduling method, system, terminal and medium |
CN113159578A (en) * | 2021-04-22 | 2021-07-23 | 杭州电子科技大学 | Charging optimization scheduling method of large-scale electric vehicle charging station based on reinforcement learning |
Non-Patent Citations (1)
Title |
---|
夏冬: "多信息融合下电动汽车充电路径规划", 电测与仪器, vol. 57, no. 22, 25 December 2019 (2019-12-25), pages 24 - 32 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115016938A (en) * | 2022-06-09 | 2022-09-06 | 北京邮电大学 | An automatic partitioning method of computational graph based on reinforcement learning |
WO2024165229A1 (en) * | 2023-02-08 | 2024-08-15 | E.On Se | Edge computing with ai at mains supply network interconnection points |
CN116436019A (en) * | 2023-04-12 | 2023-07-14 | 国网江苏省电力有限公司电力科学研究院 | A multi-resource coordination optimization method, device and storage medium |
CN116436019B (en) * | 2023-04-12 | 2024-01-23 | 国网江苏省电力有限公司电力科学研究院 | A multi-resource coordination and optimization method, device and storage medium |
CN118098000A (en) * | 2024-04-24 | 2024-05-28 | 哈尔滨华鲤跃腾科技有限公司 | Urban comprehensive management method based on artificial intelligence |
CN118438918A (en) * | 2024-05-09 | 2024-08-06 | 烟台开发区德联软件有限责任公司 | A mobile energy storage device charging control method and device |
CN118438918B (en) * | 2024-05-09 | 2024-12-31 | 烟台开发区德联软件有限责任公司 | Mobile energy storage device charging regulation and control method and device |
CN119231508A (en) * | 2024-09-26 | 2024-12-31 | 北京智芯微电子科技有限公司 | A method, device and system for orderly charging control of distribution network |
Also Published As
Publication number | Publication date |
---|---|
CN114444802B (en) | 2024-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114444802A (en) | Electric vehicle charging guide optimization method based on graph neural network reinforcement learning | |
Li et al. | Probabilistic charging power forecast of EVCS: Reinforcement learning assisted deep learning approach | |
CN109523051B (en) | A real-time optimal scheduling method for electric vehicle charging | |
CN109347149B (en) | Microgrid energy storage scheduling method and device based on deep Q-value network reinforcement learning | |
CN104463701B (en) | A kind of distribution system and the coordinated planning method of charging electric vehicle network | |
Luo et al. | Joint deployment of charging stations and photovoltaic power plants for electric vehicles | |
CN113078641B (en) | A method and device for reactive power optimization of distribution network based on estimator and reinforcement learning | |
CN110570050A (en) | A charging guidance method for electric vehicles considering road-network-vehicle | |
Chu et al. | A multiagent federated reinforcement learning approach for plug-in electric vehicle fleet charging coordination in a residential community | |
CN106651059A (en) | Optimal configuration method for electric automobile charging pile | |
CN106130007A (en) | A kind of active distribution network energy storage planing method theoretical based on vulnerability | |
CN105591433A (en) | Electric automobile charging load optimization method based on electric automobile charging power dynamic distribution | |
CN106096757A (en) | Based on the microgrid energy storage addressing constant volume optimization method improving quantum genetic algorithm | |
CN107067190A (en) | The micro-capacitance sensor power trade method learnt based on deeply | |
CN109840635A (en) | Electric automobile charging station planing method based on voltage stability and charging service quality | |
CN112116125A (en) | A method for electric vehicle charging and navigation based on deep reinforcement learning | |
CN114707292B (en) | Analysis method for voltage stability of distribution network containing electric automobile | |
CN114123256B (en) | Distributed energy storage configuration method and system adapting to random optimization decision | |
CN115344653A (en) | A site selection method for electric vehicle charging stations based on user behavior | |
CN106408452A (en) | Optimal configuration method for electric vehicle charging station containing multiple distributed power distribution networks | |
Yang et al. | Dynamic incentive pricing on charging stations for real-time congestion management in distribution network: an adaptive model-based safe deep reinforcement learning method | |
CN117522444A (en) | V2G-based dynamic electricity price setting method and system for electric vehicle charging station | |
Ma et al. | IMOCS based EV charging station planning optimization considering stakeholders’ interests balance | |
CN112097783A (en) | Planning method for electric taxi charging navigation path based on deep reinforcement learning | |
CN107776433A (en) | A kind of discharge and recharge optimal control method of electric automobile group |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |