CN114444802A

CN114444802A - Electric vehicle charging guide optimization method based on graph neural network reinforcement learning

Info

Publication number: CN114444802A
Application number: CN202210109887.2A
Authority: CN
Inventors: 江昌旭; 卢玥君; 林铮; 邵振国
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-01-29
Filing date: 2022-01-29
Publication date: 2022-05-06
Anticipated expiration: 2042-01-29
Also published as: CN114444802B

Abstract

The present invention provides an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning, comprising the following steps: step S1: initialization of a collaborative optimization model of the power-transport fusion network; step S2: updating the electric vehicle charging load; step S3: according to The epsilon-Greedy algorithm and the graph neural network reinforcement learning algorithm generate a _i,t ; Step S4: execute the charging guide behavior strategy ai _,t ; Step S5: calculate the reward function of the graph neural network reinforcement learning algorithm; The state xi _,t of the Kov decision-making process is updated; Step S7: Store the information of the current step ( xi _,t , a _i,t , ri _,t , xi _,t ) in the memory unit D ; Step S8 : determine whether the predetermined time T _end is reached; if not, execute (2) to (7); if so, output the graph neural network reinforcement learning algorithm parameters and corresponding output results. By applying the technical solution, the total cost of electric vehicle charging can be effectively reduced, and the orderly charging of the electric vehicle and the coordinated optimal scheduling of the power system can be realized.

Description

Electric vehicle charging guidance optimization method based on graph neural network reinforcement learning

技术领域technical field

本发明涉及电力-交通融合网协同优化技术领域，特别是一种基于图神经网络强化学习的电动汽车充电引导优化方法。The invention relates to the technical field of collaborative optimization of an electric power-transportation fusion network, in particular to an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning.

背景技术Background technique

随着电动汽车规模化运行，电力系统和交通系统将会存在许多的交互融合，形成电力-交通融合网。该融合网涉及电动汽车、电力系统和交通系统等多个主体，包含了多种随机不确定因素。多个主体相互作用、多种随机因素的影响以及多种随机因素的耦合关系使得弄清电力和交通系统的交互影响机理以及解决电力-交通融合网协同优化变得更加困难。例如电动汽车用户的出行和心理行为以及驾驶行为均具有一定的随机性，这将会影响到交通系统的流量分布，使得交通流量也具有一定的不确定性，进一步影响到电动汽车达到充电站的时间，使得电动汽车的充电时间、排队时间和充电时长也具有很强的不确定性。不同于传统的电力负荷，电动汽车作为一种可移动的负荷，其随机性相比于传统的电力负荷更强，更加难以预测。With the large-scale operation of electric vehicles, there will be many interactions and integrations between the power system and the transportation system, forming a power-transportation fusion network. The fusion network involves multiple subjects such as electric vehicles, power systems, and transportation systems, and contains a variety of random uncertain factors. The interaction of multiple agents, the influence of multiple random factors, and the coupling relationship of multiple random factors make it more difficult to understand the interaction mechanism of the power and transportation systems and to solve the collaborative optimization of the power-transport fusion network. For example, the travel and psychological behavior and driving behavior of electric vehicle users have a certain degree of randomness, which will affect the flow distribution of the traffic system, making the traffic flow also have certain uncertainty, and further affecting the electric vehicle reaching the charging station. Time, making the charging time, queuing time and charging time of electric vehicles also have strong uncertainty. Different from traditional electric loads, electric vehicles, as a movable load, have stronger randomness and are more difficult to predict than traditional electric loads.

目前对电力-交通融合网研究可以分为三个研究方向：1)从电力系统角度出发，通过计算节点边际成本电价或优化充电站服务定价来引导电动汽车以最低的成本进行充电；2)从交通系统角度出发考虑充电路径优化实现充电成本最小化；3)综合考虑电动汽车、电力和交通系统的利益，通过优化电动汽车的充电策略和电力系统的调度决策实现综合效益最大化。但是现有的研究大部分属于静态优化问题，尚未考虑到电动汽车、充电站和电力系统等主体在连续时间尺度上的耦合关系；同时现有大部分研究没有考虑到多种不确定因素及其相关耦合性对电力-交通融合网协同优化的影响。更重要的是，现有的研究中没有考虑到电动汽车间交互影响对电力-交通融合网协同优化影响。At present, the research on the power-transport fusion network can be divided into three research directions: 1) From the perspective of the power system, by calculating the node marginal cost price or optimizing the charging station service pricing to guide the electric vehicle to charge at the lowest cost; 2) From the perspective of the power system From the perspective of the transportation system, the optimization of the charging path is considered to minimize the charging cost; 3) the interests of the electric vehicle, the electric power and the transportation system are comprehensively considered, and the comprehensive benefit is maximized by optimizing the charging strategy of the electric vehicle and the scheduling decision of the electric power system. However, most of the existing researches belong to static optimization problems, and have not considered the coupling relationship of electric vehicles, charging stations, and power systems on a continuous time scale. The influence of correlation coupling on the collaborative optimization of power-transportation fusion network. More importantly, the existing research has not considered the impact of the interaction between electric vehicles on the synergistic optimization of the power-transportation fusion network.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供一种基于图神经网络强化学习的电动汽车充电引导优化方法，能够有效的在考虑电力-交通融合网多种不确定性因素的情况下，能够有效地降低电动汽车充电总成本，实现电动汽车的有序充电以及电力系统协同优化调度。In view of this, the purpose of the present invention is to provide an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning, which can effectively reduce the number of uncertainties in the power-transportation fusion network. The total cost of electric vehicle charging, realizing the orderly charging of electric vehicles and the coordinated optimal scheduling of the power system.

为实现上述目的，本发明采用如下技术方案：基于图神经网络强化学习的电动汽车充电引导优化方法，包括如下步骤：In order to achieve the above purpose, the present invention adopts the following technical scheme: an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning, comprising the following steps:

步骤S1：电力-交通融合网协同优化模型初始化；Step S1: initialization of the collaborative optimization model of the power-transport fusion network;

步骤S2：更新电动汽车充电负荷，并基于二阶锥松弛优化及对偶理论对电动汽车充电站所在的节点的边际成本电价进行优化计算；Step S2: updating the electric vehicle charging load, and optimizing the marginal cost electricity price of the node where the electric vehicle charging station is located based on the second-order cone relaxation optimization and dual theory;

步骤S3：根据epsilon-Greedy算法和图神经网络强化学习算法生成电动汽车充电引导行为策略a_i,t；Step S3: according to the epsilon-Greedy algorithm and the graph neural network reinforcement learning algorithm, the electric vehicle charging guidance behavior strategy a _i,t is generated;

步骤S4：执行充电引导行为策略a_i,t，并对电动汽车的状态进行判断和更新；Step S4: executing the charging guidance behavior strategy a _i,t , and judging and updating the state of the electric vehicle;

步骤S5：根据电力-交通融合环境计算图神经网络强化学习算法的奖励函数；Step S5: Calculate the reward function of the graph neural network reinforcement learning algorithm according to the power-transport fusion environment;

步骤S6：部分观测马尔科夫决策过程的状态x_i,t更新；Step S6: update the state x _{i, t} of the partially observed Markov decision process;

步骤S7：将当前步的信息(x_i,t,a_i,t,r_i,t,x_i,t’)存储于记忆单元D中，并基于随机梯度下降的方法对图神经网络强化学习算法权重进行更新；其中，x_i,t,表示图神经网络强化学习当前状态；a_i,t表示电动汽车行为策略；r_i,t表示图神经网络强化学习的奖励函数值；x_i,t’表示图神经网络强化学习下一步状态；Step S7: Store the information of the current step (x _i,t ,a _i,t ,r _i,t , _xi,t ') in the memory unit D, and perform reinforcement learning on the graph neural network based on the stochastic gradient descent method The algorithm weights are updated; among them, x _i,t , represents the current state of the reinforcement learning of the graph neural network; a _i,t represents the electric vehicle behavior strategy; ri _,t represents the reward function value of the reinforcement learning of the graph neural network; _xi,t 'represents the next state of the reinforcement learning of the graph neural network;

步骤S8：判断是否达到预定的时间T_end；若否，则执行(2)～(7)；若是，则输出图神经网络强化学习算法参数和相应输出结果。Step S8: determine whether the predetermined time T _end is reached; if not, execute (2) to (7); if so, output the parameters of the graph neural network reinforcement learning algorithm and the corresponding output result.

在一较佳的实施例中，对电力-交通融合网协同优化模型初始化，包括以下步骤：In a preferred embodiment, initializing the collaborative optimization model of the power-transport fusion network includes the following steps:

步骤21：电力网络和交通网络拓扑结构和参数确定，包括电力系统节点、线路、初始电压、优化的上下限值，交通网络包括交通节点、道路参数、容量及行驶速度最大值；Step 21: Determine the topology and parameters of the power network and the transportation network, including power system nodes, lines, initial voltages, and optimized upper and lower limits, and the transportation network includes traffic nodes, road parameters, capacity, and maximum travel speed;

步骤22：神经网络参数初始化，包括神经网络权重初始化和超参数设置，如学习速率α、折扣因子γ、批大小B和记忆单元D容量大小；Step 22: Neural network parameter initialization, including neural network weight initialization and hyperparameter settings, such as learning rate α, discount factor γ, batch size B and memory unit D capacity;

步骤23：将研究区域中的每辆电动汽车看做一个代理，并将其视为一个节点n∈N，将电动汽车间的连接视为边e∈E，以此构成图网络结构G＝(N,E)，并对每辆电动汽车i在当前状态x_i,t和邻接矩阵A进行初始化。Step 23: Consider each electric vehicle in the study area as an agent, and regard it as a node n∈N, and regard the connection between electric vehicles as an edge e∈E, so as to form a graph network structure G=( N, E), and initialize the current state x _{i, t} and the adjacency matrix A of each electric vehicle i.

在一较佳的实施例中，更新电动汽车充电负荷和基于二阶锥松弛优化及对偶理论对电动汽车充电站所在的节点的边际成本电价进行优化计算步骤包括：In a preferred embodiment, the steps of updating the electric vehicle charging load and optimizing the marginal cost electricity price of the node where the electric vehicle charging station is located based on second-order cone relaxation optimization and dual theory include:

步骤31：更新电动汽车充电负荷：根据充电站中的电动汽车数量和充电功率计算各个充电站充电负荷，得到各个站的充电负荷后加上该节点的基础负荷即可以获得该节点的最终用电负荷；Step 31: Update the charging load of electric vehicles: Calculate the charging load of each charging station according to the number of electric vehicles in the charging station and the charging power, obtain the charging load of each station and add the basic load of the node to obtain the final power consumption of the node load;

步骤32：建立基于支路潮流模型的配电网最优潮流模型：Step 32: Establish the optimal power flow model of the distribution network based on the branch power flow model:

min f(p,q,P,Q,V,I) (1)min f(p,q,P,Q,V,I) (1)

式中，E_N和E_L分别表示配电网节点和线路集合；P_ij和Q_ij表示从节点i流向节点j的支路有功功率和无功功率；P_jk表示从节点j流向节点k的支路有功功率；

和

表示发电机有功和无功出力，即注入到节点j的有功功率和无功功率；

和

表示风机注入到节点j的有功功率和无功功率；Q_js表示从节点j流向节点s的支路无功功率；r_ij和x_ij表示从节点i到节点j的支路电阻和电抗；I_ij表示从节点i到节点j的支路电流；π(j)表示与节点j相连的支路集合；

和

表示连接在节点j上的有功负荷和无功负荷；V_i表示节点i的电压幅值；V_j表示节点j的电压幅值；z_ij表示连接节点i和节点j的支路阻抗，满足z_ij＝r_ij+jx_ij；

表示连接节点i和节点j的支路电流最大值；V _j和

表示节点j的最小和最大电压；

表示连接到节点j的风机最大有功出力；

表示连接到节点j的风机的功率因素；In the formula, _{EN and EL represent the distribution network nodes and line sets, respectively; P ij} _and Q _ij represent the branch active power and reactive power flowing from node i to node j; _P _jk represent the flow from node j to node k. branch active power;

and

Represents the active and reactive output of the generator, that is, the active power and reactive power injected into node j;

and

represents the active power and reactive power injected by the fan into node j; Q _js represents the branch reactive power flowing from node j to node s; r _ij and x _ij represent the branch resistance and reactance from node i to node j; I _ij represents the branch current from node i to node j; π(j) represents the set of branches connected to node j;

and

represents the active load and reactive load connected to node j; V _i represents the voltage amplitude of node i; V _j represents the voltage amplitude of node j; z _ij represents the branch impedance connecting node i and node j, satisfying z _ij =r _ij +jx _ij ;

represents the maximum value of the branch current connecting node i and node j; V _j and

represents the minimum and maximum voltage of node j;

Represents the maximum active power output of the fan connected to node j;

represents the power factor of the fan connected to node j;

配电网节点j的负荷

包括基础负荷

和电动汽车充电负荷

即The load of distribution network node j

including base load

and electric vehicle charging load

which is

根据配电网实际需求，其目标函数min f(p,q,P,Q,V,I)可以最终定义为：According to the actual demand of the distribution network, the objective function min f(p,q,P,Q,V,I) can be finally defined as:

式中，

表示注入节点i发电机的有功出力；a_i和b_i分别表示发电机的二次煤耗和一次煤耗系数；

和

分别从主网中购买电量的电价和有功功率；In the formula,

represents the active power output of the generator injected into node i; a _i and b _i represent the secondary coal consumption and primary coal consumption coefficient of the generator, respectively;

and

The electricity price and active power of electricity purchased from the main network, respectively;

步骤33：将以上非线性配电网最优潮流模型转换为二阶锥松弛规划模型：Step 33: Convert the above nonlinear distribution network optimal power flow model into a second-order cone relaxation programming model:

由于BFM-OPF是非线性规划模型，令支路电流幅值

以及支路电压幅值

并对式进行二阶锥松弛(SOCR)转换，可以得到以下模型：Since BFM-OPF is a nonlinear programming model, let the branch current amplitude

and the branch voltage amplitude

And the second-order cone relaxation (SOCR) transformation of the formula, the following model can be obtained:

式中||·||₂表示二阶锥操作；上式-构成了松弛后的配电网最优潮流基本形式；where ||·|| ₂ represents the second-order cone operation; the above formula - constitutes the basic form of the optimal power flow of the distribution network after relaxation;

步骤34：采用Gurobi求解器求解上述模型的原问题和对偶变量，获取充电站所在节点的边际成本电价λ_k。Step 34: Use the Gurobi solver to solve the original problem and dual variables of the above model, and obtain the marginal cost electricity price λ _k of the node where the charging station is located.

在一较佳的实施例中，所述epsilon-Greedy算法包括以下步骤：In a preferred embodiment, the epsilon-Greedy algorithm includes the following steps:

步骤41：生成一个随机数u，判断其与epsilon-Greedy算法的衰退因子ξ的大小；Step 41: Generate a random number u, and judge the size of it and the decay factor ξ of the epsilon-Greedy algorithm;

步骤42：若u<ξ，则采用随机的方式在当前状态对每辆电动汽车生成一个行为a_i,t，该行为在专利中表示电动汽车充电路径策略；Step 42: If u<ξ, generate a behavior a _i,t for each electric vehicle in the current state in a random manner, and this behavior represents the electric vehicle charging path strategy in the patent;

a_i,t＝randint(N_action) (19)a _i,t = randint(N _action ) (19)

式中，N_action表示电动汽车行为决策的数量；where N _action represents the number of EV behavior decisions;

步骤43：若u≥ξ，则根据图神经网络强化学习算法的经验对每辆电动汽车i在当前状态x_i,t和邻接矩阵A下生成一个行为a_i,t，即Step 43: If u≥ξ, generate a behavior a _i,t for each electric vehicle i under the current state x _i,t and the adjacency matrix A according to the experience of the graph neural network reinforcement learning algorithm, that is,

式中，θ_t表示图神经网络强化学习算法的参数；argmax()表示取最大值对应的参数操作；x_i,t表示第i辆电动汽车在时间t时的状态，其主要由时间t时第i辆电动汽车的状态x_i,t由电动汽车状态EV_i,t、近邻交通道路信息Ro_i,t、近邻电动汽车状态Ne_i,t和各充电站信息CS_t组成，即In the formula, θ _t represents the parameters of the graph neural network reinforcement learning algorithm; argmax() represents the parameter operation corresponding to the maximum value; x _{i, t} represents the state of the i-th electric vehicle at time t, which is mainly determined by time t. The state x _i,t of the i-th electric vehicle is composed of the electric vehicle state EV _i,t , the neighboring traffic road information Ro _i,t , the neighboring electric vehicle state Ne _i,t and the information CS _t of each charging station, namely

x_i,t＝[EV_i,t,Ro_i,t,Ne_i,t,CS_t] (21)x _i,t =[EV _i,t ,Ro _i,t ,Ne _i,t ,CS _t ] (21)

式中，第i辆电动汽车状态EV_i,t包括电动汽车前往充电站时的下一节点

道路编号

电动汽车行驶速度v_i,t和剩余电量SOC_i,t；近邻交通道路信息状态Ro_i,t包括与电动汽车i所在下一节点

相连的下一条道路的起始节点

末节点

道路长度

以及道路上的电动车数量

近邻电动汽车状态Ne_i,t包括各近邻电动汽车k的状态，如与第i辆电动汽车临近的第k辆电动汽车下一节点

其所在的道路编号

电动汽车行驶速度v_i,k,t和剩余电量SOC_i,k,t；充电站信息CS_t包括各充电站的充电电价p_c,t和电动汽车数量

In the formula, the state EV _i,t of the ith electric vehicle includes the next node when the electric vehicle goes to the charging station

road number

Electric vehicle driving speed v _i,t and remaining power SOC _i,t ; neighbor traffic road information state Ro _i,t includes the next node where electric vehicle i is located

The starting node of the next connected road

end node

road length

and the number of electric vehicles on the road

Neighboring electric vehicle state Ne _i,t includes the state of each neighboring electric vehicle k, such as the next node of the k-th electric vehicle adjacent to the i-th electric vehicle

the road number it is on

Electric vehicle running speed v _i,k,t and remaining power SOC _i,k,t ; charging station information CS _t includes charging electricity price p _c,t of each charging station and the number of electric vehicles

所述图神经网络强化学习算法其神经网络结构包括一层的输入层，一层的全连接层对输入的状态x_i,t进行特征提取x_i,t’，然后将提出的特征x_i,t’和邻接矩阵A一起输入到两层的图神经网络中再进行特征提取，最后连接一层全连接层对电动汽车充电路径策略a_i,t进行输出；其中，所述的图神经网络采用的是图注意力网络。The neural network structure of the graph neural network reinforcement learning algorithm includes an input layer of one layer, and a fully connected layer of one layer performs feature extraction x _{i, t} ' on the input state x _{i, t} , and then the proposed feature x _i , t ' is extracted. _t ' and the adjacency matrix A are input into the two-layer graph neural network for feature extraction, and finally a fully connected layer is connected to output the electric vehicle charging path strategy a _{i, t} ; wherein, the graph neural network adopts is the graph attention network.

在一较佳的实施例中，所述图神经网络强化学习算法的奖励函数r_i,t如式所示：In a preferred embodiment, the reward function ri _,t of the graph neural network reinforcement learning algorithm is shown in the formula:

式中，node_cur和node_tar表示电动汽车所在当前节点和电动汽车将要前往的任一充电站节点，step表示电动汽车已经行驶的步数；penalty表示一个很大的惩罚因子；w_i表示第i辆电动汽车的单位时间成本；

和

分别表示在时间t时第i辆电动汽车前往第k个充电站时的行驶时间、充电等待时间和充电所需时间；λ_k,t表示在时间t时充电站k所在节点的边际成本电价；SOC_i,k,t表示在时间t时第i辆电动汽车达到充电站k时的剩余电量SOC_i,k,t；

表示第i辆电动汽车电池额定容量；In the formula, node _cur and node _tar represent the current node where the electric vehicle is located and any charging station node that the electric vehicle will go to, step represents the number of steps the electric vehicle has traveled; penalty represents a large penalty factor; w _i represents the i -th The unit time cost of an electric vehicle;

and

λ _{k, t} represent the marginal cost electricity price of the node where the charging station k is located at time t; SOC _i,k,t represents the remaining power SOC _{i,k,t when the ith electric vehicle reaches the charging station k at time t} ;

Indicates the rated capacity of the i-th electric vehicle battery;

从式可以看出该奖励函数r_i,t是一个分段函数；若第i辆电动汽车没有到达充电站node_cur≠node_tar并且当前电动汽车前往充电站的步数在给定的最大充电步数内step＜N_step，此时其奖励函数r_i,t＝0；若第i辆电动汽车前往充电站的步数大于或等于给定的最大充电步数step≥N_step，表明该次充电行为探索失败，此时给予其一个较大的负奖励r_i,t＝-penalty；若第i辆电动汽车到达充电站node_cur＝node_tar并且当前电动汽车前往充电站的步数在给定的最大充电步数内step＜N_step，此时其奖励函数根据电动汽车行驶时间

和充电时间

以及充电时电费来计算；It can be seen from the formula that the reward function ri _,t is a piecewise function; if the i-th electric vehicle does not arrive at the charging station node _cur ≠node _tar and the current number of steps of the electric vehicle to the charging station is within the given maximum charging step step<N _step in the number, then its reward function ri _,t = 0; if the number of steps taken by the i-th electric vehicle to the charging station is greater than or equal to the given maximum number of charging steps step≥N _step , it indicates that this charging time If the behavior exploration fails, a large negative reward ri _,t =-penalty is given to it; if the ith electric vehicle reaches the charging station node _cur = node _tar and the current number of steps of the electric vehicle to the charging station is within the given value step<N _step within the maximum number of charging steps, at this time, the reward function is based on the driving time of the electric vehicle

and charging time

And the electricity charge when charging is calculated;

第i辆电动汽车在路段a的通行时间t_a,t根据美国联邦公路局函数(bureau ofpublic roads,BPR)来计算，即The travel time t _a,t of the i-th electric vehicle on road segment a is calculated according to the U.S. Federal Highway Administration (BPR) function, that is,

式中，n_a,t表示t时刻路段a上的电动汽车数量；c_a和

分别表示路段a的容量上限和t时刻电动汽车自由通行时间；由此可以得到第i辆电动汽车前往充电站k所需时间

即In the formula, n _{a, t} represent the number of electric vehicles on road segment a at time t; c _a and

respectively represent the upper limit of the capacity of road section a and the free passage time of electric vehicles at time t; from this, the time required for the i-th electric vehicle to go to charging station k can be obtained.

which is

此外，第i辆电动汽车的充电等待时间

可以通过式得到；In addition, the charging waiting time of the i-th electric vehicle

can be obtained by the formula;

式中，SOC_t表示电动汽车剩余电量；

表示电动汽车电池的额定容量；η表示充电功率因素，P^charging表示电动汽车充电的额定功率。In the formula, SOC _t represents the remaining power of the electric vehicle;

Represents the rated capacity of the electric vehicle battery; η represents the charging power factor, and P ^charging represents the rated power of the electric vehicle charging.

在一较佳的实施例中，所述基于随机梯度下降的方法对图神经网络强化学习算法权重进行更新包括：In a preferred embodiment, the stochastic gradient descent-based method for updating the weights of the graph neural network reinforcement learning algorithm includes:

步骤61：从记忆单元D中随机抽取一定数量的样本Sample；Step 61: Randomly extract a certain number of samples from the memory unit D;

步骤62：构建损失函数如式所示，并在抽取的样本Sample下根据随机梯度下降方法对图神经网络强化学习算法权重进行更新如式所示；Step 62: Construct the loss function as shown in the formula, and update the weights of the graph neural network reinforcement learning algorithm according to the stochastic gradient descent method under the sampled sample as shown in the formula;

式中，x,a,x'和a'分别为当前状态、动作以及下一时刻的状态和动作；r表示图神经网络强化学习的立即奖励；θ_t表示当前时刻t的图神经网络强化学习算法参数；0≤γ≤1表示折扣因子，其反映未来Q值对当前动作的影响；

表示在目标图神经网络强化学习算法参数θ′_t下的状态-动作值；In the formula, x, a, x' and a' are the current state, action and state and action at the next moment, respectively; r represents the immediate reward of the reinforcement learning of the graph neural network; θ _t represents the reinforcement learning of the graph neural network at the current moment t Algorithm parameters; 0≤γ≤1 represents the discount factor, which reflects the influence of the future Q value on the current action;

represents the state-action value under the target graph neural network reinforcement learning algorithm parameter θ′ _t ;

式中，θ_t表示当前时刻t的图神经网络强化学习算法参数；

表示对θ_t进行求导操作；α表示学习速率；In the formula, θ _t represents the parameters of the reinforcement learning algorithm of the graph neural network at the current time t;

Represents the derivation operation on θ _t ; α represents the learning rate;

步骤63：每经过一定的步数根据当前图神经网络强化学习参数θ_t对目标图神经网络强化学习参数θ′_t进行更新。Step 63: Update the target graph neural network reinforcement learning parameter θ′ _t according to the current graph neural network reinforcement learning parameter θ _t every time a certain number of steps pass.

与现有技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明提供了一种基于图神经网络强化学习的电动汽车充电引导优化方法，基于图理论将电动汽车间的相互影响关系转换为一种动态网络图结构，提出一种基于注意力机制的图神经网络强化学习来处理不规则非欧式结构数据，以此研究多智能体间的沟通、协作，探讨电动汽车间的相互影响。在考虑可再生能源出力的主动配电网基础上，通过二阶锥优化及对偶优化理论对配电网最优潮流进行求解并得到配电网节点边际成本电价，以此研究电力-交通融合网协同优化。所提出的基于图神经网络强化学习的电动汽车充电引导优化方法能够有效的在考虑电力-交通融合网多种不确定性因素的情况下，能够有效地降低电动汽车充电总成本，实现电动汽车的有序充电以及电力系统协同优化调度。The invention provides an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning. Based on graph theory, the mutual influence relationship between electric vehicles is converted into a dynamic network graph structure, and a graph neural network based on attention mechanism is proposed. Network reinforcement learning is used to deal with irregular non-European structure data, in order to study the communication and cooperation between multi-agents, and to explore the mutual influence between electric vehicles. Based on the active distribution network considering the output of renewable energy, the optimal power flow of the distribution network is solved through the second-order cone optimization and dual optimization theory, and the marginal cost price of the distribution network node is obtained, so as to study the power-transportation integrated network. Collaborative optimization. The proposed optimization method of electric vehicle charging guidance based on the reinforcement learning of graph neural network can effectively reduce the total cost of electric vehicle charging and realize the electric vehicle charging under the condition of considering various uncertain factors of the power-transport fusion network. Orderly charging and coordinated optimal scheduling of the power system.

附图说明Description of drawings

图1为本发明优选实施例的基于图神经网络强化学习的电动汽车充电引导优化方法流程图。FIG. 1 is a flow chart of an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning according to a preferred embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.

应该指出，以下详细说明都是例示性的，旨在对本申请提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the application. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本申请的示例性实施方式；如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are for the purpose of describing particular embodiments only and are not intended to limit exemplary embodiments in accordance with the present application; as used herein, unless the context clearly dictates otherwise, the singular forms are also intended to include Plural forms, furthermore, should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and/or combinations thereof.

如图1所示，是本发明一种基于图神经网络强化学习的电动汽车充电引导优化方法，包括如下步骤：As shown in FIG. 1, it is an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning of the present invention, which includes the following steps:

S11：电力-交通融合网协同优化模型初始化；S11: initialization of the collaborative optimization model of the power-transport fusion network;

S12：更新电动汽车充电负荷，并基于二阶锥松弛优化及对偶理论对电动汽车充电站所在的节点的边际成本电价进行优化计算；S12: Update the electric vehicle charging load, and optimize the marginal cost electricity price of the node where the electric vehicle charging station is located based on the second-order cone relaxation optimization and dual theory;

S13：根据epsilon-Greedy算法和图神经网络强化学习算法生成电动汽车充电引导行为策略a_i,t；S13: Generate an electric vehicle charging guidance behavior strategy a _i,t according to the epsilon-Greedy algorithm and the graph neural network reinforcement learning algorithm;

S14：执行充电引导行为策略a_i,t，并对电动汽车的状态进行判断和更新；S14: Execute the charging guidance behavior strategy a _i,t , and judge and update the state of the electric vehicle;

S15：根据电力-交通融合环境计算图神经网络强化学习算法的奖励函数；S15: Calculate the reward function of the graph neural network reinforcement learning algorithm according to the power-transport fusion environment;

S16：部分观测马尔科夫决策过程的状态x_i,t更新；S16: The state x _{i, t} of the partially observed Markov decision process is updated;

S17：将当前步的信息(x_i,t,a_i,t,r_i,t,x_i,t’)存储于记忆单元D中，并基于随机梯度下降的方法对图神经网络强化学习算法权重进行更新；S17: Store the information of the current step (x _i,t ,a _i,t ,r _i,t , _xi,t ') in the memory unit D, and strengthen the learning algorithm of the graph neural network based on the method of stochastic gradient descent weights are updated;

S18：判断是否达到预定的时间T_end。若否，则执行(2)～(7)；若是，则输出图神经网络强化学习算法参数和相应输出结果。S18: Determine whether the predetermined time T _end is reached. If not, execute (2) to (7); if so, output the parameters of the graph neural network reinforcement learning algorithm and the corresponding output results.

具体的：specific:

一、电力-交通融合网协同优化模型初始化。主要的步骤包括电力网络和交通网络拓扑结构和参数确定，包括电力系统节点、线路、初始电压、优化的上下限值，交通网络包括交通节点、道路参数、容量、行驶速度最大值等。1. Initialization of the collaborative optimization model of the power-transport fusion network. The main steps include the determination of power network and traffic network topology and parameters, including power system nodes, lines, initial voltages, and optimized upper and lower limits. The traffic network includes traffic nodes, road parameters, capacity, and maximum driving speed.

神经网络参数初始化，包括神经网络权重初始化和超参数设置，如学习速率α、折扣因子γ、批大小B和记忆单元容量大小D；Neural network parameter initialization, including neural network weight initialization and hyperparameter settings, such as learning rate α, discount factor γ, batch size B and memory cell capacity size D;

将研究区域中的每辆电动汽车看做一个代理，并将其视为一个节点n∈N，将电动汽车间的连接视为边e∈E，以此构成图网络结构G＝(N,E)，并对每辆电动汽车i在当前状态x_i,t和邻接矩阵A进行初始化。Each electric vehicle in the study area is regarded as an agent, and it is regarded as a node n∈N, and the connection between electric vehicles is regarded as an edge e∈E, so as to form a graph network structure G=(N,E ), and initialize the current state _xi,t and adjacency matrix A for each electric vehicle i.

二、更新电动汽车充电负荷，并基于二阶锥松弛优化及对偶理论对电动汽车充电站所在的节点的边际成本电价进行优化计算。主要包括以下步骤：2. Update the electric vehicle charging load, and optimize the marginal cost electricity price of the node where the electric vehicle charging station is located based on the second-order cone relaxation optimization and dual theory. It mainly includes the following steps:

步骤21：更新电动汽车充电负荷：根据充电站中的电动汽车数量和充电功率计算各个充电站充电负荷，得到各个站的充电负荷后加上该节点的基础负荷即可以获得该节点的最终用电负荷；Step 21: Update the electric vehicle charging load: Calculate the charging load of each charging station according to the number of electric vehicles in the charging station and the charging power, and then add the base load of the node to obtain the final power consumption of the node after obtaining the charging load of each station. load;

步骤22：建立基于支路潮流模型的配电网最优潮流模型：Step 22: Establish the optimal power flow model of the distribution network based on the branch power flow model:

min f(p,q,P,Q,V,I) (1)min f(p,q,P,Q,V,I) (1)

式中，E_N和E_L分别表示配电网节点和线路集合；P_ij和Q_ij表示从节点i流向节点j的支路有功功率和无功功率；

和

和

表示风机注入到节点j的有功功率和无功功率；r_ij和x_ij表示从节点i到节点j的支路电阻和电抗；I_ij表示从节点i到节点j的支路电流；π(j)表示与节点j相连的支路集合；

和

表示连接在节点j上的有功负荷和无功负荷；V_i表示节点i的电压幅值；z_ij表示连接节点i和节点j的支路阻抗，满足z_ij＝r_ij+jx_ij；

表示连接节点i和节点j的支路电流最大值；V _j和

表示节点j的最小和最大电压；

表示连接到节点j的风机最大有功出力；

表示连接到节点j的风机的功率因素。In the formula, EN and _{EL represent the distribution network nodes and line sets, respectively; P ij and Q ij} _represent _the _branch active power and reactive power flowing from node i to node j;

and

Represents the active power and reactive power injected by the fan into node j; r _ij and x _ij represent the branch resistance and reactance from node i to node j; I _ij represents the branch current from node i to node j; π(j ) represents the set of branches connected to node j;

and

represents the active load and reactive load connected to node j; V _i represents the voltage amplitude of node i; z _ij represents the branch impedance connecting node i and node j, satisfying zi _ij =r _ij +jx _ij ;

represents the minimum and maximum voltage of node j;

Represents the maximum active power output of the fan connected to node j;

represents the power factor of the fan connected to node j.

配电网节点j的负荷

包括基础负荷

和电动汽车充电负荷

即The load of distribution network node j

including base load

and electric vehicle charging load

which is

式中，a_i和b_i分别表示发电机的二次煤耗和一次煤耗系数；

和

分别从主网中购买电量的电价和有功功率。In the formula, a _i and b _i represent the secondary coal consumption and primary coal consumption coefficient of the generator, respectively;

and

The electricity price and active power of electricity purchased from the main network, respectively.

步骤23、将以上非线性配电网最优潮流模型转换为二阶锥松弛规划模型：Step 23: Convert the above nonlinear distribution network optimal power flow model into a second-order cone relaxation programming model:

由于BFM-OPF是非线性规划模型，令

以及

并对式进行二阶锥松弛(SOCR)转换，可以得到以下模型：Since BFM-OPF is a nonlinear programming model, let

as well as

式中||·||₂表示二阶锥操作；上式-构成了松弛后的配电网最优潮流基本形式。In the formula ||·|| ₂ represents the second-order cone operation; the above formula - constitutes the basic form of the optimal power flow of the distribution network after relaxation.

步骤24、采用Gurobi求解器求解上述模型的原问题和对偶变量，获取充电站所在节点的边际成本电价λ_k。Step 24: Use the Gurobi solver to solve the original problem and dual variables of the above model, and obtain the marginal cost electricity price λ _k of the node where the charging station is located.

三、根据epsilon-Greedy算法和图神经网络强化学习算法生成电动汽车充电引导行为策略a_i,t。主要包括以下步骤：3. Generate electric vehicle charging guidance behavior strategy a _i,t according to epsilon-Greedy algorithm and graph neural network reinforcement learning algorithm. It mainly includes the following steps:

步骤31：生成一个随机数u，判断其与epsilon-Greedy算法的衰退因子ξ的大小。Step 31: Generate a random number u, and determine its size with the decay factor ξ of the epsilon-Greedy algorithm.

步骤32：若u<ξ，则采用随机的方式在当前状态对每辆电动汽车生成一个行为a_i,t，该行为在专利中表示电动汽车充电路径策略；Step 32: If u<ξ, generate a behavior a _i,t for each electric vehicle in the current state in a random manner, which represents the electric vehicle charging path strategy in the patent;

a_i,t＝randint(N_action) (19)a _i,t = randint(N _action ) (19)

式中，N_action表示电动汽车行为决策的数量。where N _action represents the number of EV behavior decisions.

步骤33：若u≥ξ，则根据图神经网络强化学习算法的经验对每辆电动汽车i在当前状态x_i,t和邻接矩阵A下生成一个行为a_i,t，即Step 33: If u≥ξ, according to the experience of the graph neural network reinforcement learning algorithm, a behavior a _i,t is generated for each electric vehicle i under the current state x _i,t and the adjacency matrix A, that is,

道路编号

相连的下一条道路的起始节点

末节点

道路长度

以及道路上的电动车数量

其所在的道路编号

road number

The starting node of the next connected road

end node

road length

and the number of electric vehicles on the road

the road number it is on

所述图神经网络强化学习算法其神经网络结构包括一层的输入层，一层的全连接层对输入的状态x_i,t进行特征提取x_i,t’，然后将提出的特征x_i,t’和邻接矩阵A一起输入到两层的图神经网络中再进行特征提取，最后连接一层全连接层对电动汽车充电路径策略a_i,t进行输出。其中，本专利所述的图神经网络采用的是图注意力网络。The neural network structure of the graph neural network reinforcement learning algorithm includes an input layer of one layer, and a fully connected layer of one layer performs feature extraction x _{i, t} ' on the input state x _{i, t} , and then the proposed feature x _i , t ' is extracted. _t ' and the adjacency matrix A are input into a two-layer graph neural network for feature extraction, and finally a fully connected layer is connected to output the electric vehicle charging path strategy a _i,t . Among them, the graph neural network described in this patent adopts the graph attention network.

四、执行充电引导行为策略a_i,t，并对电动汽车的状态进行判断和更新。电动汽车的状态分为三种：决策状态、运行状态和充电状态。如果电动汽车抵达交叉路口node_cur＝node_next并且该路口不是充电站节点node_cur≠node_tar，此时电动处于决策状态，电动汽车执行充电引导行为策略a_i,t，并更新道路状态如电动汽车数量、行驶理想速度，更新电动汽车状态如所在道路位置、行驶速度和距离等信息；若电动汽车没有抵达交叉路口node_cur≠node_next，此时电动汽车处于运行状态，即电动汽车按照上一步的充电引导策略a_i,t-1继续沿着当前的道路向前行驶，并更新此时的电动汽车位置信息、速度信息和SOC状态；若电动汽车所在节点位置充电站节点上node_cur＝node_tar，此时电动汽车处于充电状态，若当前电动汽车数量大于充电站中充电桩的数量时，电动汽车需要排队等待进行充电，若充电站中有可用充电桩使用时，则电动汽车立即进行充电，并更新电动汽车充电等待时间、充电时间和电动汽车SOC状态。Fourth, implement the charging guidance behavior strategy a _i,t , and judge and update the state of the electric vehicle. There are three states of electric vehicles: decision state, running state and charging state. If the electric vehicle arrives at the intersection node _cur = node _next and the intersection is not a charging station node node _cur ≠node _tar , at this time the electric vehicle is in a decision-making state, the electric vehicle executes the charging guidance behavior strategy a _i,t , and updates the road state such as the electric vehicle Quantity, ideal driving speed, update the information of the electric vehicle status such as the road location, driving speed and distance; if the electric vehicle does not reach the intersection node _cur ≠ node _next , the electric vehicle is in the running state at this time, that is, the electric vehicle is in the running state according to the previous step. The charging guidance strategy a _{i, t-1} continues to drive forward along the current road, and updates the electric vehicle position information, speed information and SOC status at this time; if the electric vehicle is located at the node position of the charging station node _cur = node _tar , At this time, the electric vehicle is in the charging state. If the current number of electric vehicles is greater than the number of charging piles in the charging station, the electric vehicle needs to wait in line for charging. If there are available charging piles in the charging station, the electric vehicle will be charged immediately. And update EV charging waiting time, charging time and EV SOC status.

五、根据电力-交通融合环境计算图神经网络强化学习算法的奖励函数。具体地，奖励函数r_i,t是一个分段函数：若第i辆电动汽车没有到达充电站node_cur≠node_tar并且当前电动汽车前往充电站的步数在给定的最大充电步数内step＜N_step，此时其奖励函数r_i,t＝0；若第i辆电动汽车前往充电站的步数大于或等于给定的最大充电步数step≥N_step，表明该次充电行为探索失败，此时给予其一个较大的负奖励r_i,t＝-penalty；若第i辆电动汽车到达充电站node_cur＝node_tar并且当前电动汽车前往充电站的步数在给定的最大充电步数内step＜N_step，此时其奖励函数根据电动汽车行驶时间

充电等待时间

充电时间

以及充电时电费来计算，具体计算表达式如所示。5. Calculate the reward function of the graph neural network reinforcement learning algorithm according to the power-transport fusion environment. Specifically, the reward function r _i,t is a piecewise function: if the ith electric vehicle does not arrive at the charging station node _cur ≠ node _tar and the current number of steps of the electric vehicle to the charging station is within the given maximum number of charging steps <N _step , the reward function ri _,t = 0 at this time; if the number of steps taken by the i-th electric vehicle to the charging station is greater than or equal to the given maximum number of charging steps step≥N _step , it indicates that the exploration of the charging behavior has failed. , at this time give it a large negative reward ri _,t =-penalty; if the i-th electric vehicle arrives at the charging station node _cur = node _tar and the current number of steps of the electric vehicle to the charging station is within the given maximum charging step Step<N _step in the number, at this time, its reward function is based on the driving time of the electric vehicle

Charging waiting time

charging time

And the electricity cost during charging is calculated, and the specific calculation expression is as shown.

行驶时间

充电等待时间

充电时间

计算表达式如-所示。travel time

Charging waiting time

charging time

The calculation expression is as shown in -.

第i辆电动汽车在路段a的通行时间根据美国联邦公路局函数(bureau of publicroads,BPR)来计算，即The transit time of the i-th electric vehicle on road segment a is calculated according to the U.S. Federal Highway Administration (BPR) function, that is,

式中，n_a,t表示t时刻路段a上的电动汽车数量；c_a和

分别表示路段a的容量上限和t时刻电动汽车自由通行时间。由此可以得到第i辆电动汽车前往充电站k所需时间

respectively represent the upper limit of the capacity of road segment a and the free passage time of electric vehicles at time t. From this, the time required for the i-th electric vehicle to go to the charging station k can be obtained.

which is

此外，第i辆电动汽车的充电等待时间

可以通过式得到。In addition, the charging waiting time of the i-th electric vehicle

can be obtained by formula.

式中，SOC_t表示电动汽车的剩余电量；

表示电动汽车电池额定容量；η表示充电功率因素，P^charging表示电动汽车充电的额定功率。In the formula, SOC _t represents the remaining power of the electric vehicle;

六、部分观测马尔科夫决策过程的状态x_i,t更新，包括更新电动汽车状态EV_i,t、近邻交通道路信息Ro_i,t、近邻电动汽车状态Ne_i,t和各充电站信息CS_t。6. Partially observe the update of the state x _i,t of the Markov decision process, including the update of the electric vehicle state EV _i,t , the neighbor traffic road information Ro _i,t , the neighbor electric vehicle state Ne _i,t and the charging station information CS _t .

七、将当前步的信息(x_i,t,a_i,t,r_i,t,x_i,t’)存储于记忆单元D中，并基于随机梯度下降的方法对图神经网络强化学习算法权重进行更新。其主要包括以下步骤：7. Store the information of the current step ( _xi,t ,ai _,t ,ri _,t , _xi,t ') in the memory unit D, and strengthen the learning algorithm of the graph neural network based on the method of stochastic gradient descent Weights are updated. It mainly includes the following steps:

步骤71：从记忆单元D中随机抽取一定数量的样本Sample；Step 71: Randomly extract a certain number of samples from the memory unit D;

步骤72：构建损失函数如式所示，并在抽取的样本Sample下根据随机梯度下降方法对图神经网络强化学习算法权重进行更新如式所示；Step 72: Construct the loss function as shown in the formula, and update the weights of the graph neural network reinforcement learning algorithm according to the stochastic gradient descent method under the sampled sample as shown in the formula;

式中，x,a,x'和a'分别为当前状态、动作以及下一时刻的状态和动作；θ_t表示当前时刻t的图神经网络强化学习算法参数；0≤γ≤1表示折扣因子，其反映未来Q值对当前动作的影响；

表示在目标图神经网络强化学习算法参数θ′_t下的状态-动作值。In the formula, x, a, x' and a' are the current state, action and state and action at the next moment, respectively; θ _t represents the graph neural network reinforcement learning algorithm parameter at the current moment t; 0≤γ≤1 represents the discount factor , which reflects the impact of the future Q value on the current action;

Represents the state-action value under the target graph neural network reinforcement learning algorithm parameter θ′ _t .

式中，θ_t表示当前时刻t的图神经网络强化学习算法参数；

表示对θ_t进行求导操作；α表示学习速率。In the formula, θ _t represents the parameters of the reinforcement learning algorithm of the graph neural network at the current time t;

represents the derivation operation on θ _t ; α represents the learning rate.

步骤73：每经过一定的步数根据当前图神经网络强化学习参数θ_t对目标图神经网络强化学习参数θ′_t进行更新。Step 73: Update the target graph neural network reinforcement learning parameter θ′ _t according to the current graph neural network reinforcement learning parameter θ _t every time a certain number of steps pass.

八、判断是否达到预定的时间T_end。若否，则执行(2)～(7)；若是，则输出图神经网络强化学习算法参数和相应输出结果。8. Determine whether the predetermined time T _end is reached. If not, execute (2) to (7); if so, output the parameters of the graph neural network reinforcement learning algorithm and the corresponding output results.

本发明一种基于图神经网络强化学习的电动汽车充电引导优化方法，基于图理论将电动汽车间的相互影响关系转换为一种动态网络图结构，提出一种基于注意力机制的图神经网络强化学习来处理不规则非欧式结构数据，以此研究多智能体间的沟通、协作，探讨电动汽车间的相互影响。在考虑可再生能源出力的主动配电网基础上，通过二阶锥优化及对偶优化理论对配电网最优潮流进行求解并得到配电网节点边际成本电价，以此研究电力-交通融合网协同优化。所提出的基于图神经网络强化学习的电动汽车充电引导优化方法能够有效的在考虑电力-交通融合网多种不确定性因素的情况下，能够有效地降低电动汽车充电总成本，实现电动汽车的有序充电以及电力系统协同优化调度。The invention is an electric vehicle charging guidance optimization method based on graph neural network reinforcement learning. Based on graph theory, the mutual influence relationship between electric vehicles is converted into a dynamic network graph structure, and a graph neural network enhancement based on attention mechanism is proposed. Learn to deal with irregular non-European structure data, in order to study the communication and cooperation between multi-agents, and explore the mutual influence between electric vehicles. Based on the active distribution network considering the output of renewable energy, the optimal power flow of the distribution network is solved through the second-order cone optimization and dual optimization theory, and the marginal cost price of the distribution network node is obtained, so as to study the power-transportation integrated network. Collaborative optimization. The proposed optimization method of electric vehicle charging guidance based on the reinforcement learning of graph neural network can effectively reduce the total cost of electric vehicle charging and realize the electric vehicle charging under the condition of considering various uncertain factors of the power-transport fusion network. Orderly charging and coordinated optimal scheduling of the power system.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。因此，本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present invention, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the patent of the present invention. It should be pointed out that for those skilled in the art, without departing from the concept of the present invention, several modifications and improvements can be made, which all belong to the protection scope of the present invention. Therefore, the protection scope of the patent of the present invention should be subject to the appended claims.

Claims

1. The electric vehicle charging guidance optimization method based on graph neural network reinforcement learning, is characterized in that, comprises the following steps:

Step S1: initialization of the collaborative optimization model of the power-transport fusion network;

Step S2: updating the electric vehicle charging load, and optimizing the marginal cost electricity price of the node where the electric vehicle charging station is located based on the second-order cone relaxation optimization and dual theory;

Step S3: according to the epsilon-Greedy algorithm and the graph neural network reinforcement learning algorithm, the electric vehicle charging guidance behavior strategy a _i,t is generated;

Step S4: executing the charging guidance behavior strategy a _i,t , and judging and updating the state of the electric vehicle;

Step S5: Calculate the reward function ri _,t of the graph neural network reinforcement learning algorithm according to the power-traffic fusion environment;

Step S6: update the state x _{i, t} of the partially observed Markov decision process;

Step S7: Store the information of the current step (x _i,t ,a _i,t ,r _i,t , _xi,t ') in the memory unit D, and perform reinforcement learning on the graph neural network based on the stochastic gradient descent method The algorithm weights are updated; among them, x _i,t , represents the current state of the reinforcement learning of the graph neural network; a _i,t represents the electric vehicle behavior strategy; ri _,t represents the reward function value of the reinforcement learning of the graph neural network; _xi,t 'represents the next state of the reinforcement learning of the graph neural network;

Step S8: determine whether the predetermined time T _end is reached; if not, execute (2) to (7); if so, output the parameters of the graph neural network reinforcement learning algorithm and the corresponding output result.

2. The electric vehicle charging guidance optimization method based on graph neural network reinforcement learning according to claim 1, characterized in that, initializing the power-transportation fusion network collaborative optimization model, comprising the following steps:

Step 21: Determine the topology and parameters of the power network and the transportation network, including power system nodes, lines, initial voltages, and optimized upper and lower limits, and the transportation network includes traffic nodes, road parameters, capacity, and maximum travel speed;

Step 22: Neural network parameter initialization, including neural network weight initialization and hyperparameter settings, such as learning rate α, discount factor γ, batch size B and memory unit D capacity;

Step 23: Consider each electric vehicle in the study area as an agent, and regard it as a node n∈N, and regard the connection between electric vehicles as an edge e∈E, so as to form a graph network structure G=( N, E), and initialize the current state x _{i, t} and the adjacency matrix A of each electric vehicle i.

3. The electric vehicle charging guidance optimization method based on graph neural network reinforcement learning according to claim 1, wherein the electric vehicle charging load is updated and the node where the electric vehicle charging station is located based on second-order cone relaxation optimization and dual theory The steps of optimizing the marginal cost of electricity price include:

Step 31: Update the charging load of electric vehicles: Calculate the charging load of each charging station according to the number of electric vehicles in the charging station and the charging power, obtain the charging load of each station and add the basic load of the node to obtain the final power consumption of the node load;

Step 32: Establish the optimal power flow model of the distribution network based on the branch power flow model:

minf(p,q,P,Q,V,I) (1)

s.t.

In the formula, _{EN and EL represent the distribution network nodes and line sets, respectively; P ij} _and Q _ij represent the branch active power and reactive power flowing from node i to node j; _P _jk represent the flow from node j to node k. branch active power;

and

and

and

represents the minimum and maximum voltage of node j;

Represents the maximum active power output of the fan connected to node j;

represents the power factor of the fan connected to node j;

The load of distribution network node j

including base load

and electric vehicle charging load

which is

According to the actual demand of the distribution network, the objective function minf(p,q,P,Q,V,I) can be finally defined as:

In the formula, a _i and b _i represent the secondary coal consumption and primary coal consumption coefficient of the generator, respectively;

Indicates the active power output injected into the generator of node i;

and

Step 33: Convert the above nonlinear distribution network optimal power flow model into a second-order cone relaxation programming model:

Since BFM-OPF is a nonlinear programming model, let the branch current amplitude

and the branch voltage amplitude

s.t.

where ||·|| ₂ represents the second-order cone operation; the above formula - constitutes the basic form of the optimal power flow of the distribution network after relaxation;

Step 34: Use the Gurobi solver to solve the original problem and dual variables of the above model, and obtain the marginal cost electricity price λ _k of the node where the charging station is located.

4. The electric vehicle charging guidance optimization method based on graph neural network reinforcement learning according to claim 3, is characterized in that, described epsilon-Greedy algorithm comprises the following steps:

Step 41: Generate a random number u, and judge the size of it and the decay factor ξ of the epsilon-Greedy algorithm;

Step 42: If u<ξ, generate a behavior a _i,t for each electric vehicle in the current state in a random manner, and this behavior represents the electric vehicle charging path strategy in the patent;

a _i,t = randint(N _action ) (19)

where N _action represents the number of EV behavior decisions;

Step 43: If u≥ξ, generate a behavior a _i,t for each electric vehicle i under the current state x _i,t and the adjacency matrix A according to the experience of the graph neural network reinforcement learning algorithm, that is,

In the formula, θ _t represents the parameters of the graph neural network reinforcement learning algorithm; argmax() represents the parameter operation corresponding to the maximum value; x _{i, t} represents the state of the i-th electric vehicle at time t, which is mainly determined by time t. The state x _i,t of the i-th electric vehicle is composed of the electric vehicle state EV _i,t , the neighboring traffic road information Ro _i,t , the neighboring electric vehicle state Ne _i,t and the information CS _t of each charging station, namely

x _i,t =[EV _i,t ,Ro _i,t ,Ne _i,t ,CS _t ] (21)

road number

Electric vehicle running speed v _i,t and remaining power SOC _i,t ; neighbor traffic road information state Ro _i,t includes the next node where electric vehicle i is located

The starting node of the next connected road

end node

road length

and the number of electric vehicles on the road

the road number it is on

Electric vehicle travel speed v _i,k,t and remaining power SOC _i,k,t ; charging station information CS _t includes charging electricity price p _{c, t} of each charging station and the number of electric vehicles

The neural network structure of the graph neural network reinforcement learning algorithm includes an input layer of one layer, and a fully connected layer of one layer performs feature extraction x _{i, t} ' on the input state x _{i, t} , and then the proposed feature x _i , t ' is extracted. _t ' and the adjacency matrix A are input into the two-layer graph neural network for feature extraction, and finally a fully connected layer is connected to output the electric vehicle charging path strategy a _{i, t} ; wherein, the graph neural network adopts is the graph attention network.

5. The electric vehicle charging guidance optimization method based on graph neural network reinforcement learning according to claim 2, is characterized in that, the reward function r _i,t of described graph neural network reinforcement learning algorithm is as shown in the formula:

In the formula, node _cur and node _tar represent the current node where the electric vehicle is located and any charging station node that the electric vehicle will go to, step represents the number of steps the electric vehicle has traveled; penalty represents a large penalty factor; w _i represents the i -th The unit time cost of an electric vehicle;

and

λ _{k, t} represent the marginal cost electricity price of the node where charging station k is located at time t; SOC _i,k,t represents the remaining power SOC _{i,k,t when the i-th electric vehicle reaches the charging station k at time t} ;

Indicates the rated capacity of the i-th electric vehicle battery;

It can be seen from the formula that the reward function ri _,t is a piecewise function; if the ith electric vehicle does not reach the charging station node _cur ≠node _tar and the current number of steps of the electric vehicle to the charging station is within the given maximum charging step step<N _step in the number, then its reward function ri _,t = 0; if the number of steps taken by the i-th electric vehicle to the charging station is greater than or equal to the given maximum number of charging steps step≥N _step , it indicates that this charging time If the behavior exploration fails, a large negative reward ri _,t =-penalty is given to it; if the ith electric vehicle reaches the charging station node _cur = node _tar and the current number of steps of the electric vehicle to the charging station is within the given value step<N _step within the maximum number of charging steps, at this time, the reward function is based on the driving time of the electric vehicle

and charging time

And the electricity charge when charging is calculated;

The transit time t _a,t of the i-th electric vehicle on the road section a is calculated according to the function of the Federal Highway Administration (BPR), that is,

In the formula, n _{a, t} represent the number of electric vehicles on road segment a at time t; c _a and

respectively represent the upper limit of the capacity of the road section a and the free passage time of the electric vehicle at time t; from this, the time required for the i-th electric vehicle to go to the charging station k can be obtained.

which is

In addition, the charging waiting time of the i-th electric vehicle

can be obtained by the formula;

In the formula, SOC _t represents the remaining power of the electric vehicle;

Represents the rated capacity of the electric vehicle battery; η represents the charging power factor, and ^Pcharging represents the rated power of the electric vehicle charging.

6. The electric vehicle charging guidance optimization method based on graph neural network reinforcement learning according to claim 1, wherein the method based on stochastic gradient descent updates the weight of the graph neural network reinforcement learning algorithm comprising:

Step 61: Randomly extract a certain number of samples from the memory unit D;

Step 62: Construct the loss function as shown in the formula, and update the weights of the graph neural network reinforcement learning algorithm according to the stochastic gradient descent method under the sampled sample as shown in the formula;

In the formula, x, a, x' and a' are the current state, action and state and action at the next moment, respectively; r represents the immediate reward of the reinforcement learning of the graph neural network; θ _t represents the reinforcement learning of the graph neural network at the current moment t Algorithm parameters; 0≤γ≤1 represents the discount factor, which reflects the influence of the future Q value on the current action;

In the formula, θ _t represents the parameters of the reinforcement learning algorithm of the graph neural network at the current time t;

Represents the derivation operation on θ _t ; α represents the learning rate;

Step 63: After a certain number of steps, update the target graph neural network reinforcement learning parameter θ′ _t according to the current graph neural network reinforcement learning parameter θ _t .