CN111049125B

CN111049125B - Electric vehicle intelligent access control method based on machine learning

Info

Publication number: CN111049125B
Application number: CN201910904347.1A
Authority: CN
Inventors: 唐子昱; 李紫昕; 方明星
Original assignee: Anhui Normal University
Current assignee: Anhui Normal University
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2021-07-30
Anticipated expiration: 2039-09-24
Also published as: CN111049125A

Abstract

The invention discloses an electric vehicle intelligent access control method based on machine learning, which comprises the following steps: 1, describing an access control process of a randomly arrived electric vehicle charging service request as an event-driven decision process; 2, taking the peak shaving electricity price of the power grid and the online service state of the charging pile as the joint state of the charging station service system; 3, taking the service request of the electric vehicle arriving at the charging station as an event, and when the event occurs, selecting whether the arriving electric vehicle is accessed to the charging station to provide the charging service as a system action according to the joint state of the charging station service system; and 4, performing online optimization on the intelligent access service system of the electric vehicle by adopting a Q learning machine learning algorithm. The intelligent access control method can effectively carry out intelligent access control on the electric vehicle to the charging station service system considering the peak regulation electricity price of the power grid, thereby improving the operation economy of the charging station and being adaptive to the peak regulation requirement of the power grid.

Description

Electric vehicle intelligent access control method based on machine learning

Technical Field

The invention belongs to the technical field of intelligent control and optimization, and particularly relates to an electric vehicle intelligent access control method based on machine learning.

Background

At present, China is the largest automobile consumption market in the world, automobile manufacturers have shifted research, development and production emphasis from automobiles powered by traditional energy sources to new energy automobiles, wherein electric automobiles are the mainstream of new energy automobile development in a long period of time, have huge consumption potential, and the market share is getting larger and larger. The charging pile is an important infrastructure for providing charging service for the electric automobile and is also an important link in the industrialization and commercialization processes of the electric automobile. With the rapid development of the electric automobile industry and the great improvement of the market reserves of electric automobiles, a charging station for performing centralized management and operation on a plurality of charging piles is an important business mode and service form in the future. In addition, the new energy permeability of wind power, photovoltaic and the like is improved, the intelligence and the adaptability of power production and service in the future are improved, and effective management and guidance of power utilization of power consumers are a trend. For example, each level of scheduling center can make an electric power peak shaving plan according to the source charge prediction data and issue the electric power peak shaving plan through real-time electricity price, so that electric power users such as electric vehicle charging stations are guided to reasonably use the electricity, and the automatic peak shaving and valley filling or peak shifting and valley filling at the user side is promoted.

The existing power grid electricity price adopts a very simple and fixed time-of-use electricity price mechanism, a power grid peak regulation electricity price plan is not dynamically formulated or adjusted according to the actual source charge prediction condition of a power grid, and a charging station service system does not dynamically and adaptively perform adaptive access control on a charging request of an electric vehicle according to the actual power grid peak regulation requirement. Therefore, under a real-time power grid peak regulation electricity price mechanism, how to carry out self-adaptive response on a charging service request of a randomly coming electric vehicle according to the real-time peak regulation electricity price of a power grid and the online service states of all charging piles in the station by the intelligent access service system of the electric vehicle of the charging station is to control whether the charging service request is accessed to the service, so that the operation economy of the charging station is improved, and the self-adaptive power grid peak regulation requirement is to be researched and solved.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides an electric vehicle intelligent access control method based on machine learning so as to carry out effective online optimization control on a charging station service system in which an electric vehicle service request arrives at random, thereby improving the operation economy of the charging station and adapting to the peak regulation requirement of a power grid.

The invention adopts the following technical scheme for solving the technical problems:

the invention relates to an electric vehicle intelligent access control method based on machine learning, which is characterized in that the method is applied to a charging station service system which is provided with J charging piles and provides paid charging service for M electric vehicles which arrive randomly, each charging pile can meet the charging power requirements of the M electric vehicles, and one charging pile only provides charging service for one electric vehicle at a time;

recording the J charging piles as CS respectively₁,CS₂,…,CS_j,…,CS_JAnd the charging power requirement of the M electric vehicles is recorded as P₁,P₂,…,P_m,…,P_MWherein CS is_jIndicates the jth charging pile, P_mRepresenting the charging power requirement of the mth electric vehicle;

let K be the maximum number of cycles per day and the corresponding total duration be T, and record the peak shaving electricity price of the power grid at any time T under the total duration T as PR_tThen PR_t∈Φ_PR(ii) a Suppose that the peak regulation electricity price of the power grid is issued according to the dispatching instruction period and the instruction tau_kPeak-shaving power rate PR for the kth_kWhen the electricity price is issued, the electricity price sequence of peak regulation is recorded as { (tau)_k,PR_k)|k＝0,1,2,…,K-1,τ₀0} where, PR_k∈Φ_PR，Φ_PRIs a limited electricity price state space;

charging service price fixing PR of charging station service system_ev；

Suppose that there is m at time t_tThe electric vehicle randomly arrives at the electric station to apply for charging service, order the mth_tThe current state of charge of the battery of the electric vehicle is

Then the m < th > is_tThe arrival event of the electric vehicle is recorded as

Recording the joint state of the J charging piles at the moment t as C_t＝(CS₁(t),CS₂(t),…,CS_j(t),…,CS_J(t)), wherein

Representing the service state of the jth charging pile; m is_j(t) indicates the jth charging pile CS at time t_jClass of electric vehicle being serviced, if m_j(t) ═ 0 denotes the jth charging pile CS at time t_jNo vehicle access, if m_j(t) is equal to {1,2, …, M } and represents the jth charging pile CS at the time t_jCharging one electric vehicle in {1,2, …, M };

represents the jth charging pile CS at the time t_jM < th > of being served_j(t) the current state of charge of the batteries of the electric vehicles;

m at t_tArrival event of electric vehicle

The charging station service system status at the time of occurrence is recorded as s_t＝{C_t,PR_tWill arrive at the event

Taking the occurrence time t as a decision time, and recording the event expansion state of the decision time as

Recording whether the charging station service system is connected to the electric vehicle or not and provides charging service as action a and recording the nth decision time T_nIs a_nAnd a is a_nE.g., {0,1}, where "0" denotes denial of service, "1" denotes access to service, and D denotes a set of actions;

the intelligent access control method of the electric vehicle comprises the following steps:

step 1, defining and initializing nth decision time T_nHas a movement search rate of ∈_nAnd let 0 < epsilon_n＜1；

Defining elements in a Q value table as discretized event expansion state-action pair learning values, and initializing the elements in the Q value table;

defining a current greedy control strategy table v as an action set formed by the maximum discretization event expansion state of each row in the Q value table and actions corresponding to action pair learning values;

step 2, initializing t to 0, and initializing n to 1; the current action exploration rate epsilon_nAssign to epsilon₁(ii) a Assigning the current greedy control strategy table v to the original strategy table v₀；

Step 3, at the nth decision time T of the charging station service system_nArrival event

Occurrence and observation of the current united state s of the charging station service system_tEvent extended State

Let the nth decision time T_nEvent extended state of

The corresponding discretization state in the Q-value table is recorded as

Let the nth decision time T_nEvent extended state of

The action actually taken is recorded

At the nth decision time T_nIf all charging piles are in service, i.e. { m_j(t) ∈ {1,2, …, M } | J ═ 1,2, … J }, then let us

Otherwise, the current event is in an extended state

Then, extracting from the Q value table

Corresponding discretized state

Greedy action in the lower case

And with a probability of 1-epsilon_nWill be provided with

Is assigned to

At the exploration rate ε_nGreedy removing actions in the action set D

Another action other than the search action is recorded as a search action

And assign a value to

The charging station service system takes action

Thereafter, the decision time T from the nth decision time is observed_nTransition to decision time T of n +1_n+1Or system transfer sample track to time T

Wherein T is T_n，t′＝T_n+1T or T ═ T; when T ═ T, let

Step 4, observing and calculating the nth decision time T of the charging station service system_nCurrent state of

Take action

Transition to decision time T of n +1_n+1Or to a state at a time T

Charging reward obtained during the state transition of

And 5, updating the Q value table by using a difference formula and a Q value updating formula shown in the formulas (1) and (2)

Corresponding discretized states

Take action

Discretized event extended state-action pair learned values of

And reassign to

In the formula (1), the reaction mixture is,

indicating a transition to the n +1 th decision time T_n+1Or to a state at a time T

Corresponding discretized states

Next if the discretized event of action a is taken, extend state-action pair learned value;

in equation (2), the operator ": "means that the value of the right formula is calculated first and then given to the left variable;

for the nth decision time T_nDiscretized state of

Take action

The learning step length of (1);

step 6, selecting the action corresponding to the maximum discretization event expansion state-action pair learning value of each row in the updated Q value table and forming a current action set, and assigning the current action set to a current greedy control strategy v by taking the current action set as an updated greedy control strategy table; and for the exploration rate epsilon_nPerforming a decay operation to obtain an updated exploration rate and assigning ε_n+1；

7, if T' is less than T, assigning n +1 to n, and returning to the step 3; otherwise, T' is represented as T, and step 8 is performed;

step 8, judging whether the control strategy table v is equal to v₀And if the current charging service requests are equal to the M charging service requests, stopping updating and performing access control on the M charging service requests of the electric vehicles by using the current control strategy table v, otherwise, returning to the step 2 for execution.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a method for controlling a power grid peak shaving system, which takes a randomly arrived electric vehicle charging service request as an event, a system decides whether the arrived electric vehicle is accessed to a charging station to provide charging service when the event occurs according to an event expansion state formed by event occurrence time, a real-time state of a charging pile in the system, the current power grid peak shaving electricity price, the type of the arrived electric vehicle and an SOC state value of the electric vehicle, and takes the event occurrence time and the current power grid peak shaving electricity price as a part of the event expansion state, thereby being beneficial to reflecting the time sequence characteristic of the power grid peak shaving, leading the control strategy to be adaptive to the power grid peak shaving requirement, being more accordant with the actual situation and improving the feasibility of the method.

2. The method takes the peak-shaving electricity price of the power grid and the online service state of the charging pile as the joint state of the charging station service system; taking a randomly arrived electric vehicle charging service request as an event; combining the randomly generated events with the combined state of the charging station service system to form an event extended state; taking whether the arriving electric vehicle is accessed to a charging station to provide charging service as a system action; taking the moment when the electric vehicle charging service request randomly arrives as decision-making moment; the intelligent access control process of the electric vehicle at the charging station, in which the electric vehicle arrives randomly, is described as a discrete event-driven decision-making process, and corresponding action is taken according to the real-time event expansion state of the system; therefore, the problem of access control of electric vehicles at the charging station, which is caused by random arrival of electric vehicle service requests, is effectively solved, the system can reasonably select access actions through optimization, the operation economy of the charging station service system is improved, and the peak regulation requirement of a power grid can be self-adapted;

3. compared with a theoretical solving method, the intelligent charging station electric vehicle access control method does not need to carry out complete mathematical modeling on a control system, and particularly does not need to carry out accurate modeling on random characteristics in the system. According to the invention, a better control strategy can be obtained only by performing real-time online learning through the operation sample of the observation system. In addition, when the random parameters of the system change, an operator does not need to modify the algorithm, online learning can still be carried out according to the running process of the actual system, and a better intelligent access control strategy of the electric vehicle can be obtained in a self-adaptive manner;

4. the intelligent access control method for the electric vehicle is also suitable for different charging price time-interval situations and power grid peak regulation and non-periodic issuing situations.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a schematic diagram of a charging station service system according to the present invention.

Detailed Description

In this embodiment, as shown in fig. 2, an electric vehicle intelligent access control method based on machine learning is applied to a charging station service system composed of J charging piles 1, M electric vehicles 2 arriving at random, a power grid peak shaving electricity price plan 3, and an access control center 4, and each charging pile can adaptively meet the charging power requirements of the M electric vehicles;

recording the jth charging pile as CS_jAnd only one electric vehicle is provided with charging service at a time; thereby recording J charging piles as CS₁,CS₂,…,CS_j,…,CS_J，j＝1,2,…,J；

Recording the charging power requirement of the mth electric vehicle as P_mKW, and total battery capacity of E_mKWH, determined by the configuration of the electric vehicle itself; thereby recording the charging power requirement of M electric vehicles as P₁,P₂,…,P_m,…,P_M，m＝1,2,…,M；

Let K be the maximum number of cycles per day and the corresponding total duration be T, and record the peak-shaving electricity price state of the power grid at any time T under the total duration T as PR_tYuan/kilowatt-hour, and PR_t∈Φ_PR，Φ_PRIs a limited electricity price state space; suppose that the peak regulation electricity price of the power grid is issued according to the dispatching instruction period and tau_kPeak-shaving power rate PR for the kth_kThe time of delivery, the price is maintained to the next peak-shaving electricity price delivery time tau_k+1To that end, i.e. PR_t＝PR_k，τ_k≤t＜τ_k+1K is 0,1,2, …, K-1 and τ₀0; the sequence of peak-shaving electrovalence is recorded as { (τ)_k,PR_k)|k＝0,1,2,…,K-1,τ₀＝0}；

The charging station provides paid charging service, and the charging service price of the charging station is PR_evYuan/kilowatt-hour;

setting the battery SOC at the time t as

M of_tThe electric vehicle randomly arrives at the electric station to apply for charging service and is recorded as an arrival event

Recording the service state of the jth charging pile as

Thereby recording the joint state of J charging piles at the moment t as C_t＝(CS₁(t),CS₂(t),…,CS_j(t),…,CS_J(t)); when t is assumed to be 0, all charging piles are empty; m is_j(t) indicates the jth charging pile CS at time t_jClass of electric vehicle being serviced, if m_j(t) ═ 0 denotes the jth charging pile CS at time t_jNo vehicle access, if m_j(t) is equal to {1,2, …, M } and represents the jth charging pile CS at the time t_jCharging one electric vehicle in {1,2, …, M };

represents the jth charging pile CS at the time t_jM < th > of being served_j(t) battery SOC of electric vehicles;

will arrive at an event

The charging station service system status at the time of occurrence is recorded as s_t＝{C_t,PR_tThe extension state of the note piece is

Note the nth event

The moment of occurrence is decision moment T_nI.e. T ═ T_nCorresponding grid price peak-to-peak hours

Namely, it is

Let τ be_K＝T；

Discretizing the variation interval of the SOC of the electric vehicle battery by a small constant delta [ 01 ]]Then obtain

Corresponding discretized event extended state

Wherein the subscript "n" indicates the corresponding nth decision time T_nA numerical or discretized value of; m is_nIt is shown

Is C_tThe corresponding discretized charging pile is in a combined state,

is CS_j(t) corresponding discretization states, and

phi is a state space formed by all possible discretization event expansion states, and the total discretization event expansion state number of the system is recorded as S;

defining the system decision time as the arrival time of any electric vehicle, namely the event occurrence time;

taking whether the charging station service system is connected to the electric vehicle or not and providing the charging service as a control action a, and recording the nth decision time T_nIs a_nAnd a is a_nE.g., {0,1}, where "0" denotes denial of service, "1" denotes access to service, and D denotes a set of actions; at any decision time T_nIf m is_j(t) ≠ 0, J ≠ 1,2, …, J, which means that all charging posts are busy, then a_n≡0；

Encoding all possible discretization event extension states to order

Represents the state of the spreading of the s-th discretization event, and

coding the spreading state of all possible discretization events with busy charging piles into the last discretization event and recording the state number as S_b；

At the nth decision time T_nIf a_nIf the number of the electric vehicles reaches 1, the electric vehicles are connected to any idle charging pile and immediately charged; leaving the charging station immediately assuming one electric vehicle is full;

as shown in fig. 1, the intelligent access control method for the electric vehicle based on machine learning is performed according to the following steps:

step 1, defining and initializing nth decision time T_nHas a movement search rate of ∈_nAnd let 0 < epsilon_n< 1, e.g. let ε_n＝0.8；

Defining elements in Q-value table as being offDiversifying event spread state-action pair learning values and initializing elements in the Q-value table, e.g., randomly initializing the value of each element or making it 0; the Q value table takes the discretization event expansion state of the system as the row of the Q value table and the access action of the system as the column of the Q value table, namely

Wherein Q value table last S_bThe action corresponding to the row is fixed to '0';

defining a current greedy control strategy table v as an action set formed by the maximum discretization event expansion state of each row in a Q value table and actions corresponding to action pair learning values;

step 2, initializing variables t ═ 0 and n ═ 1; the current action exploration rate epsilon_nAssign to epsilon₁(ii) a Let original policy table v₀＝v；

Occurrence, observation of the current federated state s of the service system_tThe extension state of the note piece is

Let the nth decision time T_nThe current event extended state of

The corresponding discretization state in the Q-value table is recorded as

Let the nth decision time T_nCurrent event extended state of

The action actually taken is recorded

Otherwise, the current event is in an extended state

Next, it is extracted from the Q value table

Corresponding state

Greedy action in the lower case

And with a probability of 1-epsilon_nWill be provided with

Is assigned to

And at a search rate of ∈_nGreedy removal actions in action set D

Another action than the search action

Is assigned to

Charging station service system takes action

Thereafter, the decision time T from the nth decision time is observed_nTransition to decision time T of n +1_n+1Or to the transfer sample track at time T

Wherein T is T_n,t′＝T_n+1T or T ═ T; when T ═ T, it is assumed that

Step 4, calculating the nth decision time T of the charging station service system by using the formula (1)_nCurrent state of

Take action

Then, the decision time T is shifted to the n +1 th decision time T_n+1Or to a state at a time T

Accumulated reward generated during the state transition of (1)

In formula (1), m is defined_jSgn (m) when (t) is 0_j(t))＝0，m_jSgn (m) with (t) > 0_j(t)) ═ 1; let T ═ min { T }_n+1,T}；

Denotes the m-th_j(t) charging power requirements of electric vehicles;

and 5, updating by using the difference formula and the Q value shown in the formula (2) and the formula (3)Formula, update in Q value table

Corresponding discretized states

Take action

Discretized event extended state-action pair learned values of

Obtaining updated learning value and assigning to

In the formula (2), the reaction mixture is,

indicating a transition to the n +1 th decision time T_nOr to a state at a time T

Corresponding discretized states

Discretized event expansion state-action pair learned value of action a is taken;

in equation (3), the operator ": "means that the value of the right formula is calculated first and then given to the left variable;

expanding state for current discretization event at nth decision time

Take action

The learning step length of (1);

step 6, selecting the action corresponding to the maximum discretization event expansion state-action pair learning value of each row in the updated Q value table and forming a current action set, and taking the current action set as an updated greedy control strategy table and assigning the updated greedy control strategy table to a current greedy control strategy v; and for the exploration rate epsilon_nPerforming a decay operation to obtain an updated exploration rate and assigning ε_n+1；

And 7, if T' ═ T_n+1If the value is less than T, assigning n +1 to n, and returning to the step 3; otherwise, indicating that T ═ T, step 8 is performed;

step 8, judging whether the control strategy table v is equal to v₀And if the request is equal to the request, stopping updating and performing access control on the random charging service requests of the M types of electric vehicles by using the final control strategy table, otherwise, returning to the step 2 for execution.

Claims

1. An electric vehicle intelligent access control method based on machine learning is characterized in that the method is applied to a charging station service system which is provided with J charging piles and provides paid charging service for M electric vehicles which arrive randomly, each charging pile can meet the charging power requirements of the M electric vehicles, and one charging pile only provides charging service for one electric vehicle at a time;

let K be the maximum number of cycles per day and the corresponding total duration be T, willThe peak regulation electricity price of the power grid at any time T under the total time length T is recorded as PR_tThen PR_t∈Φ_PR(ii) a Suppose that the peak regulation electricity price of the power grid is issued according to the dispatching instruction period and the instruction tau_kPeak-shaving power rate PR for the kth_kWhen the electricity price is issued, the electricity price sequence of peak regulation is recorded as { (tau)_k,PR_k)|k＝0,1,2,…,K-1,τ₀0} where, PR_k∈Φ_PR，Φ_PRIs a limited electricity price state space;

charging service price fixing PR of charging station service system_ev；

Then the m < th > is_tThe arrival event of the electric vehicle is recorded as

m at t_tArrival event of electric vehicle

Defining elements in a Q value table as discretization event expansion states-action pair learning values, and initializing the elements in the Q value table, namely randomly initializing the value of each element or making the value of each element be 0; the Q value table takes the discretization extended event state of the system as the row of the Q value table and the access action of the system as the column of the Q value table, namely

step 2, initialChanging t to 0 and n to 1; the current action exploration rate epsilon_nAssign to epsilon₁(ii) a Assigning the current greedy control strategy table v to the original strategy table v₀；

Let the nth decision time T_nEvent extended state of

The corresponding discretization state in the Q-value table is recorded as

Let the nth decision time T_nEvent extended state of

The action actually taken is recorded

Otherwise, the current event is in an extended state

Then, extracting from the Q value table

Corresponding discretized state

Greedy action in the lower case

And with a probability of 1-epsilon_nWill be provided with

Is assigned to

At the exploration rate ε_nGreedy removing actions in the action set D

Another action other than the search action is recorded as a search action

And assign a value to

The charging station service system takes action

Wherein T is T_n，t′＝T_n+1T or T ═ T; when T ═ T, let

Take action

Transition to decision time T of n +1_n+1Or to a state at a time T

Charging reward obtained during the state transition of

Corresponding discretized states

Take action

Discretized event extended state-action pair learned values of

And reassign to

In the formula (1), the reaction mixture is,

Corresponding discretized states

for the nth decision time T_nDiscretized state of

Take action

The learning step length of (1);

step 8, judging whether the control strategy table v is equal to v₀And if the current charging service requests are equal to the M electric vehicles, stopping updating and performing access control on the random charging service requests of the M electric vehicles by using the current control strategy table v, otherwise, returning to the step 2 for execution.