CN116703062A

CN116703062A - Ordered charging method for electric automobile based on depth deterministic strategy gradient algorithm

Info

Publication number: CN116703062A
Application number: CN202310522988.7A
Authority: CN
Inventors: 韩帅; 肖静; 吴宁; 陈卫东; 郭敏; 吴晓锐; 龚文兰; 卢健斌; 姚知洋; 莫宇鸿; 郭小璇; 孙乐平; 赵立夏
Original assignee: Electric Power Research Institute of Guangxi Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Guangxi Power Grid Co Ltd
Priority date: 2023-05-10
Filing date: 2023-05-10
Publication date: 2023-09-05

Abstract

The application discloses an ordered charging method of an electric vehicle based on a depth deterministic strategy gradient algorithm, which considers data and time-of-use electricity price signals fed back in real time by a charging monitoring system, considers uncertainty of an electric vehicle travel mode and charging requirements, and optimizes charging behaviors of the electric vehicle from a load aggregator level. By modeling the charging process of a single electric automobile, solving the optimal scheduling model based on a depth deterministic strategy gradient algorithm (DDPG) to accurately and rapidly acquire an optimal charging plan so as to achieve the aims of optimizing operation in a charging station and effectively reducing daily operation cost in the station. The battery of the electric automobile is protected, and meanwhile, the charging requirement of a user of the electric automobile is met.

Description

Ordered charging method for electric automobile based on depth deterministic strategy gradient algorithm

Technical Field

The application relates to the technical field of electric automobiles, in particular to an ordered charging method for an electric automobile based on a depth deterministic strategy gradient algorithm.

Background

Electric vehicles have been rapidly developed in recent years. As a novel load, the electric automobile has the characteristics of randomness and flexibility. The charging time of the electric automobile is extremely overlapped with the working and living rest time of people, the charging load extremely probability of the electric automobile can be overlapped with the basic operation load of the existing power grid to form an overload condition of peak-to-peak addition, the power grid load is further increased, the current rising rate of each node in the power grid is increased suddenly, the operation difficulty of a heavy-load line and a heavy-load transformer substation is increased, the power grid loss is increased, the aging of power grid operation, distribution and power transmission equipment is accelerated, and the safety operation of a power system and the power consumption experience of power consumers are greatly negatively influenced. Meanwhile, if the maximum load at the moment of electricity consumption peak is met, higher requirements are put on capacity increase of the power grid, and economy and development of power grid construction are not facilitated.

For the problem of peak-valley difference increase of a load curve caused by the electric automobile connected to a power grid, coordination of charging by a user in an orderly guiding manner is an important point of current research, and economic guiding by using peak-valley electricity prices is an important adoptable direction. The intelligent power grid is used as an important development direction of the power grid construction of the current power system, the ordered guiding strategy of the peak-valley electricity price to the charging users is applied by analyzing, researching and judging the three-party data information of the power grid, the charging equipment and the charging automobile users, so that the aim of peak clipping and valley filling of the power grid is fulfilled, the running loss of the power grid is further reduced, the running stability and safety of the power system are improved, the three-party benefit is maximized, the electric automobile is promoted to be popularized and applied in a larger scale finally, and the clean conversion of the power source demand side is promoted. However, current charging station ordered charging plans that do not take into account uncertainty in time-of-use electricity prices and electric vehicle user behavior

Disclosure of Invention

The embodiment of the application provides an electric vehicle ordered charging method based on a depth deterministic strategy gradient algorithm, which at least solves the problem of how to formulate an ordered charging plan of a charging station while considering uncertainty of time-of-use electricity prices and electric vehicle user behaviors.

According to an aspect of the embodiment of the application, there is provided an ordered charging method for an electric vehicle based on a depth deterministic strategy gradient algorithm, including:

from the aspect of load aggregation, comprehensively considering the charging demands of community electric automobile users, adjusting the charging behaviors of electric automobiles in a charging station, integrating SOC information fed back by a charging monitoring system and predicted vehicle taking time information of the users, and establishing an ordered charging optimization model of a community electric automobile cluster;

and solving the ordered charging optimization model of the community electric automobile cluster by adopting a depth deterministic strategy gradient algorithm to obtain an optimal charging plan, wherein the optimal charging plan achieves the aims of optimizing operation in a charging station and effectively reducing daily operation cost in the station.

Optionally, the optimization objective of the ordered charging optimization model of the community electric automobile cluster is as follows:

wherein P is _n,t The charging power of the nth electric automobile in the t period; ρ _t Time-sharing electricity price for t period; n (N) _t The total number of electric vehicles connected with a power grid in the charging station at the t period; t is t _n,lea And t _n,arr The time when the nth electric automobile arrives at the charging station and leaves the charging station is respectively; f is the total cost of the electric charge of the electric automobile cluster in all time periods.

Optionally, the constraint of the ordered charging optimization model of the community electric automobile cluster includes: the method comprises the following steps of constraint of the state of charge of the electric automobile, constraint of charging expectancy of a user, constraint of operation of a charging pile of the electric automobile and constraint of charging time of the electric automobile.

Optionally, solving the ordered charge optimization model of the community electric automobile cluster by adopting a depth deterministic strategy gradient algorithm comprises:

approximating the strategy function and the action value function, respectively, by using a deep neural network, i.e. determining the parameter θ of the value network _Q And parameters θ of the policy network _μ ；

Adding a Target network with the same structure as the strategy network and the value network to improve the performance of the depth deterministic strategy gradient algorithm, and obtaining optimized Target network parameters of theta respectively _Q' And theta _μ' ；

And training the target network and the value network through the optimized parameters, so that the ordered charging optimization model of the community electric automobile cluster outputs an optimal strategy.

Optionally, the parameter θ of the value network _Q Updating by minimizing the loss function L _Q To realize:

L _Q ＝E((y _t -Q(s _t ,a _t |θ _Q )) ² )

wherein Q(s) _t ,a _t |θ _Q ) For output of the value network, i.e. t-period in state S _t And performs action a _t Expected return on time; y is _t Is the target Q value;

y _t ＝r _t +γQ'(s _t+1 ,u'(s _t+1 |θ _μ' )|θ _Q' )

wherein r is _t A prize value for the period t; q 'and u' are the target value network and the target policy network, respectively.

Alternatively, r _t Expressed as a negative value of the jackpot obtained from the reinforcement training:

γ _t ＝-J＝-ω ₁ J ₁ -ω ₂ J ₂ -ω ₃ J ₃ ······

in J, J ₁ 、J ₂ 、J ₃ Rewards, omega obtained for each training respectively ₁ 、ω ₂ 、ω ₃ The weight values of rewards obtained for each training are respectively obtained.

Optionally, the parameter θ of the policy network _μ By minimizing the loss function L _μ To realize:

L _μ ＝-E(Q(s _t ,u(s _t )))

wherein Q(s) _t ,u(s _t ) ) is the output of the policy network, i.e. t period is in state s _t The value of the corresponding action-state value function, i.e., the Q value;

target network parameter θ _Q' And theta _μ' The updating mode of (a) is as follows:

θ _μ' ←τθ _μ +(1-τ)θ _μ'

θ _Q' ←τθ _Q +(1-τ)θ _Q'

where τ is a soft update rate factor, and when τ is greater, the value network parameter θ _Q And parameters θ of the policy network _μ To corresponding target network parameter theta _Q' And theta _μ' The faster the transfer speed of (2)。

According to another aspect of the embodiment of the present application, there is also provided an ordered charging device for an electric vehicle based on a depth deterministic strategy gradient algorithm, including:

the system comprises a charge monitoring system, a charging optimization model establishing module, a charging optimization model determining module and a charging optimization model determining module, wherein the charge optimization model establishing module is used for comprehensively considering the charge demands of community electric automobile users from the perspective of a load aggregator, adjusting the charging behaviors of the electric automobiles in a charging station, integrating SOC information fed back by the charge monitoring system and predicted vehicle taking time information of the users, and establishing an ordered charge optimization model of the community electric automobile clusters;

and the model solving module is used for solving the ordered charging optimization model of the community electric automobile cluster by adopting a depth deterministic strategy gradient algorithm to obtain an optimal charging plan, and the optimal charging plan achieves the aims of optimizing operation in a charging station and effectively reducing daily operation cost in the station.

According to another aspect of the embodiment of the present application, there is further provided a computer readable storage medium, where the computer readable storage medium includes a stored program, and when the program runs, the device where the computer readable storage medium is controlled to execute the method for orderly charging an electric automobile based on the depth deterministic strategy gradient algorithm according to any one of the above embodiments.

According to another aspect of the embodiment of the present application, there is further provided a processor, configured to execute a program, where the program executes any one of the above-described methods for orderly charging of electric vehicles based on a depth deterministic strategy gradient algorithm.

Compared with the prior art, the application has the following beneficial effects:

in the embodiment of the application, the real-time feedback data and the time-of-use electricity price signal of the charging monitoring system are considered, the uncertainty of the travel mode and the charging requirement of the electric automobile is considered, and the charging behavior of the electric automobile is optimized from the load aggregator level. By modeling the charging process of a single electric automobile, solving the optimal scheduling model based on a depth deterministic strategy gradient algorithm (DDPG) to accurately and rapidly acquire an optimal charging plan so as to achieve the aims of optimizing operation in a charging station and effectively reducing daily operation cost in the station. The battery of the electric automobile is protected, and meanwhile, the charging requirement of a user of the electric automobile is met.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawing in the description below is only one embodiment of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an ordered charging method for an electric vehicle based on a depth deterministic strategy gradient algorithm according to an embodiment of the present application;

fig. 2 is a schematic diagram of a relationship between a time-of-use electricity price, an electric vehicle user and an electric vehicle charging pile according to an embodiment of the present application;

fig. 3 is a schematic diagram of an optimization flow of a DDPG to ordered charge optimization model of a community electric vehicle cluster according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a time sampling scenario according to an embodiment of the present application;

fig. 5 is a schematic diagram of a DDPG algorithm training scenario according to an embodiment of the present application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.

Example 1

According to an embodiment of the present application, there is provided an embodiment of an electric vehicle ordered charging method based on a depth deterministic strategy gradient algorithm, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.

Fig. 1 is a flowchart of an ordered charging method for an electric vehicle based on a depth deterministic strategy gradient algorithm according to an embodiment of the present application, as shown in fig. 1, the method includes the following steps:

step S1, aiming at the ordered charging problem of large-scale electric vehicles in a community charging station, comprehensively considering the charging demands of community electric vehicle users from the perspective of a load aggregator, adjusting the charging behavior of the electric vehicles in the charging station, integrating SOC information fed back by a charging monitoring system and predicted vehicle taking time information of the users, and establishing an ordered charging optimization model of a community electric vehicle cluster;

and S2, solving an ordered charging optimization model of the community electric vehicle cluster by adopting a depth deterministic strategy gradient algorithm (DDPG) to obtain an optimal charging plan so as to achieve the aims of optimizing operation in a charging station and effectively reducing daily operation cost in the station.

The application considers the real-time feedback data and the time-sharing electricity price signal of the charging monitoring system, considers the uncertainty of the travel mode and the charging requirement of the electric automobile, and optimizes the charging behavior of the electric automobile from the load aggregator level. By modeling the charging process of a single electric automobile, solving the optimal scheduling model based on a depth deterministic strategy gradient algorithm (DDPG) to accurately and rapidly acquire an optimal charging plan so as to achieve the aims of optimizing operation in a charging station and effectively reducing daily operation cost in the station. The battery of the electric automobile is protected, and meanwhile, the charging requirement of a user of the electric automobile is met.

As an alternative embodiment, as shown in fig. 2, as an intermediate link between the power grid and the user, the benefits of the load aggregator are mainly derived from the difference between the charge management service charge charged to the electric vehicle user and the electricity consumption overhead purchased from the power grid. When the charge management service charge is calculated, the charge behavior of the electric automobile cluster is optimized by responding to the time-of-use electricity price, the expense of the electric quantity purchased by the power grid is reduced, and the load aggregator can obtain larger profit space. Therefore, the optimization objective of the ordered charging optimization model of the community electric automobile cluster is as follows:

As an optional embodiment, the constraint of the ordered charging optimization model of the community electric automobile cluster includes: state of charge (SOC) constraints of the electric vehicle, charging expectancy constraints of a user, electric vehicle charging pile operation constraints and electric vehicle charging time constraints. Specific:

1) In period t, the state of charge constraint of the electric vehicle can be expressed as:

in the method, in the process of the application,the SOC of the nth electric automobile in the t period is set; />Is->Upper and lower limit values of (2); q (Q) _n The battery capacity of the nth electric automobile; />Charging power P for nth electric automobile in t period _n,t Corresponding charging efficiency; Δt is the time gap length.

2) Because for continuous adjustable electric automobile of power fills electric pile, electric automobile fills electric pile's average power that chargesAnd charging power P _n,t Has stronger correlation, and average charging power is +.>And charging power P _n,t The approximate expression of the relationship is:

in order to meet the travel demands of users, the situations of overcharging and undercharging of the electric automobile are reasonably avoided, and when the users get away, the SOC of the battery of the electric automobile is in a section expected by the users, so that the charging expected constraint of the users is as follows:

in the method, in the process of the application,the SOC is the SOC expected by a user when the electric automobile leaves; epsilon is the allowable difference between the SOC when the electric automobile leaves and the expected SOC, and t is the current moment.

3) Considering the safe and stable operation of the electric automobile charging pile, the charging power of the electric automobile has constraint (namely, the operation constraint of the electric automobile charging pile) requirements:

0≤P _n,t ≤P _max

wherein P is _max And the upper limit of the charging power of the electric automobile charging pile is set.

4) Because the time period that the electric automobile is connected into the power grid through the charging pile is the time range that the electric power system can be scheduled at will, the electric automobile charging time t constraint is:

t _n,arr ≤t≤t _n,lea

wherein t is _n,arr And t _n,lea The time when the nth electric vehicle arrives at the charging station and leaves the charging station is respectively.

As an alternative embodiment, the reinforcement learning process is described by a markov decision process (Markov Decision Process, MDP), generally represented by a five-tuple (S, a, P, R, γ), where S characterizes the state set, a characterizes the action set, P characterizes the transition probability, R characterizes the reward function and γ characterizes the discount factor;

the selection of the state space S should contain all information of the environment, whileFailure to redundancy, if too many factors are added to the state space, can result in a model that is too complex to train. For this reason, the arrival time t of the electric vehicle will be taken into account in conjunction with the problems studied here _n,arr Leaving time t of electric automobile _n,lea State of charge of electric vehicleAnd the current period t joins the state space. Thus, t period state S _t Can be expressed as +.>

The action space A is the decision quantity of the model, and is the next action obtained by the intelligent agent according to the state at the current moment. In the study herein, since the operation is the charge/discharge power of the electric vehicle, the operation a is performed at time t _t Can be expressed as (P) _n,t )。

As an alternative embodiment, as shown in fig. 3, step S2 of solving the ordered charge optimization model of the community electric automobile cluster by using a depth deterministic strategy gradient algorithm includes:

step S21, approximating the strategy function and the action value function by using the deep neural network respectively, namely determining the parameter theta of the value network _Q And parameters θ of the policy network _μ The method comprises the steps of carrying out a first treatment on the surface of the The neural network comprises a value network and a strategy network, wherein the value network comprises a target value network and an updated value network, and the strategy networks are the same;

step S22, adding a Target network with the same structure as the strategy network (Actor network) and the value network (Critic network) to improve the performance of the deep deterministic strategy gradient algorithm (i.e. optimize θ) _Q And theta _μ ) The optimized target network parameters are respectively theta _Q' And theta _μ' ；

And S23, training a target network and a value network through the optimized parameters, so that the ordered charging optimization model of the community electric automobile cluster outputs an optimal strategy. The output of the ordered charging optimization model of the community electric automobile cluster is determined, and the optimal action in the current state is represented.

As an alternative embodiment, in step S22, the parameter θ of the value network (Critic network) _Q Updating by minimizing the loss function L _Q To realize:

L _Q ＝E((y _t -Q(s _t ,a _t |θ _Q )) ² )

y _t ＝r _t +γQ'(s _t+1 ,u'(s _t+1 |θ _μ' )|θ _Q' )

The DDPG algorithm belongs to model-free reinforcement learning, and the learning process can be completed without a specific expression of a state transfer function. I.e. r _t Expressed as a negative value of the jackpot obtained from the reinforcement training:

γ _t ＝-J＝-ω ₁ J ₁ -ω ₂ J ₂ -ω ₃ J ₃ ······

By the above equation, the minimized objective function is converted into a form that obtains the maximum prize by optimizing the decision function.

As an alternative embodiment, in step S22, the parameter θ of the policy network (Actor network) _μ By minimizing the loss function L _μ To realize:

L _μ ＝-E(Q(s _t ,u(s _t )))

θ _μ' ←τθ _μ +(1-τ)θ _μ'

θ _Q' ←τθ _Q +(1-τ)θ _Q'

where τ is a soft update rate factor, and when τ is greater, the value network parameter θ _Q And parameters θ of the policy network _μ To corresponding target network parameter theta _Q' And theta _μ' The faster the transfer speed of (c).

As an alternative embodiment, according to the method of embodiment 1 of the present application, as shown in fig. 4, based on the situation of the arrival and departure of the electric vehicle, that is, the number of arrival and departure of the electric vehicle at different times in the sampled day, the scheduling period of the power distribution network is set to 24 hours in combination with the time-of-use electricity price information, the interval between two adjacent time periods is 1 hour, and the DDPG algorithm is used to successfully converge after training the agent to obtain a corresponding operation strategy, and the training result is shown in fig. 5. From fig. 5 it can be seen that the curve converges, i.e. the problem is solved, and the method used is reasonable.

Example 2

According to another aspect of the embodiment of the present application, there is also provided an electric vehicle ordered charging device based on a depth deterministic strategy gradient algorithm, the electric vehicle ordered charging device applying the electric vehicle ordered charging method based on the depth deterministic strategy gradient algorithm, the device including:

the system comprises a community electric automobile cluster ordered charging optimization model building module, a charging monitoring system and a charging system, wherein the community electric automobile cluster ordered charging optimization model building module is used for comprehensively considering the charging demands of community electric automobile users from the aspect of load aggregators aiming at the ordered charging problem of large-scale electric automobiles in a community charging station, regulating the charging behaviors of the electric automobiles in a charging station, integrating SOC information fed back by the charging monitoring system and predicted vehicle taking time information of the users, and building an ordered charging optimization model of the community electric automobile cluster;

and the model solving module is used for solving the ordered charging optimization model of the community electric automobile cluster by adopting a depth deterministic strategy gradient algorithm (DDPG) to obtain an optimal charging plan, and the optimal charging plan achieves the aims of optimizing operation in a charging station and effectively reducing daily operation cost in the station.

The present application is not limited to the above embodiments, but is to be accorded the widest scope consistent with the principles and other features of the present application.

Example 3

According to another aspect of the embodiment of the present application, there is further provided a computer readable storage medium, where the computer readable storage medium includes a stored program, and when the program runs, the device where the computer readable storage medium is located is controlled to execute the method for orderly charging an electric automobile based on the depth deterministic strategy gradient algorithm according to any one of the above.

Alternatively, in this embodiment, the above-mentioned computer readable storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network or in any one of the mobile terminals in the mobile terminal group, and the above-mentioned computer readable storage medium includes a stored program.

Optionally, the computer readable storage medium is controlled to perform the following functions when the program is run: from the aspect of load aggregation, comprehensively considering the charging demands of community electric automobile users, adjusting the charging behaviors of electric automobiles in a charging station, integrating SOC information fed back by a charging monitoring system and predicted vehicle taking time information of the users, and establishing an ordered charging optimization model of a community electric automobile cluster; and solving the ordered charging optimization model of the community electric automobile cluster by adopting a depth deterministic strategy gradient algorithm to obtain an optimal charging plan, wherein the optimal charging plan achieves the aims of optimizing operation in a charging station and effectively reducing daily operation cost in the station.

Example 4

According to another aspect of the embodiment of the present application, there is further provided a processor for running a program, wherein the program executes the method for orderly charging an electric vehicle based on the depth deterministic strategy gradient algorithm according to any one of the above.

The embodiment of the application provides equipment, which comprises a processor, a memory and a program stored on the memory and capable of running on the processor, wherein the processor realizes the step of the electric vehicle ordered charging method based on a depth deterministic strategy gradient algorithm when executing the program.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces and the indirect coupling or communication connection of units or modules may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-0nlyMemory (ROM), a random access memory (RAM, randomAccessMemory), a removable hard disk, a magnetic disk, or an optical disk, or the like, which can store program codes.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. An ordered charging method of an electric vehicle based on a depth deterministic strategy gradient algorithm is characterized by comprising the following steps:

2. The ordered charging method of the electric vehicles based on the depth deterministic strategy gradient algorithm according to claim 1, wherein the optimization objective of the ordered charging optimization model of the community electric vehicle cluster is:

3. The ordered charging method of electric vehicles based on depth deterministic strategy gradient algorithm according to claim 1, wherein the constraints of the ordered charging optimization model of the community electric vehicle cluster comprise: the method comprises the following steps of constraint of the state of charge of the electric automobile, constraint of charging expectancy of a user, constraint of operation of a charging pile of the electric automobile and constraint of charging time of the electric automobile.

4. The method for orderly charging electric vehicles based on the depth deterministic strategy gradient algorithm according to claim 1, wherein solving the orderly charging optimization model of the community electric vehicle cluster by using the depth deterministic strategy gradient algorithm comprises:

5. The ordered charging method for electric vehicles based on depth deterministic strategy gradient algorithm according to claim 4, wherein the value network parameter θ _Q Updating by minimizing the loss function L _Q To realize:

L _Q ＝E((y _t -Q(s _t ,a _t |θ _Q )) ² )

y _t ＝r _t +γQ'(s _t+1 ,u'(s _t+1 |θ _μ' )|θ _Q' )

6. The depth deterministic strategy gradient algorithm-based ordered charging method for electric vehicles according to claim 5, wherein r _t Expressed as a negative value of the jackpot obtained from the reinforcement training:

γ _t ＝-J＝-ω ₁ J ₁ -ω ₂ J ₂ -ω ₃ J ₃ ······

7. The ordered charging method for electric vehicles based on depth deterministic strategy gradient algorithm according to claim 4, wherein the parameters θ of the strategy network _μ By minimizing the loss functionNumber L _μ To realize:

L _μ ＝-E(Q(s _t ,u(s _t )))

θ _μ' ←τθ _μ +(1-τ)θ _μ'

θ _Q' ←τθ _Q +(1-τ)θ _Q'

8. An electric automobile ordered charging device based on depth deterministic strategy gradient algorithm is characterized by comprising:

9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run controls a device in which the computer readable storage medium is located to execute the method for orderly charging an electric vehicle based on the depth deterministic strategy gradient algorithm according to any one of claims 1 to 7.

10. A processor, wherein the processor is configured to run a program, wherein the program when run performs the depth deterministic strategy gradient algorithm-based electric vehicle ordered charging method according to any one of claims 1 to 7.