CN117669993A - Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium - Google Patents

Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium Download PDF

Info

Publication number
CN117669993A
CN117669993A CN202410124489.7A CN202410124489A CN117669993A CN 117669993 A CN117669993 A CN 117669993A CN 202410124489 A CN202410124489 A CN 202410124489A CN 117669993 A CN117669993 A CN 117669993A
Authority
CN
China
Prior art keywords
charging
virtual world
reinforcement learning
model
charging facility
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410124489.7A
Other languages
Chinese (zh)
Other versions
CN117669993B (en
Inventor
嘉有为
黄麒霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202410124489.7A priority Critical patent/CN117669993B/en
Priority claimed from CN202410124489.7A external-priority patent/CN117669993B/en
Publication of CN117669993A publication Critical patent/CN117669993A/en
Application granted granted Critical
Publication of CN117669993B publication Critical patent/CN117669993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention provides a progressive charging facility planning method, a progressive charging facility planning device, a progressive charging facility planning terminal and a progressive charging facility planning storage medium, wherein the progressive charging facility planning method comprises the following steps: dividing a target city into a plurality of areas, acquiring charging layout history data corresponding to each area, and training according to all the charging layout history data to obtain a reinforcement learning model; performing Monte Carlo tree search deduction on the reinforcement learning model in a pre-constructed initial virtual world model, and determining a current optimal construction area in all areas; acquiring real feedback data obtained after the charging facility is built according to the current optimal building area, and correcting the initial virtual world model according to the real feedback data to obtain a corrected virtual world model; based on the modified virtual world model, a next optimal construction area is determined in all areas. The method can accurately select the construction position of the charging resource, continuously correct and optimize the virtual world model, and ensure the rationality and the accuracy of charging facility planning.

Description

Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium
Technical Field
The present invention relates to the field of resource planning technologies, and in particular, to a progressive charging facility planning method, device, terminal, and storage medium.
Background
The existing charging resources are not enough in scale, uneven in distribution and unreasonable. The lack of adequate infrastructure presents significant challenges to the power grid and traffic network, including lengthy charge queuing and driving times, and potential overload of the power grid. Therefore, electric vehicle charging infrastructure planning has become a key research area that accelerates the popularity of global electric vehicles.
However, existing research methods and techniques suffer from drawbacks, such as over-idealization of the methods based on solving optimization equations; the trade-off of the long-term and short-term influence of the construction charging resources is ignored based on a heuristic method; the method based on the reinforcement learning method is seriously dependent on simulation environment and the like, so that the rationality of charging facility planning is poor, and further the effect of charging resource construction is poor.
Accordingly, the prior art has drawbacks and needs to be improved and developed.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a progressive charging facility planning method, a progressive charging facility planning device, a progressive charging facility planning terminal and a progressive charging facility planning storage medium aiming at solving the problem of poor rationality of charging facility planning in the prior art.
The technical scheme adopted for solving the technical problems is as follows:
A progressive charging facility planning method, comprising:
dividing a target city into a plurality of areas, acquiring charging layout history data corresponding to each area, and training according to all the charging layout history data to obtain a reinforcement learning model;
performing Monte Carlo tree search deduction on the reinforcement learning model in a pre-constructed initial virtual world model, and determining a current optimal construction area in all the areas;
acquiring real feedback data obtained after the charging facility is built according to the current optimal building area, and correcting the initial virtual world model according to the real feedback data to obtain a corrected virtual world model;
and determining the next optimal construction area in all the areas based on the modified virtual world model.
In one implementation manner, the dividing the target city into a plurality of regions, obtaining charging layout history data corresponding to each region, and training according to all the charging layout history data to obtain a reinforcement learning model, including:
taking a city of which charging resources are to be built as a target city, and dividing the target city into a plurality of areas;
acquiring charging layout history data corresponding to each region, and integrating each charging layout history data into a corresponding picture according to a plurality of characteristics in each charging layout history data, wherein the picture is provided with a plurality of channels;
Obtaining user behavior information according to all the charging layout historical data, constructing a simulation environment, and training a pre-constructed initial reinforcement learning model in the simulation environment according to the user behavior information and the pictures to obtain a trained reinforcement learning model;
wherein the charging layout history data includes: geographic data, traffic network data, power grid data, total charging travel time of an electric vehicle sending a charging request, charging queuing waiting time of the electric vehicle, annual electric vehicle maintenance volume increase rate and base load increase rate; the simulation environment is used for simulating user behaviors aiming at the electric automobile and evaluating the influence of charging resource construction in each area on a power grid and a traffic network of a target city; the user behavior information is a mapping relation obtained by carrying out user behavior modeling based on a pre-constructed probability model, and the mapping relation is a corresponding relation between generated charging demands and charging demand distribution.
In one implementation, obtaining user behavior information according to all the charging layout history data, constructing a simulation environment, training a pre-constructed initial reinforcement learning model in the simulation environment according to the user behavior information and the picture, and obtaining a trained reinforcement learning model, wherein the method comprises the following steps:
Obtaining user behavior information according to all the charging layout historical data, and constructing a simulation environment;
obtaining the total running time and the total queuing waiting time of the electric vehicle corresponding to reinforcement learning of adjacent times according to the user behavior information in a simulation environment;
substituting the total running time and the total queuing waiting time into a preset feedback function calculation formula to obtain a feedback function;
respectively taking each picture as input data, taking a corresponding feedback function as a training target, and training a pre-constructed initial reinforcement learning model to obtain a trained reinforcement learning model;
wherein the initial reinforcement learning model includes an attention convolution neural network for processing the picture data and a full-connection layer neural network for evaluating the action value.
In one implementation, the reinforcement learning model performs Monte Carlo tree search deduction in a pre-constructed initial virtual world model, and determines a current optimal construction area in all the areas, including:
an initial virtual world model is built by adopting a bootstrap dynamic integrated learning algorithm in advance, wherein the initial virtual world model is formed by assembling a plurality of learnable models, and the learnable models are probability neural networks;
The Monte Carlo tree search establishes nodes and edges based on the data change of the initial virtual world model, records the virtual value of the edges from the root node to the second node, the corresponding action exploration times and the action value estimated by the reinforcement learning model, and carries out rapid deduction through reinforcement learning selection actions;
updating the virtual value and the action exploration times of the traversed edges, wherein each edge keeps records of the virtual value and the action exploration times;
after the last search is completed, starting from the initial root node, accessing the action with the highest action search frequency, and taking the area corresponding to the action with the highest action search frequency as the current optimal construction area;
when the reinforcement learning model adopts Monte Carlo tree searching in an initial virtual world model, each deduction adopts one independent learning model, and each batch of deductions shares all the learning models.
In one implementation, obtaining real feedback data obtained after the charging facility is built according to the current optimal building area, and correcting the initial virtual world model according to the real feedback data to obtain a corrected virtual world model, including:
Acquiring real feedback data of each area obtained after the charging facility construction is carried out according to the current optimal construction area, and acquiring prediction change data of the real world caused by the charging facility construction predicted for each area in an initial virtual world model in advance;
and correcting the model parameters of the initial virtual world model according to the real feedback data and the prediction change data of each region to obtain a corrected virtual world model.
In one implementation, determining a next best construction area in all of the areas based on the modified virtual world model includes:
acquiring initial model parameters corresponding to the initial virtual world model and corrected model parameters corresponding to the corrected virtual world model;
calculating the divergence and the mean square error between the initial model parameters and the corrected model parameters;
if the divergence and mean square error exceed a preset threshold range, retraining the reinforcement learning model based on the corrected virtual world model;
and carrying out Monte Carlo tree search deduction on the retrained reinforcement learning model in the corrected virtual world model, and determining the next optimal construction area in all the areas until the charging facility planning of the preset number is completed in the target city.
In one implementation, the progressive charging facility planning method further includes:
presetting construction periods, wherein each construction period is used for constructing a preset number of charging facilities;
and when the charging resource of the next construction period is constructed, adding a preset generic expansion coefficient into the reinforcement learning model, wherein the generic expansion coefficient is used for discounting returns between continuous reinforcement learning.
The invention provides a progressive charging facility planning device, comprising:
the training module is used for dividing the target city into a plurality of areas, acquiring charging layout history data corresponding to each area, and training according to all the charging layout history data to obtain a reinforcement learning model;
the deduction module is used for carrying out Monte Carlo tree search deduction on the reinforcement learning model in a pre-constructed initial virtual world model, and determining the current optimal construction area in all the areas;
the correction module is used for acquiring real feedback data obtained after the charging facility construction is carried out according to the current optimal construction area, correcting the initial virtual world model according to the real feedback data, and obtaining a corrected virtual world model;
And the determining module is used for determining the next optimal construction area in all the areas based on the corrected virtual world model.
The invention provides a terminal, comprising: a memory, a processor, and a progressive charging facility planning program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the progressive charging facility planning method as described above.
The present invention provides a computer readable storage medium storing a computer program executable for implementing the steps of the progressive charging facility planning method as described above.
The invention provides a progressive charging facility planning method, which comprises the following steps: dividing a target city into a plurality of areas, acquiring charging layout history data corresponding to each area, and training according to all the charging layout history data to obtain a reinforcement learning model; performing Monte Carlo tree search deduction on the reinforcement learning model in a pre-constructed initial virtual world model, and determining a current optimal construction area in all the areas; acquiring real feedback data obtained after the charging facility is built according to the current optimal building area, and correcting the initial virtual world model according to the real feedback data to obtain a corrected virtual world model; and determining the next optimal construction area in all the areas based on the modified virtual world model. According to the invention, the charging resource construction position is accurately selected by adopting reinforcement learning and Monte Carlo tree search deduction based on the model, and the virtual world model is continuously corrected and optimized by comparing construction feedback in the real world and the virtual world, so that the rationality and accuracy of charging facility planning are ensured.
Drawings
FIG. 1 is a flow chart of a progressive charging facility planning method according to a preferred embodiment of the present invention;
FIG. 2 is a detailed logic diagram of a progressive charging facility planning method of the present invention;
FIG. 3 is a schematic illustration of a geographic meshing of the progressive charging facility planning method of the present invention;
FIG. 4 is a graph comparing the effect of charging resource construction under four schemes of the progressive charging facility planning method of the present invention;
FIG. 5 is a graph of the results of two schemes of charge resource construction in a dynamic scenario of the progressive charging facility planning method of the present invention;
FIG. 6 is a graph showing the environmental changes and charging resource construction results of two schemes in a dynamic scenario of the progressive charging facility planning method of the present invention;
fig. 7 is a comparison diagram of service capacity improvement brought by an operator in the process of constructing charging resources under two schemes in a dynamic scene of the progressive charging facility planning method of the invention;
FIG. 8 is a graph comparing the overall resource construction effect under four scheme multi-scenario tests of the progressive charging facility planning method of the present invention;
FIG. 9 is a graph comparing the effects of specific resource construction under four-scheme multi-scenario testing of the progressive charging facility planning method of the present invention;
FIG. 10 is a schematic diagram of an analysis system in the progressive charging facility planning method of the present invention;
FIG. 11 is a functional block diagram of a preferred embodiment of a progressive charging facility planning apparatus in accordance with the present invention;
fig. 12 is a functional block diagram of a preferred embodiment of a terminal in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In the prior art, the lack of research guides operators to distribute charging resource construction in cities with existing charging resource distribution so as to achieve optimal service capacity with minimum construction cost, so that the current charging resource construction is disordered, and the general planning scheme and analysis system are lacked.
Referring to fig. 1, fig. 1 is a flowchart of a progressive charging facility planning method according to the present invention. As shown in fig. 1, the progressive charging facility planning method according to the embodiment of the invention includes:
and step S100, dividing the target city into a plurality of areas, acquiring charging layout history data corresponding to each area, and training according to all the charging layout history data to obtain a reinforcement learning model.
Specifically, the charging layout history data of the target city is integrated, and the target city grid is divided into n×m areas.
In this embodiment of the present application, the step S100 specifically includes:
step S110, taking a city to be built with charging resources as a target city, and dividing the target city into a plurality of areas;
step S120, acquiring charging layout history data corresponding to each region, and integrating each charging layout history data into a corresponding picture according to a plurality of characteristics in each charging layout history data, wherein the picture is provided with a plurality of channels;
and step 130, obtaining user behavior information according to all the charging layout historical data, constructing a simulation environment, and training a pre-constructed initial reinforcement learning model in the simulation environment according to the user behavior information and the pictures to obtain a trained reinforcement learning model.
Wherein the charging layout history data includes: geographic data, traffic network data, power grid data, total charging travel time of an electric vehicle sending a charging request, charging queuing waiting time of the electric vehicle, annual electric vehicle maintenance volume increase rate and base load increase rate; the simulation environment is used for simulating user behaviors aiming at the electric automobile and evaluating influences of charging resource construction in each area on a power grid and a traffic network of a target city.
Specifically, the data of each region includes: geographic data of each region, traffic network data of each region and power network data of each region; the geographical data of each region includes: longitude and latitude, population, POI and electric automobile conservation amount; POIs are an abbreviation for "Polnt of Information" and are commonly used to represent something in the real world that is in a geographic location. Each regional traffic network data includes: road number, number of electric vehicles in each period and regional average speed; each regional power grid data includes: and (5) carrying out energy balance statistics on transformer capacity, base load and grid nodes. In addition, it is also necessary to estimate the total charge travel time of electric vehicles issuing charge requests per day in each region and the charge queuing time of electric vehicles in each region, and the annual mass growth rate and base load growth rate of electric vehicles in each region. Wherein each feature is represented as a graph, and the final 12 features are integrated into a graph containing 12 channels.
The calculation formula of the resource construction optimization target and the constraint condition is as follows:
the optimization objective is to maximize the Score function, i.e. to achieve optimal service capabilities with minimal construction costs. Where SW represents the service capability of the operator, related to the user charge travel time and queuing wait time. In order to plan the number of years the progress is in,for regional index +.>Index for expansion times>Representing the total planned time period,/->Represents the coefficient of expansion of the currency, < >>Indicating that region i is +.>Total travel time of all vehicles in year, +.>Indicating that region i is +.>Total waiting time of all vehicles in the year. Construction cost->Comprising the following steps: charging station cost->Expansion cost of distribution network>Pressure regulating cost->,/>Represents the line cost of region i, < >>Indicating the extension of the distribution line required for the new charging station in zone i,/->Representing the original line capacity of region i, +.>Line capacity of a node representing an extension area i, < ->Represents the extension costs of the substation in zone i, +.>Function representing net power increment, +.>Deviation voltage of voltage representing region i from nominal voltage, +.>Substation capacity representing capacity expansion area i +.>Representing the remaining capacity of the substation,/->Representation ofThe station of the area i adds the number of charging piles in the area i, < >>Indicating the actual power occupied by the charging resource with increased i-zone,/-, for example>Representing a set of regions of the target city partition.
Expressed as a mixed integer nonlinear optimization problem:
wherein,representing the number of capacity expansion of the charge capacity,/-)>Representing the maximum number of expansion>Is a set of all regions, +. >Finger spreadingNumber of volume decisions, each decision being from +.>Is selected from the group consisting of a plurality of combinations of the above. />Representing the maximum budget, which limits the financial costs. According to queuing theory, the->Less than a formula definition that guarantees queuing latency makes sense. In any given region i, charge load +.>And base load->Is not to exceed the capacity of the local transformer +.>. Taking into account the financial costs under budget constraints, use +.>Representing the investment costs of c resource constructions, including land costs +.>And installation cost->
The user behavior information is a mapping relation obtained by carrying out user behavior modeling based on a pre-constructed probability model, and the mapping relation is a corresponding relation between generated charging demands and charging demand distribution. The user behavior modeling provided by the invention influences the simulation environment change, and the simulation environment change influences the user behavior, which is a dynamic random and mutually-influenced model.
Specifically, the calculation formula related to the user behavior modeling is as follows:
wherein,representing the influence coefficient corresponding to the jth feature, hotspot intensity +.>Defined as the sum of the products of each feature and the corresponding influence coefficient, +.>The j-th feature matrix is represented, and the dimension is n×m. />Representing the attraction capacity of region y to region x, < - >Representing the excitation of region x to region y, +.>Indicating the maximum influencing radius of the charging station, +.>Influence coefficient representing the distance from region x to region y, < ->Random variables representing independent homodistribution and servant extremum distribution. As can be seen from the above formula, although the distance from region i to region y may be equal to the distance +.>The same, but parameter->Representing the unique attractions of zone y to zone x based on occupancy and employment of the x and y zones and trafficNetwork conditions, etc.
Assuming that the electric vehicle with the charging requirement is attracted by the area within the radius of the maximum influence of the charging station, the electric vehicle of the area x is selected to be charged in the area y by the following quantity:
wherein,the probability of the electric vehicle of region x selecting to charge region y is represented. />And->Indicating the number of electric vehicles that have a charge demand in region x and a charge in region y. />Is the population size of region x. />Is a random variable reflecting the randomness of the charge demand. />And the energy conservation quantity of the human-average electric automobile in the region x before the starting of the resource construction project is represented.Indicating the charging requirement of zone x at the t-th yearSolving for the growth rate.
In this embodiment of the present application, the step S130 specifically includes:
step S131, obtaining user behavior information according to all the charging layout historical data, and constructing a simulation environment;
Step S132, obtaining the total running time and the total queuing waiting time of the electric vehicle corresponding to reinforcement learning of adjacent times according to the user behavior information in a simulation environment;
s133, substituting the total running time and the total queuing waiting time into a preset feedback function calculation formula to obtain a feedback function;
and step S134, respectively taking each picture as input data, taking a corresponding feedback function as a training target, and training the pre-constructed initial reinforcement learning model to obtain a trained reinforcement learning model.
Wherein the initial reinforcement learning model includes an attention convolution neural network for processing the picture data and a full-connection layer neural network for evaluating the action value.
Specifically, in a simulation environment, simulating user behavior of an electric vehicle every day includes: and sending a charging request, selecting a charging station, driving a journey, queuing and charging the charging station, and evaluating the influence of the construction of charging resources of a certain area on the whole urban power grid traffic network.
The simulation environment further comprises evaluation resource construction related calculation, and the influence of the charge capacity layout on the simulation environment needs to be evaluated. The score for the charge capacity layout is calculated from the social cost, which is measured by the sum of travel and waiting time:
Wherein,to implement a certain planning strategy->Total pre-preparation of all front and rear electric vehiclesThe difference in charging time represents an improvement in social welfare. />Represents the total number of all divided regions, constant +.>Will->Down to a suitable analytical scale.For the total waiting time before resource construction, +.>For the total charging driving time before resource construction, +.>For the total waiting time after resource construction, +.>And the total charging running time after resource construction is obtained. />The travel time for charging is the sum of travel time of electric vehicles for all charging demands of the area, so the demands of the respective areas need to be considered.
The relevant formulas for the impact on the user and the traffic network are as follows:
wherein,charging travel distance for kth electric vehicle, < >>The average speed of the kth electric automobile. />In connection with the traffic situation of this area, the traffic situation index is expressed as + ->。/>For the number of roads in region i, +.>As an additional variable, its variation is related to the road type. Make up the collection->Then, the collection +.>Normalizing and mapping the average speed of the electric vehicles in the area i to the maximum and minimum speed ranges specified by the traffic rules, and determining the average speed of the kth vehicle>。/>Indicating the number of areas traversed by the kth vehicle. / >Representing a set of kth vehicle pathway regions,representation set->Element of (a)>Representing the corresponding region->Is a function of the average speed of the (c).
The queuing time is calculated based on the queuing theory according to the following formula:
wherein,indicating the ith partition->Charge capacity per year (day), for example>Representing the ith partitionAverage number of vehicles arrived in annual unit time (day),. About.>Indicating the ith partition->Average number of service vehicles per year (day), for example>Indicating the ith partition->Average service strength per year (day). At the same time (I)>And->Will vary depending on the dynamic total charge demand and its rate of increase.
In combination with the above analysis, the final attraction calculation can be modified as follows:
representing the driving time influence coefficient,/->Representation->Electric vehicle total driving time in the region of phase i, < > in->Representing queuing time influence coefficient,/->Representation->Electric vehicle total queuing time in stage i area, < >>Indicating the charging station maximum influencing radius.
The invention optimizes the running states of the power grid and the traffic network and effectively improves the resource construction efficiency.
The reinforcement learning model includes two parts: picture data processing of the attention convolution neural network and action value evaluation of the common neural network. At the data integration module, the city data has been organized into a picture containing 12 channels. And the attention convolutional neural network part mainly learns the importance degree of the influence of 12 channels on resource construction, and the convolutional neural network mainly simplifies operation and refines the three-dimensional picture information into one-dimensional array information. The one-dimensional array information is input into a common full-connection layer neural network, and the effect evaluation of each resource construction area is output. The two neural networks are trained through a reinforcement learning mechanism, but this process belongs to offline training.
The entire process of resource construction needs to be modeled as a markov decision process before reinforcement learning algorithms can be used.
The environmental state is represented as a tensor consisting of features connected in the following way:
the method comprises the steps of POI number, road number, electric vehicle number in each period, population number, electric vehicle holding capacity, existing charging capacity, driving time, waiting time, distributable load, power grid node energy balance statistics, charging demand growth rate and base load growth rate. Each matrix->Representing the spatial distribution of the relevant features, the dimension being mxn.
The action is represented as the operator taking an action to increase the charge capacity of a particular area, for example, the operator increasing the charge capacity of 20 charging posts in that area.
The state transfer function is defined as the slave timeTo->Is performed by the operator>And random external data->Driven. />Status and->The state relationship at the time is expressed as:
the feedback function carries out rewarding calculation from the angle of an operator, and the calculation formula is as follows:
wherein,the t-1 th time of reinforcement learning is represented, and the total running time of the electric automobile in the i area is represented; />The t-1 th time of reinforcement learning is represented, and the total queuing waiting time of the electric automobile in the i area is represented; / >Indicating the total running time of the electric automobile in the i area at the t time of reinforcement learning; />Indicating the total queuing waiting time of the electric automobile in the i area at t time of reinforcement learning.
The calculation formula of the state action value function is as follows:
wherein,representing the desire for improved social welfare for resource construction using strategy pi, < >>Charge resource construction representing the last step, +.>Representing the discount factor.
As shown in fig. 1, the progressive charging facility planning method according to the present embodiment further includes:
and step 200, carrying out Monte Carlo tree search deduction on the reinforcement learning model in a pre-constructed initial virtual world model, and determining the current optimal construction area in all the areas.
In this embodiment of the present application, the step S200 specifically includes:
step S210, an initial virtual world model is built by adopting a bootstrap dynamic integrated learning algorithm in advance, wherein the initial virtual world model is formed by assembling a plurality of learnable models, and the learnable models are probability neural networks;
step S220, establishing nodes and edges based on the data change of the initial virtual world model by Monte Carlo tree search, recording the virtual value of the edges from the root node to the second node, corresponding action exploration times and the action value estimated by the reinforcement learning model, and carrying out quick deduction through reinforcement learning selection actions;
Step S230, updating the virtual value and the action exploration times of the traversed edges, wherein each edge keeps records of the virtual value and the action exploration times;
step S240, after the last search is completed, starting from the initial root node, accessing the action with the highest action search frequency, and taking the area corresponding to the action with the highest action search frequency as the current optimal construction area.
When the reinforcement learning model adopts Monte Carlo tree searching in an initial virtual world model, each deduction adopts one independent learning model, and each batch of deductions shares all the learning models.
The invention relates to a model-based reinforcement learning and online integrated learning simulation environment, which solves the problems of lack of data quantity and serious dependence on the simulation environment in reinforcement learning by utilizing Monte Carlo tree search for multiple sampling.
Specifically, the probability distribution of reinforcement learning next states and corresponding action rewards is primarily output according to the learnable virtual world model. And carrying out Monte Carlo tree search deduction B times on the virtual world model by the trained reinforcement learning model, and selecting the address with the highest score for resource construction. The virtual world model is based on state User behavior modeling similar to a simulation environment is established and simulated. However, fluctuations in the virtual world model with respect to feature and region variations show more uncertainty than the simulation environment model. It can reduce uncertainty by adding a priori and posterior knowledge that can be derived from charging layout history data predictions and programming implemented feedback, such as more accurate user behavior modeling parameters and predicted feature rate of change ranges at time t, etc.
Monte Carlo tree search builds nodes and connection lines based on the data changes of the virtual world model. The trained reinforcement learning model helps Monte Carlo tree search pick efficient actions by forward deduction. In each of the q-th simulations, a root node is connectedAnd leaf node->Is->Store virtual value->Phase(s)Number of times of action explorationAnd action value estimated by reinforcement learning model +.>. The search tree is perfected mainly by simulating in a virtual world, namely only recording the virtual value of the edge from the root node to the second node, the corresponding action exploration times and the action value estimated by the reinforcement learning model, and then carrying out quick deduction directly through reinforcement learning selection actions without the backtracking process of Monte Carlo tree search. Before each simulation starts, the root node evaluates the value by maximizing +. >Select action->:/>
Wherein,representing the virtual value weight. The evaluation value is determined by two different components: final profit->And virtual value->. The final benefit is proportional to the action value, but to encourage the mechanism to explore more space, it decays as the number of repeated accesses increases.
The reinforcement learning model calculates the root node state only onceThese values are stored as a priori knowledge to determine the final benefit, which helps to select one action from the root node during each simulation. When the leaf node state +.>When the leaf node is expanded to generate a new child node.
Virtual value is determined by virtual scoreCalculation, representing the adoption of a fast push strategy in the virtual environment +.>The final score obtained from the leaf node state is from the leaf node state until the planning process is completed. />Is binary and indicates whether an edge was traversed in the first simulation.
In the whole simulation deduction process, only the virtual value and the action exploration times of the traversed edges are updated. Finally, each edge maintains a record of virtual value and number of action explores. After the last search is completed, the algorithm selects the action with the highest access times from the initial root position. Resource construction is carried out in the area represented by the resource construction:
The virtual world modeling adopts a bootstrap dynamic integrated learning algorithm, the large model is formed by assembling Q learnable models,. Each member of the integration is a probabilistic neural network whose output parameterizes a gaussian distribution with diagonal covariance:
wherein,indicating that the i-th learning parameter is +.>Is based on the t-moment data tuple +.>And the existing a priori posterior knowledge, predictive action +.>Return of->And the state at time t+1->. Thus, when a Monte Carlo tree search is used in the virtual world model, each deduction uses one of the independent models. And then parallelizing the algorithm, deducing and sharing Q models in each batch, and outputting Q possible results at the same time, thereby greatly improving the efficiency of Monte Carlo tree search. />
The data of the integrated learning are firstly from simulation environment and charging layout history data of each region of the city, the rudiment of the virtual world model is learned firstly, and then fine adjustment is carried out by collecting real world changes and real feedback of resource construction.
As shown in fig. 1, the progressive charging facility planning method according to the present embodiment further includes:
and step S300, acquiring real feedback data obtained after the charging facility construction is carried out according to the current optimal construction area, and correcting the initial virtual world model according to the real feedback data to obtain a corrected virtual world model.
The progressive resource construction of the invention means that each time construction is carried out, real feedback and world change are observed, data analysis is collected, and next construction is carried out until the construction is finished. In the construction process, decisions are made based on actual current data analysis, and short-term return and long-term return are considered to be balanced, so that the construction effect of charging resources is effectively improved, and the construction cost is reduced.
In this embodiment of the present application, the step S300 specifically includes:
step S310, obtaining real feedback data of each area obtained after charging facility construction according to the current optimal construction area, and obtaining predicted change data of the charging facility construction on the real world, which is predicted in advance in an initial virtual world model;
and step 320, correcting the model parameters of the initial virtual world model according to the real feedback data and the predicted change data of each region to obtain a corrected virtual world model.
Specifically, the virtual world model is modified according to feedback of the collection of real world resource construction, and compared with feedback of the virtual world resource construction. Before real resource construction is carried out in one place, the virtual world module predicts the benefits brought by real construction to the real world for the nodes worth exploring. Once the real world is built, real world change data and feedback data are received, so that unreasonable model parameters are finely adjusted, and real world change tracks are accurately predicted on the basis of new data. It should be noted that any estimated parameters used in the analysis system of the present invention can be fine-tuned according to this concept, including learnable virtual world model parameters, influence coefficients of features, user behavior selection coefficients, electric vehicle reserve rate estimation, and the like. The model parameters of the machine learning class are adjusted in a soft gradient descending mode, and other model parameters are subjected to soft adjustment according to new and old parameter values. The real feedback data and the predicted variation data comprise: and carrying out geographic data, traffic network data, power grid data, total charging running time of the electric vehicle sending a charging request, charging queuing waiting time of the electric vehicle, annual electric vehicle maintenance quantity increasing rate and base load increasing rate and other charging resource related data after the charging facilities are built according to the current optimal construction area.
As shown in fig. 1, the progressive charging facility planning method according to the present embodiment further includes:
and step 400, determining the next optimal construction area in all the areas based on the corrected virtual world model.
In this embodiment of the present application, the step S400 specifically includes:
step S410, obtaining initial model parameters corresponding to the initial virtual world model and corrected model parameters corresponding to the corrected virtual world model;
step S420, calculating the divergence and the mean square error between the initial model parameters and the corrected model parameters;
step S430, if the divergence and the mean square error exceed a preset threshold range, retraining the reinforcement learning model based on the corrected virtual world model;
and step S440, carrying out Monte Carlo tree search deduction on the retrained reinforcement learning model in the corrected virtual world model, and determining the next optimal construction area in all the areas until the charging facility planning of the preset number is completed in the target city.
In particular, the present invention supports retraining reinforcement learning models. The progressive charging resource construction method provided by the invention has the advantages that the output construction scheme is not directly determined by reinforcement learning, and the progressive charging resource construction method also comprises the construction scheme which is influenced by Monte Carlo tree searching and learnable virtual worlds and is finally determined by the grading of an analysis system. Reinforcement learning models are used only in Monte Carlo tree searches, but reinforcement learning needs to be guaranteed to be efficient. Therefore, the analysis system of the invention is not only a model-based simulation system, continuously generates simulation data and perfects the model in the scheme implementation process, but also supports the retraining of the reinforcement learning model in the updated simulation environment.
The target Q value calculation for reinforcement learning is modified as follows:
if the divergence and the mean square error do not exceed the preset threshold range, the original reinforcement learning model is directly used for carrying out Monte Carlo tree search deduction in the corrected virtual world model, and the next optimal construction area is determined in all the areas. The invention continuously simulates and analyzes the resource construction and judges whether the construction is finished or the analysis is stopped manually.
The 2 judgment indexes for measuring whether the reinforcement learning retraining is needed are the divergence and the mean square error. The following is shown:
the idea is mainly to compare the original parameters of training environmentAnd after fine tuning->Is a relationship of (3). Because the analysis system faces different scenes or the parameters of the analysis system are different, the critical value of the judgment index of retraining is required to be determined according to the situation. The experiment conclusion is that the MSE is controlled to be less than 20 percent, and the system has high-efficiency analysis capability without retraining, so that the invention improves the tolerance value of the MSE.
In an embodiment of the present application, the progressive charging facility planning method further includes: presetting construction periods, wherein each construction period is used for constructing a preset number of charging facilities; and when the charging resource of the next construction period is constructed, adding a preset generic expansion coefficient into the reinforcement learning model, wherein the generic expansion coefficient is used for discounting returns between continuous reinforcement learning.
Specifically, the analysis system of the present invention is a self-circulating system that directs the effect of resource construction rather than simply constructing capacity. Progressive charge capacity construction is a long time scale problem, and the amount of construction specified at the beginning of construction may be more or less than that required for urban development, so that construction directed to improving social welfare is more important. For the present system, it is important to trade off the long-term short-term benefits of resource construction, where reinforcement learning of long-term and short-term weights is a major issue.
Based on the above analysis, the generic expansion coefficient was introduced. In the deep reinforcement learning algorithm, a discount factor is inherently added in order to balance the current and future rewards and ensure the convergence of the learning algorithm>. This factor must be considered in each action because its primary effect is to discount the return between successive deep reinforcement learning algorithm actions. However, unlike the discounted factor, the generic expansion coefficient is related to the net present value of future revenue and is primarily related to the process of discounting between resource construction cycles, which need not be considered in the same construction cycle. In order to apply deep reinforcement learning to solve the long-term planning problem, both the discount factor and the generic expansion coefficient need to be considered to ensure accurate calculation of net present value NPV. Add +. >After that, return on discount of t times of resource construction->Has performedThe corresponding redefinition may be expressed by the following formula:
wherein,expressed in planned->The number of charging stations planned to be built in stages; />Indicate->Accumulating the construction times in stages; />Indicating the return in the present period; />Indicating a return on future refraction.
For the construction period, the target city has a construction target, so the construction is performed in multiple stages, and each construction period is one construction period. For example, 100 stations need to be built, t means 1 to 100 times of building stations, building stations is performed in a total of 5 periods, 20 stations are built in one building period, and 20 stations are in the same building period.
The invention aims to improve the construction efficiency of charging resources and reduce the cost, and the user behavior modeling and the simulation environment construction are carried out by integrating the charging layout history data of a target city and dividing the charging layout history data into N.times.M regional grids. On the basis, model-based reinforcement learning and Monte Carlo tree search deduction are adopted, and the construction position of the charging resource is accurately selected. By comparing the construction feedback in the real world and the virtual world, the virtual world model is continuously corrected and optimized, and the adaptability and accuracy of the decision are ensured. The method not only effectively improves the resource construction efficiency and optimizes the running states of the power grid and the traffic network, but also obviously improves the urban social benefit by considering the balance of short-term and long-term return. Through the strategy of dynamic adaptation and continuous optimization, the method effectively solves the problems of insufficient data volume and excessively high dependence on simulation environment in reinforcement learning. Therefore, the invention can effectively improve the construction effect of the charging resource and reduce the construction cost.
In a specific embodiment, as shown in fig. 2, the progressive charging facility planning method specifically includes:
a1, integrating charging layout historical data of a target city, and dividing grids into multiple areas;
a2, modeling user behaviors and building a simulation environment;
step A3, training a reinforcement learning model;
step A4, carrying out Monte Carlo tree search deduction B times on the trained model in the virtual world model, and selecting the area with the highest score for resource construction;
step A5, collecting feedback of real world resource construction, comparing the feedback of virtual world resource construction, and correcting a virtual world model;
step A6, judging whether to retrain the reinforcement learning model; if yes, returning to the step A2; if not, returning to the step A4;
and A7, continuously constructing and analyzing the resources until the construction is completed or the analysis is stopped manually.
As shown in fig. 3, which is a schematic diagram of geographical division, the abscissa in the diagram represents latitude and longitude, respectively, and the city edge line is outlined by a black curve, where m=n=15. The 12 groups are characterized by:
,/>
the method comprises the steps of POI number, road number, electric vehicle number in each period, population number, electric vehicle holding capacity, existing charging capacity, driving time, waiting time, distributable load, power grid node energy balance statistics, charging demand growth rate and base load growth rate. Each matrix- >Are all 15 by 15 dimensions +.>Is (12 x 15). The meaningless geographic locations in the figure are filled with 0 s.
In fig. 4, the original plan was constructed for 100 times for 10 years, and the construction investment was increased or decreased according to the actual situation. Thus, the resource builds three scenarios: the construction of charging resources is carried out for 100 times in 10 years as expected; the construction of charging resources for 100 times in 10 years is expected to exceed the charging requirement, and the construction is reduced by 20 times; the charging resource construction for 100 times in 10 years is expected to not meet the charging requirement, and the number of the construction is increased by 20. FIG. 4 shows a graph of the effect of four methods of solving a resource plan, in each scenario, dynamically evolving and sampling 100 times worldwide. The result shows that compared with a heuristic algorithm, the comprehensive score of the method is improved by 10.65%, 4.88% compared with classical reinforcement learning and 3.99% compared with improved reinforcement learning. Regardless of the scenario, the present invention always presents an optimal composite score.
The classical reinforcement learning, the improved reinforcement learning and the invention adopt the same reinforcement learning framework, and CNN, dueling Double DQN and two full-connection layers are adopted. The improved reinforcement learning model enhances basic reinforcement learning, uniquely integrates a channel attention layer prior to CNN, and takes into account the generic expansion factor in the bonus calculation process. The invention is explored and developed by using improved reinforcement learning, and decision making is performed by using MCTS.
Fig. 5 shows a graph of the construction results of the charging resources of the two schemes in a certain dynamic scene, and fig. 6 shows a distribution diagram of the environmental changes and the construction results of the charging resources of the two schemes in a certain dynamic scene. The gray background in fig. 5 indicates that the area has low charging service capability. The circle represents the area where the construction of the charging resource is performed. When investment is limited, it is inevitable that certain areas have low electrical service capability, and therefore an effective strategy is needed to determine which areas should experience low electrical service capability and which areas should increase charging service capability. As can be seen from fig. 5, the analysis system proposed by the present invention not only presents fewer areas of low electric service capability, but also strives to ensure that low electric service capability occurs in areas of little choice for the electric vehicle. These areas are characterized by a small population, sparse POIs, or poor traffic infrastructure. The larger the graph in fig. 6, the more significant the variation in the corresponding factors. Increased appeal is related to POI, number of roads, existing charge capacity, distributable load, grid operating status, traffic grid operating status, while increased electric vehicle possession is related to population and charge demand growth rate. The analysis system provided by the invention not only shows less congestion areas, but also has few electric vehicles for selecting the congestion areas.
In a specific embodiment, the analysis system provided by the invention improves the performance effect by 8.32% compared with a heuristic algorithm; in another specific embodiment, the analysis system provided by the invention improves the performance effect by 5.35% compared with the improved reinforcement learning.
Fig. 7 is a comparison diagram of service capacity improvement brought by an operator in the process of constructing charging resources under two schemes in a certain dynamic scene. The analysis system provided by the invention determines the policy with highest efficiency-cost ratio through the development and the exploration of Monte Carlo tree search, increases reserved funds for coping with potential cost caused by original scheme change, and realizes the maximization of social benefit. In addition, the resource construction scheme of the system provided by the invention enables citizens to enjoy the improvement of social benefits brought by the resource construction of operators as widely as possible.
Fig. 8 shows a comparison graph of overall resource construction effects under four-scheme multi-scenario test, and fig. 9 shows a comparison graph of specific resource construction effects under four-scheme multi-scenario test.
For the specific presentation of the data changes, as shown in table 1:
TABLE 1
Table 1 shows various average indexes when the electric vehicle has 18 ten thousand electric vehicles and a charging requirement of 163639 electric vehicles is randomly generated at a time after the capacity expansion by four different modes. The reinforcement learning-based algorithm provides an effective charge resource construction strategy under the near-reality condition, improves traffic conditions under the budget constraint and improves the service capacity of operators. Both classical reinforcement learning and improved reinforcement learning create an effective policy. However, the proposed system not only achieves the highest score, but also minimizes costs. The system provided by the invention determines the policy with highest efficiency-cost ratio through the development and exploration of Monte Carlo tree search, increases reserved funds for coping with potential cost caused by original scheme change, and realizes the highest charge service capacity improvement by using the least investment funds.
In addition, referring to fig. 10, the analysis system of the present invention includes a data integration and data processing module, a user behavior simulation and simulation environment module, a reinforcement learning module, a learnable virtual world module, and a real world detection module.
The data integration and data processing module is used for dividing the grid into N+M areas and collecting data of each area. Geographic data of each region: longitude and latitude, population, POI and electric automobile conservation amount; traffic network data for each region: road number, number of electric vehicles in each period, and average speed in the region; grid data for each region: the transformer capacity, the base load and the energy balance statistics of the power grid node; the total charging running time of electric vehicles sending charging requests in each region and the charging queuing waiting time of the electric vehicles in each region are estimated, and the electric vehicles in each region maintain a quantitative increase rate and a basic load increase rate each year. The data is collated into 12-channel 'pictures'. The data provides a user behavior simulation and emulation environment module and a learnable virtual world module.
The user behavior simulation and simulation environment module is used for modeling user behaviors and building a simulation environment. In a simulation environment, simulating user behavior of an electric vehicle every day includes: and sending a charging request, selecting a charging station, driving a journey, queuing and charging the charging station, and evaluating the influence of the construction of charging resources in a certain area on the whole urban power grid traffic network.
The reinforcement learning module is trained and used for simulating the user behavior and simulating the environment module, and supports rehearsal in the learnable virtual world module when the real environment interacts. The module is used for carrying out Monte Carlo tree search deduction in the virtual world module.
The virtual world module can be learned, the data is sourced from the user behavior simulation and simulation environment module and the real world detection module. The module deduction algorithm is a Monte Carlo tree search.
A real world detection module for collecting data on a long time scale from the real world by partitions, including population, POI, electric vehicle inventory; traffic network data for each region: road number, number of electric vehicles in each period, and average speed in the region; grid data for each region: the transformer capacity, the base load and the energy balance statistics of the power grid node; the electric vehicles in each region send out the charging request every day and have the total charging running time and the charging queuing waiting time of the electric vehicles in each region, and the electric vehicles in each region maintain the quantity increasing rate and the base load increasing rate every year. In addition, real feedback of resource construction in the city is provided, including the running state of the traffic network and the power grid.
In an embodiment, as shown in fig. 11, based on the progressive charging facility planning method, the present invention further provides a progressive charging facility planning apparatus, including:
The training module 100 is configured to divide a target city into a plurality of regions, obtain charging layout history data corresponding to each region, and train according to all the charging layout history data to obtain a reinforcement learning model;
the deduction module 200 is configured to perform Monte Carlo tree search deduction on the reinforcement learning model in a pre-constructed initial virtual world model, and determine a current optimal construction area in all the areas;
the correction module 300 is configured to obtain real feedback data obtained after the charging facility is built according to the current optimal building area, and correct the initial virtual world model according to the real feedback data to obtain a corrected virtual world model;
a determining module 400, configured to determine a next optimal construction area in all the areas based on the modified virtual world model.
Fig. 12 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal may include:
memory 501, processor 502, and a computer program stored on memory 501 and executable on processor 502.
The processor 502 implements the progressive charging facility planning method provided in the above embodiment when executing the program.
Further, the terminal further includes:
a communication interface 503 for communication in the memory 501 and the processor 502.
Memory 501 for storing a computer program executable on processor 502.
The memory 501 may include high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 501, the processor 502, and the communication interface 503 are implemented independently, the communication interface 503, the memory 501, and the processor 502 may be connected to each other via a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (Periphera l Component, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the figures are shown with only one line, but not with only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 501, the processor 502, and the communication interface 503 are integrated on a chip, the memory 501, the processor 502, and the communication interface 503 may perform communication with each other through internal interfaces.
The processor 502 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more integrated circuits configured to implement embodiments of the present application.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a progressive charging facility planning method as above.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "N" is at least two, such as two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can read instructions from and execute instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer cartridge (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented as software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.
In summary, the invention discloses a progressive charging facility planning method, a device, a terminal and a storage medium, wherein the method comprises the following steps: dividing a target city into a plurality of areas, acquiring charging layout history data corresponding to each area, and training according to all the charging layout history data to obtain a reinforcement learning model; performing Monte Carlo tree search deduction on the reinforcement learning model in a pre-constructed initial virtual world model, and determining a current optimal construction area in all the areas; acquiring real feedback data obtained after the charging facility is built according to the current optimal building area, and correcting the initial virtual world model according to the real feedback data to obtain a corrected virtual world model; and determining the next optimal construction area in all the areas based on the modified virtual world model. According to the invention, the charging resource construction position is accurately selected by adopting reinforcement learning and Monte Carlo tree search deduction based on the model, and the virtual world model is continuously corrected and optimized by comparing construction feedback in the real world and the virtual world, so that the rationality and accuracy of charging facility planning are ensured.
It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims (10)

1. A progressive charging facility planning method, the method comprising:
dividing a target city into a plurality of areas, acquiring charging layout history data corresponding to each area, and training according to all the charging layout history data to obtain a reinforcement learning model;
performing Monte Carlo tree search deduction on the reinforcement learning model in a pre-constructed initial virtual world model, and determining a current optimal construction area in all the areas;
acquiring real feedback data obtained after the charging facility is built according to the current optimal building area, and correcting the initial virtual world model according to the real feedback data to obtain a corrected virtual world model;
and determining the next optimal construction area in all the areas based on the modified virtual world model.
2. The progressive charging facility planning method according to claim 1, wherein the dividing the target city into a plurality of regions, obtaining charging layout history data corresponding to each region, and training according to all the charging layout history data to obtain a reinforcement learning model includes:
Taking a city of which charging resources are to be built as a target city, and dividing the target city into a plurality of areas;
acquiring charging layout history data corresponding to each region, and integrating each charging layout history data into a corresponding picture according to a plurality of characteristics in each charging layout history data, wherein the picture is provided with a plurality of channels;
obtaining user behavior information according to all the charging layout historical data, constructing a simulation environment, and training a pre-constructed initial reinforcement learning model in the simulation environment according to the user behavior information and the pictures to obtain a trained reinforcement learning model;
wherein the charging layout history data includes: geographic data, traffic network data, power grid data, total charging travel time of an electric vehicle sending a charging request, charging queuing waiting time of the electric vehicle, annual electric vehicle maintenance volume increase rate and base load increase rate; the simulation environment is used for simulating user behaviors aiming at the electric automobile and evaluating the influence of charging resource construction in each area on a power grid and a traffic network of a target city; the user behavior information is a mapping relation obtained by carrying out user behavior modeling based on a pre-constructed probability model, and the mapping relation is a corresponding relation between generated charging demands and charging demand distribution.
3. The progressive charging facility planning method of claim 2, wherein obtaining user behavior information from all the charging layout history data, and constructing a simulation environment, training a pre-constructed initial reinforcement learning model in the simulation environment according to the user behavior information and the pictures, and obtaining a trained reinforcement learning model, comprising:
obtaining user behavior information according to all the charging layout historical data, and constructing a simulation environment;
obtaining the total running time and the total queuing waiting time of the electric vehicle corresponding to reinforcement learning of adjacent times according to the user behavior information in a simulation environment;
substituting the total running time and the total queuing waiting time into a preset feedback function calculation formula to obtain a feedback function;
respectively taking each picture as input data, taking a corresponding feedback function as a training target, and training a pre-constructed initial reinforcement learning model to obtain a trained reinforcement learning model;
wherein the initial reinforcement learning model includes an attention convolution neural network for processing the picture data and a full-connection layer neural network for evaluating the action value.
4. The progressive charging facility planning method of claim 1, wherein performing a monte carlo tree search deduction on the reinforcement learning model in a pre-constructed initial virtual world model, determining a current optimal construction area among all the areas, comprises:
an initial virtual world model is built by adopting a bootstrap dynamic integrated learning algorithm in advance, wherein the initial virtual world model is formed by assembling a plurality of learnable models, and the learnable models are probability neural networks;
the Monte Carlo tree search establishes nodes and edges based on the data change of the initial virtual world model, records the virtual value of the edges from the root node to the second node, the corresponding action exploration times and the action value estimated by the reinforcement learning model, and carries out rapid deduction through reinforcement learning selection actions;
updating the virtual value and the action exploration times of the traversed edges, wherein each edge keeps records of the virtual value and the action exploration times;
after the last search is completed, starting from the initial root node, accessing the action with the highest action search frequency, and taking the area corresponding to the action with the highest action search frequency as the current optimal construction area;
When the reinforcement learning model adopts Monte Carlo tree searching in an initial virtual world model, each deduction adopts one independent learning model, and each batch of deductions shares all the learning models.
5. The progressive charging facility planning method according to claim 1, wherein obtaining real feedback data obtained after charging facility construction according to the current optimal construction area, and correcting the initial virtual world model according to the real feedback data, obtaining a corrected virtual world model, comprises:
acquiring real feedback data of each area obtained after the charging facility construction is carried out according to the current optimal construction area, and acquiring prediction change data of the real world caused by the charging facility construction predicted for each area in an initial virtual world model in advance;
and correcting the model parameters of the initial virtual world model according to the real feedback data and the prediction change data of each region to obtain a corrected virtual world model.
6. The progressive charging facility planning method of claim 5, wherein determining a next optimal construction area in all of the areas based on the modified virtual world model comprises:
Acquiring initial model parameters corresponding to the initial virtual world model and corrected model parameters corresponding to the corrected virtual world model;
calculating the divergence and the mean square error between the initial model parameters and the corrected model parameters;
if the divergence and mean square error exceed a preset threshold range, retraining the reinforcement learning model based on the corrected virtual world model;
and carrying out Monte Carlo tree search deduction on the retrained reinforcement learning model in the corrected virtual world model, and determining the next optimal construction area in all the areas until the charging facility planning of the preset number is completed in the target city.
7. The progressive charging facility planning method of claim 1, further comprising:
presetting construction periods, wherein each construction period is used for constructing a preset number of charging facilities;
and when the charging resource of the next construction period is constructed, adding a preset generic expansion coefficient into the reinforcement learning model, wherein the generic expansion coefficient is used for discounting returns between continuous reinforcement learning.
8. A progressive charging facility planning apparatus, the apparatus comprising:
The training module is used for dividing the target city into a plurality of areas, acquiring charging layout history data corresponding to each area, and training according to all the charging layout history data to obtain a reinforcement learning model;
the deduction module is used for carrying out Monte Carlo tree search deduction on the reinforcement learning model in a pre-constructed initial virtual world model, and determining the current optimal construction area in all the areas;
the correction module is used for acquiring real feedback data obtained after the charging facility construction is carried out according to the current optimal construction area, correcting the initial virtual world model according to the real feedback data, and obtaining a corrected virtual world model;
and the determining module is used for determining the next optimal construction area in all the areas based on the corrected virtual world model.
9. A terminal, comprising: a memory, a processor and a progressive charging facility planning program stored on the memory and operable on the processor, the progressive charging facility planning program when executed by the processor implementing the steps of the progressive charging facility planning method of any one of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program executable for implementing the steps of the progressive charging facility planning method according to any one of claims 1-7.
CN202410124489.7A 2024-01-30 Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium Active CN117669993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410124489.7A CN117669993B (en) 2024-01-30 Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410124489.7A CN117669993B (en) 2024-01-30 Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN117669993A true CN117669993A (en) 2024-03-08
CN117669993B CN117669993B (en) 2024-07-02

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117932519A (en) * 2024-03-15 2024-04-26 广东领卓能源科技有限公司 New energy automobile charging station monitored control system

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805321A (en) * 2017-05-02 2018-11-13 南京理工大学 A kind of electric automobile charging station planing method
CN109711630A (en) * 2018-12-28 2019-05-03 郑州大学 A kind of electric car fast charge station addressing constant volume method based on trip probability matrix
CN109840635A (en) * 2019-01-29 2019-06-04 三峡大学 Electric automobile charging station planing method based on voltage stability and charging service quality
CN111104732A (en) * 2019-12-03 2020-05-05 中国人民解放军国防科技大学 Intelligent planning method for mobile communication network based on deep reinforcement learning
CN112330025A (en) * 2020-11-06 2021-02-05 国网冀北电力有限公司张家口供电公司 Prediction method of space-time charging load for urban electric vehicle
CN112487622A (en) * 2020-11-23 2021-03-12 国网河北省电力有限公司经济技术研究院 Method and device for locating and sizing electric vehicle charging pile and terminal equipment
CN113112097A (en) * 2021-05-12 2021-07-13 华北电力大学 Electric vehicle load prediction and charging facility layout optimization method
CN113609746A (en) * 2021-05-11 2021-11-05 四川大学 Power distribution network planning method based on Monte Carlo tree search and reinforcement learning algorithm
CN114048920A (en) * 2021-11-24 2022-02-15 国家电网有限公司大数据中心 Site selection layout method, device, equipment and storage medium for charging facility construction
CN114169624A (en) * 2021-12-13 2022-03-11 南方电网科学研究院有限责任公司 Electric vehicle charging station planning method and device
CN114297809A (en) * 2021-12-20 2022-04-08 上海电机学院 Electric vehicle charging station site selection and volume fixing method
WO2022077693A1 (en) * 2020-10-15 2022-04-21 中国科学院深圳先进技术研究院 Load prediction model training method and apparatus, storage medium, and device
CN114418300A (en) * 2021-12-16 2022-04-29 国网上海市电力公司 Multi-type electric vehicle charging facility planning method based on urban function partition and resident trip big data
WO2022160705A1 (en) * 2021-01-26 2022-08-04 中国电力科学研究院有限公司 Method and apparatus for constructing dispatching model of integrated energy system, medium, and electronic device
CN115063184A (en) * 2022-07-14 2022-09-16 兰州理工大学 Electric vehicle charging demand modeling method, system, medium, equipment and terminal
CN115344653A (en) * 2022-07-28 2022-11-15 国网湖北省电力有限公司电力科学研究院 Electric vehicle charging station site selection method based on user behaviors
US20220396172A1 (en) * 2021-06-14 2022-12-15 Volta Charging, Llc Machine learning for optimization of power distribution to electric vehicle charging ports
CN115952961A (en) * 2022-07-01 2023-04-11 国网上海市电力公司 Charging station and power distribution network configuration collaborative planning method
CN116187541A (en) * 2023-01-03 2023-05-30 国网山东省电力公司泰安供电公司 Collaborative optimization construction method for electric vehicle charging facilities and power distribution network
CN116542003A (en) * 2023-05-08 2023-08-04 合肥工业大学 New energy charging station optimizing arrangement method based on reinforcement learning
CN116797002A (en) * 2023-08-17 2023-09-22 国网天津市电力公司培训中心 Electric vehicle charging station planning method, device and storage medium
CN116843500A (en) * 2023-06-12 2023-10-03 中国南方电网有限责任公司 Charging station planning method, neural network model training method, device and equipment
CN116843115A (en) * 2023-05-18 2023-10-03 广西电网有限责任公司柳州供电局 Electric vehicle charging station site selection planning method and device

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805321A (en) * 2017-05-02 2018-11-13 南京理工大学 A kind of electric automobile charging station planing method
CN109711630A (en) * 2018-12-28 2019-05-03 郑州大学 A kind of electric car fast charge station addressing constant volume method based on trip probability matrix
CN109840635A (en) * 2019-01-29 2019-06-04 三峡大学 Electric automobile charging station planing method based on voltage stability and charging service quality
CN111104732A (en) * 2019-12-03 2020-05-05 中国人民解放军国防科技大学 Intelligent planning method for mobile communication network based on deep reinforcement learning
WO2022077693A1 (en) * 2020-10-15 2022-04-21 中国科学院深圳先进技术研究院 Load prediction model training method and apparatus, storage medium, and device
CN112330025A (en) * 2020-11-06 2021-02-05 国网冀北电力有限公司张家口供电公司 Prediction method of space-time charging load for urban electric vehicle
CN112487622A (en) * 2020-11-23 2021-03-12 国网河北省电力有限公司经济技术研究院 Method and device for locating and sizing electric vehicle charging pile and terminal equipment
WO2022160705A1 (en) * 2021-01-26 2022-08-04 中国电力科学研究院有限公司 Method and apparatus for constructing dispatching model of integrated energy system, medium, and electronic device
CN113609746A (en) * 2021-05-11 2021-11-05 四川大学 Power distribution network planning method based on Monte Carlo tree search and reinforcement learning algorithm
CN113112097A (en) * 2021-05-12 2021-07-13 华北电力大学 Electric vehicle load prediction and charging facility layout optimization method
US20220396172A1 (en) * 2021-06-14 2022-12-15 Volta Charging, Llc Machine learning for optimization of power distribution to electric vehicle charging ports
CN114048920A (en) * 2021-11-24 2022-02-15 国家电网有限公司大数据中心 Site selection layout method, device, equipment and storage medium for charging facility construction
CN114169624A (en) * 2021-12-13 2022-03-11 南方电网科学研究院有限责任公司 Electric vehicle charging station planning method and device
CN114418300A (en) * 2021-12-16 2022-04-29 国网上海市电力公司 Multi-type electric vehicle charging facility planning method based on urban function partition and resident trip big data
CN114297809A (en) * 2021-12-20 2022-04-08 上海电机学院 Electric vehicle charging station site selection and volume fixing method
CN115952961A (en) * 2022-07-01 2023-04-11 国网上海市电力公司 Charging station and power distribution network configuration collaborative planning method
CN115063184A (en) * 2022-07-14 2022-09-16 兰州理工大学 Electric vehicle charging demand modeling method, system, medium, equipment and terminal
CN115344653A (en) * 2022-07-28 2022-11-15 国网湖北省电力有限公司电力科学研究院 Electric vehicle charging station site selection method based on user behaviors
CN116187541A (en) * 2023-01-03 2023-05-30 国网山东省电力公司泰安供电公司 Collaborative optimization construction method for electric vehicle charging facilities and power distribution network
CN116542003A (en) * 2023-05-08 2023-08-04 合肥工业大学 New energy charging station optimizing arrangement method based on reinforcement learning
CN116843115A (en) * 2023-05-18 2023-10-03 广西电网有限责任公司柳州供电局 Electric vehicle charging station site selection planning method and device
CN116843500A (en) * 2023-06-12 2023-10-03 中国南方电网有限责任公司 Charging station planning method, neural network model training method, device and equipment
CN116797002A (en) * 2023-08-17 2023-09-22 国网天津市电力公司培训中心 Electric vehicle charging station planning method, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117932519A (en) * 2024-03-15 2024-04-26 广东领卓能源科技有限公司 New energy automobile charging station monitored control system

Similar Documents

Publication Publication Date Title
CN110517492B (en) Traffic path recommendation method, system and device based on parallel ensemble learning
Lu et al. A bi-criterion dynamic user equilibrium traffic assignment model and solution algorithm for evaluating dynamic road pricing strategies
CN111260118B (en) Vehicle networking traffic flow prediction method based on quantum particle swarm optimization strategy
Zheng et al. A stochastic simulation-based optimization method for equitable and efficient network-wide signal timing under uncertainties
CN113362600B (en) Traffic state estimation method and system
CN115063184A (en) Electric vehicle charging demand modeling method, system, medium, equipment and terminal
Segredo et al. Optimising real-world traffic cycle programs by using evolutionary computation
Shamsi et al. Reinforcement learning for traffic light control with emphasis on emergency vehicles
Cui et al. Dynamic pricing for fast charging stations with deep reinforcement learning
CN117669993B (en) Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium
CN112330054A (en) Dynamic traveler problem solving method, system and storage medium based on decision tree
CN117669993A (en) Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium
Helmus et al. SEVA: A data driven model of electric vehicle charging behavior
Pourvaziri et al. Planning of electric vehicle charging stations: An integrated deep learning and queueing theory approach
CN115472023A (en) Intelligent traffic light control method and device based on deep reinforcement learning
CN115239004A (en) Charging parking lot location and volume optimization method based on charging demand prediction
CN114912669A (en) Public transport passenger flow combined graph neural network prediction method based on multi-source data
CN112508220A (en) Traffic flow prediction method and device
Xu Development and test of dynamic congestion pricing model
CN116662815B (en) Training method of time prediction model and related equipment
CN116882692B (en) Alcohol-based new energy filling point setting optimization method and system based on generation formula
CN116957166B (en) Tunnel traffic condition prediction method and system based on Hongmon system
CN112215520B (en) Multi-target fusion passing method and device, computer equipment and storage medium
CN115648973B (en) Improved DDPG reinforcement learning hybrid energy management method based on local sensitive hash
CN115275999B (en) Power distribution network optimal scheduling method considering electric automobile time-varying road impedance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant