CN111461500B - Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning - Google Patents

Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning Download PDF

Info

Publication number
CN111461500B
CN111461500B CN202010172819.1A CN202010172819A CN111461500B CN 111461500 B CN111461500 B CN 111461500B CN 202010172819 A CN202010172819 A CN 202010172819A CN 111461500 B CN111461500 B CN 111461500B
Authority
CN
China
Prior art keywords
electronic fence
determining
neural network
action
dqn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010172819.1A
Other languages
Chinese (zh)
Other versions
CN111461500A (en
Inventor
冯强
贾露露
任羿
孙博
杨德真
王自力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010172819.1A priority Critical patent/CN111461500B/en
Publication of CN111461500A publication Critical patent/CN111461500A/en
Application granted granted Critical
Publication of CN111461500B publication Critical patent/CN111461500B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0645Rental transactions; Leasing transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Feedback Control In General (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention provides a tidal phenomenon control method of a shared bicycle system based on a dynamic electronic fence and reinforcement learning, which is oriented to the problem of unbalanced supply and demand of the shared bicycle system and comprises the following steps: (1) determining state information of the electronic fence group; (2) determining a scheduling action of the electronic fence; (3) determining behaviors and interactions of the agent; (4) determining a current benefit available to take action a; (5) determining a reinforcement learning environment in the electronic fence scheduling system; (6) determining an agent state transition rule based on the DQN neural network; (7) constructing a DQN neural network and carrying out forward calculation; (8) selecting each output action by utilizing a random exploration strategy; (9) training a DQN neural network model and updating parameters; (10) judging whether the training of the DQN neural network is finished; (11) and inputting the initial time and the initial state of the electronic fence group into the trained neural network to acquire the control strategy of the electronic fence.

Description

Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning
(I) technical field
The invention provides a tidal phenomenon control method of a shared bicycle system based on dynamic electronic fence and reinforcement learning, which aims at the problem of unbalanced supply and demand of the urban shared bicycle system. The method is characterized in that under the premise of limited bicycle number and dynamic electronic fences, the electronic fences are zoomed through a reinforcement learning method and pedestrians are induced to stop the bicycles to reasonable positions before the tide phenomenon comes. The method aims at improving the satisfaction degree of customers and the utilization rate of the single vehicle, establishes a Deep Q Network (DQN) -based reinforcement learning model, and performs optimization control and decision on the tide phenomenon of the shared single vehicle, so as to relieve or solve the tide phenomenon in the shared single vehicle system. The method belongs to the field of intelligent transportation.
(II) background of the invention
In order to relieve the urban traffic problem, an important means widely accepted at home and abroad is to implement a new green travel mode and construct a low-carbon and environment-friendly traffic system. In china, the appearance of a shared bicycle with a "pile-free" feature provides a brand-new solution for solving the above problems, and is an important innovation of urban public bicycles, but the development of the shared bicycle at present has some problems to be solved urgently, as follows: (1) the system does not accurately evaluate the demand of the single vehicle in the construction process, and blindly and excessively throwing the single vehicle causes a waste phenomenon, occupies excessive public resources and also causes the increase of the operation cost of enterprises; (2) in the system construction process, the asymmetric requirements of the shared bicycle system are not fully considered, so that a tidal phenomenon is generated, and particularly in the peak time period, the bicycle can be ridden without a bicycle, and the situation that no person rides the bicycle occurs occasionally.
For these problems, the system constructors and supervisors take various measures, such as reducing the influence of tide phenomenon through regular manual vehicle dispatching, and avoiding disordered parking through setting electronic fences and other ways. However, the influence of these measures on the comprehensive benefits of the system, such as the utilization rate of the bicycle, the satisfaction degree of the user and the like, is lack of effective research of the system. Aiming at the requirement, the invention is based on a dynamic electronic fence and a reinforcement learning method, aims at improving the customer satisfaction and the bicycle utilization rate, constructs a shared bicycle system scheduling model considering the customer satisfaction and the bicycle utilization rate, gives a system multi-objective optimization algorithm, and provides a new solution for solving the problem of unbalanced supply and demand of a shared bicycle system and improving the comprehensive benefit of the shared bicycle system.
Disclosure of the invention
(1) Objects of the invention
The invention provides a tide control scheme of an electronic fence-based shared bicycle system, which takes reinforcement learning as a core, and realizes automatic distribution of bicycle positions before a flow peak arrives by standardizing and inducing the parking positions of pedestrians, so that the problem of unbalanced supply and demand caused by tide phenomenon is solved. By controlling the tide phenomenon in the shared bicycle system, the utilization rate of the bicycles and the satisfaction degree of customers can be improved on the premise of the same number of bicycles and electronic fences.
(2) Technical scheme
The invention relates to a tidal control method of a shared bicycle system based on dynamic electronic fences and reinforcement learning. The method comprises the steps of firstly analyzing attributes and parameter systems of two intelligent agents of a bicycle and a pedestrian, and defining bicycle scheduling evaluation indexes (bicycle utilization rate, pedestrian satisfaction degree and relation models thereof). Then, an evaluation process of the single-vehicle utilization rate and the pedestrian satisfaction degree is determined by analyzing the type of the intelligent agent, the interaction mode and the like, and a set of simulation modeling method based on the intelligent agent is formed. Thereafter, the invention determines the goals of electronic fence scheduling and the algorithmic environment, and then analyzes the method for applying DQN to the problem of the bicycle tide control (including the complete process of describing the details of the reinforcement learning algorithm and the algorithm) and determines the overall process of electronic fence bicycle tide control based on the goals. And finally, verifying the proposed reinforcement learning algorithm and the control strategy through a simulation experiment, analyzing the bicycle utilization rate and the pedestrian satisfaction degree before and after the control strategy is implemented, and evaluating and verifying the feasibility and the effectiveness of the method.
The method comprises the following steps:
step one, determining the state information of the electronic fence group.
And step two, determining the size scaling of the electronic fence as a scheduling action.
And step three, determining the behaviors and the interaction of the intelligent agent.
Step four, determining the current benefit available to take action a.
Step five, determining a reward function Q(s)tA) to evaluate an agent in a particular state stHow good the action a is taken.
And step six, determining an intelligent agent state transfer rule based on the DQN neural network, thereby automatically updating the state of the intelligent agent in the reinforcement learning process and continuously interacting with the intelligent agent environment to form a closed loop.
And seventhly, constructing a DQN neural network and performing forward calculation. The method is divided into the following substeps:
(1) determining input information for DQN neural networks
(2) Determining output information for DQN neural networks
(3) Determining DQN neural network structure
And step eight, selecting each output action by utilizing a random exploration strategy.
And step nine, training the DQN neural network model and updating parameters.
And step ten, judging whether the training of the DQN neural network is finished.
And step eleven, inputting the initial time and the initial state of the electronic fence group into the trained neural network, and acquiring the electronic fence control strategy in the shared bicycle system.
Through the steps, the optimal control strategy of the shared bicycle system before the tide phenomenon comes can be obtained.
(IV) description of the drawings
FIG. 1 is an overall architecture of the present invention
FIG. 2 is a flow of agent behavior interaction between a pedestrian, an electronic fence, and a bicycle
FIG. 3 is a satisfaction degree calculation flow
FIG. 4 is a diagram of a neural network architecture
(V) detailed description of the preferred embodiments
The invention provides a tidal control method of a shared bicycle system based on reinforcement learning. Deep-Q-Network is taken as a typical reinforcement learning algorithm, a complex and accurate mathematical model can be avoided being established in the process of solving the intelligent optimization problem, and the size of an electronic fence in a shared bicycle system is effectively scheduled before the tide phenomenon of the shared bicycle occurs, so that the bicycle is induced to the electronic fence area with large demand, and the utilization rate of the shared bicycle and the satisfaction degree of pedestrians are improved. In order to make the technical solution, features and advantages of the present invention more clearly understood, the following detailed description is made with reference to the accompanying drawings. The overall architecture of the present invention is shown in fig. 1, and the specific implementation steps are as follows:
step one, determining the state information of the electronic fence group.
And a state s: (s)1,s2,…,siT), the states involved mainly include electronic fence group state information siI.e., each electronic fence involved in the dispatch increases (or decreases) the cumulative number of parked cars throughout the dispatch period. In addition the instant rewards generated by the state transition enforcement actions are time dependent,the state s should also include the current time t. Here we select the state s of the system as the set of the above two part states and the state s can decide whether the reinforcement learning is finished. The cluster controller obtains the state information at each discrete time point as a decision basis, and the related model calculation process and the related state can be obtained through simulation software.
And step two, determining the size scaling of the electronic fence as a scheduling action.
Behavior a: (a)1,a2,…,ai) Here, the electronic fence size scaling is taken as a scheduling action, that is, the action of enlarging (or reducing) the parking range of each electronic fence at a specific time is taken. Wherein a is1+a2+a3+,...,+ai0, due to a1Is the number of vehicles induced to a given fence per unit time period, then these vehicles must come from other fences, so a1=-(a2+a3+,...,+ai). This action selection is easy to understand and convenient to compute. The behavior set A is the current state of the system.
And step three, determining the behaviors and the interaction of the intelligent agent.
The intelligent body interaction is a rule that the intelligent body movement needs to follow, and the interaction of the pedestrian, the bicycle and the environment is mainly reflected in the travel mode of the pedestrian. In the shared bicycle system, the traveling conditions of pedestrians are mainly divided into two types, the first type is irrelevant to the dispatching process, namely, as long as the pedestrians do not accept the closest distance between the bicycle and the pedestrians, the walking mode is selected; the second category relates to the manner of scheduling, i.e., whether the pedestrian accepts the distance to the bicycle but is affected by the scheduling is in the scheduling time period, whether the pedestrian destination is a set of called electronic fences, whether the target electronic fence requirement has been met, and whether to accept the scheduling to continue selecting whether to cycle. The size of the electronic fence is scheduled in a period of time before the tide phenomenon occurs, and the interaction flow of the behaviors of the pedestrians, the electronic fence and the bicycle is shown in fig. 2.
Step four, determining the current benefit available to take action a.
Instant prizeExciter(s)tA) is in state stThe current benefits available following action a, which together determine the value of the instant prize, thus constitute an r(s)tA) a matrix. The dispatching aim is to meet the premise that the number of the single vehicles reaches the required quantity in certain areas in a specific time period, namely before the tide phenomenon comes, so that the riding rate of the single vehicles reaches the highest value as much as possible, namely the average satisfaction degree of pedestrians reaches the highest value. The average satisfaction degree of the pedestrians in the unit time period during the dispatching can be calculated through simulation and is used as r(s)tA). The calculation of the average satisfaction degree of the pedestrian mainly needs to count the average daily number of the bicycle ridden, and num _ cycling ++, num _ cycling represents the sum of the number of the bicycle ridden as long as the pedestrian agent state is the using vehicle. Fig. 3 shows several situations where a pedestrian may ride a bicycle, where the calculation of the average satisfaction of the pedestrian mainly requires counting the number of times the bicycle is used. According to the above analysis, the pedestrian rides, namely, firstly determines whether the pedestrian with riding requirements can ride to the vehicle or not, and the four possibilities are associated with each other.
Step five, determining a reward function Q(s)tA) to evaluate an agent in a particular state stHow good or bad the action a is taken
Reward function Q(s)tA) for evaluating an agent in a particular state stThe degree of goodness of the action a, i.e., the action-utility function, is taken. Q(s)tA) is an instant reward r(s) for a series of actionstA) desired value of the sum, i.e. Q(s)t,a)=E[∑γiri(st,a)]. Solving Q(s) according to a reinforcement learning algorithmtA) a scheduling scheme is obtained, since Q(s)tAnd a) instructing the agent to take the most favorable action under the condition that the average pedestrian satisfaction and the average single vehicle utilization rate reach the maximum value finally.
Assuming that three electronic fences A, B and C participate in scheduling, the initial state of the electronic fence is (0,0,0,0), and the state of the electronic fence is (+10, -5, -5) through actions (+10, -5, -5) that A induces 10 single vehicles, and B and C induce 5 single vehicles respectively, the state of the electronic fence is (+10, -5, -5,1), wherein the average pedestrian satisfaction degree at the stage is taken as the reward obtained by the action taken by the intelligent agent, and r is 0.356, and the instantaneous rewards corresponding to the action taken by the electronic fence in different states can be simulated through Anlogic.
Figure BDA0002408712970000041
And step six, determining an intelligent agent state transfer rule based on the DQN neural network, thereby automatically updating the state of the intelligent agent in the reinforcement learning process and continuously interacting with the intelligent agent environment to form a closed loop.
The state transition rule of the intelligent agent is determined in the DQN neural network, and then the state of the intelligent agent can be automatically updated in the reinforcement learning process, so that the intelligent agent continuously interacts with the intelligent environment to form a closed loop. Assume agent states as (in _ num, out _ num)1,out_num2T), where in _ num is the accumulated number of vehicles induced to be put into the electronic fence A near the teaching building from the target electronic fence at the time t, and out _ num1And out _ num2Respectively, the cumulative number of the bicycles that should be parked in the stadium electronic fences B and C within the acceptance range from the start of the schedule to time t, and a is B + C. The scheduling actions are four in number, the set is (a, b, c, d), which are respectively represented as (+10, -5, -5), (+20, -10, -10), (+30, -15, -15) and (+40, -20, -20), and the units in the set are vehicles.
And seventhly, constructing a DQN neural network and performing forward calculation.
A typical neuron consists of five parts of input, weight value and closed value, a summation unit, an excitation function and output, and a neural network structure for storing the value function is shown in FIG. 4.
(1) Determining input information for DQN neural networks
The input layer is(s)1,s2,…,sjT), t denotes time t, input layer s1Representing the cumulative number of dispatched vehicles, s, of the target electronic fence at that moment2,…,sjRepresenting the cumulative number of dispatched vehicles for each of the dispatched electronic fences
(2) Determining output information for DQN neural networks
The output layer is (a)1,a2,…,an) The dimension of the output layer is n, which represents a total of n scheduling actions, aiRepresenting the ith scheduling action e.g., (+10, -5, -5).
(3) Determining structure of DQN neural network
The depth of the neural network has two hidden layers, namely depth 2. The dimension of the input neuron is j +1, the dimension of the output neuron is n, and the action a at the moment of t +1 can be determined after the action selection of the output layer selects the action corresponding to the corresponding Q value according to epsilon-greedyk. The neural network is a complex network formed by connecting a large number of simple neurons with each other, and the summing unit performs weighted summation on input signals and then takes the summation result as the output of the neurons through the operation of an excitation function. The output of the jth neuron for the entire neuron is:
Figure BDA0002408712970000051
wherein
Figure BDA0002408712970000052
The weight that represents the k-th neuron of layer l-1 connected to the jth neuron of layer l (the input layer is layer 0, where l is 2),
Figure BDA0002408712970000053
is the input of the kth neuron of the upper layer,
Figure BDA0002408712970000054
for the bias of the jth neuron at the l-th layer, σ is a stimulus function, and considering that the linear model expression capability of a neural network is not enough, a nonlinear factor is added through the stimulus function to solve the nonlinear problem, starting from the micromability and monotonicity, typical forms of the stimulus function are tanh, sigmoid and ReLU, wherein the ReLU has the characteristics of high convergence speed, simple calculation, difficulty in saturation and the like, so currently, ReLU is used for replacing sigmoid, and the formula is shown as follows:
f(x)=max(0,x)
wherein the weight value
Figure BDA0002408712970000061
And bias
Figure BDA0002408712970000062
Are tunable, they reflect behavioral characteristics of neural networks
And step eight, selecting each output action by utilizing a random exploration strategy.
In the dispatching process of the electric fence bicycle, randomness is seen everywhere, such as that a starting point and a destination of a pedestrian in riding are uncertain, and whether the pedestrian in riding is uncertain or not, so that an epsilon-greedy behavior selection strategy is used in a DQN neural network, and the problem that an optimal strategy cannot be obtained sometimes through an algorithm based on a cost function, such as DQN, can be solved through the epsilon-greedy random exploration strategy. The epsilon-greedy action includes the time and the size of each electronic fence at that time, and a smaller epsilon value is set to prevent the algorithm from falling into a locally optimal solution so that the agent maintains a certain exploratory property to search for a globally optimal solution. After a Q value list is obtained through a neural network, selecting an action to be taken according to an epsilon-greedy behavior selection strategy: the probability of 1-epsilon ensures that the action to be taken next by the electronic fence system is determined by the maximum value of the Q value output by the value neural network, one Q value corresponds to one action, the probability of epsilon is explored, namely, one action is randomly selected, and the action is continuously selected until the next state is reached after the action is taken.
And step nine, training the DQN neural network model and updating parameters.
(1) Back propagation of neural networks
The update of the neural network parameters involves Back propagation (Back propagation): in defining a neural network, each node is randomly assigned a weight and a bias. After one iteration, the deviation of the whole network can be calculated according to the generated result, and then the deviation is combined with the gradient of the cost function to correspondingly schedule the weight factor, so that the deviation is reduced in the process of the next iteration. Such a process of scheduling the weight factors in combination with the gradient of the cost function is called back-propagation. In back propagation, the direction of propagation of the signal is backward, and the error propagates from the output layer along the hidden layer along with the gradient of the cost function, accompanied by the scheduling of the weight factors.
Loss function of DQN (loss _ function function):
L(θ)=E[(TargetQ-Q(s,a;θ))2]
where θ is the target parameter, the target is:
Figure BDA0002408712970000063
in the machine learning algorithm, firstly, a loss function of the model is determined according to a target value and a true value, then a gradient descent algorithm such as a quasi-newton method is selected to reduce the loss function step by step and update model parameters, namely after the loss function L (theta) is determined, the parameter theta is updated through gradient descent:
Figure BDA0002408712970000064
(2) construction of experience pool and parameter update between value neural network and real neural network
In addition, in order to reduce the influence of the relevance between the Q estimation value and the Q-realistic neural network training continuous sample data on the convergence of the loss function as far as possible, two neural networks with the same structure, such as input and output sizes and network depths, but different network parameters are established in the DQN, and then a delayed update technology, namely fixed Q-targets, is used, which is a mechanism for planning the correlation of the training samples, and the parameters used by the Q-realistic neural network are updated by delaying the parameters of the Q-estimated neural network by a certain number of steps, while the parameters of the Q-estimated neural network are the latest. In addition, an Experience pool (Experience replay) is introduced into the neural network. And collecting data acquired by the intelligent agent in the simulation process by using an experience pool, accumulating sample data in the experience pool to a certain degree, and randomly extracting a batch of data from the samples of the time series, wherein the size is determined by batch-size.
And step ten, judging whether the training of the DQN neural network is finished.
And setting the number of the epsilon, and judging whether one-time epsilon of the DQN neural network is finished or not according to a model exit condition during each training, namely whether the number of the dispatching bicycles meets the dispatching requirement or whether the dispatching times reaches the dispatching upper limit or not. And after the training times specified by the epicode are finished and the loss function is reduced to a certain value, determining that the training of the DQN neural network is finished.
And step eleven, inputting the initial time and the initial state of the electronic fence group into the trained neural network, and acquiring the electronic fence control strategy in the shared bicycle system.
After training of the reinforcement learning DQN neural network is completed, inputting the initial time and the initial state of the electronic fence group into the trained neural network, and automatically obtaining a series of time scheduling actions by combining the judgment condition, namely whether the single-vehicle scheduling number or the electronic fence scheduling frequency meets the requirement, so that the whole electronic fence scheduling scheme is obtained. Assuming that the scheduling actions include (a, b, c, d), respectively (+10, -5, -5), (+20, -10, -10), (+30, -15, -15), (+40, -20, -20), the procedure of scheduling a reward known DQN resulting from taking different actions given in the table below is a → d → b → b → None (end of scheduling).
Figure BDA0002408712970000071

Claims (1)

1. A tide phenomenon control method of a shared bicycle system based on dynamic electronic fence and reinforcement learning comprises the following steps:
step one, determining the state information of the electronic fence group: state information s: (s)1,s2,…,sjT) mainly includes electronic fence group status information sjI.e. each electronic fence involved in the dispatch is parked incrementally or decrementally throughout the dispatch periodThe accumulated number of the single vehicles and the current time t;
step two, determining the size scaling of the electronic fence as a scheduling action;
step three, determining the behavior and interaction of the intelligent agent;
step four, determining the current benefit available by taking action a: (a)1,a2,…,ai) I.e. the parking area of each electronic fence, enlarged or reduced at a particular moment, aiA scheduling action representing the ith electronic fence;
step five, determining a reward function Q(s)tA) to evaluate an agent in a particular state stHow good the action a is taken down, stAs current state information st=s:(s1,s2,…,sj,t);
Sixthly, determining an intelligent agent state transfer rule based on the DQN neural network, so that the state of the intelligent agent is automatically updated in the reinforcement learning process and continuously interacts with the intelligent environment to form a closed loop;
step seven, constructing a DQN neural network and carrying out forward calculation, and the method is divided into the following substeps:
(1) determining input information of the DQN neural network, the input information being state information(s)1,s2,…,sjT), t denotes time t, s1Representing the cumulative number of dispatched vehicles, s, of the target electronic fence at that moment2,…,sjRepresenting the respective accumulated number of dispatched vehicles for the dispatched electronic fences;
(2) determining output information of the DQN neural network, the output information action a, a1,a2,...,ai),aiRepresenting the scheduling action of the ith electronic fence, wherein n electronic fences are in total;
(3) determining a DQN neural network structure;
step eight, selecting each output action a by utilizing a random exploration strategy;
step nine, training a DQN neural network model and updating parameters;
step ten, judging whether the training of the DQN neural network is finished;
and step eleven, inputting the initial time and the initial state of the electronic fence group into the trained neural network, and acquiring the electronic fence control strategy in the shared bicycle system.
CN202010172819.1A 2020-03-12 2020-03-12 Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning Active CN111461500B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010172819.1A CN111461500B (en) 2020-03-12 2020-03-12 Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010172819.1A CN111461500B (en) 2020-03-12 2020-03-12 Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning

Publications (2)

Publication Number Publication Date
CN111461500A CN111461500A (en) 2020-07-28
CN111461500B true CN111461500B (en) 2022-04-05

Family

ID=71684448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010172819.1A Active CN111461500B (en) 2020-03-12 2020-03-12 Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning

Country Status (1)

Country Link
CN (1) CN111461500B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348258B (en) * 2020-11-09 2022-09-20 合肥工业大学 Shared bicycle predictive scheduling method based on deep Q network
CN113095406B (en) * 2021-04-14 2022-04-26 国能智慧科技发展(江苏)有限公司 Electronic fence effective time period management and control method based on intelligent Internet of things
CN114897656B (en) * 2022-07-15 2022-11-25 深圳市城市交通规划设计研究中心股份有限公司 Shared bicycle tidal area parking dredging method, electronic equipment and storage medium
CN115879016B (en) * 2023-02-20 2023-05-16 中南大学 Prediction method for travel tide period of shared bicycle

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3872715A1 (en) * 2015-11-12 2021-09-01 Deepmind Technologies Limited Asynchronous deep reinforcement learning
CN105491124B (en) * 2015-12-03 2018-11-02 北京航空航天大学 Mobile vehicle distribution polymerization
CN109447573A (en) * 2018-10-09 2019-03-08 中国兵器装备集团上海电控研究所 The specification parking management system and method for internet car rental
TW202020473A (en) * 2018-11-27 2020-06-01 奇異平台股份有限公司 Electronic fence and electronic fence system

Also Published As

Publication number Publication date
CN111461500A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN111461500B (en) Shared bicycle system tide phenomenon control method based on dynamic electronic fence and reinforcement learning
CN110032782B (en) City-level intelligent traffic signal control system and method
CN112216124B (en) Traffic signal control method based on deep reinforcement learning
CN111696370B (en) Traffic light control method based on heuristic deep Q network
CN112669629B (en) Real-time traffic signal control method and device based on deep reinforcement learning
CN112700664A (en) Traffic signal timing optimization method based on deep reinforcement learning
CN103280114B (en) Signal lamp intelligent control method based on BP-PSO fuzzy neural network
CN110794842A (en) Reinforced learning path planning algorithm based on potential field
CN112364984A (en) Cooperative multi-agent reinforcement learning method
CN112365724A (en) Continuous intersection signal cooperative control method based on deep reinforcement learning
Lin et al. Traffic signal optimization based on fuzzy control and differential evolution algorithm
CN109558985A (en) A kind of bus passenger flow amount prediction technique based on BP neural network
CN114758497B (en) Adaptive parking lot variable entrance and exit control method, device and storage medium
CN111985619B (en) Urban single intersection control method based on short-time traffic flow prediction
CN112950251A (en) Reputation-based vehicle crowd sensing node reverse combination auction excitation optimization method
CN107087161A (en) The Forecasting Methodology of user experience quality based on multilayer neural network in video traffic
CN106781465A (en) A kind of road traffic Forecasting Methodology
CN109544913A (en) A kind of traffic lights dynamic timing algorithm based on depth Q e-learning
CN106781464A (en) A kind of congestion in road situation method of testing
Ahmad et al. Applications of evolutionary game theory in urban road transport network: A state of the art review
CN112950963A (en) Self-adaptive signal control optimization method for main branch intersection of city
CN113724507B (en) Traffic control and vehicle guidance cooperative method and system based on deep reinforcement learning
CN114572229A (en) Vehicle speed prediction method, device, medium and equipment based on graph neural network
Chentoufi et al. A hybrid particle swarm optimization and tabu search algorithm for adaptive traffic signal timing optimization
CN116071939B (en) Traffic signal control model building method and control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant