CN116485196B - Service area open control decision method and system based on reinforcement learning - Google Patents
Service area open control decision method and system based on reinforcement learning Download PDFInfo
- Publication number
- CN116485196B CN116485196B CN202310380218.3A CN202310380218A CN116485196B CN 116485196 B CN116485196 B CN 116485196B CN 202310380218 A CN202310380218 A CN 202310380218A CN 116485196 B CN116485196 B CN 116485196B
- Authority
- CN
- China
- Prior art keywords
- service area
- closing
- moment
- neural network
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 96
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000003062 neural network model Methods 0.000 claims abstract description 93
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 37
- 239000000463 material Substances 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 18
- 238000004458 analytical method Methods 0.000 claims description 17
- 210000002569 neuron Anatomy 0.000 claims description 6
- 239000002828 fuel tank Substances 0.000 claims description 5
- 238000007405 data analysis Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 abstract description 23
- 238000013473 artificial intelligence Methods 0.000 abstract description 9
- 230000035484 reaction time Effects 0.000 abstract description 2
- 230000000306 recurrent effect Effects 0.000 description 12
- 239000003795 chemical substances by application Substances 0.000 description 8
- 230000009471 action Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000002457 bidirectional effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000004308 accommodation Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012854 evaluation process Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000000284 resting effect Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Educational Administration (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Development Economics (AREA)
- Evolutionary Computation (AREA)
- Game Theory and Decision Science (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a service area open control decision method and a service area open control decision system based on reinforcement learning, and belongs to the technical field of artificial intelligence. The method comprises the following steps: establishing a reinforced-learning cyclic neural network model, wherein an output layer of the model is used for intelligently predicting the respective existence quantity of various vehicles existing around a service area at the next moment; and determining a closing and opening strategy of the service area in a preset time period after the next moment and a closing and opening strategy of a gas station in the service area based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment. According to the invention, the circulating neural network can be applied to the decision of the open control of the service area of the expressway, and an artificial intelligent recognition mechanism which is suitable for different service areas and can determine the information of the passing vehicles at the next moment based on the historical data is constructed through reinforcement learning and structure customization, so that enough reaction time is provided for the decision of a service area manager.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a service area open control decision method and system based on reinforcement learning.
Background
Artificial intelligence is a branch of the computer science and has been called one of the three-pronged world technologies (space technology, energy technology, artificial intelligence) since the seventies of the twentieth century. Is also considered to be one of the three-tip technologies of the twenty-first century (genetic engineering, nanoscience, artificial intelligence). This is because it has been rapidly developed over the last three decades, has been widely used in many disciplines and has achieved great success, and artificial intelligence has evolved into a single branch, both theoretically and practically self-contained.
Neural networks are an important branch of artificial intelligence, artificial neural networks (Artificial Neural Networks, abbreviated as ANNs), also abbreviated as Neural Networks (NNs) or Connection models (Connection models), which are algorithmic mathematical models that mimic the behavioral characteristics of animal neural networks and perform distributed parallel information processing. The network relies on the complexity of the system and achieves the purpose of processing information by adjusting the relationship of the interconnection among a large number of nodes.
The recurrent neural network (Recurrent Neural Network, RNN) is a type of recurrent neural network (recursive neural network) that takes sequence data as input, performs recursion (recovery) in the evolution direction of the sequence, and all nodes (circulation units) are connected in a chained manner, wherein a bidirectional recurrent neural network (Bidirectional RNN, bi-RNN) and a Long Short-term memory network (Long Short-Term Memory networks, LSTM) are common recurrent neural networks.
The cyclic neural network can be used for various artificial intelligence type fuzzy processing and can reach a certain precision. Nevertheless, in many application fields, solutions of the recurrent neural network still have a blank, so that data of an application scene cannot be organically fused with solutions of the recurrent neural network, and even if the fusion is performed, due to lack of targeted research, accuracy of artificial intelligence identification is not high. For example, in the service area of the expressway in some wilderness, operators are always entangled in the contradiction that the cost is too high when the operators are open and the normal service requirements cannot be met when the operators are closed, because the types and the number of the passing vehicles at each moment cannot be predicted, and the recurrent neural network is not applied to the scene.
Disclosure of Invention
In order to solve the problems, the invention provides a service area open control decision method and a service area open control decision device based on reinforcement learning, which can apply a circulating neural network to decision making of open control of a service area of a highway, and form an artificial intelligent recognition mode capable of determining the information of a passing vehicle at the next moment based on historical data through reinforcement learning and structure customization.
For this reason, the present invention needs to have at least the following four key inventions:
(1) The service area history data is used for carrying out artificial intelligent prediction on the types and the quantity of the past vehicles nearby the service area at the next moment of the prediction moment, and determining the switching strategy of the service area and the switching strategy of the gas station in the service area within a preset duration after the next moment based on a prediction result;
(2) Selecting a cyclic neural network model for realizing artificial intelligent prediction of past vehicle information near a service area at the next moment, and performing targeted reinforcement learning on the cyclic neural network model, wherein the farther the distance from the service area to the nearest city is, the longer the time interval between two adjacent moments in each moment uniformly spaced in historical data is, so that flexible customization of models of different service areas is completed;
(3) Determining a closing and opening strategy of a service area in a preset time period after the next moment by adopting a weighted calculation mode, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is;
(4) And determining a closing and opening strategy of a gas station in a service area within a preset time period after the next moment by adopting a weighted calculation mode, wherein the larger the volume of a fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.
According to a first aspect of the present invention, there is provided a service area open control decision method based on reinforcement learning, the method comprising:
establishing a circulating neural network model, wherein each neuron in a hidden layer of the circulating neural network model receives input data with the same set time delay, output data of an output layer of the circulating neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the circulating neural network model is the past vehicle information existing around the preset number and even interval of each moment service area before the judging moment;
taking past vehicle information existing around a service area at each time preset before a certain judgment time in history and uniformly spaced as one piece of learning data of the circulating neural network model, taking each reciprocal of each existing number corresponding to each vehicle existing around the service area at a certain judgment time in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining a circulating neural network model after reinforcement learning;
Taking the past vehicle information existing around the service area at each time of a preset quantity and even interval before the next time as each input data of an input layer of the circulation neural network model after reinforcement learning, and operating the circulation neural network model after reinforcement learning to obtain the output data of an output layer thereof, namely, each existing quantity respectively corresponding to each vehicle existing around the service area at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the even interval;
determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
determining a closing and opening strategy of a gas station in a service area within a preset duration after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
Wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
the distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
According to a second aspect of the present invention, there is provided a decision system for reinforcement learning based service area opening control, the system comprising:
the first modeling device is used for building a cyclic neural network model, each neuron in a hidden layer of the cyclic neural network model receives input data with the same set time delay, output data of an output layer of the cyclic neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the cyclic neural network model is the past vehicle information existing around a preset number and even interval of each moment service area before the judging moment;
The second modeling device is connected with the first modeling device and is used for taking the past vehicle information existing around the service area at each moment which is preset before a certain judging moment in history and is uniformly spaced as one piece of learning data of the circulating neural network model, taking the respective reciprocal of each existing quantity corresponding to each vehicle existing around the service area at the certain judging moment in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining the circulating neural network model after reinforcement learning;
the data analysis device is connected with the second modeling device and is used for taking the past vehicle information existing around the service areas at all times which are preset in number and uniformly spaced before the next time as all input data of an input layer of the reinforced-learning cyclic neural network model, and operating the reinforced-learning cyclic neural network model to obtain output data of an output layer of the reinforced-learning cyclic neural network model, namely, the respective existing number of various vehicles existing around the service areas at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the uniform spacing;
The first judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the second judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
The distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
Drawings
Embodiments of the present invention will be described below with reference to the accompanying drawings, in which:
FIG. 1 is a technical flow diagram of a reinforcement learning-based service area open control decision method and system in accordance with the present invention.
Fig. 2 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 4 of the present invention.
Fig. 3 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 5 of the present invention.
Fig. 4 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 6 of the present invention.
Detailed Description
The recurrent neural network has memory, parameter sharing and complete graphics (Turing completeness), so that the recurrent neural network has certain advantages in learning the nonlinear characteristics of the sequence. The recurrent neural network has application in the fields of natural language processing (Natural Language Processing, NLP), such as speech recognition, language modeling, machine translation, etc., and is also used for various time series predictions. A recurrent neural network constructed with the introduction of convolutional neural networks (Convolutional Neural Network, CNN) can address computer vision problems involving sequence inputs.
Reinforcement learning (Reinforcement Learning, RL), also known as re-excitation learning, evaluation learning, or reinforcement learning, is one of the paradigm and methodology of machine learning to describe and address agents in their interaction with an environment. A common model for reinforcement learning is a standard markov decision process (Markov Decision Process, MDP). Reinforcement learning can be classified into model-based reinforcement learning (model-free RL) and model-free RL, and active reinforcement learning (active RL) and passive reinforcement learning (passive RL) according to given conditions. Variants of reinforcement learning include reverse reinforcement learning, hierarchical reinforcement learning, and reinforcement learning of partially observable systems. Algorithms used to solve reinforcement learning problems can be categorized into two types, a policy search algorithm and a value function (value function) algorithm. The deep learning model may be used in reinforcement learning to form deep reinforcement learning. The questions that maximize returns or achieve specific goals are learned by learning strategies.
Currently, different strategies are required to be formulated for the service areas of the expressway and the related gas stations in terms of whether the service areas are opened or not and the selection of opening time, because the service areas are located in different areas, the flow and the type of the passing vehicles are different, and meanwhile, even in the same area, the flow and the type of the passing vehicles have certain randomness and contingency, so that a service area operator is difficult to determine the opening strategies of the service areas and the related gas stations.
In order to overcome the defects, the invention discloses a service area opening control decision method and device based on reinforcement learning, which finish intelligent analysis of the flow and the type of the past vehicle at any moment based on service area historical data by introducing a circulation neural network model with a customized structure after reinforcement learning, so as to adaptively determine the opening strategy and the specific opening moment of a corresponding service area and an associated gas station, thereby achieving dynamic balance between reducing cost and meeting requirements.
As shown in fig. 1, a technical flowchart of a service area open control decision method and system based on reinforcement learning according to the present invention is presented.
As shown in fig. 1, the service area open control decision method based on reinforcement learning of the present invention includes:
firstly, collecting the information of passing vehicles near a service area at each time which is uniformly spaced before a judging time, wherein the information comprises the number of each type of vehicles, taking the collected data as input data of a circulating neural network model, taking the information of the passing vehicles near the service area at the judging time as output data of the circulating neural network model, and constructing a circulating neural network model with a customized structure;
Secondly, carrying out targeted reinforcement learning on the constructed circulating neural network model with the customized structure so as to ensure the recognition precision of the circulating neural network model after reinforcement learning;
thirdly, taking the information of the past vehicles near the service area at each time uniformly spaced before the next time as input data of the circulation neural network model after reinforcement learning so as to use the circulation neural network model after reinforcement learning, and acquiring key information of a switching strategy of the service area and a switching strategy of a gas station in the service area within a preset time period after the next time, namely the number of each type of vehicles in the past vehicles near the service area at the next time;
and finally, executing the determined switching strategy of the service area in the preset time length after the next moment and the switching strategy of the gas station in the service area by using the key information, wherein the switching strategy comprises whether the service area is opened or not, and the quantity of human resources or the quantity of material resources to be allocated.
The method has the key points that the circulation neural network model with the customized structure after reinforcement learning is used for specific selection of the expressway service area switching strategy and resource allocation, and the number of each type of vehicles in the past vehicles near the service area at the next moment is predicted based on historical data, so that first hand data is obtained in advance, and reaction time is provided for opening or closing of the service area and the gas station and quantitative allocation of manpower and material resources of the service area.
The service area open control decision method and device based on reinforcement learning of the present invention will be specifically described by way of example.
Example 1
The service area open control decision method based on reinforcement learning provided by the embodiment 1 of the invention comprises the following steps:
establishing a circulating neural network model, wherein each neuron in a hidden layer of the circulating neural network model receives input data with the same set time delay, output data of an output layer of the circulating neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the circulating neural network model is the past vehicle information existing around the preset number and even interval of each moment service area before the judging moment;
taking past vehicle information existing around a service area at each time preset before a certain judgment time in history and uniformly spaced as one piece of learning data of the circulating neural network model, taking each reciprocal of each existing number corresponding to each vehicle existing around the service area at a certain judgment time in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining a circulating neural network model after reinforcement learning;
Taking the past vehicle information existing around the service area at each time of a preset quantity and even interval before the next time as each input data of an input layer of the circulation neural network model after reinforcement learning, and operating the circulation neural network model after reinforcement learning to obtain the output data of an output layer thereof, namely, each existing quantity respectively corresponding to each vehicle existing around the service area at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the even interval;
determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
determining a closing and opening strategy of a gas station in a service area within a preset duration after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
Wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
the distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
Example 2
Compared to embodiment 1 of the present invention, the method for determining a service area open control based on reinforcement learning according to embodiment 2 of the present invention further includes:
analyzing the amount of human resources which are required to be dispatched by the service area in the preset time period after the next moment based on the determined closing and opening strategy of the service area in the preset time period after the next moment and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next moment;
the analyzing the amount of human resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing the amount of human resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more.
Example 3
Compared to embodiment 1 of the present invention, the method for determining a service area open control based on reinforcement learning according to embodiment 3 of the present invention further includes:
analyzing the quantity of material resources to be dispatched by the service area in the preset time after the next moment based on the determined closing and opening strategy of the service area in the preset time after the next moment and the determined closing and opening strategy of the gas station in the service area in the preset time after the next moment;
the analyzing the quantity of the material resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the quantity of material resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing the quantity of the material resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, the quantity of material resources which should be sent by the service area in the preset time period after the next time of analysis is larger than that of the closing and opening strategy.
In any of the above embodiments 1-3, optionally, in the reinforcement learning-based service area opening control decision method:
the closing and opening strategy of the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment, and the closing and opening strategy comprises the following steps: and executing weighting calculation on the respective existence quantity corresponding to each vehicle existing around the service area at the next moment, and determining a closing and opening strategy of the service area in a preset duration after the next moment based on a weighting calculation result, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is.
In any of the above embodiments 1-3, optionally, in the reinforcement learning-based service area opening control decision method:
the closing and opening strategy of the gas station in the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment comprises the following steps: and executing weighted calculation on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment, and determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the weighted calculation result, wherein the larger the volume of the fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.
Example 4
Fig. 2 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 4 of the present invention.
As shown in fig. 2, the decision system for reinforcement learning-based service area opening control includes the following components:
the first modeling device is used for building a cyclic neural network model, each neuron in a hidden layer of the cyclic neural network model receives input data with the same set time delay, output data of an output layer of the cyclic neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the cyclic neural network model is the past vehicle information existing around a preset number and even interval of each moment service area before the judging moment;
The second modeling device is connected with the first modeling device and is used for taking the past vehicle information existing around the service area at each moment which is preset before a certain judging moment in history and is uniformly spaced as one piece of learning data of the circulating neural network model, taking the respective reciprocal of each existing quantity corresponding to each vehicle existing around the service area at the certain judging moment in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining the circulating neural network model after reinforcement learning;
the data analysis device is connected with the second modeling device and is used for taking the past vehicle information existing around the service areas at all times which are preset in number and uniformly spaced before the next time as all input data of an input layer of the reinforced-learning cyclic neural network model, and operating the reinforced-learning cyclic neural network model to obtain output data of an output layer of the reinforced-learning cyclic neural network model, namely, the respective existing number of various vehicles existing around the service areas at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the uniform spacing;
The first judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the second judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
The distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
Example 5
Fig. 3 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 5 of the present invention.
As shown in fig. 3, compared to embodiment 4 of the present invention, the decision system for service area open control based on reinforcement learning further includes:
the first allocation device is respectively connected with the first judgment device and the second judgment device and is used for analyzing the quantity of human resources which should be dispatched by the service area in the preset time period after the next moment based on the closing and opening strategy of the service area in the preset time period after the next moment determined by the first judgment device and the closing and opening strategy of the gas station in the service area in the preset time period after the next moment determined by the second judgment device;
the analyzing, based on the closing and opening policy of the service area in the preset duration after the next time determined by the first judging device and the closing and opening policy of the gas station in the service area in the preset duration after the next time determined by the second judging device, the number of human resources to be dispatched by the service area in the preset duration after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing, based on the closing and opening policy of the service area in the preset duration after the next time determined by the first judging device and the closing and opening policy of the gas station in the service area in the preset duration after the next time determined by the second judging device, the number of human resources to be dispatched by the service area in the preset duration after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more.
Example 6
Fig. 4 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 6 of the present invention.
As shown in fig. 4, compared to embodiment 4 of the present invention, the decision system for service area open control based on reinforcement learning further includes:
the second allocation device is respectively connected with the first judgment device and the second judgment device and is used for analyzing the quantity of material resources to be dispatched by the service area in the preset time period after the next moment based on the closing and opening strategy of the service area in the preset time period after the next moment determined by the first judgment device and the closing and opening strategy of the gas station in the service area in the preset time period after the next moment determined by the second judgment device;
The analyzing the quantity of the material resources to be dispatched by the service area in the preset time period after the next time based on the closing and opening strategy of the service area in the preset time period after the next time determined by the first judging device and the closing and opening strategy of the gas station in the service area in the preset time period after the next time determined by the second judging device includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the quantity of material resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
the analyzing the quantity of the material resources to be dispatched by the service area in the preset time period after the next time based on the closing and opening strategy of the service area in the preset time period after the next time determined by the first judging device and the closing and opening strategy of the gas station in the service area in the preset time period after the next time determined by the second judging device includes: when the closing and opening strategy of the gas station in the service area is determined to be open, the quantity of material resources which should be sent by the service area in the preset time period after the next time of analysis is larger than that of the closing and opening strategy.
In any of embodiments 4-6 above, optionally, in the reinforcement learning based service area opening control decision system:
The closing and opening strategy of the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment, and the closing and opening strategy comprises the following steps: and executing weighting calculation on the respective existence quantity corresponding to each vehicle existing around the service area at the next moment, and determining a closing and opening strategy of the service area in a preset duration after the next moment based on a weighting calculation result, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is.
In any of embodiments 4-6 above, optionally, in the reinforcement learning based service area opening control decision system:
the closing and opening strategy of the gas station in the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment comprises the following steps: and executing weighted calculation on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment, and determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the weighted calculation result, wherein the larger the volume of the fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.
In addition, the expressway service area is a place specially used for passengers and drivers to stay at rest, and facilities such as a parking lot, a public toilet, a gas station, a vehicle repair station, a catering department, a canteen and the like are provided, wherein the average distance is about 50 kilometers. The service flow entering the service area is divided into a people flow and a traffic flow. The flow has no and no stops. The people flow is divided into different types such as waiting for a car, resting, toilet, shopping, dining, accommodation, using electronic equipment and the like.
The construction scale of the service area generally needs to accommodate future increases in traffic volume. Common highway service areas include a single-sided service area and a double-sided service area.
The single-sided service area is also called a central centralized service area. The layout principle is that a service area is arranged on one side of a road, and various functional services are concentrated in one service area. Vehicles traveling in reverse lanes enter the service area through overpasses or tunnels. Single-sided service areas are less common because they have some usage defects than double-sided service areas. There are two common forms of single-sided service areas: the utility model provides a large-scale centralized service area, external service facilities set up in highway one side, and the facility of refueling is located the highway both sides respectively. There are many such forms of service areas in europe that focus on shopping malls, entertainment facilities, accommodation, dining, etc. The other small single-side service area is mainly due to the topography, has the existing value in mountain areas and areas where enough construction space cannot be provided, and particularly is determined to be the planning construction of the small service area and the parking area, and the single-side service area can play a good role to a certain extent. The northwest region has complex terrain, and a single-sided service area is a necessary choice in some special geographic environments. The floor area is small, the use characteristics of the bidirectional running vehicle are met, and the bidirectional running vehicle can play a unique role in the special environments. In recent years, with the development of highways, some landscape service areas where scenes are parked appear.
The layout of the two-side service areas is most common, and the layout principle is that the service areas are arranged on two sides of the road, and service facilities and functional areas on two sides are the same. The expressway is a totally-enclosed bidirectional lane, and the road with the isolation belt arranged in the middle is provided with the double-side service areas, so that vehicles in different directions can be respectively used for entering and exiting; meanwhile, the service areas at two sides are communicated through the overpass or the tunnel, so that the use capacity and material allocation of the vehicle are optimized. The double-side service area is used in a good geographic environment, so that the double-side service area is more reasonable, the requirements of convenience in use, rapidness and high efficiency are met, and the commercial effect is maximized. The visual double-side service areas can be designed into the same layout form or different forms, and the natural landscape environment borrows and corresponds to the landscape form.
The expressway service area is used as an expressway industry to develop matched service facilities, and has important significance for rapid development of expressways and increase of mileage planning. The effective operation and high-quality service of the service area can better realize the social service value of the expressway, can increase the economic benefit of the expressway investment company, can provide employment opportunities and solve the shunting of surplus personnel. In addition, the opportunity of developing the service area can be utilized to obtain rare land resources and the like at low land cost.
And reinforcement learning is to consider learning as a heuristic evaluation process, wherein an Agent selects an action for an environment, the state of the environment changes after receiving the action, and a reinforcement signal (rewarding or punishment) is generated and fed back to the Agent, and the Agent reselects the next action according to the reinforcement signal and the current state of the environment, wherein the selection principle is to increase the probability of being subjected to positive reinforcement (rewarding). The action selected affects not only the immediate enhancement value, but also the state at the moment in the environment and the final enhancement value.
Reinforcement learning differs from supervised learning in connection with sense learning in that the reinforcement signal provided by the environment is an Agent's assessment of how well the generated action is (typically a scalar signal) rather than telling the Agent how to generate the correct action. Since the external environment provides little information, the Agent must learn from its own experiences. In this way, the Agent obtains knowledge in the environment of the action-by-action evaluation, and the action scheme is modified to adapt to the environment.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.
Claims (10)
1. A service area open control decision method based on reinforcement learning, the method comprising:
establishing a circulating neural network model, wherein each neuron in a hidden layer of the circulating neural network model receives input data with the same set time delay, output data of an output layer of the circulating neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the circulating neural network model is the past vehicle information existing around the preset number and even interval of each moment service area before the judging moment;
taking past vehicle information existing around a service area at each time preset before a certain judgment time in history and uniformly spaced as one piece of learning data of the circulating neural network model, taking each reciprocal of each existing number corresponding to each vehicle existing around the service area at a certain judgment time in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining a circulating neural network model after reinforcement learning;
Taking the past vehicle information existing around the service area at each time of a preset quantity and even interval before the next time as each input data of an input layer of the circulation neural network model after reinforcement learning, and operating the circulation neural network model after reinforcement learning to obtain the output data of an output layer thereof, namely, each existing quantity respectively corresponding to each vehicle existing around the service area at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the even interval;
determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
determining a closing and opening strategy of a gas station in a service area within a preset duration after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
Wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
the distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
2. The reinforcement learning based service area open control decision method of claim 1, wherein the method further comprises:
analyzing the amount of human resources which are required to be dispatched by the service area in the preset time period after the next moment based on the determined closing and opening strategy of the service area in the preset time period after the next moment and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next moment;
the analyzing the amount of human resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing the amount of human resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more.
3. The reinforcement learning based service area open control decision method of claim 1, wherein the method further comprises:
analyzing the quantity of material resources to be dispatched by the service area in the preset time after the next moment based on the determined closing and opening strategy of the service area in the preset time after the next moment and the determined closing and opening strategy of the gas station in the service area in the preset time after the next moment;
the analyzing the quantity of the material resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the quantity of material resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing the quantity of the material resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, the quantity of material resources which should be sent by the service area in the preset time period after the next time of analysis is larger than that of the closing and opening strategy.
4. A reinforcement learning based service area open control decision method as claimed in any one of claims 1-3, wherein:
the closing and opening strategy of the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment, and the closing and opening strategy comprises the following steps: and executing weighting calculation on the respective existence quantity corresponding to each vehicle existing around the service area at the next moment, and determining a closing and opening strategy of the service area in a preset duration after the next moment based on a weighting calculation result, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is.
5. A reinforcement learning based service area open control decision method as claimed in any one of claims 1-3, wherein:
the closing and opening strategy of the gas station in the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment comprises the following steps: and executing weighted calculation on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment, and determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the weighted calculation result, wherein the larger the volume of the fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.
6. A reinforcement learning based service area opening control decision making system, the system comprising:
the first modeling device is used for building a cyclic neural network model, each neuron in a hidden layer of the cyclic neural network model receives input data with the same set time delay, output data of an output layer of the cyclic neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the cyclic neural network model is the past vehicle information existing around a preset number and even interval of each moment service area before the judging moment;
The second modeling device is connected with the first modeling device and is used for taking the past vehicle information existing around the service area at each moment which is preset before a certain judging moment in history and is uniformly spaced as one piece of learning data of the circulating neural network model, taking the respective reciprocal of each existing quantity corresponding to each vehicle existing around the service area at the certain judging moment in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining the circulating neural network model after reinforcement learning;
the data analysis device is connected with the second modeling device and is used for taking the past vehicle information existing around the service areas at all times which are preset in number and uniformly spaced before the next time as all input data of an input layer of the reinforced-learning cyclic neural network model, and operating the reinforced-learning cyclic neural network model to obtain output data of an output layer of the reinforced-learning cyclic neural network model, namely, the respective existing number of various vehicles existing around the service areas at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the uniform spacing;
The first judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the second judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
The distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
7. The reinforcement-learning-based service area open control decision system of claim 6, wherein said system further comprises:
the first allocation device is respectively connected with the first judgment device and the second judgment device and is used for analyzing the quantity of human resources which should be dispatched by the service area in the preset time period after the next moment based on the closing and opening strategy of the service area in the preset time period after the next moment determined by the first judgment device and the closing and opening strategy of the gas station in the service area in the preset time period after the next moment determined by the second judgment device;
the analyzing, based on the closing and opening policy of the service area in the preset duration after the next time determined by the first judging device and the closing and opening policy of the gas station in the service area in the preset duration after the next time determined by the second judging device, the number of human resources to be dispatched by the service area in the preset duration after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing, based on the closing and opening policy of the service area in the preset duration after the next time determined by the first judging device and the closing and opening policy of the gas station in the service area in the preset duration after the next time determined by the second judging device, the number of human resources to be dispatched by the service area in the preset duration after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more.
8. The reinforcement-learning-based service area open control decision system of claim 6, wherein said system further comprises:
the second allocation device is respectively connected with the first judgment device and the second judgment device and is used for analyzing the quantity of material resources to be dispatched by the service area in the preset time period after the next moment based on the closing and opening strategy of the service area in the preset time period after the next moment determined by the first judgment device and the closing and opening strategy of the gas station in the service area in the preset time period after the next moment determined by the second judgment device;
The analyzing the quantity of the material resources to be dispatched by the service area in the preset time period after the next time based on the closing and opening strategy of the service area in the preset time period after the next time determined by the first judging device and the closing and opening strategy of the gas station in the service area in the preset time period after the next time determined by the second judging device includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the quantity of material resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
the analyzing the quantity of the material resources to be dispatched by the service area in the preset time period after the next time based on the closing and opening strategy of the service area in the preset time period after the next time determined by the first judging device and the closing and opening strategy of the gas station in the service area in the preset time period after the next time determined by the second judging device includes: when the closing and opening strategy of the gas station in the service area is determined to be open, the quantity of material resources which should be sent by the service area in the preset time period after the next time of analysis is larger than that of the closing and opening strategy.
9. A reinforcement learning based service area open control decision system as claimed in any one of claims 6 to 8, wherein:
The closing and opening strategy of the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment, and the closing and opening strategy comprises the following steps: and executing weighting calculation on the respective existence quantity corresponding to each vehicle existing around the service area at the next moment, and determining a closing and opening strategy of the service area in a preset duration after the next moment based on a weighting calculation result, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is.
10. A reinforcement learning based service area open control decision system as claimed in any one of claims 6 to 8, wherein:
the closing and opening strategy of the gas station in the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment comprises the following steps: and executing weighted calculation on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment, and determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the weighted calculation result, wherein the larger the volume of the fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310380218.3A CN116485196B (en) | 2023-04-11 | 2023-04-11 | Service area open control decision method and system based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310380218.3A CN116485196B (en) | 2023-04-11 | 2023-04-11 | Service area open control decision method and system based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116485196A CN116485196A (en) | 2023-07-25 |
CN116485196B true CN116485196B (en) | 2023-11-14 |
Family
ID=87224450
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310380218.3A Active CN116485196B (en) | 2023-04-11 | 2023-04-11 | Service area open control decision method and system based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116485196B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002083391A (en) * | 2000-09-07 | 2002-03-22 | Matsushita Electric Ind Co Ltd | System of coping with congestion in service area and method of coping with congestion |
CN105225485A (en) * | 2015-10-09 | 2016-01-06 | 山东高速信息工程有限公司 | The monitoring method of a kind of Expressway Service service capacity, system and device |
WO2021042296A1 (en) * | 2019-09-04 | 2021-03-11 | 北京图森智途科技有限公司 | Method and system for solving requirement of hub service area |
CN112766751A (en) * | 2021-01-25 | 2021-05-07 | 云南交投集团经营开发有限公司 | Intelligent management method and system for high-speed service area |
CN113344254A (en) * | 2021-05-20 | 2021-09-03 | 山西省交通新技术发展有限公司 | Method for predicting traffic flow of expressway service area based on LSTM-LightGBM-KNN |
CN113362598A (en) * | 2021-06-04 | 2021-09-07 | 重庆高速公路路网管理有限公司 | Traffic flow prediction method for expressway service area |
CN113963544A (en) * | 2021-11-05 | 2022-01-21 | 贵州省通信产业服务有限公司 | Service area traffic flow prediction system |
CN114333333A (en) * | 2022-03-10 | 2022-04-12 | 四川高速公路建设开发集团有限公司 | Tidal type highway intelligent service area based on traffic flow prediction |
CN114418161A (en) * | 2021-11-24 | 2022-04-29 | 广东省城乡规划设计研究院有限责任公司 | Intelligent networking method and device for highway service area, electronic equipment and storage medium |
CN115497299A (en) * | 2022-11-14 | 2022-12-20 | 中科聚信信息技术(北京)有限公司 | ETC-based service area traffic flow prediction method and system and service area |
-
2023
- 2023-04-11 CN CN202310380218.3A patent/CN116485196B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002083391A (en) * | 2000-09-07 | 2002-03-22 | Matsushita Electric Ind Co Ltd | System of coping with congestion in service area and method of coping with congestion |
CN105225485A (en) * | 2015-10-09 | 2016-01-06 | 山东高速信息工程有限公司 | The monitoring method of a kind of Expressway Service service capacity, system and device |
WO2021042296A1 (en) * | 2019-09-04 | 2021-03-11 | 北京图森智途科技有限公司 | Method and system for solving requirement of hub service area |
CN112766751A (en) * | 2021-01-25 | 2021-05-07 | 云南交投集团经营开发有限公司 | Intelligent management method and system for high-speed service area |
CN113344254A (en) * | 2021-05-20 | 2021-09-03 | 山西省交通新技术发展有限公司 | Method for predicting traffic flow of expressway service area based on LSTM-LightGBM-KNN |
CN113362598A (en) * | 2021-06-04 | 2021-09-07 | 重庆高速公路路网管理有限公司 | Traffic flow prediction method for expressway service area |
CN113963544A (en) * | 2021-11-05 | 2022-01-21 | 贵州省通信产业服务有限公司 | Service area traffic flow prediction system |
CN114418161A (en) * | 2021-11-24 | 2022-04-29 | 广东省城乡规划设计研究院有限责任公司 | Intelligent networking method and device for highway service area, electronic equipment and storage medium |
CN114333333A (en) * | 2022-03-10 | 2022-04-12 | 四川高速公路建设开发集团有限公司 | Tidal type highway intelligent service area based on traffic flow prediction |
CN115497299A (en) * | 2022-11-14 | 2022-12-20 | 中科聚信信息技术(北京)有限公司 | ETC-based service area traffic flow prediction method and system and service area |
Non-Patent Citations (1)
Title |
---|
基于智慧建筑理念的高速公路服务区设计研究;周诗钦;;居舍(23);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116485196A (en) | 2023-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jie et al. | A hybrid algorithm for time-dependent vehicle routing problem with soft time windows and stochastic factors | |
An et al. | Optimal scheduling of electric vehicle charging operations considering real-time traffic condition and travel distance | |
Kim et al. | Idle vehicle relocation strategy through deep learning for shared autonomous electric vehicle system optimization | |
Stopher et al. | Modelling Travel Demand: A Disaggregate Behavioral Approach Issues and Applications | |
Li et al. | Towards smart transportation system: A case study on the rebalancing problem of bike sharing system based on reinforcement learning | |
Cai et al. | A hybrid adaptive large neighborhood search and tabu search algorithm for the electric vehicle relocation problem | |
Kadri et al. | An integrated Petri net and GA-based approach for performance optimisation of bicycle sharing systems | |
Liu et al. | Electric transit network design by an improved artificial fish-swarm algorithm | |
Hou et al. | The effect of the dataset on evaluating urban traffic prediction | |
Embarak | Smart Cities New Paradigm Applications and Challenges | |
Kamel et al. | A modelling platform for optimizing time-dependent transit fares in large-scale multimodal networks | |
Sierpiński et al. | Platform to support the implementation of electromobility in smart cities based on ICT applications-concept for an electric travelling project. | |
Li et al. | A new fuzzy-based method for energy-aware resource allocation in vehicular cloud computing using a nature-inspired algorithm | |
Parezanović et al. | Evaluation of sustainable mobility measures using fuzzy COPRAS method | |
Zhang et al. | A public transport network design using a hidden Markov model and an optimization algorithm | |
CN116485196B (en) | Service area open control decision method and system based on reinforcement learning | |
Hachette et al. | Mobility Hubs, an Innovative Concept for Sustainable Urban Mobility? State of the Art and Guidelines from European Experiences | |
Kedia et al. | Transit shift response analysis through fuzzy rule based-choice model: a case study of Indian metropolitan city | |
Yu et al. | Optimization of urban bus operation frequency under common route condition with rail transit | |
Wang et al. | Human‐centric multimodal deep (HMD) traffic signal control | |
Chatterjee | Modelling the impacts of transport telematics: current limitations and future developments | |
CN111091286A (en) | Public bicycle scheduling model and solving method | |
Lejdel | A conceptual framework for modeling smart parking | |
Malone et al. | The scenario explorer for passenger transport: A strategic model for long-term travel demand forecasting | |
Ruiz et al. | Intelligent electric drive management for plug-in hybrid buses |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |