CN116485196B - Service area open control decision method and system based on reinforcement learning - Google Patents

Service area open control decision method and system based on reinforcement learning Download PDF

Info

Publication number
CN116485196B
CN116485196B CN202310380218.3A CN202310380218A CN116485196B CN 116485196 B CN116485196 B CN 116485196B CN 202310380218 A CN202310380218 A CN 202310380218A CN 116485196 B CN116485196 B CN 116485196B
Authority
CN
China
Prior art keywords
service area
closing
moment
neural network
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310380218.3A
Other languages
Chinese (zh)
Other versions
CN116485196A (en
Inventor
王笑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Terminus Technology Group Co Ltd
Original Assignee
Terminus Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Terminus Technology Group Co Ltd filed Critical Terminus Technology Group Co Ltd
Priority to CN202310380218.3A priority Critical patent/CN116485196B/en
Publication of CN116485196A publication Critical patent/CN116485196A/en
Application granted granted Critical
Publication of CN116485196B publication Critical patent/CN116485196B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Game Theory and Decision Science (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a service area open control decision method and a service area open control decision system based on reinforcement learning, and belongs to the technical field of artificial intelligence. The method comprises the following steps: establishing a reinforced-learning cyclic neural network model, wherein an output layer of the model is used for intelligently predicting the respective existence quantity of various vehicles existing around a service area at the next moment; and determining a closing and opening strategy of the service area in a preset time period after the next moment and a closing and opening strategy of a gas station in the service area based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment. According to the invention, the circulating neural network can be applied to the decision of the open control of the service area of the expressway, and an artificial intelligent recognition mechanism which is suitable for different service areas and can determine the information of the passing vehicles at the next moment based on the historical data is constructed through reinforcement learning and structure customization, so that enough reaction time is provided for the decision of a service area manager.

Description

Service area open control decision method and system based on reinforcement learning
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a service area open control decision method and system based on reinforcement learning.
Background
Artificial intelligence is a branch of the computer science and has been called one of the three-pronged world technologies (space technology, energy technology, artificial intelligence) since the seventies of the twentieth century. Is also considered to be one of the three-tip technologies of the twenty-first century (genetic engineering, nanoscience, artificial intelligence). This is because it has been rapidly developed over the last three decades, has been widely used in many disciplines and has achieved great success, and artificial intelligence has evolved into a single branch, both theoretically and practically self-contained.
Neural networks are an important branch of artificial intelligence, artificial neural networks (Artificial Neural Networks, abbreviated as ANNs), also abbreviated as Neural Networks (NNs) or Connection models (Connection models), which are algorithmic mathematical models that mimic the behavioral characteristics of animal neural networks and perform distributed parallel information processing. The network relies on the complexity of the system and achieves the purpose of processing information by adjusting the relationship of the interconnection among a large number of nodes.
The recurrent neural network (Recurrent Neural Network, RNN) is a type of recurrent neural network (recursive neural network) that takes sequence data as input, performs recursion (recovery) in the evolution direction of the sequence, and all nodes (circulation units) are connected in a chained manner, wherein a bidirectional recurrent neural network (Bidirectional RNN, bi-RNN) and a Long Short-term memory network (Long Short-Term Memory networks, LSTM) are common recurrent neural networks.
The cyclic neural network can be used for various artificial intelligence type fuzzy processing and can reach a certain precision. Nevertheless, in many application fields, solutions of the recurrent neural network still have a blank, so that data of an application scene cannot be organically fused with solutions of the recurrent neural network, and even if the fusion is performed, due to lack of targeted research, accuracy of artificial intelligence identification is not high. For example, in the service area of the expressway in some wilderness, operators are always entangled in the contradiction that the cost is too high when the operators are open and the normal service requirements cannot be met when the operators are closed, because the types and the number of the passing vehicles at each moment cannot be predicted, and the recurrent neural network is not applied to the scene.
Disclosure of Invention
In order to solve the problems, the invention provides a service area open control decision method and a service area open control decision device based on reinforcement learning, which can apply a circulating neural network to decision making of open control of a service area of a highway, and form an artificial intelligent recognition mode capable of determining the information of a passing vehicle at the next moment based on historical data through reinforcement learning and structure customization.
For this reason, the present invention needs to have at least the following four key inventions:
(1) The service area history data is used for carrying out artificial intelligent prediction on the types and the quantity of the past vehicles nearby the service area at the next moment of the prediction moment, and determining the switching strategy of the service area and the switching strategy of the gas station in the service area within a preset duration after the next moment based on a prediction result;
(2) Selecting a cyclic neural network model for realizing artificial intelligent prediction of past vehicle information near a service area at the next moment, and performing targeted reinforcement learning on the cyclic neural network model, wherein the farther the distance from the service area to the nearest city is, the longer the time interval between two adjacent moments in each moment uniformly spaced in historical data is, so that flexible customization of models of different service areas is completed;
(3) Determining a closing and opening strategy of a service area in a preset time period after the next moment by adopting a weighted calculation mode, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is;
(4) And determining a closing and opening strategy of a gas station in a service area within a preset time period after the next moment by adopting a weighted calculation mode, wherein the larger the volume of a fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.
According to a first aspect of the present invention, there is provided a service area open control decision method based on reinforcement learning, the method comprising:
establishing a circulating neural network model, wherein each neuron in a hidden layer of the circulating neural network model receives input data with the same set time delay, output data of an output layer of the circulating neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the circulating neural network model is the past vehicle information existing around the preset number and even interval of each moment service area before the judging moment;
taking past vehicle information existing around a service area at each time preset before a certain judgment time in history and uniformly spaced as one piece of learning data of the circulating neural network model, taking each reciprocal of each existing number corresponding to each vehicle existing around the service area at a certain judgment time in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining a circulating neural network model after reinforcement learning;
Taking the past vehicle information existing around the service area at each time of a preset quantity and even interval before the next time as each input data of an input layer of the circulation neural network model after reinforcement learning, and operating the circulation neural network model after reinforcement learning to obtain the output data of an output layer thereof, namely, each existing quantity respectively corresponding to each vehicle existing around the service area at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the even interval;
determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
determining a closing and opening strategy of a gas station in a service area within a preset duration after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
Wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
the distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
According to a second aspect of the present invention, there is provided a decision system for reinforcement learning based service area opening control, the system comprising:
the first modeling device is used for building a cyclic neural network model, each neuron in a hidden layer of the cyclic neural network model receives input data with the same set time delay, output data of an output layer of the cyclic neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the cyclic neural network model is the past vehicle information existing around a preset number and even interval of each moment service area before the judging moment;
The second modeling device is connected with the first modeling device and is used for taking the past vehicle information existing around the service area at each moment which is preset before a certain judging moment in history and is uniformly spaced as one piece of learning data of the circulating neural network model, taking the respective reciprocal of each existing quantity corresponding to each vehicle existing around the service area at the certain judging moment in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining the circulating neural network model after reinforcement learning;
the data analysis device is connected with the second modeling device and is used for taking the past vehicle information existing around the service areas at all times which are preset in number and uniformly spaced before the next time as all input data of an input layer of the reinforced-learning cyclic neural network model, and operating the reinforced-learning cyclic neural network model to obtain output data of an output layer of the reinforced-learning cyclic neural network model, namely, the respective existing number of various vehicles existing around the service areas at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the uniform spacing;
The first judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the second judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
The distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
Drawings
Embodiments of the present invention will be described below with reference to the accompanying drawings, in which:
FIG. 1 is a technical flow diagram of a reinforcement learning-based service area open control decision method and system in accordance with the present invention.
Fig. 2 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 4 of the present invention.
Fig. 3 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 5 of the present invention.
Fig. 4 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 6 of the present invention.
Detailed Description
The recurrent neural network has memory, parameter sharing and complete graphics (Turing completeness), so that the recurrent neural network has certain advantages in learning the nonlinear characteristics of the sequence. The recurrent neural network has application in the fields of natural language processing (Natural Language Processing, NLP), such as speech recognition, language modeling, machine translation, etc., and is also used for various time series predictions. A recurrent neural network constructed with the introduction of convolutional neural networks (Convolutional Neural Network, CNN) can address computer vision problems involving sequence inputs.
Reinforcement learning (Reinforcement Learning, RL), also known as re-excitation learning, evaluation learning, or reinforcement learning, is one of the paradigm and methodology of machine learning to describe and address agents in their interaction with an environment. A common model for reinforcement learning is a standard markov decision process (Markov Decision Process, MDP). Reinforcement learning can be classified into model-based reinforcement learning (model-free RL) and model-free RL, and active reinforcement learning (active RL) and passive reinforcement learning (passive RL) according to given conditions. Variants of reinforcement learning include reverse reinforcement learning, hierarchical reinforcement learning, and reinforcement learning of partially observable systems. Algorithms used to solve reinforcement learning problems can be categorized into two types, a policy search algorithm and a value function (value function) algorithm. The deep learning model may be used in reinforcement learning to form deep reinforcement learning. The questions that maximize returns or achieve specific goals are learned by learning strategies.
Currently, different strategies are required to be formulated for the service areas of the expressway and the related gas stations in terms of whether the service areas are opened or not and the selection of opening time, because the service areas are located in different areas, the flow and the type of the passing vehicles are different, and meanwhile, even in the same area, the flow and the type of the passing vehicles have certain randomness and contingency, so that a service area operator is difficult to determine the opening strategies of the service areas and the related gas stations.
In order to overcome the defects, the invention discloses a service area opening control decision method and device based on reinforcement learning, which finish intelligent analysis of the flow and the type of the past vehicle at any moment based on service area historical data by introducing a circulation neural network model with a customized structure after reinforcement learning, so as to adaptively determine the opening strategy and the specific opening moment of a corresponding service area and an associated gas station, thereby achieving dynamic balance between reducing cost and meeting requirements.
As shown in fig. 1, a technical flowchart of a service area open control decision method and system based on reinforcement learning according to the present invention is presented.
As shown in fig. 1, the service area open control decision method based on reinforcement learning of the present invention includes:
firstly, collecting the information of passing vehicles near a service area at each time which is uniformly spaced before a judging time, wherein the information comprises the number of each type of vehicles, taking the collected data as input data of a circulating neural network model, taking the information of the passing vehicles near the service area at the judging time as output data of the circulating neural network model, and constructing a circulating neural network model with a customized structure;
Secondly, carrying out targeted reinforcement learning on the constructed circulating neural network model with the customized structure so as to ensure the recognition precision of the circulating neural network model after reinforcement learning;
thirdly, taking the information of the past vehicles near the service area at each time uniformly spaced before the next time as input data of the circulation neural network model after reinforcement learning so as to use the circulation neural network model after reinforcement learning, and acquiring key information of a switching strategy of the service area and a switching strategy of a gas station in the service area within a preset time period after the next time, namely the number of each type of vehicles in the past vehicles near the service area at the next time;
and finally, executing the determined switching strategy of the service area in the preset time length after the next moment and the switching strategy of the gas station in the service area by using the key information, wherein the switching strategy comprises whether the service area is opened or not, and the quantity of human resources or the quantity of material resources to be allocated.
The method has the key points that the circulation neural network model with the customized structure after reinforcement learning is used for specific selection of the expressway service area switching strategy and resource allocation, and the number of each type of vehicles in the past vehicles near the service area at the next moment is predicted based on historical data, so that first hand data is obtained in advance, and reaction time is provided for opening or closing of the service area and the gas station and quantitative allocation of manpower and material resources of the service area.
The service area open control decision method and device based on reinforcement learning of the present invention will be specifically described by way of example.
Example 1
The service area open control decision method based on reinforcement learning provided by the embodiment 1 of the invention comprises the following steps:
establishing a circulating neural network model, wherein each neuron in a hidden layer of the circulating neural network model receives input data with the same set time delay, output data of an output layer of the circulating neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the circulating neural network model is the past vehicle information existing around the preset number and even interval of each moment service area before the judging moment;
taking past vehicle information existing around a service area at each time preset before a certain judgment time in history and uniformly spaced as one piece of learning data of the circulating neural network model, taking each reciprocal of each existing number corresponding to each vehicle existing around the service area at a certain judgment time in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining a circulating neural network model after reinforcement learning;
Taking the past vehicle information existing around the service area at each time of a preset quantity and even interval before the next time as each input data of an input layer of the circulation neural network model after reinforcement learning, and operating the circulation neural network model after reinforcement learning to obtain the output data of an output layer thereof, namely, each existing quantity respectively corresponding to each vehicle existing around the service area at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the even interval;
determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
determining a closing and opening strategy of a gas station in a service area within a preset duration after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
Wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
the distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
Example 2
Compared to embodiment 1 of the present invention, the method for determining a service area open control based on reinforcement learning according to embodiment 2 of the present invention further includes:
analyzing the amount of human resources which are required to be dispatched by the service area in the preset time period after the next moment based on the determined closing and opening strategy of the service area in the preset time period after the next moment and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next moment;
the analyzing the amount of human resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing the amount of human resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more.
Example 3
Compared to embodiment 1 of the present invention, the method for determining a service area open control based on reinforcement learning according to embodiment 3 of the present invention further includes:
analyzing the quantity of material resources to be dispatched by the service area in the preset time after the next moment based on the determined closing and opening strategy of the service area in the preset time after the next moment and the determined closing and opening strategy of the gas station in the service area in the preset time after the next moment;
the analyzing the quantity of the material resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the quantity of material resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing the quantity of the material resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, the quantity of material resources which should be sent by the service area in the preset time period after the next time of analysis is larger than that of the closing and opening strategy.
In any of the above embodiments 1-3, optionally, in the reinforcement learning-based service area opening control decision method:
the closing and opening strategy of the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment, and the closing and opening strategy comprises the following steps: and executing weighting calculation on the respective existence quantity corresponding to each vehicle existing around the service area at the next moment, and determining a closing and opening strategy of the service area in a preset duration after the next moment based on a weighting calculation result, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is.
In any of the above embodiments 1-3, optionally, in the reinforcement learning-based service area opening control decision method:
the closing and opening strategy of the gas station in the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment comprises the following steps: and executing weighted calculation on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment, and determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the weighted calculation result, wherein the larger the volume of the fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.
Example 4
Fig. 2 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 4 of the present invention.
As shown in fig. 2, the decision system for reinforcement learning-based service area opening control includes the following components:
the first modeling device is used for building a cyclic neural network model, each neuron in a hidden layer of the cyclic neural network model receives input data with the same set time delay, output data of an output layer of the cyclic neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the cyclic neural network model is the past vehicle information existing around a preset number and even interval of each moment service area before the judging moment;
The second modeling device is connected with the first modeling device and is used for taking the past vehicle information existing around the service area at each moment which is preset before a certain judging moment in history and is uniformly spaced as one piece of learning data of the circulating neural network model, taking the respective reciprocal of each existing quantity corresponding to each vehicle existing around the service area at the certain judging moment in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining the circulating neural network model after reinforcement learning;
the data analysis device is connected with the second modeling device and is used for taking the past vehicle information existing around the service areas at all times which are preset in number and uniformly spaced before the next time as all input data of an input layer of the reinforced-learning cyclic neural network model, and operating the reinforced-learning cyclic neural network model to obtain output data of an output layer of the reinforced-learning cyclic neural network model, namely, the respective existing number of various vehicles existing around the service areas at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the uniform spacing;
The first judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the second judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
The distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
Example 5
Fig. 3 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 5 of the present invention.
As shown in fig. 3, compared to embodiment 4 of the present invention, the decision system for service area open control based on reinforcement learning further includes:
the first allocation device is respectively connected with the first judgment device and the second judgment device and is used for analyzing the quantity of human resources which should be dispatched by the service area in the preset time period after the next moment based on the closing and opening strategy of the service area in the preset time period after the next moment determined by the first judgment device and the closing and opening strategy of the gas station in the service area in the preset time period after the next moment determined by the second judgment device;
the analyzing, based on the closing and opening policy of the service area in the preset duration after the next time determined by the first judging device and the closing and opening policy of the gas station in the service area in the preset duration after the next time determined by the second judging device, the number of human resources to be dispatched by the service area in the preset duration after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing, based on the closing and opening policy of the service area in the preset duration after the next time determined by the first judging device and the closing and opening policy of the gas station in the service area in the preset duration after the next time determined by the second judging device, the number of human resources to be dispatched by the service area in the preset duration after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more.
Example 6
Fig. 4 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 6 of the present invention.
As shown in fig. 4, compared to embodiment 4 of the present invention, the decision system for service area open control based on reinforcement learning further includes:
the second allocation device is respectively connected with the first judgment device and the second judgment device and is used for analyzing the quantity of material resources to be dispatched by the service area in the preset time period after the next moment based on the closing and opening strategy of the service area in the preset time period after the next moment determined by the first judgment device and the closing and opening strategy of the gas station in the service area in the preset time period after the next moment determined by the second judgment device;
The analyzing the quantity of the material resources to be dispatched by the service area in the preset time period after the next time based on the closing and opening strategy of the service area in the preset time period after the next time determined by the first judging device and the closing and opening strategy of the gas station in the service area in the preset time period after the next time determined by the second judging device includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the quantity of material resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
the analyzing the quantity of the material resources to be dispatched by the service area in the preset time period after the next time based on the closing and opening strategy of the service area in the preset time period after the next time determined by the first judging device and the closing and opening strategy of the gas station in the service area in the preset time period after the next time determined by the second judging device includes: when the closing and opening strategy of the gas station in the service area is determined to be open, the quantity of material resources which should be sent by the service area in the preset time period after the next time of analysis is larger than that of the closing and opening strategy.
In any of embodiments 4-6 above, optionally, in the reinforcement learning based service area opening control decision system:
The closing and opening strategy of the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment, and the closing and opening strategy comprises the following steps: and executing weighting calculation on the respective existence quantity corresponding to each vehicle existing around the service area at the next moment, and determining a closing and opening strategy of the service area in a preset duration after the next moment based on a weighting calculation result, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is.
In any of embodiments 4-6 above, optionally, in the reinforcement learning based service area opening control decision system:
the closing and opening strategy of the gas station in the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment comprises the following steps: and executing weighted calculation on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment, and determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the weighted calculation result, wherein the larger the volume of the fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.
In addition, the expressway service area is a place specially used for passengers and drivers to stay at rest, and facilities such as a parking lot, a public toilet, a gas station, a vehicle repair station, a catering department, a canteen and the like are provided, wherein the average distance is about 50 kilometers. The service flow entering the service area is divided into a people flow and a traffic flow. The flow has no and no stops. The people flow is divided into different types such as waiting for a car, resting, toilet, shopping, dining, accommodation, using electronic equipment and the like.
The construction scale of the service area generally needs to accommodate future increases in traffic volume. Common highway service areas include a single-sided service area and a double-sided service area.
The single-sided service area is also called a central centralized service area. The layout principle is that a service area is arranged on one side of a road, and various functional services are concentrated in one service area. Vehicles traveling in reverse lanes enter the service area through overpasses or tunnels. Single-sided service areas are less common because they have some usage defects than double-sided service areas. There are two common forms of single-sided service areas: the utility model provides a large-scale centralized service area, external service facilities set up in highway one side, and the facility of refueling is located the highway both sides respectively. There are many such forms of service areas in europe that focus on shopping malls, entertainment facilities, accommodation, dining, etc. The other small single-side service area is mainly due to the topography, has the existing value in mountain areas and areas where enough construction space cannot be provided, and particularly is determined to be the planning construction of the small service area and the parking area, and the single-side service area can play a good role to a certain extent. The northwest region has complex terrain, and a single-sided service area is a necessary choice in some special geographic environments. The floor area is small, the use characteristics of the bidirectional running vehicle are met, and the bidirectional running vehicle can play a unique role in the special environments. In recent years, with the development of highways, some landscape service areas where scenes are parked appear.
The layout of the two-side service areas is most common, and the layout principle is that the service areas are arranged on two sides of the road, and service facilities and functional areas on two sides are the same. The expressway is a totally-enclosed bidirectional lane, and the road with the isolation belt arranged in the middle is provided with the double-side service areas, so that vehicles in different directions can be respectively used for entering and exiting; meanwhile, the service areas at two sides are communicated through the overpass or the tunnel, so that the use capacity and material allocation of the vehicle are optimized. The double-side service area is used in a good geographic environment, so that the double-side service area is more reasonable, the requirements of convenience in use, rapidness and high efficiency are met, and the commercial effect is maximized. The visual double-side service areas can be designed into the same layout form or different forms, and the natural landscape environment borrows and corresponds to the landscape form.
The expressway service area is used as an expressway industry to develop matched service facilities, and has important significance for rapid development of expressways and increase of mileage planning. The effective operation and high-quality service of the service area can better realize the social service value of the expressway, can increase the economic benefit of the expressway investment company, can provide employment opportunities and solve the shunting of surplus personnel. In addition, the opportunity of developing the service area can be utilized to obtain rare land resources and the like at low land cost.
And reinforcement learning is to consider learning as a heuristic evaluation process, wherein an Agent selects an action for an environment, the state of the environment changes after receiving the action, and a reinforcement signal (rewarding or punishment) is generated and fed back to the Agent, and the Agent reselects the next action according to the reinforcement signal and the current state of the environment, wherein the selection principle is to increase the probability of being subjected to positive reinforcement (rewarding). The action selected affects not only the immediate enhancement value, but also the state at the moment in the environment and the final enhancement value.
Reinforcement learning differs from supervised learning in connection with sense learning in that the reinforcement signal provided by the environment is an Agent's assessment of how well the generated action is (typically a scalar signal) rather than telling the Agent how to generate the correct action. Since the external environment provides little information, the Agent must learn from its own experiences. In this way, the Agent obtains knowledge in the environment of the action-by-action evaluation, and the action scheme is modified to adapt to the environment.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims (10)

1. A service area open control decision method based on reinforcement learning, the method comprising:
establishing a circulating neural network model, wherein each neuron in a hidden layer of the circulating neural network model receives input data with the same set time delay, output data of an output layer of the circulating neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the circulating neural network model is the past vehicle information existing around the preset number and even interval of each moment service area before the judging moment;
taking past vehicle information existing around a service area at each time preset before a certain judgment time in history and uniformly spaced as one piece of learning data of the circulating neural network model, taking each reciprocal of each existing number corresponding to each vehicle existing around the service area at a certain judgment time in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining a circulating neural network model after reinforcement learning;
Taking the past vehicle information existing around the service area at each time of a preset quantity and even interval before the next time as each input data of an input layer of the circulation neural network model after reinforcement learning, and operating the circulation neural network model after reinforcement learning to obtain the output data of an output layer thereof, namely, each existing quantity respectively corresponding to each vehicle existing around the service area at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the even interval;
determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
determining a closing and opening strategy of a gas station in a service area within a preset duration after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
Wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
the distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
2. The reinforcement learning based service area open control decision method of claim 1, wherein the method further comprises:
analyzing the amount of human resources which are required to be dispatched by the service area in the preset time period after the next moment based on the determined closing and opening strategy of the service area in the preset time period after the next moment and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next moment;
the analyzing the amount of human resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing the amount of human resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more.
3. The reinforcement learning based service area open control decision method of claim 1, wherein the method further comprises:
analyzing the quantity of material resources to be dispatched by the service area in the preset time after the next moment based on the determined closing and opening strategy of the service area in the preset time after the next moment and the determined closing and opening strategy of the gas station in the service area in the preset time after the next moment;
the analyzing the quantity of the material resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the quantity of material resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing the quantity of the material resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, the quantity of material resources which should be sent by the service area in the preset time period after the next time of analysis is larger than that of the closing and opening strategy.
4. A reinforcement learning based service area open control decision method as claimed in any one of claims 1-3, wherein:
the closing and opening strategy of the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment, and the closing and opening strategy comprises the following steps: and executing weighting calculation on the respective existence quantity corresponding to each vehicle existing around the service area at the next moment, and determining a closing and opening strategy of the service area in a preset duration after the next moment based on a weighting calculation result, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is.
5. A reinforcement learning based service area open control decision method as claimed in any one of claims 1-3, wherein:
the closing and opening strategy of the gas station in the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment comprises the following steps: and executing weighted calculation on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment, and determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the weighted calculation result, wherein the larger the volume of the fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.
6. A reinforcement learning based service area opening control decision making system, the system comprising:
the first modeling device is used for building a cyclic neural network model, each neuron in a hidden layer of the cyclic neural network model receives input data with the same set time delay, output data of an output layer of the cyclic neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the cyclic neural network model is the past vehicle information existing around a preset number and even interval of each moment service area before the judging moment;
The second modeling device is connected with the first modeling device and is used for taking the past vehicle information existing around the service area at each moment which is preset before a certain judging moment in history and is uniformly spaced as one piece of learning data of the circulating neural network model, taking the respective reciprocal of each existing quantity corresponding to each vehicle existing around the service area at the certain judging moment in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining the circulating neural network model after reinforcement learning;
the data analysis device is connected with the second modeling device and is used for taking the past vehicle information existing around the service areas at all times which are preset in number and uniformly spaced before the next time as all input data of an input layer of the reinforced-learning cyclic neural network model, and operating the reinforced-learning cyclic neural network model to obtain output data of an output layer of the reinforced-learning cyclic neural network model, namely, the respective existing number of various vehicles existing around the service areas at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the uniform spacing;
The first judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the second judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;
the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;
wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;
The distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.
7. The reinforcement-learning-based service area open control decision system of claim 6, wherein said system further comprises:
the first allocation device is respectively connected with the first judgment device and the second judgment device and is used for analyzing the quantity of human resources which should be dispatched by the service area in the preset time period after the next moment based on the closing and opening strategy of the service area in the preset time period after the next moment determined by the first judgment device and the closing and opening strategy of the gas station in the service area in the preset time period after the next moment determined by the second judgment device;
the analyzing, based on the closing and opening policy of the service area in the preset duration after the next time determined by the first judging device and the closing and opening policy of the gas station in the service area in the preset duration after the next time determined by the second judging device, the number of human resources to be dispatched by the service area in the preset duration after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
The analyzing, based on the closing and opening policy of the service area in the preset duration after the next time determined by the first judging device and the closing and opening policy of the gas station in the service area in the preset duration after the next time determined by the second judging device, the number of human resources to be dispatched by the service area in the preset duration after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more.
8. The reinforcement-learning-based service area open control decision system of claim 6, wherein said system further comprises:
the second allocation device is respectively connected with the first judgment device and the second judgment device and is used for analyzing the quantity of material resources to be dispatched by the service area in the preset time period after the next moment based on the closing and opening strategy of the service area in the preset time period after the next moment determined by the first judgment device and the closing and opening strategy of the gas station in the service area in the preset time period after the next moment determined by the second judgment device;
The analyzing the quantity of the material resources to be dispatched by the service area in the preset time period after the next time based on the closing and opening strategy of the service area in the preset time period after the next time determined by the first judging device and the closing and opening strategy of the gas station in the service area in the preset time period after the next time determined by the second judging device includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the quantity of material resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;
the analyzing the quantity of the material resources to be dispatched by the service area in the preset time period after the next time based on the closing and opening strategy of the service area in the preset time period after the next time determined by the first judging device and the closing and opening strategy of the gas station in the service area in the preset time period after the next time determined by the second judging device includes: when the closing and opening strategy of the gas station in the service area is determined to be open, the quantity of material resources which should be sent by the service area in the preset time period after the next time of analysis is larger than that of the closing and opening strategy.
9. A reinforcement learning based service area open control decision system as claimed in any one of claims 6 to 8, wherein:
The closing and opening strategy of the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment, and the closing and opening strategy comprises the following steps: and executing weighting calculation on the respective existence quantity corresponding to each vehicle existing around the service area at the next moment, and determining a closing and opening strategy of the service area in a preset duration after the next moment based on a weighting calculation result, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is.
10. A reinforcement learning based service area open control decision system as claimed in any one of claims 6 to 8, wherein:
the closing and opening strategy of the gas station in the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment comprises the following steps: and executing weighted calculation on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment, and determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the weighted calculation result, wherein the larger the volume of the fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.
CN202310380218.3A 2023-04-11 2023-04-11 Service area open control decision method and system based on reinforcement learning Active CN116485196B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310380218.3A CN116485196B (en) 2023-04-11 2023-04-11 Service area open control decision method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310380218.3A CN116485196B (en) 2023-04-11 2023-04-11 Service area open control decision method and system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN116485196A CN116485196A (en) 2023-07-25
CN116485196B true CN116485196B (en) 2023-11-14

Family

ID=87224450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310380218.3A Active CN116485196B (en) 2023-04-11 2023-04-11 Service area open control decision method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN116485196B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002083391A (en) * 2000-09-07 2002-03-22 Matsushita Electric Ind Co Ltd System of coping with congestion in service area and method of coping with congestion
CN105225485A (en) * 2015-10-09 2016-01-06 山东高速信息工程有限公司 The monitoring method of a kind of Expressway Service service capacity, system and device
WO2021042296A1 (en) * 2019-09-04 2021-03-11 北京图森智途科技有限公司 Method and system for solving requirement of hub service area
CN112766751A (en) * 2021-01-25 2021-05-07 云南交投集团经营开发有限公司 Intelligent management method and system for high-speed service area
CN113344254A (en) * 2021-05-20 2021-09-03 山西省交通新技术发展有限公司 Method for predicting traffic flow of expressway service area based on LSTM-LightGBM-KNN
CN113362598A (en) * 2021-06-04 2021-09-07 重庆高速公路路网管理有限公司 Traffic flow prediction method for expressway service area
CN113963544A (en) * 2021-11-05 2022-01-21 贵州省通信产业服务有限公司 Service area traffic flow prediction system
CN114333333A (en) * 2022-03-10 2022-04-12 四川高速公路建设开发集团有限公司 Tidal type highway intelligent service area based on traffic flow prediction
CN114418161A (en) * 2021-11-24 2022-04-29 广东省城乡规划设计研究院有限责任公司 Intelligent networking method and device for highway service area, electronic equipment and storage medium
CN115497299A (en) * 2022-11-14 2022-12-20 中科聚信信息技术(北京)有限公司 ETC-based service area traffic flow prediction method and system and service area

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002083391A (en) * 2000-09-07 2002-03-22 Matsushita Electric Ind Co Ltd System of coping with congestion in service area and method of coping with congestion
CN105225485A (en) * 2015-10-09 2016-01-06 山东高速信息工程有限公司 The monitoring method of a kind of Expressway Service service capacity, system and device
WO2021042296A1 (en) * 2019-09-04 2021-03-11 北京图森智途科技有限公司 Method and system for solving requirement of hub service area
CN112766751A (en) * 2021-01-25 2021-05-07 云南交投集团经营开发有限公司 Intelligent management method and system for high-speed service area
CN113344254A (en) * 2021-05-20 2021-09-03 山西省交通新技术发展有限公司 Method for predicting traffic flow of expressway service area based on LSTM-LightGBM-KNN
CN113362598A (en) * 2021-06-04 2021-09-07 重庆高速公路路网管理有限公司 Traffic flow prediction method for expressway service area
CN113963544A (en) * 2021-11-05 2022-01-21 贵州省通信产业服务有限公司 Service area traffic flow prediction system
CN114418161A (en) * 2021-11-24 2022-04-29 广东省城乡规划设计研究院有限责任公司 Intelligent networking method and device for highway service area, electronic equipment and storage medium
CN114333333A (en) * 2022-03-10 2022-04-12 四川高速公路建设开发集团有限公司 Tidal type highway intelligent service area based on traffic flow prediction
CN115497299A (en) * 2022-11-14 2022-12-20 中科聚信信息技术(北京)有限公司 ETC-based service area traffic flow prediction method and system and service area

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于智慧建筑理念的高速公路服务区设计研究;周诗钦;;居舍(23);全文 *

Also Published As

Publication number Publication date
CN116485196A (en) 2023-07-25

Similar Documents

Publication Publication Date Title
Jie et al. A hybrid algorithm for time-dependent vehicle routing problem with soft time windows and stochastic factors
An et al. Optimal scheduling of electric vehicle charging operations considering real-time traffic condition and travel distance
Kim et al. Idle vehicle relocation strategy through deep learning for shared autonomous electric vehicle system optimization
Stopher et al. Modelling Travel Demand: A Disaggregate Behavioral Approach Issues and Applications
Li et al. Towards smart transportation system: A case study on the rebalancing problem of bike sharing system based on reinforcement learning
Cai et al. A hybrid adaptive large neighborhood search and tabu search algorithm for the electric vehicle relocation problem
Kadri et al. An integrated Petri net and GA-based approach for performance optimisation of bicycle sharing systems
Liu et al. Electric transit network design by an improved artificial fish-swarm algorithm
Hou et al. The effect of the dataset on evaluating urban traffic prediction
Embarak Smart Cities New Paradigm Applications and Challenges
Kamel et al. A modelling platform for optimizing time-dependent transit fares in large-scale multimodal networks
Sierpiński et al. Platform to support the implementation of electromobility in smart cities based on ICT applications-concept for an electric travelling project.
Li et al. A new fuzzy-based method for energy-aware resource allocation in vehicular cloud computing using a nature-inspired algorithm
Parezanović et al. Evaluation of sustainable mobility measures using fuzzy COPRAS method
Zhang et al. A public transport network design using a hidden Markov model and an optimization algorithm
CN116485196B (en) Service area open control decision method and system based on reinforcement learning
Hachette et al. Mobility Hubs, an Innovative Concept for Sustainable Urban Mobility? State of the Art and Guidelines from European Experiences
Kedia et al. Transit shift response analysis through fuzzy rule based-choice model: a case study of Indian metropolitan city
Yu et al. Optimization of urban bus operation frequency under common route condition with rail transit
Wang et al. Human‐centric multimodal deep (HMD) traffic signal control
Chatterjee Modelling the impacts of transport telematics: current limitations and future developments
CN111091286A (en) Public bicycle scheduling model and solving method
Lejdel A conceptual framework for modeling smart parking
Malone et al. The scenario explorer for passenger transport: A strategic model for long-term travel demand forecasting
Ruiz et al. Intelligent electric drive management for plug-in hybrid buses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant