CN116485196B

CN116485196B - Service area open control decision method and system based on reinforcement learning

Info

Publication number: CN116485196B
Application number: CN202310380218.3A
Authority: CN
Inventors: 王笑
Original assignee: Terminus Technology Group Co Ltd
Current assignee: Terminus Technology Group Co Ltd
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-11-14
Anticipated expiration: 2043-04-11
Also published as: CN116485196A

Abstract

The invention relates to a service area open control decision method and a service area open control decision system based on reinforcement learning, and belongs to the technical field of artificial intelligence. The method comprises the following steps: establishing a reinforced-learning cyclic neural network model, wherein an output layer of the model is used for intelligently predicting the respective existence quantity of various vehicles existing around a service area at the next moment; and determining a closing and opening strategy of the service area in a preset time period after the next moment and a closing and opening strategy of a gas station in the service area based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment. According to the invention, the circulating neural network can be applied to the decision of the open control of the service area of the expressway, and an artificial intelligent recognition mechanism which is suitable for different service areas and can determine the information of the passing vehicles at the next moment based on the historical data is constructed through reinforcement learning and structure customization, so that enough reaction time is provided for the decision of a service area manager.

Description

Service area open control decision method and system based on reinforcement learning

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a service area open control decision method and system based on reinforcement learning.

Background

Artificial intelligence is a branch of the computer science and has been called one of the three-pronged world technologies (space technology, energy technology, artificial intelligence) since the seventies of the twentieth century. Is also considered to be one of the three-tip technologies of the twenty-first century (genetic engineering, nanoscience, artificial intelligence). This is because it has been rapidly developed over the last three decades, has been widely used in many disciplines and has achieved great success, and artificial intelligence has evolved into a single branch, both theoretically and practically self-contained.

Neural networks are an important branch of artificial intelligence, artificial neural networks (Artificial Neural Networks, abbreviated as ANNs), also abbreviated as Neural Networks (NNs) or Connection models (Connection models), which are algorithmic mathematical models that mimic the behavioral characteristics of animal neural networks and perform distributed parallel information processing. The network relies on the complexity of the system and achieves the purpose of processing information by adjusting the relationship of the interconnection among a large number of nodes.

The recurrent neural network (Recurrent Neural Network, RNN) is a type of recurrent neural network (recursive neural network) that takes sequence data as input, performs recursion (recovery) in the evolution direction of the sequence, and all nodes (circulation units) are connected in a chained manner, wherein a bidirectional recurrent neural network (Bidirectional RNN, bi-RNN) and a Long Short-term memory network (Long Short-Term Memory networks, LSTM) are common recurrent neural networks.

The cyclic neural network can be used for various artificial intelligence type fuzzy processing and can reach a certain precision. Nevertheless, in many application fields, solutions of the recurrent neural network still have a blank, so that data of an application scene cannot be organically fused with solutions of the recurrent neural network, and even if the fusion is performed, due to lack of targeted research, accuracy of artificial intelligence identification is not high. For example, in the service area of the expressway in some wilderness, operators are always entangled in the contradiction that the cost is too high when the operators are open and the normal service requirements cannot be met when the operators are closed, because the types and the number of the passing vehicles at each moment cannot be predicted, and the recurrent neural network is not applied to the scene.

Disclosure of Invention

In order to solve the problems, the invention provides a service area open control decision method and a service area open control decision device based on reinforcement learning, which can apply a circulating neural network to decision making of open control of a service area of a highway, and form an artificial intelligent recognition mode capable of determining the information of a passing vehicle at the next moment based on historical data through reinforcement learning and structure customization.

For this reason, the present invention needs to have at least the following four key inventions:

(1) The service area history data is used for carrying out artificial intelligent prediction on the types and the quantity of the past vehicles nearby the service area at the next moment of the prediction moment, and determining the switching strategy of the service area and the switching strategy of the gas station in the service area within a preset duration after the next moment based on a prediction result;

(2) Selecting a cyclic neural network model for realizing artificial intelligent prediction of past vehicle information near a service area at the next moment, and performing targeted reinforcement learning on the cyclic neural network model, wherein the farther the distance from the service area to the nearest city is, the longer the time interval between two adjacent moments in each moment uniformly spaced in historical data is, so that flexible customization of models of different service areas is completed;

(3) Determining a closing and opening strategy of a service area in a preset time period after the next moment by adopting a weighted calculation mode, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is;

(4) And determining a closing and opening strategy of a gas station in a service area within a preset time period after the next moment by adopting a weighted calculation mode, wherein the larger the volume of a fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.

According to a first aspect of the present invention, there is provided a service area open control decision method based on reinforcement learning, the method comprising:

establishing a circulating neural network model, wherein each neuron in a hidden layer of the circulating neural network model receives input data with the same set time delay, output data of an output layer of the circulating neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the circulating neural network model is the past vehicle information existing around the preset number and even interval of each moment service area before the judging moment;

taking past vehicle information existing around a service area at each time preset before a certain judgment time in history and uniformly spaced as one piece of learning data of the circulating neural network model, taking each reciprocal of each existing number corresponding to each vehicle existing around the service area at a certain judgment time in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining a circulating neural network model after reinforcement learning;

Taking the past vehicle information existing around the service area at each time of a preset quantity and even interval before the next time as each input data of an input layer of the circulation neural network model after reinforcement learning, and operating the circulation neural network model after reinforcement learning to obtain the output data of an output layer thereof, namely, each existing quantity respectively corresponding to each vehicle existing around the service area at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the even interval;

determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;

determining a closing and opening strategy of a gas station in a service area within a preset duration after the next moment based on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment;

the output data of the output layer of the cyclic neural network model is the past vehicle information existing around the service area at the judging moment, and the method comprises the following steps: the past vehicle information existing around the judging moment service area is the corresponding existing quantity of various vehicles existing around the judging moment service area;

Wherein, each input data of the input layer of the cyclic neural network model is the past vehicle information existing around the service area at each time preset before the judgment time and uniformly spaced, and the past vehicle information comprises: each input data is the corresponding existing quantity of various vehicles around the service area at a certain moment before the judging moment;

the distance from the service area to the nearest city is longer, and the value of the interval duration between two adjacent moments in the uniformly-spaced moments is larger.

According to a second aspect of the present invention, there is provided a decision system for reinforcement learning based service area opening control, the system comprising:

the first modeling device is used for building a cyclic neural network model, each neuron in a hidden layer of the cyclic neural network model receives input data with the same set time delay, output data of an output layer of the cyclic neural network model is past vehicle information existing around a judging moment service area, and each input data of the input layer of the cyclic neural network model is the past vehicle information existing around a preset number and even interval of each moment service area before the judging moment;

The second modeling device is connected with the first modeling device and is used for taking the past vehicle information existing around the service area at each moment which is preset before a certain judging moment in history and is uniformly spaced as one piece of learning data of the circulating neural network model, taking the respective reciprocal of each existing quantity corresponding to each vehicle existing around the service area at the certain judging moment in history as a reward signal for performing reinforcement learning on the circulating neural network model so as to realize reinforcement learning operation on the circulating neural network model, and completing reinforcement learning operation on the circulating neural network model by a plurality of pieces of learning data with fixed quantity in a time-sharing way, thereby obtaining the circulating neural network model after reinforcement learning;

the data analysis device is connected with the second modeling device and is used for taking the past vehicle information existing around the service areas at all times which are preset in number and uniformly spaced before the next time as all input data of an input layer of the reinforced-learning cyclic neural network model, and operating the reinforced-learning cyclic neural network model to obtain output data of an output layer of the reinforced-learning cyclic neural network model, namely, the respective existing number of various vehicles existing around the service areas at the next time, wherein the next time and the current time are separated by the interval duration corresponding to the uniform spacing;

The first judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of the service area in a preset time period after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;

the second judging device is connected with the data analyzing device and is used for determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the respective existence quantity respectively corresponding to various vehicles existing around the service area at the next moment;

Drawings

Embodiments of the present invention will be described below with reference to the accompanying drawings, in which:

FIG. 1 is a technical flow diagram of a reinforcement learning-based service area open control decision method and system in accordance with the present invention.

Fig. 2 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 4 of the present invention.

Fig. 3 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 5 of the present invention.

Fig. 4 is an internal structural diagram of a decision system based on reinforcement learning service area open control according to embodiment 6 of the present invention.

Detailed Description

The recurrent neural network has memory, parameter sharing and complete graphics (Turing completeness), so that the recurrent neural network has certain advantages in learning the nonlinear characteristics of the sequence. The recurrent neural network has application in the fields of natural language processing (Natural Language Processing, NLP), such as speech recognition, language modeling, machine translation, etc., and is also used for various time series predictions. A recurrent neural network constructed with the introduction of convolutional neural networks (Convolutional Neural Network, CNN) can address computer vision problems involving sequence inputs.

Reinforcement learning (Reinforcement Learning, RL), also known as re-excitation learning, evaluation learning, or reinforcement learning, is one of the paradigm and methodology of machine learning to describe and address agents in their interaction with an environment. A common model for reinforcement learning is a standard markov decision process (Markov Decision Process, MDP). Reinforcement learning can be classified into model-based reinforcement learning (model-free RL) and model-free RL, and active reinforcement learning (active RL) and passive reinforcement learning (passive RL) according to given conditions. Variants of reinforcement learning include reverse reinforcement learning, hierarchical reinforcement learning, and reinforcement learning of partially observable systems. Algorithms used to solve reinforcement learning problems can be categorized into two types, a policy search algorithm and a value function (value function) algorithm. The deep learning model may be used in reinforcement learning to form deep reinforcement learning. The questions that maximize returns or achieve specific goals are learned by learning strategies.

Currently, different strategies are required to be formulated for the service areas of the expressway and the related gas stations in terms of whether the service areas are opened or not and the selection of opening time, because the service areas are located in different areas, the flow and the type of the passing vehicles are different, and meanwhile, even in the same area, the flow and the type of the passing vehicles have certain randomness and contingency, so that a service area operator is difficult to determine the opening strategies of the service areas and the related gas stations.

In order to overcome the defects, the invention discloses a service area opening control decision method and device based on reinforcement learning, which finish intelligent analysis of the flow and the type of the past vehicle at any moment based on service area historical data by introducing a circulation neural network model with a customized structure after reinforcement learning, so as to adaptively determine the opening strategy and the specific opening moment of a corresponding service area and an associated gas station, thereby achieving dynamic balance between reducing cost and meeting requirements.

As shown in fig. 1, a technical flowchart of a service area open control decision method and system based on reinforcement learning according to the present invention is presented.

As shown in fig. 1, the service area open control decision method based on reinforcement learning of the present invention includes:

firstly, collecting the information of passing vehicles near a service area at each time which is uniformly spaced before a judging time, wherein the information comprises the number of each type of vehicles, taking the collected data as input data of a circulating neural network model, taking the information of the passing vehicles near the service area at the judging time as output data of the circulating neural network model, and constructing a circulating neural network model with a customized structure;

Secondly, carrying out targeted reinforcement learning on the constructed circulating neural network model with the customized structure so as to ensure the recognition precision of the circulating neural network model after reinforcement learning;

thirdly, taking the information of the past vehicles near the service area at each time uniformly spaced before the next time as input data of the circulation neural network model after reinforcement learning so as to use the circulation neural network model after reinforcement learning, and acquiring key information of a switching strategy of the service area and a switching strategy of a gas station in the service area within a preset time period after the next time, namely the number of each type of vehicles in the past vehicles near the service area at the next time;

and finally, executing the determined switching strategy of the service area in the preset time length after the next moment and the switching strategy of the gas station in the service area by using the key information, wherein the switching strategy comprises whether the service area is opened or not, and the quantity of human resources or the quantity of material resources to be allocated.

The method has the key points that the circulation neural network model with the customized structure after reinforcement learning is used for specific selection of the expressway service area switching strategy and resource allocation, and the number of each type of vehicles in the past vehicles near the service area at the next moment is predicted based on historical data, so that first hand data is obtained in advance, and reaction time is provided for opening or closing of the service area and the gas station and quantitative allocation of manpower and material resources of the service area.

The service area open control decision method and device based on reinforcement learning of the present invention will be specifically described by way of example.

Example 1

The service area open control decision method based on reinforcement learning provided by the embodiment 1 of the invention comprises the following steps:

Example 2

Compared to embodiment 1 of the present invention, the method for determining a service area open control based on reinforcement learning according to embodiment 2 of the present invention further includes:

analyzing the amount of human resources which are required to be dispatched by the service area in the preset time period after the next moment based on the determined closing and opening strategy of the service area in the preset time period after the next moment and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next moment;

the analyzing the amount of human resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;

The analyzing the amount of human resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more.

Example 3

Compared to embodiment 1 of the present invention, the method for determining a service area open control based on reinforcement learning according to embodiment 3 of the present invention further includes:

analyzing the quantity of material resources to be dispatched by the service area in the preset time after the next moment based on the determined closing and opening strategy of the service area in the preset time after the next moment and the determined closing and opening strategy of the gas station in the service area in the preset time after the next moment;

the analyzing the quantity of the material resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the quantity of material resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;

The analyzing the quantity of the material resources to be sent by the service area in the preset time period after the next time based on the determined closing and opening strategy of the service area in the preset time period after the next time and the determined closing and opening strategy of the gas station in the service area in the preset time period after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, the quantity of material resources which should be sent by the service area in the preset time period after the next time of analysis is larger than that of the closing and opening strategy.

In any of the above embodiments 1-3, optionally, in the reinforcement learning-based service area opening control decision method:

the closing and opening strategy of the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment, and the closing and opening strategy comprises the following steps: and executing weighting calculation on the respective existence quantity corresponding to each vehicle existing around the service area at the next moment, and determining a closing and opening strategy of the service area in a preset duration after the next moment based on a weighting calculation result, wherein the more passengers corresponding to the vehicle types are, the larger the weight value given to the vehicle types is.

the closing and opening strategy of the gas station in the service area in the preset time period after the next moment is determined based on the respective existence quantity corresponding to the various vehicles existing around the service area at the next moment comprises the following steps: and executing weighted calculation on the respective existence quantity corresponding to various vehicles existing around the service area at the next moment, and determining a closing and opening strategy of a gas station in the service area within a preset duration after the next moment based on the weighted calculation result, wherein the larger the volume of the fuel tank corresponding to the vehicle type is, the larger the weight value given to the vehicle type is.

Example 4

As shown in fig. 2, the decision system for reinforcement learning-based service area opening control includes the following components:

Example 5

As shown in fig. 3, compared to embodiment 4 of the present invention, the decision system for service area open control based on reinforcement learning further includes:

the first allocation device is respectively connected with the first judgment device and the second judgment device and is used for analyzing the quantity of human resources which should be dispatched by the service area in the preset time period after the next moment based on the closing and opening strategy of the service area in the preset time period after the next moment determined by the first judgment device and the closing and opening strategy of the gas station in the service area in the preset time period after the next moment determined by the second judgment device;

the analyzing, based on the closing and opening policy of the service area in the preset duration after the next time determined by the first judging device and the closing and opening policy of the gas station in the service area in the preset duration after the next time determined by the second judging device, the number of human resources to be dispatched by the service area in the preset duration after the next time includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;

The analyzing, based on the closing and opening policy of the service area in the preset duration after the next time determined by the first judging device and the closing and opening policy of the gas station in the service area in the preset duration after the next time determined by the second judging device, the number of human resources to be dispatched by the service area in the preset duration after the next time includes: when the closing and opening strategy of the gas station in the service area is determined to be open, compared with the closing and opening strategy of closing, the number of human resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more.

Example 6

As shown in fig. 4, compared to embodiment 4 of the present invention, the decision system for service area open control based on reinforcement learning further includes:

the second allocation device is respectively connected with the first judgment device and the second judgment device and is used for analyzing the quantity of material resources to be dispatched by the service area in the preset time period after the next moment based on the closing and opening strategy of the service area in the preset time period after the next moment determined by the first judgment device and the closing and opening strategy of the gas station in the service area in the preset time period after the next moment determined by the second judgment device;

The analyzing the quantity of the material resources to be dispatched by the service area in the preset time period after the next time based on the closing and opening strategy of the service area in the preset time period after the next time determined by the first judging device and the closing and opening strategy of the gas station in the service area in the preset time period after the next time determined by the second judging device includes: when the closing and opening strategy of the service area is determined to be open, compared with the closing and opening strategy of closing, the quantity of material resources which are required to be dispatched by the service area in the preset duration after the next time of analysis is more;

the analyzing the quantity of the material resources to be dispatched by the service area in the preset time period after the next time based on the closing and opening strategy of the service area in the preset time period after the next time determined by the first judging device and the closing and opening strategy of the gas station in the service area in the preset time period after the next time determined by the second judging device includes: when the closing and opening strategy of the gas station in the service area is determined to be open, the quantity of material resources which should be sent by the service area in the preset time period after the next time of analysis is larger than that of the closing and opening strategy.

In any of embodiments 4-6 above, optionally, in the reinforcement learning based service area opening control decision system:

In addition, the expressway service area is a place specially used for passengers and drivers to stay at rest, and facilities such as a parking lot, a public toilet, a gas station, a vehicle repair station, a catering department, a canteen and the like are provided, wherein the average distance is about 50 kilometers. The service flow entering the service area is divided into a people flow and a traffic flow. The flow has no and no stops. The people flow is divided into different types such as waiting for a car, resting, toilet, shopping, dining, accommodation, using electronic equipment and the like.

The construction scale of the service area generally needs to accommodate future increases in traffic volume. Common highway service areas include a single-sided service area and a double-sided service area.

The single-sided service area is also called a central centralized service area. The layout principle is that a service area is arranged on one side of a road, and various functional services are concentrated in one service area. Vehicles traveling in reverse lanes enter the service area through overpasses or tunnels. Single-sided service areas are less common because they have some usage defects than double-sided service areas. There are two common forms of single-sided service areas: the utility model provides a large-scale centralized service area, external service facilities set up in highway one side, and the facility of refueling is located the highway both sides respectively. There are many such forms of service areas in europe that focus on shopping malls, entertainment facilities, accommodation, dining, etc. The other small single-side service area is mainly due to the topography, has the existing value in mountain areas and areas where enough construction space cannot be provided, and particularly is determined to be the planning construction of the small service area and the parking area, and the single-side service area can play a good role to a certain extent. The northwest region has complex terrain, and a single-sided service area is a necessary choice in some special geographic environments. The floor area is small, the use characteristics of the bidirectional running vehicle are met, and the bidirectional running vehicle can play a unique role in the special environments. In recent years, with the development of highways, some landscape service areas where scenes are parked appear.

The layout of the two-side service areas is most common, and the layout principle is that the service areas are arranged on two sides of the road, and service facilities and functional areas on two sides are the same. The expressway is a totally-enclosed bidirectional lane, and the road with the isolation belt arranged in the middle is provided with the double-side service areas, so that vehicles in different directions can be respectively used for entering and exiting; meanwhile, the service areas at two sides are communicated through the overpass or the tunnel, so that the use capacity and material allocation of the vehicle are optimized. The double-side service area is used in a good geographic environment, so that the double-side service area is more reasonable, the requirements of convenience in use, rapidness and high efficiency are met, and the commercial effect is maximized. The visual double-side service areas can be designed into the same layout form or different forms, and the natural landscape environment borrows and corresponds to the landscape form.

The expressway service area is used as an expressway industry to develop matched service facilities, and has important significance for rapid development of expressways and increase of mileage planning. The effective operation and high-quality service of the service area can better realize the social service value of the expressway, can increase the economic benefit of the expressway investment company, can provide employment opportunities and solve the shunting of surplus personnel. In addition, the opportunity of developing the service area can be utilized to obtain rare land resources and the like at low land cost.

And reinforcement learning is to consider learning as a heuristic evaluation process, wherein an Agent selects an action for an environment, the state of the environment changes after receiving the action, and a reinforcement signal (rewarding or punishment) is generated and fed back to the Agent, and the Agent reselects the next action according to the reinforcement signal and the current state of the environment, wherein the selection principle is to increase the probability of being subjected to positive reinforcement (rewarding). The action selected affects not only the immediate enhancement value, but also the state at the moment in the environment and the final enhancement value.

Reinforcement learning differs from supervised learning in connection with sense learning in that the reinforcement signal provided by the environment is an Agent's assessment of how well the generated action is (typically a scalar signal) rather than telling the Agent how to generate the correct action. Since the external environment provides little information, the Agent must learn from its own experiences. In this way, the Agent obtains knowledge in the environment of the action-by-action evaluation, and the action scheme is modified to adapt to the environment.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims

1. A service area open control decision method based on reinforcement learning, the method comprising:

2. The reinforcement learning based service area open control decision method of claim 1, wherein the method further comprises:

3. The reinforcement learning based service area open control decision method of claim 1, wherein the method further comprises:

4. A reinforcement learning based service area open control decision method as claimed in any one of claims 1-3, wherein:

5. A reinforcement learning based service area open control decision method as claimed in any one of claims 1-3, wherein:

6. A reinforcement learning based service area opening control decision making system, the system comprising:

7. The reinforcement-learning-based service area open control decision system of claim 6, wherein said system further comprises:

8. The reinforcement-learning-based service area open control decision system of claim 6, wherein said system further comprises:

9. A reinforcement learning based service area open control decision system as claimed in any one of claims 6 to 8, wherein:

10. A reinforcement learning based service area open control decision system as claimed in any one of claims 6 to 8, wherein: