CN117079479A - Traffic signal control method and device for subsequent reinforcement learning of space-time prediction - Google Patents

Traffic signal control method and device for subsequent reinforcement learning of space-time prediction Download PDF

Info

Publication number
CN117079479A
CN117079479A CN202311344089.9A CN202311344089A CN117079479A CN 117079479 A CN117079479 A CN 117079479A CN 202311344089 A CN202311344089 A CN 202311344089A CN 117079479 A CN117079479 A CN 117079479A
Authority
CN
China
Prior art keywords
road condition
state
network
prediction
basic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311344089.9A
Other languages
Chinese (zh)
Other versions
CN117079479B (en
Inventor
王永恒
王乐乐
巫英才
李炳强
王超
邵彬
陈卫
周春来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311344089.9A priority Critical patent/CN117079479B/en
Publication of CN117079479A publication Critical patent/CN117079479A/en
Application granted granted Critical
Publication of CN117079479B publication Critical patent/CN117079479B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0137Measuring and analyzing of parameters relative to traffic conditions for specific applications
    • G08G1/0145Measuring and analyzing of parameters relative to traffic conditions for specific applications for active traffic flow control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Traffic Control Systems (AREA)

Abstract

The present specification discloses a traffic signal control method and device for subsequent reinforcement learning of space-time prediction, by introducing the displayed traffic state prediction based on space-time characteristics, adopting LSTM and GAT networks to predict future microscopic states based on time correlation and space correlation respectively, and using the current and predicted states to make optimal decision by an intelligent agent, the space-time correlation of traffic data can be fully utilized, and the traffic efficiency of road network can be improved. Meanwhile, the subsequent features are combined with the deep reinforcement learning, the estimated rewards of the tasks and the expected features of the tasks are separated, the traffic light control tasks can be transferred more conveniently, the training speed of a traffic light control model is improved, and the accuracy and the intellectualization of traffic signal lamp control are improved.

Description

Traffic signal control method and device for subsequent reinforcement learning of space-time prediction
Technical Field
The present disclosure relates to the field of computer and traffic signal control technologies, and in particular, to a method and apparatus for traffic signal control by subsequent reinforcement learning of space-time prediction.
Background
The traffic signal lamps of the existing urban intersections are mostly operated according to the traditional fixed timing traffic signal control strategy, and the switching time and duration of each signal lamp are fixed and the actual traffic condition is not considered. Therefore, the traffic signal lamp in the existing fixed working mode cannot adapt to the change of traffic flow, and is unfavorable for improving the traffic efficiency of urban roads.
Therefore, how to design an intelligent traffic signal lamp control method so as to link traffic jam becomes a current problem.
Disclosure of Invention
The present disclosure provides a method and apparatus for traffic signal control with post reinforcement learning of spatio-temporal prediction to partially solve the above-mentioned problems in the prior art.
The technical scheme adopted in the specification is as follows:
the present specification provides a traffic signal control method for subsequent reinforcement learning of spatiotemporal prediction, comprising:
determining state data of a preset intersection in a current state, wherein the state data comprises: basic road condition state, time sequence road condition state and space road condition state;
inputting the basic road condition state into a basic feature extraction network to obtain basic road condition features, inputting the time sequence road condition state into a time sequence prediction network to obtain road condition features at the next moment, and inputting the space road condition state into the space prediction network to obtain road condition features at the next moment as first road condition features and second road condition features;
splicing the basic road condition characteristics, the first road condition characteristics and the second road condition characteristics, and inputting the spliced characteristics and preset actions into a preset prediction model to obtain a prediction value corresponding to the current state;
And training the prediction model, the basic feature extraction network, the time sequence prediction network and the spatial prediction network according to the prediction value quantity so as to control traffic signals through the trained prediction model, the trained basic feature extraction network, the trained time sequence prediction network and the trained spatial prediction network.
Optionally, before training the prediction model, the basic feature extraction network, the time-series prediction network, and the spatial prediction network, the method further comprises:
inputting the basic road condition characteristics into a preset restoration network to obtain restoration state data;
training the prediction model, the basic feature extraction network, the time sequence prediction network and the space prediction network, wherein the method specifically comprises the following steps of:
and training the prediction model, the basic feature extraction network, the time sequence prediction network and the spatial prediction network according to the prediction value quantity by taking the difference between the minimized reduced state data and the basic road condition state, the difference between the first road condition feature and the basic road condition feature of the next state and the difference between the second road condition feature and the basic road condition feature of the next state as an optimization target.
Optionally, the basic feature extraction network is a multi-layer perceptron, the time sequence prediction network is a long-short-term memory network, and the spatial prediction network is a graph attention network.
Optionally, the basic road condition state includes a length of a lane waiting queue of each lane of the preset intersection in the current state, the time sequence road condition state includes a length of a lane waiting queue of each lane of the preset intersection in a preset time before the current state, and the space road condition state includes; and the traffic states of the preset intersection and the neighbor intersection of the preset intersection.
Optionally, before determining the state data of the preset intersection in the current state, the method further includes:
modeling an intersection by using traffic simulation software SUMO, wherein the intersection of a road in the intersection consists of four paths including east, south, west and north, a traffic signal lamp is arranged at the intersection, the intersection is a bidirectional 6-lane, a left Bian Chedao lane is a left-turning lane along the running direction of a vehicle, a middle lane is a straight-going lane, and a right lane is a straight-going plus right-turning lane; the control signals of the traffic signal lamp are respectively as follows: the direction of north and south is straight, the direction of south and north is left, the direction of east and west is straight, and the direction of east and west is left. A yellow signal lamp is designed for transition between the four phase switches, so that vehicles can safely pass through an intersection;
The determining of the state data of the preset intersection in the current state specifically comprises the following steps:
modeling the intersection through the traffic simulation software SUMO to obtain state data of the preset intersection in the current state.
Optionally, the preset actions include: the green light is turned to the north-south direction, the green light is turned to the south-north direction and the green light is turned to the east-west direction.
Optionally, before training the prediction model, the basic feature extraction network, the time-series prediction network, and the spatial prediction network, the method further comprises:
determining rewards obtained after taking preset actions according to the difference between the length of the lane waiting queue of each lane of the preset intersection in the current state and the length of the lane waiting queue of each lane of the preset intersection in the next state;
training the prediction model, the basic feature extraction network, the time sequence prediction network and the space prediction network, wherein the method specifically comprises the following steps of:
and training the prediction model, the basic feature extraction network, the time sequence prediction network and the spatial prediction network according to the prediction value quantity and the rewards by taking the minimized difference between the first road condition feature and the basic road condition feature of the next state as an optimization target and minimizing the difference between the second road condition feature and the basic road condition feature of the next state.
The present specification provides a traffic signal control apparatus for subsequent reinforcement learning of spatiotemporal prediction, comprising:
the determining module is used for determining state data of the preset intersection in the current state, and the state data comprise: basic road condition state, time sequence road condition state and space road condition state;
the feature extraction module is used for inputting the basic road condition state into a basic feature extraction network to obtain basic road condition features, inputting the time sequence road condition state into a time sequence prediction network to obtain road condition features at the next moment, and inputting the space road condition state into the space prediction network to obtain road condition features at the next moment as first road condition features and second road condition features;
the prediction module is used for splicing the basic road condition characteristics, the first road condition characteristics and the second road condition characteristics, inputting the spliced characteristics and preset actions into a preset prediction model, and obtaining a prediction value quantity corresponding to the current state;
the training module is used for taking the minimized difference between the first road condition characteristic and the basic road condition characteristic of the next state as an optimization target, and training the prediction model, the basic characteristic extraction network, the time sequence prediction network and the spatial prediction network according to the prediction value quantity so as to control traffic signals through the trained prediction model, the trained basic characteristic extraction network, the trained time sequence prediction network and the trained spatial prediction network.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements a traffic signal control method for subsequent reinforcement learning of spatiotemporal prediction as described above.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a traffic signal control method for subsequent reinforcement learning of spatiotemporal predictions as described above when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
as can be seen from the traffic signal control method of the subsequent reinforcement learning of the space-time prediction, the state data of the preset intersection in the current state can be determined, and the state data includes: the method comprises the steps of inputting a basic road condition state, a time sequence road condition state and a space road condition state into a basic feature extraction network to obtain basic road condition features, inputting the time sequence road condition state into a time sequence prediction network to obtain road condition features at the next moment, inputting the space road condition state into the space prediction network to obtain road condition features at the next moment as first road condition features, splicing the basic road condition features, the first road condition features and the second road condition features as second road condition features, inputting the spliced features and preset actions into a preset prediction model to obtain a prediction value quantity corresponding to the current state, minimizing the difference between the first road condition features and the basic road condition features at the next state as an optimization target, and training the prediction model, the basic feature extraction network, the time sequence prediction network and the space prediction network according to the prediction value quantity so as to control traffic signals through the trained prediction model, the basic feature extraction network, the time sequence prediction network and the space prediction network.
From the above, it can be seen that, in the traffic signal lamp control scene, the method not only adopts the training mode of reinforcement learning, but also adds the method in the model training loss because the relation between the characteristics and the model output result is not considered in the model loss in the prior art, and predicts the road condition characteristics through time sequence data and the road condition characteristics through space data, thereby improving the accuracy and the intellectualization of traffic signal lamp control to a certain extent.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a flow chart of a method for traffic signal control for subsequent reinforcement learning of spatiotemporal prediction provided in the present specification;
FIG. 2 is a schematic diagram of a traffic light controlled network provided in the present specification;
FIG. 3 is a flowchart of the detailed steps of constructing a road network model, training the model, and controlling traffic lights through the trained model provided in the present specification;
FIG. 4 is a schematic diagram of a traffic signal control apparatus for subsequent reinforcement learning of spatiotemporal prediction provided herein;
fig. 5 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a traffic signal control method for subsequent reinforcement learning of space-time prediction provided in the present specification, specifically including the following steps:
s100: determining state data of a preset intersection in a current state, wherein the state data comprises: basic road condition state, time sequence road condition state and space road condition state.
S102: inputting the basic road condition state into a basic feature extraction network to obtain basic road condition features, inputting the time sequence road condition state into a time sequence prediction network to obtain first road condition features at the next moment, and inputting the space road condition state into a space prediction network to obtain second road condition features at the next moment.
S104: and splicing the basic road condition features, the first road condition features and the second road condition features, and inputting the spliced features and the selected actions into a prediction model to obtain a prediction value corresponding to the current state.
S106: and training the prediction model, the basic feature extraction network, the time sequence prediction network and the space prediction network according to the prediction value quantity so as to control the traffic signal through the trained prediction model.
In this specification, reinforcement learning models may be trained to control traffic signals.
Specifically, state data of the preset intersection in the current state may be determined, where the state data may include: basic road condition state, time sequence road condition state and space road condition state.
The basic road condition state may include a length of a lane waiting queue of each lane of the preset intersection in the current state, and specifically, the basic road condition state may be represented by the following expression:
wherein t represents the current time of day,for queuing length, +.>The number of entrance ways of the intersection.
The time sequence road condition state comprises the length of a lane waiting queue of each lane of the preset intersection in a preset time before the current state, and the length can be specifically expressed by the following expression:
where L represents the sequence length. Historical traffic state information may be saved for time dependent modeling.
The space road condition state comprises; the traffic states of the preset intersection and the neighbor intersection of the preset intersection can be specifically represented by the following expression:
wherein the method comprises the steps ofAnd respectively representing the traffic states of the east, west, south and north neighbor intersections of the preset intersection at the moment t. Traffic state information of the neighbor intersections is saved for spatial dependency modeling.
It should be noted that the intersection mentioned above may be an intersection.
Then, the basic road condition state can be input into the basic feature extraction network to obtain basic road condition features, the time sequence road condition state is input into the time sequence prediction network to obtain road condition features at the next moment, the space road condition state is input into the space prediction network as a first road condition feature to obtain road condition features at the next moment, and the structure of the whole network of the specification is shown in fig. 2 as a second road condition feature.
Fig. 2 is a schematic structural diagram of a network for traffic light control provided in the present specification.
Wherein the basic feature extraction network is a multi-layer perceptron, i.e. the basic road condition state can be obtainedInput multi-layer perceptron neural network->Obtain basic road condition characteristics->The time sequence prediction network is a long-term and short-term memory network, and the time sequence road condition state can be input into the long-term and short-term memory network>Obtaining the road condition status embedding at the next moment: first road condition feature->The spatial prediction network is a graph annotation meaning network, and the spatial road condition state can be input into the graph annotation meaning network +.>Modeling the spatial correlation to obtain the road condition status embedding at the next moment: second road condition feature->
And then, the basic road condition features, the first road condition features and the second road condition features can be spliced, and the spliced features and the selected actions are input into a prediction model to obtain the prediction value quantity corresponding to the current state.
Wherein, can be used forAnd->Is->Splicing to obtain spliced features as final state embedding
And then, the basic road condition features, the first road condition features and the second road condition features can be spliced, and the obtained spliced features and preset actions are input into a prediction model to obtain a prediction value quantity corresponding to the current state.
There may be 4 kinds of preset actions, including a north-south direction straight green light (a 1), a north-south direction left-turning green light (a 2), an east-west direction straight green light (a 3), and a east-west direction left-turning green light (a 4); each green light phase is set to be 10 seconds in a fixed duration, and a yellow light with a duration of 3 seconds is executed during two action switching periods; the agent will select an action at the beginning of each time stepAfter execution a new state is obtained>
The bonus function Reward design: the rewards for actions are defined as the difference between the sum of all lane vehicle queuing lengths for the intersection neighborhood states:
wherein the method comprises the steps ofThe sum of the queuing lengths of all lanes of the intersection at the time t is represented; />Indicating the sum of the queuing lengths of all lanes in the next state of the intersection.
That is, the rewards obtained after taking the preset actions can be determined according to the difference between the length of the lane waiting queue of each lane of the preset intersection in the current state and the length of the lane waiting queue of each lane of the preset intersection in the next state. The preset actions taken as referred to herein may refer to one preset action selected from various preset actions in the current state.
The model of the present general description (subsequent reinforcement learning model) includes, in addition to the above-described network for feature extraction, a bonus function prediction branch, a state reconstruction branch, and a subsequent branch.
In the training of the neural network model, features are often directly input into the network to obtain results, the network is trained through the obtained results, and the results are similar to a black box, so that the effects of the features in the network are not reflected in loss.
In order to solve the problem, in the application, the loss of the various characteristics is added in the training of the whole network, namely, the loss comprises the steps of minimizing the difference between the first road condition characteristic and the basic road condition characteristic of the next state and the second road condition characteristic and the difference between the basic road condition characteristic of the next state as optimization targets.
And the state reconstruction is also carried out, namely, the basic road condition characteristics can be input into a preset restoration network to obtain restoration state data, and a training target for minimizing the difference between the restoration state data and the basic road condition state is added into the loss of the whole network.
Specifically, the state reconstruction branch is based on post-splice featuresEnsure to pass deep neural network ∈ ->The current state of the traffic signal is reconstructed, i.e.,
simultaneously, space-time relation constraint is introduced into the reconstruction loss function to ensure the correctness of space-time embedding:
The first part ensures learned characteristicsCan reconstruct +.>The second part ensures that the road condition characteristic of the next state is predicted by the time sequence road condition state in the previous state +.>Can approximate the current state embedded (i.e. the basic road condition features) to->Namely, the model can accurately learn time related characteristics to predict the embedding of the road condition state at the next moment; the third part ensures that the road condition characteristics of the next state are predicted through the space road condition state in the previous stateCan be embedded approximately in the current state>Namely, the model can accurately learn the spatial correlation characteristic to predict the embedding of the road condition state at the next moment.
The prediction of traffic signals may be performed in this specification by training a subsequent reinforcement learning DSR model. The basic road condition features, the first road condition features and the second road condition features are spliced, and the spliced features and preset actions are input into a preset prediction model, so that the prediction value quantity corresponding to the current state can be obtained. For each preset action, the predicted value (Q value) obtained by predicting the current state by adopting the preset action can be obtained through a prediction model, and in the training stage, the actual reward obtained by adopting the preset action can be obtained through the reward function.
By training the subsequent reinforcement learning model (the model includes the network for performing feature extraction and the prediction model, and of course, some other structures), the traffic signal can be controlled by the subsequent reinforcement learning model.
A detailed process flow diagram of the method provided in this specification is shown in fig. 3.
Fig. 3 is a flowchart of detailed steps for constructing a road network model, training the model, and controlling traffic lights through the trained model provided in the present specification.
As can be seen from fig. 3, in the present specification, the road network model may be performed in a simulation manner (the road network model referred to herein refers to modeling the intersection by the traffic simulation software SUMO, so as to obtain the state data of the preset intersection in each state, that is, the state data of the preset intersection in the current state is included), so as to obtain the simulated basic road condition state, the simulated time-series road condition state and the simulated space road condition state.
Then, the state space, the action space and the rewarding function of the traffic signal lamp control model based on reinforcement learning can be designed, and the DSR model is trained to obtain the traffic signal lamp control model (comprising the prediction model, the basic feature extraction network, the time sequence prediction network and the space prediction network) so as to control the traffic signal lamp.
The bonus function prediction branches can be embedded with the final state (post-splice features)Based on a linear network->To predict rewards for the Q network.
Predicting a bonus function in a branch based on the bonus functionThe method comprises the following steps of:
wherein is defined asIs a subsequent feature.
Reconstruction branch-based deceptible featuresReconstruction state->Features->As a state->The following feature constraints are rewritten as:
the following features may be calculated using the bellman equation:
from the above formula, the DQN network can be used to learn the subsequent features, with subsequent branches embedded with the final stateNeural network based on multilayer perceptron>Predicting successes of a Q networkFeatures.
Based on the subsequent branches, we get:
learning feature maps using gradient descent algorithmsParameter->,/>,/>And the weight of the bonus functionAnd +.>. The loss function is:
in subsequent feature mapping:/>
Learning feature mappingParameter->,/>,/>And weight of the reward function +.>
The first part ensures that the learned bonus function weights can be regressed, and the second part ensures the learned features, respectivelyCan reconstruct +.>The method comprises the steps of carrying out a first treatment on the surface of the Prediction state embedding->Can be embedded approximately in the current state>Namely, the model can accurately learn time correlation characteristics to predict the state embedding at the next moment; prediction state embedding Can be embedded approximately in the current state>That is, the model can accurately learn the spatial correlation characteristic to predict the state embedding at the next moment, and it should be noted that the two part learning can be separated, that is, the two parts can be cross-trained during training.
Finally, a decision may be made based on the Q value of the action:
then, DSR models may be trained;
constructing a traffic signal lamp control agent model, wherein an agent observes traffic environment, and records waiting queue length (basic road condition state), time state sequence (time sequence road condition state), space state sequence (space road condition state) and signal lamp phase state information of an intersection. Waiting for the queue length, the time state sequence and the space state sequence to be used as the input of a control network model;
and selecting an action with a larger Q value by using an epsilon-greedy strategy (epsilon-greedy algorithm), deciding the action at the next moment, and guiding to formulate a signal switching strategy of a signal lamp to finish signal switching, wherein the traffic environment enters a new state and rewards the behavior of an intelligent agent, and the intelligent agent observes the environment information again to finish learning and decision.
And collecting information (St, at, rt, st+1) obtained by interaction of each time step and the environment by using an experience playback mechanism, selecting a proper optimizer and setting super parameters, uniformly sampling small batches of samples from an experience pool as training data, training and updating parameters in the DRL intelligent agent and a prediction neural network, judging that training of the deep reinforcement learning model is completed if all prediction models in the deep reinforcement learning model are judged to reach a convergence state, storing the trained deep reinforcement learning model, and defining the deep reinforcement learning model as a traffic signal lamp control model.
Finally, signal lamp control can be performed through the model after training is completed;
the execution subject of the embodiment can be a traffic light control system, and can be configured at a client side or a server side, a trained traffic light control model is built in the traffic light control system, and when the traffic light is controlled, the traffic light control model can be input by acquiring traffic data of a target traffic intersection in real time to obtain a traffic control strategy of the target traffic intersection, and the traffic light control of the target traffic intersection is executed according to the traffic control strategy.
According to the method, in the traffic signal lamp control scene, the training mode of reinforcement learning is adopted, and in addition, the relation between the characteristics and the model output result is not considered in the model loss in the prior art, so that the method is added in the model training loss, the road condition characteristics are predicted through time sequence data, and the road condition characteristics are predicted through space data, and the accuracy and the intellectualization of traffic signal lamp control are improved to a certain extent.
The above provides a traffic signal control method for the subsequent reinforcement learning of the space-time prediction for one or more embodiments of the present specification, and based on the same idea, the present specification also provides a traffic signal control apparatus for the subsequent reinforcement learning of the space-time prediction, as shown in fig. 4.
FIG. 4 is a schematic diagram of a traffic signal control apparatus for subsequent reinforcement learning of spatiotemporal prediction, including;
the determining module 401 is configured to determine state data of a preset intersection in a current state, where the state data includes: basic road condition state, time sequence road condition state and space road condition state;
the feature extraction module 402 is configured to input the basic road condition state to a basic feature extraction network to obtain a basic road condition feature, input the time-sequence road condition state to a time-sequence prediction network to obtain a road condition feature at a next moment, and input the spatial road condition state to the spatial prediction network as a first road condition feature to obtain a road condition feature at a next moment, as a second road condition feature;
the prediction module 403 is configured to splice the basic road condition feature, the first road condition feature and the second road condition feature, and input the spliced feature and a preset action into a preset prediction model to obtain a prediction value corresponding to the current state;
the training module 404 is configured to train the prediction model, the basic feature extraction network, the time sequence prediction network, and the spatial prediction network with the objective of minimizing the difference between the first road condition feature and the basic road condition feature of the next state, and minimizing the difference between the second road condition feature and the basic road condition feature of the next state according to the prediction value amount, so as to perform traffic signal control through the trained prediction model, basic feature extraction network, time sequence prediction network, and spatial prediction network.
Optionally, before the training module 404 trains the prediction model, the basic feature extraction network, the time sequence prediction network, and the spatial prediction network, the prediction module 403 is further configured to input the basic road condition features into a preset restoration network to obtain restoration state data;
the training module 404 is specifically configured to minimize a difference between the restored state data and the basic road condition state, a difference between the first road condition feature and the basic road condition feature of the next state, and a difference between the second road condition feature and the basic road condition feature of the next state as an optimization objective, and train the prediction model, the basic feature extraction network, the time sequence prediction network, and the spatial prediction network according to the prediction value.
Optionally, the basic feature extraction network is a multi-layer perceptron, the time sequence prediction network is a long-short-term memory network, and the spatial prediction network is a graph attention network.
Optionally, the basic road condition state includes a length of a lane waiting queue of each lane of the preset intersection in the current state, the time sequence road condition state includes a length of a lane waiting queue of each lane of the preset intersection in a preset time before the current state, and the space road condition state includes; and the traffic states of the preset intersection and the neighbor intersection of the preset intersection.
Optionally, before determining the state data of the preset intersection in the current state, the device further includes:
the modeling module 405 is configured to model an intersection through traffic simulation software SUMO, wherein a road intersection in the intersection is composed of four roads including east, south, west and north, a traffic signal lamp is arranged at the intersection, the intersection is a bidirectional 6-lane, a left Bian Chedao lane is a left-turning lane along the running direction of a vehicle, a middle lane is a straight-going lane, and a right lane is a straight-going plus right-turning lane; the control signals of the traffic signal lamp are respectively as follows: the direction of north and south is straight, the direction of south and north is left-turned, the direction of east and west is straight, and the direction of east and west is straight. A yellow signal lamp is designed for transition between the four phase switches, so that vehicles can safely pass through an intersection;
the determining module 401 is specifically configured to model the intersection through the traffic simulation software SUMO, so as to obtain state data for determining that the preset intersection is in the current state.
Optionally, the preset actions include: the green light is turned to the north-south direction, the green light is turned to the south-north direction and the green light is turned to the east-west direction.
Optionally, before training the prediction model, the basic feature extraction network, the time sequence prediction network, and the spatial prediction network, the training module 404 is further configured to determine a reward obtained after taking a preset action according to a difference between a length of a lane waiting queue of each lane of the preset intersection in a current state and a length of a lane waiting queue of each lane of the preset intersection in a next state;
The training module 404 is specifically configured to train the prediction model, the basic feature extraction network, the time sequence prediction network, and the spatial prediction network with a view to minimizing a difference between the first road condition feature and the basic road condition feature of the next state, minimizing a difference between the second road condition feature and the basic road condition feature of the next state as an optimization target, and according to the prediction value and the reward.
The present specification also provides a computer-readable storage medium storing a computer program operable to perform the traffic signal control method of subsequent reinforcement learning of spatiotemporal prediction described above.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 5. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the traffic signal control method of the subsequent reinforcement learning of the space-time prediction.
Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (10)

1. A method of traffic signal control for subsequent reinforcement learning of spatiotemporal prediction, comprising:
determining state data of a preset intersection in a current state, wherein the state data comprises: basic road condition state, time sequence road condition state and space road condition state;
inputting the basic road condition state into a basic feature extraction network to obtain basic road condition features, inputting the time sequence road condition state into a time sequence prediction network to obtain road condition features at the next moment, and inputting the space road condition state into the space prediction network to obtain road condition features at the next moment as first road condition features and second road condition features;
splicing the basic road condition characteristics, the first road condition characteristics and the second road condition characteristics, and inputting the spliced characteristics and preset actions into a preset prediction model to obtain a prediction value corresponding to the current state;
and training the prediction model, the basic feature extraction network, the time sequence prediction network and the spatial prediction network according to the prediction value quantity so as to control traffic signals through the trained prediction model, the trained basic feature extraction network, the trained time sequence prediction network and the trained spatial prediction network.
2. The method of claim 1, wherein prior to training the predictive model, the base feature extraction network, the temporal prediction network, and the spatial prediction network, the method further comprises:
inputting the basic road condition characteristics into a preset restoration network to obtain restoration state data;
training the prediction model, the basic feature extraction network, the time sequence prediction network and the space prediction network, wherein the method specifically comprises the following steps of:
and training the prediction model, the basic feature extraction network, the time sequence prediction network and the spatial prediction network according to the prediction value quantity by taking the difference between the minimized reduced state data and the basic road condition state, the difference between the first road condition feature and the basic road condition feature of the next state and the difference between the second road condition feature and the basic road condition feature of the next state as an optimization target.
3. The method of claim 1, wherein the basic feature extraction network is a multi-layer perceptron, the temporal prediction network is a long-short-term memory network, and the spatial prediction network is a graph-meaning network.
4. The method of claim 1, wherein the basic road condition state comprises a length of a lane waiting queue of each lane of the preset intersection in a current state, the time-series road condition state comprises a length of a lane waiting queue of each lane of the preset intersection in a preset time before the current state, and the space road condition state comprises; and the traffic states of the preset intersection and the neighbor intersection of the preset intersection.
5. The method of claim 1, wherein prior to determining the status data of the preset intersection in the current state, the method further comprises:
modeling an intersection by using traffic simulation software SUMO, wherein the intersection of a road in the intersection consists of four paths including east, south, west and north, a traffic signal lamp is arranged at the intersection, the intersection is a bidirectional 6-lane, a left Bian Chedao lane is a left-turning lane along the running direction of a vehicle, a middle lane is a straight-going lane, and a right lane is a straight-going plus right-turning lane; the control signals of the traffic signal lamp are respectively as follows: a yellow signal lamp is designed for transition among four phase switching, so that vehicles can safely pass through an intersection;
the determining of the state data of the preset intersection in the current state specifically comprises the following steps:
modeling the intersection through the traffic simulation software SUMO to obtain state data of the preset intersection in the current state.
6. The method of claim 1, wherein the preset action comprises: the green light is turned to the north-south direction, the green light is turned to the south-north direction and the green light is turned to the east-west direction.
7. The method of claim 1, wherein prior to training the predictive model, the base feature extraction network, the temporal prediction network, and the spatial prediction network, the method further comprises:
determining rewards obtained after taking preset actions according to the difference between the length of the lane waiting queue of each lane of the preset intersection in the current state and the length of the lane waiting queue of each lane of the preset intersection in the next state;
training the prediction model, the basic feature extraction network, the time sequence prediction network and the space prediction network, wherein the method specifically comprises the following steps of:
and training the prediction model, the basic feature extraction network, the time sequence prediction network and the spatial prediction network according to the prediction value quantity and the rewards by taking the minimized difference between the first road condition feature and the basic road condition feature of the next state as an optimization target and minimizing the difference between the second road condition feature and the basic road condition feature of the next state.
8. A traffic signal control apparatus for subsequent reinforcement learning of spatiotemporal prediction, comprising:
the determining module is used for determining state data of the preset intersection in the current state, and the state data comprise: basic road condition state, time sequence road condition state and space road condition state;
The feature extraction module is used for inputting the basic road condition state into a basic feature extraction network to obtain basic road condition features, inputting the time sequence road condition state into a time sequence prediction network to obtain road condition features at the next moment, and inputting the space road condition state into the space prediction network to obtain road condition features at the next moment as first road condition features and second road condition features;
the prediction module is used for splicing the basic road condition characteristics, the first road condition characteristics and the second road condition characteristics, inputting the spliced characteristics and preset actions into a preset prediction model, and obtaining a prediction value quantity corresponding to the current state;
the training module is used for taking the minimized difference between the first road condition characteristic and the basic road condition characteristic of the next state as an optimization target, and training the prediction model, the basic characteristic extraction network, the time sequence prediction network and the spatial prediction network according to the prediction value quantity so as to control traffic signals through the trained prediction model, the trained basic characteristic extraction network, the trained time sequence prediction network and the trained spatial prediction network.
9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.
CN202311344089.9A 2023-10-17 2023-10-17 Traffic signal control method and device for subsequent reinforcement learning of space-time prediction Active CN117079479B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311344089.9A CN117079479B (en) 2023-10-17 2023-10-17 Traffic signal control method and device for subsequent reinforcement learning of space-time prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311344089.9A CN117079479B (en) 2023-10-17 2023-10-17 Traffic signal control method and device for subsequent reinforcement learning of space-time prediction

Publications (2)

Publication Number Publication Date
CN117079479A true CN117079479A (en) 2023-11-17
CN117079479B CN117079479B (en) 2024-01-16

Family

ID=88717679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311344089.9A Active CN117079479B (en) 2023-10-17 2023-10-17 Traffic signal control method and device for subsequent reinforcement learning of space-time prediction

Country Status (1)

Country Link
CN (1) CN117079479B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611859A (en) * 2020-04-21 2020-09-01 河北工业大学 Gait recognition method based on GRU
CN113223303A (en) * 2021-05-18 2021-08-06 华录易云科技有限公司 Road traffic double-core signal control machine and control system with same
CN113963555A (en) * 2021-10-12 2022-01-21 南京航空航天大学 Deep reinforcement learning traffic signal control method combined with state prediction
CN114038212A (en) * 2021-10-19 2022-02-11 南京航空航天大学 Signal lamp control method based on two-stage attention mechanism and deep reinforcement learning
US20220398921A1 (en) * 2021-06-14 2022-12-15 The Governing Council Of The University Of Toronto Method and system for traffic signal control with a learned model
CN116453343A (en) * 2023-04-27 2023-07-18 云控智行(上海)汽车科技有限公司 Intelligent traffic signal control optimization algorithm, software and system based on flow prediction in intelligent networking environment
CN116776135A (en) * 2023-08-24 2023-09-19 之江实验室 Physical field data prediction method and device based on neural network model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611859A (en) * 2020-04-21 2020-09-01 河北工业大学 Gait recognition method based on GRU
CN113223303A (en) * 2021-05-18 2021-08-06 华录易云科技有限公司 Road traffic double-core signal control machine and control system with same
US20220398921A1 (en) * 2021-06-14 2022-12-15 The Governing Council Of The University Of Toronto Method and system for traffic signal control with a learned model
CN113963555A (en) * 2021-10-12 2022-01-21 南京航空航天大学 Deep reinforcement learning traffic signal control method combined with state prediction
CN114038212A (en) * 2021-10-19 2022-02-11 南京航空航天大学 Signal lamp control method based on two-stage attention mechanism and deep reinforcement learning
CN116453343A (en) * 2023-04-27 2023-07-18 云控智行(上海)汽车科技有限公司 Intelligent traffic signal control optimization algorithm, software and system based on flow prediction in intelligent networking environment
CN116776135A (en) * 2023-08-24 2023-09-19 之江实验室 Physical field data prediction method and device based on neural network model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐建闽等: "基于深度强化学习的自适应交通信号控制研究", 《重庆交通大学学报(自然科学版)》, vol. 41, no. 8 *
李春贵;周坚和;孙自广;王萌;张增芳;: "基于多智能体团队强化学习的交通信号控制", 广西工学院学报, no. 02 *

Also Published As

Publication number Publication date
CN117079479B (en) 2024-01-16

Similar Documents

Publication Publication Date Title
CN111208838B (en) Control method and device of unmanned equipment
CN110929431B (en) Training method and device for vehicle driving decision model
CN110488821B (en) Method and device for determining unmanned vehicle motion strategy
CN112306059B (en) Training method, control method and device for control model
CN111076739B (en) Path planning method and device
CN114194211B (en) Automatic driving method and device, electronic equipment and storage medium
CN111238523A (en) Method and device for predicting motion trail
CN116432778B (en) Data processing method and device, storage medium and electronic equipment
CN111522245B (en) Method and device for controlling unmanned equipment
CN111062372A (en) Method and device for predicting obstacle track
CN112629550A (en) Method and device for predicting obstacle trajectory and training model
CN116304720A (en) Cost model training method and device, storage medium and electronic equipment
CN116453343A (en) Intelligent traffic signal control optimization algorithm, software and system based on flow prediction in intelligent networking environment
CN114547972A (en) Dynamic model construction method and device, storage medium and electronic equipment
CN117079479B (en) Traffic signal control method and device for subsequent reinforcement learning of space-time prediction
CN114153207B (en) Control method and control device of unmanned equipment
CN111123957B (en) Method and device for planning track
CN110895406B (en) Method and device for testing unmanned equipment based on interferent track planning
CN114019971B (en) Unmanned equipment control method and device, storage medium and electronic equipment
CN112925331B (en) Unmanned equipment control method and device, storage medium and electronic equipment
CN115047864A (en) Model training method, unmanned equipment control method and device
CN114280960A (en) Automatic driving simulation method and device, storage medium and electronic equipment
CN114120273A (en) Model training method and device
CN114019981B (en) Track planning method and device for unmanned equipment
CN112925210A (en) Method and device for model training and unmanned equipment control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant