CN113553934B - Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning - Google Patents

Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN113553934B
CN113553934B CN202110811357.8A CN202110811357A CN113553934B CN 113553934 B CN113553934 B CN 113553934B CN 202110811357 A CN202110811357 A CN 202110811357A CN 113553934 B CN113553934 B CN 113553934B
Authority
CN
China
Prior art keywords
vehicle
decision
reinforcement learning
information
environment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110811357.8A
Other languages
Chinese (zh)
Other versions
CN113553934A (en
Inventor
王刚
张禹瑄
徐谦
胡玮通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202110811357.8A priority Critical patent/CN113553934B/en
Publication of CN113553934A publication Critical patent/CN113553934A/en
Application granted granted Critical
Publication of CN113553934B publication Critical patent/CN113553934B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

A ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning, the method comprises the following steps: the deep reinforcement learning decision network analyzes and calculates the acquired vehicle information and environment information to obtain different feature expressions, analyzes the environment feature expressions and makes intelligent decisions; the scoring module judges and scores the current driving state by using the driving characteristic expression of the driver, and records the current scoring score and the current driving state termination times; the experience pool evaluates the termination state of the current driving state and stores the driving environment state, the score, the decision result and the termination state as experience; and randomly extracting a plurality of experiences to carry out parameter adjustment on the deep reinforcement learning decision network, so as to obtain a deep network model integrating environment perception and intelligent decision. The invention realizes the direct judgment from environment to decision by using the integrated model, and solves the problem that the intelligent decision of the ground unmanned vehicle can not be realized under the environment condition of a complex road at present.

Description

Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning
Technical Field
The invention relates to the technical field of vehicle control, in particular to a ground unmanned vehicle intelligent decision method and system based on deep reinforcement learning.
Background
In the traditional environment sensing and intelligent decision method, a method of measuring the environment by adopting a plurality of sensors and effectively utilizing the measured data and then quickly fusing the multi-source data is always a difficult problem. For an intelligent decision system, a knowledge base is established by acquiring a large number of driving rules, historical driving environments and intelligent decision data experience, and accurate intelligent decision cannot be performed on complex unstructured road surface environment information in real situations. Therefore, under the condition of complex road surface conditions in limited manpower and real driving environments, accurate intelligent decision on the driving state of the vehicle cannot be carried out.
Disclosure of Invention
The invention provides a ground unmanned vehicle intelligent decision method and system based on deep reinforcement learning, which aims to solve the problem that the existing environment perception and intelligent decision method can not realize the ground unmanned vehicle intelligent decision under the complex road environment condition.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a ground unmanned vehicle intelligent decision-making method based on deep reinforcement learning comprises the following steps:
s1, collecting vehicle information and surrounding environment information of a vehicle;
s2, the deep reinforcement learning decision network analyzes and calculates the vehicle information and the environment information obtained in the step S1 to obtain driving characteristic expression and vehicle environment characteristic expression of a driver;
s3, the deep reinforcement learning decision network analyzes and calculates the vehicle environment characteristic expression, and gives an intelligent decision result to the current driving environment;
s4, analyzing and calculating the driving characteristic expression of the driver by a scoring module, judging and scoring the current driving state of the vehicle, and recording the current scoring score and the current driving state termination times;
s5, storing the driving characteristic expression of the driver, the intelligent decision result, the score of the current vehicle state and whether the current driving state is terminated or not as an experience in an experience pool module;
s6, extracting a plurality of experiences in the experience pool module, carrying out back propagation on the deep reinforcement learning decision network, and adjusting the decision network parameters until the deep reinforcement learning decision network converges.
Further, in the step S1, vehicle information and environmental information around the vehicle are collected through a GNSS positioning system, a camera sensor, a millimeter wave radar sensor, a laser radar sensor and a sound sensor mounted on the unmanned vehicle; and meanwhile, the current signal information of a steering lamp, a brake and an accelerator of the vehicle is collected.
Further, acquiring current vehicle position information through the GNSS positioning system; acquiring environmental image information around the vehicle through the camera sensor; acquiring environment three-dimensional point cloud information through the millimeter wave radar and the laser radar sensor, and obtaining the distance, the relative speed and the relative azimuth of the obstacle and the vehicle, and the length, the width and the volume of the obstacle; and acquiring sound information through the sound sensor.
Further, the intelligent decision result in step S3 includes a lateral decision and a longitudinal decision; the transverse decision comprises lane keeping, lane changing, left turning and right turning of the vehicle, and the longitudinal decision comprises acceleration, deceleration and uniform speed of the vehicle.
Further, in the step S4, the scoring module determines and scores the current driving state of the vehicle according to the position of the current vehicle in the lane, the distance between the current vehicle and the adjacent vehicle, the distance between the current vehicle and the preceding vehicle, and the distance information between the current vehicle and the preceding obstacle.
Further, the deep reinforcement learning decision network includes a deep convolutional neural network for processing image information and ambient three-dimensional point cloud information and a deep round robin neural network for processing sound information.
Further, the network structure of the deep reinforcement learning decision network consists of at least one convolution layer and two fully connected streams; two fully connected streams are located after the convolutional layer; each full connection flow is composed of at least one full connection layer; the number of neurons of the last layer of the full-connection layer is the same as the corresponding number of transverse decision types and longitudinal decision types; the convolutional layer is formed as a deep convolutional neural network and the fully-connected stream is formed as a deep convolutional neural network.
Further, the full connection layer carries out nonlinear calculation on the input vehicle environment feature expression, extracts the integral feature in the vehicle environment feature expression, and carries out intelligent decision on the integral feature.
Further, the intelligent decisions comprise a transverse decision and a longitudinal decision, the transverse decision comprises a vehicle keeping lane, lane changing, left turning and right turning, and the longitudinal decision comprises vehicle acceleration, deceleration and uniform speed.
Further, in the step S4, the scoring module analyzes and calculates the driving characteristic expression of the driver, specifically, the scoring module judges whether the current vehicle state is terminated according to the turn signal, the brake signal and the accelerator signal in the driving characteristic expression of the driver.
Further, the current vehicle state is judged to be terminated when at least one of the following occurs: when the intelligent decision result is that the lane is kept and the turning lamp is in a lighted state within the decision threshold time, the current driving state is judged to be terminated; when the intelligent decision result is lane change and the turning lamp of the vehicle is not in a lighted state within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is turning and the turning lamp of the corresponding direction of the vehicle is not on within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is deceleration and the signal exists on the vehicle accelerator within the decision threshold time, judging that the current driving state is terminated; and when the intelligent decision result is acceleration and the braking of the vehicle is signaled within the decision-making threshold time, judging that the current driving state is terminated.
Further, in the step S6, a plurality of experiences in the experience pool module are extracted, a tag value and a loss value required by the deep reinforcement learning decision network for performing the back propagation method and the random gradient descent method are calculated, and then the obtained tag value and loss value are used for updating the deep reinforcement learning decision network.
In another aspect of the present invention, a ground unmanned vehicle intelligent decision system based on deep reinforcement learning is provided, comprising:
the information acquisition device is used for acquiring vehicle information and environment information around the vehicle;
a vehicle-mounted server;
the CAN bus is used for realizing data communication between the information acquisition device and the vehicle-mounted server;
the system comprises a deep reinforcement learning decision network, a scoring module and an experience pool module which are integrated in a vehicle-mounted server; the deep reinforcement learning decision network is used for analyzing and calculating collected vehicle information and surrounding environment information of a vehicle to generate driving characteristic expression of a driver and vehicle environment characteristic expression, and analyzing and calculating the vehicle environment characteristic expression to give an intelligent decision result to the current driving environment; the scoring module analyzes and calculates the driving characteristic expression of the driver, judges and scores the current driving state, and records the score of the current score and the termination times of the current driving state; the experience pool module is used for storing experiences of signal information including driving characteristic expression of a driver, intelligent decision results, scores of current vehicle states and whether the current driving states are terminated.
Further, the information acquisition device comprises a GNSS positioning system, a camera sensor, a millimeter wave radar sensor, a laser radar sensor and a sound sensor which are arranged on the unmanned vehicle.
Further, acquiring current vehicle position information through the GNSS positioning system; acquiring environmental image information around the vehicle through the camera sensor; acquiring environment three-dimensional point cloud information through the millimeter wave radar and the laser radar sensor, and obtaining the distance, the relative speed and the relative azimuth of the obstacle and the vehicle, and the length, the width and the volume of the obstacle; and acquiring sound information through the sound sensor.
Further, the intelligent decision result comprises a transverse decision and a longitudinal decision; the transverse decision comprises lane keeping, lane changing, left turning and right turning of the vehicle, and the longitudinal decision comprises acceleration, deceleration and uniform speed of the vehicle.
Further, the scoring module judges and scores the current driving state of the vehicle according to the position of the current vehicle in the lane, the distance between the current vehicle and the adjacent vehicle, the distance between the current vehicle and the front vehicle and the distance information between the current vehicle and the front obstacle.
Further, the deep reinforcement learning decision network includes a deep convolutional neural network for processing image information and ambient three-dimensional point cloud information and a deep round robin neural network for processing sound information.
Further, the network structure of the deep reinforcement learning decision network consists of at least one convolution layer and two fully connected streams; two fully connected streams are located after the convolutional layer; each full connection flow is composed of at least one full connection layer; the number of neurons of the last layer of the full-connection layer is the same as the corresponding number of transverse decision types and longitudinal decision types; the convolutional layer is formed as a deep convolutional neural network and the fully-connected stream is formed as a deep convolutional neural network.
Further, the full connection layer carries out nonlinear calculation on the input vehicle environment feature expression, extracts the integral feature in the vehicle environment feature expression, and carries out intelligent decision on the integral feature.
Further, the intelligent decisions comprise a transverse decision and a longitudinal decision, the transverse decision comprises a vehicle keeping lane, lane changing, left turning and right turning, and the longitudinal decision comprises vehicle acceleration, deceleration and uniform speed.
Further, the scoring module analyzes and calculates the driving characteristic expression of the driver, and particularly, the scoring module judges whether the current vehicle state is terminated according to the steering lamp, the brake and the accelerator signals in the driving characteristic expression of the driver.
Further, the current vehicle state is judged to be terminated when at least one of the following occurs: when the intelligent decision result is that the lane is kept and the turning lamp is in a lighted state within the decision threshold time, the current driving state is judged to be terminated; when the intelligent decision result is lane change and the turning lamp of the vehicle is not in a lighted state within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is turning and the turning lamp of the corresponding direction of the vehicle is not on within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is deceleration and the signal exists on the vehicle accelerator within the decision threshold time, judging that the current driving state is terminated; and when the intelligent decision result is acceleration and the braking of the vehicle is signaled within the decision-making threshold time, judging that the current driving state is terminated.
Further, a plurality of experiences in the experience pool module are extracted, tag values and loss values required by the deep reinforcement learning decision network for performing a back propagation method and a random gradient descent method are calculated, and then the obtained tag values and loss values are used for updating the deep reinforcement learning decision network.
In summary, the invention provides a ground unmanned vehicle intelligent decision method and a ground unmanned vehicle intelligent decision system based on deep reinforcement learning, wherein the method comprises the following steps: the deep reinforcement learning decision network analyzes and calculates the acquired vehicle information and environment information to obtain different feature expressions, and analyzes the environment feature expressions to make intelligent decisions; the scoring module judges and scores the current driving state by using the driving characteristic expression of the driver, and records the current scoring score and the current driving state termination times; the experience pool evaluates the termination state of the current driving state and stores the driving environment state, the score, the decision result and the termination state as experience; and randomly extracting a plurality of experiences to carry out parameter adjustment on the deep reinforcement learning decision network, so as to obtain a deep network model integrating environment perception and intelligent decision.
Compared with the prior art, the invention has the following advantages:
1. in the invention, after the deep reinforcement learning decision network converges, the environment perception-intelligent decision integrated real-time prediction is realized, so that the calculation flow is simplified, the calculation amount is reduced, the real-time performance is ensured, the whole training process can not disturb the driver, and the driver can complete the training of the network only by driving the vehicle correctly.
2. In the invention, the deep reinforcement learning decision network calculates the extracted driving environment information without developing a related knowledge base system, so that the situation of decision misalignment or incapability of decision caused by unstructured environment information of a road can be avoided while manpower is reduced, the unstructured environment information has strong robustness, and accurate decision can be carried out in any driving environment.
3. In the driving environment, driving environment information acquired by using a camera, a radar and other sensors, the position and posture information of the vehicle and the operation information of a driver are used, and an unsupervised training mode is adopted to obtain a deep reinforcement learning decision network model capable of completing the intelligent decision function of the vehicle in the real environment.
Drawings
Fig. 1 is a schematic structural diagram of an intelligent decision-making system of a ground unmanned vehicle based on deep reinforcement learning.
Fig. 2 is a schematic flow chart of an intelligent decision-making method of a ground unmanned vehicle based on deep reinforcement learning.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The invention provides a ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning, which mainly adopts the deep reinforcement learning technology, acquires vehicle environment information in a driving environment through a sensor, automatically evaluates and gives a score on the current driving state based on the sensor data acquired by a real environment, carries out environment sensing and intelligent decision-making on the current vehicle through a deep reinforcement learning decision-making network, and carries out parameter updating on the deep reinforcement learning decision-making network by utilizing an experience pool module so as to realize the ground unmanned vehicle intelligent decision-making.
As shown in fig. 1, the ground unmanned vehicle intelligent decision system based on deep reinforcement learning mainly comprises: CAN bus, GNSS positioning system, camera sensor, millimeter wave radar sensor, laser radar sensor, sound sensor, degree of depth reinforcement study decision network, experience pool module, grading module, on-vehicle server.
The deep reinforcement learning decision network, the experience pool module and the scoring module are integrated in a vehicle-mounted server, and the vehicle-mounted server adopts a high-performance CPU and a large amount of memories to complete automatic scoring of the scoring module, stores the experience pool module and performs training work of the deep reinforcement learning decision network. GNSS positioning systems, camera sensors, millimeter wave radar sensors, lidar sensors, and acoustic sensors are onboard the vehicle. GNSS positioning system, camera sensor, millimeter wave radar sensor, laser radar sensor, sound sensor all link to each other with on-vehicle server through the CAN bus to carry out data transmission through the CAN bus.
The GNSS positioning system, the camera sensor, the millimeter wave radar sensor, the laser radar sensor and the sound sensor are used for acquiring vehicle information and surrounding environment information of the vehicle.
The deep reinforcement learning decision network is used for fusing the acquired vehicle information and the surrounding environment information of the vehicle, extracting data features such as image features, environment three-dimensional point cloud data features and the like, and finally generating 1 pair of environment feature expression including driving feature expression of a driver and vehicle environment feature expression. Each time, 1 pair of environment feature expressions are respectively sent to a deep reinforcement learning decision network and a scoring module, wherein the driving feature expressions of a driver are transmitted to the scoring module for analysis, the current driving state is judged and scored, and the score of the current score and the number of times of ending the current driving state are recorded; the vehicle environment characteristic expression, excluding the current turn signal, brake and accelerator signals of the vehicle, is transmitted to a deep reinforcement learning decision network for analysis and intelligent decision to train the deep reinforcement learning decision network.
The intelligent decision result is divided into two parts: horizontal decision and vertical decision. Wherein the lateral decisions include lane keeping, lane changing, left turn, right turn; longitudinal decisions include vehicle acceleration, deceleration, and uniform velocity.
The deep reinforcement learning decision network can carry out deep feature extraction on image data, has extremely strong autonomous learning capability and high nonlinear mapping, and can train the deep reinforcement learning decision network in an unsupervised and background automatic scoring mode under the conditions of complex road surface environment information and a small amount of manpower, accurately judge the current driving environment and make decisions for ensuring the safety of vehicles.
The scoring module judges whether the current vehicle state is terminated according to the steering lamp, the brake and the accelerator signals in the driving characteristic expression of the driver, and judges that the current driving state is terminated when the intelligent decision result is that the lane is kept and the steering lamp is in a lighted state within the threshold time for making the decision; when the intelligent decision result is lane change and the turning lamp of the vehicle is not in a lighted state within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is turning and the turning lamp of the corresponding direction of the vehicle is not on within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is deceleration and the signal exists on the vehicle accelerator within the decision threshold time, the current driving state is judged to be terminated; and when the intelligent decision result is acceleration and the braking of the vehicle is signaled within the decision-making threshold time, judging that the current driving state is terminated.
The experience pool module receives the current driving characteristic expression of the driver, the intelligent decision result of the deep reinforcement learning decision network, and the scoring module scores the current vehicle state, and whether the current driving state is terminated or not. And the four information of the current driving characteristics of the driver, namely the intelligent decision result of the deep reinforcement learning decision network, the scoring module scores the current vehicle state, and whether the current driving state is terminated or not is stored in the experience pool module as an experience. After a large amount of experience is accumulated, each time an intelligent decision is executed, a part of experience in the experience pool module is randomly extracted to calculate a label value and a loss value required by the deep reinforcement learning decision network for carrying out a back propagation method and a random gradient descent method, so that network parameters are adjusted until the network converges. After the deep reinforcement learning decision network converges, the current driving environment is perceived and judged in the prediction process, the current driving state is intelligently decided, the environment perception and intelligent decision function in the automatic driving process is integrally completed, and a knowledge base system and an inference system are not required to be constructed in the whole process.
In the invention, the system automatically analyzes and scores the current driving state, and the training process does not need manual marking or intervention any more, thereby realizing the unsupervised deep reinforcement learning decision network training process.
As shown in FIG. 2, the ground unmanned vehicle intelligent decision method based on deep reinforcement learning, provided by the invention, comprises a training step of a deep learning neural network and a deep reinforcement learning neural network model, and is mainly realized by the following steps:
step S1, acquiring vehicle information and surrounding environment information of a vehicle through a GNSS positioning system, a camera sensor, a millimeter wave radar sensor, a laser radar sensor and a sound sensor which are mounted on the vehicle, for example, acquiring current vehicle position information through the GNSS positioning system, acquiring surrounding environment image information of the vehicle through the camera sensor, acquiring environment three-dimensional point cloud information through the millimeter wave radar and the laser radar sensor, obtaining distance, relative speed and relative azimuth of an obstacle and the vehicle, acquiring sound information through the sound sensor, and meanwhile, collecting current turn light, brake and accelerator signal information of the vehicle when a driver drives the vehicle, and transmitting the information to a deep reinforcement learning decision network.
And S2, the deep reinforcement learning decision network uses a deep learning algorithm to perform fusion analysis and calculation on the input vehicle information and the surrounding environment information of the vehicle, and finally generates 1 pair of environment characteristic expression including driving characteristic expression of a driver and vehicle environment characteristic expression.
The network structure of the deep reinforcement learning decision network consists of at least one convolution layer and two fully connected streams. Two fully connected streams are located after the convolutional layer. Each fully connected stream is made up of at least one fully connected layer. The number of neurons of the last layer of the full-connection layer is the same as the corresponding transverse decision type and longitudinal decision type.
The deep reinforcement learning decision network (algorithm) mainly comprises: a deep convolutional neural network (corresponding to a convolutional layer) for processing image information and environmental three-dimensional point cloud information, and a deep convolutional neural network (corresponding to a fully-connected stream) for processing sound information. The depth convolution neural network is used for extracting image information and characteristic information of environment three-dimensional point cloud information, and the depth convolution neural network is pretrained in a training mode similar to an anti-neural network, so that the depth convolution neural network has strong robustness to noise interference of data.
For example, the configuration parameters of a typical three-layer deep convolutional neural network are: three layers are all convolution layers, wherein the first layer has 32 convolution kernels of 8×8, and the step size is 4; the second layer has 64 4 x 4 convolution kernels with a step size of 2; the third layer has 64 3 x 3 convolution kernels with a step size of 1. And the environmental image information around the vehicle, which is acquired by the camera sensor, is calculated by the three-layer deep convolutional neural network to obtain the driving image characteristic expression at the current moment, and the expression and other vehicle environmental characteristic expressions are input into the deep reinforcement learning decision network together for analysis and calculation.
The deep cyclic neural network uses a long-short-time memory unit to calculate the input sound information flow, and continuously performs feature extraction on the current sound state.
The driving characteristic expression of the driver is transmitted to a scoring module; the vehicle environment expression is transmitted to a deep reinforcement learning decision network; the current driver driving characteristics representation is transferred to the experience pool module.
And S3, calculating and analyzing the received vehicle environment characteristic expression by the deep reinforcement learning decision network, wherein the network structure of the deep reinforcement learning decision network consists of at least one convolution layer and two fully connected streams. Each fully connected stream is made up of at least one fully connected layer. The number of neurons of the last layer of the full-connection layer is the same as the corresponding transverse decision type and longitudinal decision type. The full-connection layer carries out nonlinear calculation on the input vehicle environment characteristic expression, extracts the integral characteristic in the vehicle environment characteristic expression, and carries out decision on the integral characteristic; the last layer of the fully connected layer of the deep reinforcement learning decision network is divided into two branches, which respectively represent a transverse decision and a longitudinal decision, wherein: the transverse decision full-connection layer is provided with 4 neurons which respectively correspond to road maintenance, lane changing, left turning and right turning in the transverse decision of the vehicle; the longitudinal decision full-connection layer is provided with 3 neurons which respectively correspond to acceleration, deceleration and uniform speed of the vehicle. The deep reinforcement learning decision network calculates the characteristic expression of the vehicle environment and then makes a decision on the current driving environment, wherein the decision is a { transverse decision, longitudinal decision } binary group.
S4, the scoring module analyzes and calculates the driving characteristic expression of the driver, judges and scores the current driving state of the vehicle according to the position of the current vehicle in the lane, the distance between the current vehicle and the adjacent vehicle, the distance between the current vehicle and the front obstacle and other information, and records the score of the current score and the number of times of ending the current driving state; the method mainly relates to various algorithms related to lane line detection, target recognition and the like. Finally, the scoring module automatically gives scoring scores according to the current driving state of the vehicle, and the driver is not required to perform any operation and do not interfere with the scoring module. The score, whether the current driving state is terminated, and the intelligent decision result of the deep reinforcement learning decision network obtained in step S3 are sent to the experience pool module to wait for the next calculation.
And S5, receiving the current driving characteristic expression of the driver by the experience pool module, and grading the current vehicle state by the grading module according to the intelligent decision result of the deep reinforcement learning decision network, and judging whether the current driving state is a termination signal or not. And the four information of the current driving characteristics of the driver, namely the intelligent decision result of the deep reinforcement learning decision network, the scoring module scores the current vehicle state, and whether the current driving state is terminated or not is stored in the experience pool module as an experience. After a certain amount of experience is accumulated, at each subsequent moment, the experience of the micro batch number in the experience pool module is randomly extracted, the deep reinforcement learning decision network is counter-propagated, and the network parameters are adjusted until the deep reinforcement learning decision network converges.
The invention has strong robustness to scene change, illumination change and weather change, is particularly suitable for solving intelligent decision of ground unmanned vehicles under the condition of complex road environment, can achieve extremely low accident rate in intelligent decision of real environment and simultaneously ensures decision accuracy; because the deep reinforcement learning decision network is adopted, the system has very high prediction speed, and can completely meet the intelligent decision under the condition of an actual road.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (6)

1. The ground unmanned vehicle intelligent decision-making method based on deep reinforcement learning is characterized by comprising the following steps of:
s1, collecting vehicle information and surrounding environment information of a vehicle;
s2, the deep reinforcement learning decision network analyzes and calculates the vehicle information and the environment information obtained in the step S1, obtains the driving characteristic expression of the driver through the vehicle information, and obtains the vehicle environment characteristic expression through the environment information;
s3, the deep reinforcement learning decision network analyzes and calculates the vehicle environment characteristic expression, and gives intelligent decision results including transverse decisions and longitudinal decisions to the current driving environment; the transverse decision comprises lane keeping, lane changing, left turning and right turning of the vehicle, and the longitudinal decision comprises acceleration, deceleration and uniform speed of the vehicle;
s4, analyzing and calculating the driving characteristic expression of the driver by a scoring module, judging and scoring the current driving state of the vehicle, and recording the current scoring score and the current driving state termination times;
s5, storing the driving characteristic expression of the driver, the intelligent decision result, the score of the current vehicle state and whether the current driving state is terminated or not as an experience in an experience pool module;
s6, extracting a plurality of experiences in the experience pool module, carrying out counter-propagation on a deep reinforcement learning decision network, and adjusting decision network parameters until the deep reinforcement learning decision network converges, wherein the deep reinforcement learning decision network comprises a deep convolution neural network for processing image information and environment three-dimensional point cloud information and a deep circulation neural network for processing sound information, and the network structure of the deep reinforcement learning decision network consists of at least one convolution layer and two fully connected streams; two fully connected streams are located after the convolutional layer; each full connection flow is composed of at least one full connection layer; the number of neurons of the last layer of the full-connection layer is the same as the corresponding number of transverse decision types and longitudinal decision types; the convolutional layer is formed as a deep convolutional neural network and the fully-connected stream is formed as a deep convolutional neural network.
2. The intelligent decision-making method of the ground unmanned vehicle based on deep reinforcement learning according to claim 1, wherein in the step S4, the scoring module analyzes and calculates the driving characteristic expression of the driver, and specifically the scoring module judges whether the current vehicle state is terminated according to the turn signal, the brake signal and the accelerator signal in the driving characteristic expression of the driver.
3. The ground unmanned vehicle intelligent decision-making method based on deep reinforcement learning of claim 2, wherein the current vehicle state is judged to be terminated when at least one of the following occurs: when the intelligent decision result is that the lane is kept and the turning lamp is in a lighted state within the decision threshold time, the current driving state is judged to be terminated; when the intelligent decision result is lane change and the turning lamp of the vehicle is not in a lighted state within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is turning and the turning lamp of the corresponding direction of the vehicle is not on within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is deceleration and the signal exists on the vehicle accelerator within the decision threshold time, judging that the current driving state is terminated; and when the intelligent decision result is acceleration and the braking of the vehicle is signaled within the decision-making threshold time, judging that the current driving state is terminated.
4. Ground unmanned vehicle intelligent decision-making system based on degree of depth reinforcement study, characterized by comprising: the information acquisition device is used for acquiring vehicle information and environment information around the vehicle; the vehicle-mounted server is used for vehicle-mounted high-performance calculation; the CAN bus is used for realizing data communication between the information acquisition device and the vehicle-mounted server; the system comprises a deep reinforcement learning decision network, a scoring module and an experience pool module which are integrated in a vehicle-mounted server; the deep reinforcement learning decision network is used for analyzing and calculating collected vehicle information and surrounding environment information of a vehicle to generate driving characteristic expression of a driver and vehicle environment characteristic expression, analyzing and calculating the vehicle environment characteristic expression to give an intelligent decision result to the current driving environment, wherein the intelligent decision result comprises a transverse decision and a longitudinal decision, and particularly, the vehicle information and the surrounding environment information of the vehicle are collected; analyzing and calculating the obtained vehicle information and environment information, obtaining driving characteristic expression of a driver through the vehicle information, and obtaining vehicle environment characteristic expression through the environment information; analyzing and calculating the vehicle environment characteristic expression, and giving an intelligent decision result to the current driving environment, wherein the intelligent decision result comprises a transverse decision and a longitudinal decision; the transverse decision comprises lane keeping, lane changing, left turning and right turning of the vehicle, and the longitudinal decision comprises acceleration, deceleration and uniform speed of the vehicle; the evaluation module analyzes and calculates the driving characteristic expression of the driver, judges and scores the current driving state, records the score of the current score and the termination times of the current driving state, stores the driving characteristic expression of the driver, the intelligent decision result, the score of the current vehicle state and whether the current driving state is terminated or not as one experience in the experience pool module, wherein the experience pool module is used for storing the experience of signal information comprising the driving characteristic expression of the driver, the intelligent decision result, the score of the current vehicle state and whether the current driving state is terminated or not, extracts a plurality of experiences in the experience pool module, carries out back propagation on a deep reinforcement learning decision network, adjusts decision network parameters until the deep reinforcement learning decision network converges, and comprises a deep convolution neural network for processing image information and environment three-dimensional point cloud information and a deep circulation neural network for processing sound information, and the network structure of the deep reinforcement learning decision network comprises at least one convolution layer and two fully connected flows; two fully connected streams are located after the convolutional layer; each full connection flow is composed of at least one full connection layer; the number of neurons of the last layer of the full-connection layer is the same as the corresponding number of transverse decision types and longitudinal decision types; the convolutional layer is formed as a deep convolutional neural network and the fully-connected stream is formed as a deep convolutional neural network.
5. The intelligent decision-making system of the ground unmanned vehicle based on deep reinforcement learning of claim 4, wherein the scoring module analyzes and calculates the driving characteristic expression of the driver, and particularly the scoring module judges whether the current vehicle state is terminated according to the steering lamp, the brake and the accelerator signals in the driving characteristic expression of the driver.
6. The ground unmanned vehicle intelligent decision-making system based on deep reinforcement learning of claim 5, wherein the current vehicle state is determined to be terminated when at least one of the following conditions occurs: when the intelligent decision result is that the lane is kept and the turning lamp is in a lighted state within the decision threshold time, the current driving state is judged to be terminated; when the intelligent decision result is lane change and the turning lamp of the vehicle is not in a lighted state within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is turning and the turning lamp of the corresponding direction of the vehicle is not on within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is deceleration and the signal exists on the vehicle accelerator within the decision threshold time, judging that the current driving state is terminated; and when the intelligent decision result is acceleration and the braking of the vehicle is signaled within the decision-making threshold time, judging that the current driving state is terminated.
CN202110811357.8A 2021-07-19 2021-07-19 Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning Active CN113553934B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110811357.8A CN113553934B (en) 2021-07-19 2021-07-19 Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110811357.8A CN113553934B (en) 2021-07-19 2021-07-19 Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113553934A CN113553934A (en) 2021-10-26
CN113553934B true CN113553934B (en) 2024-02-20

Family

ID=78103382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110811357.8A Active CN113553934B (en) 2021-07-19 2021-07-19 Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113553934B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108594804A (en) * 2018-03-12 2018-09-28 苏州大学 Automatic ride control method based on depth Q distribution via internet trolleies
CN109213148A (en) * 2018-08-03 2019-01-15 东南大学 It is a kind of based on deeply study vehicle low speed with decision-making technique of speeding
WO2020056875A1 (en) * 2018-09-20 2020-03-26 初速度(苏州)科技有限公司 Parking strategy based on deep reinforcement learning
CN111605565A (en) * 2020-05-08 2020-09-01 昆山小眼探索信息科技有限公司 Automatic driving behavior decision method based on deep reinforcement learning
CN112068549A (en) * 2020-08-07 2020-12-11 哈尔滨工业大学 Unmanned system cluster control method based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190054374A (en) * 2017-11-13 2019-05-22 한국전자통신연구원 Autonomous drive learning apparatus and method using drive experience information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108594804A (en) * 2018-03-12 2018-09-28 苏州大学 Automatic ride control method based on depth Q distribution via internet trolleies
CN109213148A (en) * 2018-08-03 2019-01-15 东南大学 It is a kind of based on deeply study vehicle low speed with decision-making technique of speeding
WO2020056875A1 (en) * 2018-09-20 2020-03-26 初速度(苏州)科技有限公司 Parking strategy based on deep reinforcement learning
CN111605565A (en) * 2020-05-08 2020-09-01 昆山小眼探索信息科技有限公司 Automatic driving behavior decision method based on deep reinforcement learning
CN112068549A (en) * 2020-08-07 2020-12-11 哈尔滨工业大学 Unmanned system cluster control method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Driverless Car: Autonomous Driving Using Deep Reinforcement Learning In Urban Environment;Abdur R. Fayjie 等;《2018 15th International Conference on Ubiquitous Robots》;第896-901页 *
基于深度强化学习方法的无人驾驶智能决策控制的研究;陈超;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》(第7期);第1-51页 *

Also Published As

Publication number Publication date
CN113553934A (en) 2021-10-26

Similar Documents

Publication Publication Date Title
US11651240B2 (en) Object association for autonomous vehicles
US11836623B2 (en) Object detection and property determination for autonomous vehicles
US11084494B2 (en) Method for detecting safety of driving behavior, apparatus, device and storage medium
EP3937079A1 (en) Trajectory prediction method and device
EP3218890B1 (en) Hyper-class augmented and regularized deep learning for fine-grained image classification
US9053433B2 (en) Assisting vehicle guidance over terrain
CN108986540A (en) Vehicle control system and method and traveling secondary server
CN112698645A (en) Dynamic model with learning-based location correction system
CN113552883B (en) Ground unmanned vehicle autonomous driving method and system based on deep reinforcement learning
CN116529783A (en) System and method for intelligent selection of data for building machine learning models
CN114435351A (en) System and method for neural network based autopilot
CN111016901A (en) Intelligent driving decision method and system based on deep learning
EP2405383A1 (en) Assisting with guiding a vehicle over terrain
WO2022178858A1 (en) Vehicle driving intention prediction method and apparatus, terminal and storage medium
CN113553934B (en) Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning
US11983918B2 (en) Platform for perception system development for automated driving system
CN110333517B (en) Obstacle sensing method, obstacle sensing device and storage medium
CN112351407A (en) AEB strategy method based on 5G hierarchical decision
CN114895682B (en) Unmanned mine car walking parameter correction method and system based on cloud data
WO2024093321A1 (en) Vehicle position acquiring method, model training method, and related device
CN116476861A (en) Automatic driving decision system based on multi-mode sensing and layering actions
KR20240001069A (en) Trajectory planning based on extracted trajectory features
CN114581865A (en) Confidence measure in deep neural networks
CN116451740A (en) High-efficiency neural network
CN117250947A (en) Automatic driving method based on condition imitation learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant