CN113553934B

CN113553934B - Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning

Info

Publication number: CN113553934B
Application number: CN202110811357.8A
Authority: CN
Inventors: 王刚; 张禹瑄; 徐谦; 胡玮通
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2024-02-20
Anticipated expiration: 2041-07-19
Also published as: CN113553934A

Abstract

A ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning, the method comprises the following steps: the deep reinforcement learning decision network analyzes and calculates the acquired vehicle information and environment information to obtain different feature expressions, analyzes the environment feature expressions and makes intelligent decisions; the scoring module judges and scores the current driving state by using the driving characteristic expression of the driver, and records the current scoring score and the current driving state termination times; the experience pool evaluates the termination state of the current driving state and stores the driving environment state, the score, the decision result and the termination state as experience; and randomly extracting a plurality of experiences to carry out parameter adjustment on the deep reinforcement learning decision network, so as to obtain a deep network model integrating environment perception and intelligent decision. The invention realizes the direct judgment from environment to decision by using the integrated model, and solves the problem that the intelligent decision of the ground unmanned vehicle can not be realized under the environment condition of a complex road at present.

Description

Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning

Technical Field

The invention relates to the technical field of vehicle control, in particular to a ground unmanned vehicle intelligent decision method and system based on deep reinforcement learning.

Background

In the traditional environment sensing and intelligent decision method, a method of measuring the environment by adopting a plurality of sensors and effectively utilizing the measured data and then quickly fusing the multi-source data is always a difficult problem. For an intelligent decision system, a knowledge base is established by acquiring a large number of driving rules, historical driving environments and intelligent decision data experience, and accurate intelligent decision cannot be performed on complex unstructured road surface environment information in real situations. Therefore, under the condition of complex road surface conditions in limited manpower and real driving environments, accurate intelligent decision on the driving state of the vehicle cannot be carried out.

Disclosure of Invention

The invention provides a ground unmanned vehicle intelligent decision method and system based on deep reinforcement learning, which aims to solve the problem that the existing environment perception and intelligent decision method can not realize the ground unmanned vehicle intelligent decision under the complex road environment condition.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a ground unmanned vehicle intelligent decision-making method based on deep reinforcement learning comprises the following steps:

s1, collecting vehicle information and surrounding environment information of a vehicle;

s2, the deep reinforcement learning decision network analyzes and calculates the vehicle information and the environment information obtained in the step S1 to obtain driving characteristic expression and vehicle environment characteristic expression of a driver;

s3, the deep reinforcement learning decision network analyzes and calculates the vehicle environment characteristic expression, and gives an intelligent decision result to the current driving environment;

s4, analyzing and calculating the driving characteristic expression of the driver by a scoring module, judging and scoring the current driving state of the vehicle, and recording the current scoring score and the current driving state termination times;

s5, storing the driving characteristic expression of the driver, the intelligent decision result, the score of the current vehicle state and whether the current driving state is terminated or not as an experience in an experience pool module;

s6, extracting a plurality of experiences in the experience pool module, carrying out back propagation on the deep reinforcement learning decision network, and adjusting the decision network parameters until the deep reinforcement learning decision network converges.

Further, in the step S1, vehicle information and environmental information around the vehicle are collected through a GNSS positioning system, a camera sensor, a millimeter wave radar sensor, a laser radar sensor and a sound sensor mounted on the unmanned vehicle; and meanwhile, the current signal information of a steering lamp, a brake and an accelerator of the vehicle is collected.

Further, acquiring current vehicle position information through the GNSS positioning system; acquiring environmental image information around the vehicle through the camera sensor; acquiring environment three-dimensional point cloud information through the millimeter wave radar and the laser radar sensor, and obtaining the distance, the relative speed and the relative azimuth of the obstacle and the vehicle, and the length, the width and the volume of the obstacle; and acquiring sound information through the sound sensor.

Further, the intelligent decision result in step S3 includes a lateral decision and a longitudinal decision; the transverse decision comprises lane keeping, lane changing, left turning and right turning of the vehicle, and the longitudinal decision comprises acceleration, deceleration and uniform speed of the vehicle.

Further, in the step S4, the scoring module determines and scores the current driving state of the vehicle according to the position of the current vehicle in the lane, the distance between the current vehicle and the adjacent vehicle, the distance between the current vehicle and the preceding vehicle, and the distance information between the current vehicle and the preceding obstacle.

Further, the deep reinforcement learning decision network includes a deep convolutional neural network for processing image information and ambient three-dimensional point cloud information and a deep round robin neural network for processing sound information.

Further, the network structure of the deep reinforcement learning decision network consists of at least one convolution layer and two fully connected streams; two fully connected streams are located after the convolutional layer; each full connection flow is composed of at least one full connection layer; the number of neurons of the last layer of the full-connection layer is the same as the corresponding number of transverse decision types and longitudinal decision types; the convolutional layer is formed as a deep convolutional neural network and the fully-connected stream is formed as a deep convolutional neural network.

Further, the full connection layer carries out nonlinear calculation on the input vehicle environment feature expression, extracts the integral feature in the vehicle environment feature expression, and carries out intelligent decision on the integral feature.

Further, the intelligent decisions comprise a transverse decision and a longitudinal decision, the transverse decision comprises a vehicle keeping lane, lane changing, left turning and right turning, and the longitudinal decision comprises vehicle acceleration, deceleration and uniform speed.

Further, in the step S4, the scoring module analyzes and calculates the driving characteristic expression of the driver, specifically, the scoring module judges whether the current vehicle state is terminated according to the turn signal, the brake signal and the accelerator signal in the driving characteristic expression of the driver.

Further, the current vehicle state is judged to be terminated when at least one of the following occurs: when the intelligent decision result is that the lane is kept and the turning lamp is in a lighted state within the decision threshold time, the current driving state is judged to be terminated; when the intelligent decision result is lane change and the turning lamp of the vehicle is not in a lighted state within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is turning and the turning lamp of the corresponding direction of the vehicle is not on within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is deceleration and the signal exists on the vehicle accelerator within the decision threshold time, judging that the current driving state is terminated; and when the intelligent decision result is acceleration and the braking of the vehicle is signaled within the decision-making threshold time, judging that the current driving state is terminated.

Further, in the step S6, a plurality of experiences in the experience pool module are extracted, a tag value and a loss value required by the deep reinforcement learning decision network for performing the back propagation method and the random gradient descent method are calculated, and then the obtained tag value and loss value are used for updating the deep reinforcement learning decision network.

In another aspect of the present invention, a ground unmanned vehicle intelligent decision system based on deep reinforcement learning is provided, comprising:

the information acquisition device is used for acquiring vehicle information and environment information around the vehicle;

a vehicle-mounted server;

the CAN bus is used for realizing data communication between the information acquisition device and the vehicle-mounted server;

the system comprises a deep reinforcement learning decision network, a scoring module and an experience pool module which are integrated in a vehicle-mounted server; the deep reinforcement learning decision network is used for analyzing and calculating collected vehicle information and surrounding environment information of a vehicle to generate driving characteristic expression of a driver and vehicle environment characteristic expression, and analyzing and calculating the vehicle environment characteristic expression to give an intelligent decision result to the current driving environment; the scoring module analyzes and calculates the driving characteristic expression of the driver, judges and scores the current driving state, and records the score of the current score and the termination times of the current driving state; the experience pool module is used for storing experiences of signal information including driving characteristic expression of a driver, intelligent decision results, scores of current vehicle states and whether the current driving states are terminated.

Further, the information acquisition device comprises a GNSS positioning system, a camera sensor, a millimeter wave radar sensor, a laser radar sensor and a sound sensor which are arranged on the unmanned vehicle.

Further, the intelligent decision result comprises a transverse decision and a longitudinal decision; the transverse decision comprises lane keeping, lane changing, left turning and right turning of the vehicle, and the longitudinal decision comprises acceleration, deceleration and uniform speed of the vehicle.

Further, the scoring module judges and scores the current driving state of the vehicle according to the position of the current vehicle in the lane, the distance between the current vehicle and the adjacent vehicle, the distance between the current vehicle and the front vehicle and the distance information between the current vehicle and the front obstacle.

Further, the scoring module analyzes and calculates the driving characteristic expression of the driver, and particularly, the scoring module judges whether the current vehicle state is terminated according to the steering lamp, the brake and the accelerator signals in the driving characteristic expression of the driver.

Further, a plurality of experiences in the experience pool module are extracted, tag values and loss values required by the deep reinforcement learning decision network for performing a back propagation method and a random gradient descent method are calculated, and then the obtained tag values and loss values are used for updating the deep reinforcement learning decision network.

In summary, the invention provides a ground unmanned vehicle intelligent decision method and a ground unmanned vehicle intelligent decision system based on deep reinforcement learning, wherein the method comprises the following steps: the deep reinforcement learning decision network analyzes and calculates the acquired vehicle information and environment information to obtain different feature expressions, and analyzes the environment feature expressions to make intelligent decisions; the scoring module judges and scores the current driving state by using the driving characteristic expression of the driver, and records the current scoring score and the current driving state termination times; the experience pool evaluates the termination state of the current driving state and stores the driving environment state, the score, the decision result and the termination state as experience; and randomly extracting a plurality of experiences to carry out parameter adjustment on the deep reinforcement learning decision network, so as to obtain a deep network model integrating environment perception and intelligent decision.

Compared with the prior art, the invention has the following advantages:

1. in the invention, after the deep reinforcement learning decision network converges, the environment perception-intelligent decision integrated real-time prediction is realized, so that the calculation flow is simplified, the calculation amount is reduced, the real-time performance is ensured, the whole training process can not disturb the driver, and the driver can complete the training of the network only by driving the vehicle correctly.

2. In the invention, the deep reinforcement learning decision network calculates the extracted driving environment information without developing a related knowledge base system, so that the situation of decision misalignment or incapability of decision caused by unstructured environment information of a road can be avoided while manpower is reduced, the unstructured environment information has strong robustness, and accurate decision can be carried out in any driving environment.

3. In the driving environment, driving environment information acquired by using a camera, a radar and other sensors, the position and posture information of the vehicle and the operation information of a driver are used, and an unsupervised training mode is adopted to obtain a deep reinforcement learning decision network model capable of completing the intelligent decision function of the vehicle in the real environment.

Drawings

Fig. 1 is a schematic structural diagram of an intelligent decision-making system of a ground unmanned vehicle based on deep reinforcement learning.

Fig. 2 is a schematic flow chart of an intelligent decision-making method of a ground unmanned vehicle based on deep reinforcement learning.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The invention provides a ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning, which mainly adopts the deep reinforcement learning technology, acquires vehicle environment information in a driving environment through a sensor, automatically evaluates and gives a score on the current driving state based on the sensor data acquired by a real environment, carries out environment sensing and intelligent decision-making on the current vehicle through a deep reinforcement learning decision-making network, and carries out parameter updating on the deep reinforcement learning decision-making network by utilizing an experience pool module so as to realize the ground unmanned vehicle intelligent decision-making.

As shown in fig. 1, the ground unmanned vehicle intelligent decision system based on deep reinforcement learning mainly comprises: CAN bus, GNSS positioning system, camera sensor, millimeter wave radar sensor, laser radar sensor, sound sensor, degree of depth reinforcement study decision network, experience pool module, grading module, on-vehicle server.

The deep reinforcement learning decision network, the experience pool module and the scoring module are integrated in a vehicle-mounted server, and the vehicle-mounted server adopts a high-performance CPU and a large amount of memories to complete automatic scoring of the scoring module, stores the experience pool module and performs training work of the deep reinforcement learning decision network. GNSS positioning systems, camera sensors, millimeter wave radar sensors, lidar sensors, and acoustic sensors are onboard the vehicle. GNSS positioning system, camera sensor, millimeter wave radar sensor, laser radar sensor, sound sensor all link to each other with on-vehicle server through the CAN bus to carry out data transmission through the CAN bus.

The GNSS positioning system, the camera sensor, the millimeter wave radar sensor, the laser radar sensor and the sound sensor are used for acquiring vehicle information and surrounding environment information of the vehicle.

The deep reinforcement learning decision network is used for fusing the acquired vehicle information and the surrounding environment information of the vehicle, extracting data features such as image features, environment three-dimensional point cloud data features and the like, and finally generating 1 pair of environment feature expression including driving feature expression of a driver and vehicle environment feature expression. Each time, 1 pair of environment feature expressions are respectively sent to a deep reinforcement learning decision network and a scoring module, wherein the driving feature expressions of a driver are transmitted to the scoring module for analysis, the current driving state is judged and scored, and the score of the current score and the number of times of ending the current driving state are recorded; the vehicle environment characteristic expression, excluding the current turn signal, brake and accelerator signals of the vehicle, is transmitted to a deep reinforcement learning decision network for analysis and intelligent decision to train the deep reinforcement learning decision network.

The intelligent decision result is divided into two parts: horizontal decision and vertical decision. Wherein the lateral decisions include lane keeping, lane changing, left turn, right turn; longitudinal decisions include vehicle acceleration, deceleration, and uniform velocity.

The deep reinforcement learning decision network can carry out deep feature extraction on image data, has extremely strong autonomous learning capability and high nonlinear mapping, and can train the deep reinforcement learning decision network in an unsupervised and background automatic scoring mode under the conditions of complex road surface environment information and a small amount of manpower, accurately judge the current driving environment and make decisions for ensuring the safety of vehicles.

The scoring module judges whether the current vehicle state is terminated according to the steering lamp, the brake and the accelerator signals in the driving characteristic expression of the driver, and judges that the current driving state is terminated when the intelligent decision result is that the lane is kept and the steering lamp is in a lighted state within the threshold time for making the decision; when the intelligent decision result is lane change and the turning lamp of the vehicle is not in a lighted state within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is turning and the turning lamp of the corresponding direction of the vehicle is not on within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is deceleration and the signal exists on the vehicle accelerator within the decision threshold time, the current driving state is judged to be terminated; and when the intelligent decision result is acceleration and the braking of the vehicle is signaled within the decision-making threshold time, judging that the current driving state is terminated.

The experience pool module receives the current driving characteristic expression of the driver, the intelligent decision result of the deep reinforcement learning decision network, and the scoring module scores the current vehicle state, and whether the current driving state is terminated or not. And the four information of the current driving characteristics of the driver, namely the intelligent decision result of the deep reinforcement learning decision network, the scoring module scores the current vehicle state, and whether the current driving state is terminated or not is stored in the experience pool module as an experience. After a large amount of experience is accumulated, each time an intelligent decision is executed, a part of experience in the experience pool module is randomly extracted to calculate a label value and a loss value required by the deep reinforcement learning decision network for carrying out a back propagation method and a random gradient descent method, so that network parameters are adjusted until the network converges. After the deep reinforcement learning decision network converges, the current driving environment is perceived and judged in the prediction process, the current driving state is intelligently decided, the environment perception and intelligent decision function in the automatic driving process is integrally completed, and a knowledge base system and an inference system are not required to be constructed in the whole process.

In the invention, the system automatically analyzes and scores the current driving state, and the training process does not need manual marking or intervention any more, thereby realizing the unsupervised deep reinforcement learning decision network training process.

As shown in FIG. 2, the ground unmanned vehicle intelligent decision method based on deep reinforcement learning, provided by the invention, comprises a training step of a deep learning neural network and a deep reinforcement learning neural network model, and is mainly realized by the following steps:

step S1, acquiring vehicle information and surrounding environment information of a vehicle through a GNSS positioning system, a camera sensor, a millimeter wave radar sensor, a laser radar sensor and a sound sensor which are mounted on the vehicle, for example, acquiring current vehicle position information through the GNSS positioning system, acquiring surrounding environment image information of the vehicle through the camera sensor, acquiring environment three-dimensional point cloud information through the millimeter wave radar and the laser radar sensor, obtaining distance, relative speed and relative azimuth of an obstacle and the vehicle, acquiring sound information through the sound sensor, and meanwhile, collecting current turn light, brake and accelerator signal information of the vehicle when a driver drives the vehicle, and transmitting the information to a deep reinforcement learning decision network.

And S2, the deep reinforcement learning decision network uses a deep learning algorithm to perform fusion analysis and calculation on the input vehicle information and the surrounding environment information of the vehicle, and finally generates 1 pair of environment characteristic expression including driving characteristic expression of a driver and vehicle environment characteristic expression.

The network structure of the deep reinforcement learning decision network consists of at least one convolution layer and two fully connected streams. Two fully connected streams are located after the convolutional layer. Each fully connected stream is made up of at least one fully connected layer. The number of neurons of the last layer of the full-connection layer is the same as the corresponding transverse decision type and longitudinal decision type.

The deep reinforcement learning decision network (algorithm) mainly comprises: a deep convolutional neural network (corresponding to a convolutional layer) for processing image information and environmental three-dimensional point cloud information, and a deep convolutional neural network (corresponding to a fully-connected stream) for processing sound information. The depth convolution neural network is used for extracting image information and characteristic information of environment three-dimensional point cloud information, and the depth convolution neural network is pretrained in a training mode similar to an anti-neural network, so that the depth convolution neural network has strong robustness to noise interference of data.

For example, the configuration parameters of a typical three-layer deep convolutional neural network are: three layers are all convolution layers, wherein the first layer has 32 convolution kernels of 8×8, and the step size is 4; the second layer has 64 4 x 4 convolution kernels with a step size of 2; the third layer has 64 3 x 3 convolution kernels with a step size of 1. And the environmental image information around the vehicle, which is acquired by the camera sensor, is calculated by the three-layer deep convolutional neural network to obtain the driving image characteristic expression at the current moment, and the expression and other vehicle environmental characteristic expressions are input into the deep reinforcement learning decision network together for analysis and calculation.

The deep cyclic neural network uses a long-short-time memory unit to calculate the input sound information flow, and continuously performs feature extraction on the current sound state.

The driving characteristic expression of the driver is transmitted to a scoring module; the vehicle environment expression is transmitted to a deep reinforcement learning decision network; the current driver driving characteristics representation is transferred to the experience pool module.

And S3, calculating and analyzing the received vehicle environment characteristic expression by the deep reinforcement learning decision network, wherein the network structure of the deep reinforcement learning decision network consists of at least one convolution layer and two fully connected streams. Each fully connected stream is made up of at least one fully connected layer. The number of neurons of the last layer of the full-connection layer is the same as the corresponding transverse decision type and longitudinal decision type. The full-connection layer carries out nonlinear calculation on the input vehicle environment characteristic expression, extracts the integral characteristic in the vehicle environment characteristic expression, and carries out decision on the integral characteristic; the last layer of the fully connected layer of the deep reinforcement learning decision network is divided into two branches, which respectively represent a transverse decision and a longitudinal decision, wherein: the transverse decision full-connection layer is provided with 4 neurons which respectively correspond to road maintenance, lane changing, left turning and right turning in the transverse decision of the vehicle; the longitudinal decision full-connection layer is provided with 3 neurons which respectively correspond to acceleration, deceleration and uniform speed of the vehicle. The deep reinforcement learning decision network calculates the characteristic expression of the vehicle environment and then makes a decision on the current driving environment, wherein the decision is a { transverse decision, longitudinal decision } binary group.

S4, the scoring module analyzes and calculates the driving characteristic expression of the driver, judges and scores the current driving state of the vehicle according to the position of the current vehicle in the lane, the distance between the current vehicle and the adjacent vehicle, the distance between the current vehicle and the front obstacle and other information, and records the score of the current score and the number of times of ending the current driving state; the method mainly relates to various algorithms related to lane line detection, target recognition and the like. Finally, the scoring module automatically gives scoring scores according to the current driving state of the vehicle, and the driver is not required to perform any operation and do not interfere with the scoring module. The score, whether the current driving state is terminated, and the intelligent decision result of the deep reinforcement learning decision network obtained in step S3 are sent to the experience pool module to wait for the next calculation.

And S5, receiving the current driving characteristic expression of the driver by the experience pool module, and grading the current vehicle state by the grading module according to the intelligent decision result of the deep reinforcement learning decision network, and judging whether the current driving state is a termination signal or not. And the four information of the current driving characteristics of the driver, namely the intelligent decision result of the deep reinforcement learning decision network, the scoring module scores the current vehicle state, and whether the current driving state is terminated or not is stored in the experience pool module as an experience. After a certain amount of experience is accumulated, at each subsequent moment, the experience of the micro batch number in the experience pool module is randomly extracted, the deep reinforcement learning decision network is counter-propagated, and the network parameters are adjusted until the deep reinforcement learning decision network converges.

The invention has strong robustness to scene change, illumination change and weather change, is particularly suitable for solving intelligent decision of ground unmanned vehicles under the condition of complex road environment, can achieve extremely low accident rate in intelligent decision of real environment and simultaneously ensures decision accuracy; because the deep reinforcement learning decision network is adopted, the system has very high prediction speed, and can completely meet the intelligent decision under the condition of an actual road.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. The ground unmanned vehicle intelligent decision-making method based on deep reinforcement learning is characterized by comprising the following steps of:

s2, the deep reinforcement learning decision network analyzes and calculates the vehicle information and the environment information obtained in the step S1, obtains the driving characteristic expression of the driver through the vehicle information, and obtains the vehicle environment characteristic expression through the environment information;

s3, the deep reinforcement learning decision network analyzes and calculates the vehicle environment characteristic expression, and gives intelligent decision results including transverse decisions and longitudinal decisions to the current driving environment; the transverse decision comprises lane keeping, lane changing, left turning and right turning of the vehicle, and the longitudinal decision comprises acceleration, deceleration and uniform speed of the vehicle;

s6, extracting a plurality of experiences in the experience pool module, carrying out counter-propagation on a deep reinforcement learning decision network, and adjusting decision network parameters until the deep reinforcement learning decision network converges, wherein the deep reinforcement learning decision network comprises a deep convolution neural network for processing image information and environment three-dimensional point cloud information and a deep circulation neural network for processing sound information, and the network structure of the deep reinforcement learning decision network consists of at least one convolution layer and two fully connected streams; two fully connected streams are located after the convolutional layer; each full connection flow is composed of at least one full connection layer; the number of neurons of the last layer of the full-connection layer is the same as the corresponding number of transverse decision types and longitudinal decision types; the convolutional layer is formed as a deep convolutional neural network and the fully-connected stream is formed as a deep convolutional neural network.

2. The intelligent decision-making method of the ground unmanned vehicle based on deep reinforcement learning according to claim 1, wherein in the step S4, the scoring module analyzes and calculates the driving characteristic expression of the driver, and specifically the scoring module judges whether the current vehicle state is terminated according to the turn signal, the brake signal and the accelerator signal in the driving characteristic expression of the driver.

3. The ground unmanned vehicle intelligent decision-making method based on deep reinforcement learning of claim 2, wherein the current vehicle state is judged to be terminated when at least one of the following occurs: when the intelligent decision result is that the lane is kept and the turning lamp is in a lighted state within the decision threshold time, the current driving state is judged to be terminated; when the intelligent decision result is lane change and the turning lamp of the vehicle is not in a lighted state within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is turning and the turning lamp of the corresponding direction of the vehicle is not on within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is deceleration and the signal exists on the vehicle accelerator within the decision threshold time, judging that the current driving state is terminated; and when the intelligent decision result is acceleration and the braking of the vehicle is signaled within the decision-making threshold time, judging that the current driving state is terminated.

4. Ground unmanned vehicle intelligent decision-making system based on degree of depth reinforcement study, characterized by comprising: the information acquisition device is used for acquiring vehicle information and environment information around the vehicle; the vehicle-mounted server is used for vehicle-mounted high-performance calculation; the CAN bus is used for realizing data communication between the information acquisition device and the vehicle-mounted server; the system comprises a deep reinforcement learning decision network, a scoring module and an experience pool module which are integrated in a vehicle-mounted server; the deep reinforcement learning decision network is used for analyzing and calculating collected vehicle information and surrounding environment information of a vehicle to generate driving characteristic expression of a driver and vehicle environment characteristic expression, analyzing and calculating the vehicle environment characteristic expression to give an intelligent decision result to the current driving environment, wherein the intelligent decision result comprises a transverse decision and a longitudinal decision, and particularly, the vehicle information and the surrounding environment information of the vehicle are collected; analyzing and calculating the obtained vehicle information and environment information, obtaining driving characteristic expression of a driver through the vehicle information, and obtaining vehicle environment characteristic expression through the environment information; analyzing and calculating the vehicle environment characteristic expression, and giving an intelligent decision result to the current driving environment, wherein the intelligent decision result comprises a transverse decision and a longitudinal decision; the transverse decision comprises lane keeping, lane changing, left turning and right turning of the vehicle, and the longitudinal decision comprises acceleration, deceleration and uniform speed of the vehicle; the evaluation module analyzes and calculates the driving characteristic expression of the driver, judges and scores the current driving state, records the score of the current score and the termination times of the current driving state, stores the driving characteristic expression of the driver, the intelligent decision result, the score of the current vehicle state and whether the current driving state is terminated or not as one experience in the experience pool module, wherein the experience pool module is used for storing the experience of signal information comprising the driving characteristic expression of the driver, the intelligent decision result, the score of the current vehicle state and whether the current driving state is terminated or not, extracts a plurality of experiences in the experience pool module, carries out back propagation on a deep reinforcement learning decision network, adjusts decision network parameters until the deep reinforcement learning decision network converges, and comprises a deep convolution neural network for processing image information and environment three-dimensional point cloud information and a deep circulation neural network for processing sound information, and the network structure of the deep reinforcement learning decision network comprises at least one convolution layer and two fully connected flows; two fully connected streams are located after the convolutional layer; each full connection flow is composed of at least one full connection layer; the number of neurons of the last layer of the full-connection layer is the same as the corresponding number of transverse decision types and longitudinal decision types; the convolutional layer is formed as a deep convolutional neural network and the fully-connected stream is formed as a deep convolutional neural network.

5. The intelligent decision-making system of the ground unmanned vehicle based on deep reinforcement learning of claim 4, wherein the scoring module analyzes and calculates the driving characteristic expression of the driver, and particularly the scoring module judges whether the current vehicle state is terminated according to the steering lamp, the brake and the accelerator signals in the driving characteristic expression of the driver.

6. The ground unmanned vehicle intelligent decision-making system based on deep reinforcement learning of claim 5, wherein the current vehicle state is determined to be terminated when at least one of the following conditions occurs: when the intelligent decision result is that the lane is kept and the turning lamp is in a lighted state within the decision threshold time, the current driving state is judged to be terminated; when the intelligent decision result is lane change and the turning lamp of the vehicle is not in a lighted state within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is turning and the turning lamp of the corresponding direction of the vehicle is not on within the decision threshold time, judging that the current driving state is terminated; when the intelligent decision result is deceleration and the signal exists on the vehicle accelerator within the decision threshold time, judging that the current driving state is terminated; and when the intelligent decision result is acceleration and the braking of the vehicle is signaled within the decision-making threshold time, judging that the current driving state is terminated.