CN115140091A

CN115140091A - Automatic driving decision method, device, vehicle and storage medium

Info

Publication number: CN115140091A
Application number: CN202210753584.4A
Authority: CN
Inventors: 王艺蒙; 吕颖; 高延熹; 韩佳琪
Original assignee: FAW Group Corp
Current assignee: FAW Group Corp
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-10-04

Abstract

The invention discloses an automatic driving decision method, an automatic driving decision device, a vehicle and a storage medium. Wherein, the method comprises the following steps: acquiring driving environment data of a target vehicle, wherein the driving environment data comprises vehicle data and obstacle data of the target vehicle; analyzing the driving environment data by using a deep neural network model to obtain a decision action, wherein the deep neural network model is constructed and trained on the basis of a return function and a deep reinforcement learning algorithm, and the return function is used for training the deep neural network model according to the driving speed and the driving displacement of a target vehicle at least two moments; and controlling the target vehicle to execute the decision-making action. The method solves the technical problem that the return function in the decision model is unreasonable in design, so that the decision result output by the model is unreasonable.

Description

Automatic driving decision method, device, vehicle and storage medium

Technical Field

The invention relates to the technical field of automatic driving, in particular to an automatic driving decision method, an automatic driving decision device, a vehicle and a storage medium.

Background

The automotive industry is in a revolutionary age, and the development and application of the related technology of automatic driving are fierce. The current automatic driving technical scheme is that sensor data are processed through sensing, an unmanned vehicle instruction is output through a decision module, a specific driving track is planned by a planning module and output to a control module, and a steering wheel and acceleration and deceleration control is carried out on the unmanned vehicle through the control module.

At present, an excellent automatic driving decision model based on deep reinforcement learning combines the perception capability of deep learning and the decision capability of reinforcement learning, an intelligent agent in the deep reinforcement learning does not need human expert experience or artificial coding under a reinforcement learning large frame, the intelligent agent completely depends on self learning and environment interaction signals, and the intelligent agent can be realized through the deep reinforcement learning. However, when these decision models are designed, the collected information is not comprehensive enough, which results in unreasonable design of the return function in the model and further results in unreasonable decision results.

Disclosure of Invention

The embodiment of the invention provides an automatic driving decision method, an automatic driving decision device, a vehicle and a storage medium, and aims to at least solve the technical problem that a return function in a decision model is unreasonable in design, so that a decision result output by the model is unreasonable.

According to an aspect of an embodiment of the present invention, there is provided an automatic driving decision method, including:

acquiring driving environment data of a target vehicle, wherein the driving environment data comprises vehicle data and obstacle data of the target vehicle; analyzing the driving environment data by using a deep neural network model to obtain a decision action, wherein the deep neural network model is constructed and trained on the basis of a return function and a deep reinforcement learning algorithm, and the return function is used for training the deep neural network model according to the driving speed and the driving displacement of the target vehicle at least two moments; and controlling the target vehicle to execute the decision-making action.

Optionally, the constructing and training of the deep neural network model based on the reward function and the deep reinforcement learning algorithm includes: calculating to obtain an accumulated return value according to the value of the return function; constructing an action value function according to the accumulated return value and a deep reinforcement learning algorithm; constructing a loss function according to the action value function; constructing an initial deep neural network model according to the loss function and the action value function; and training the initial deep neural network model according to a preset sample and a loss function to obtain the deep neural network model.

Optionally, constructing the loss function according to the action value function includes: setting the depth network with the weight as a preset value as a function approximator of an action value function; and constructing a loss function according to the action value function and the function approximator.

Optionally, the training the initial deep neural network model according to the preset sample and the loss function to obtain the deep neural network model includes: training the initial deep neural network model according to a preset sample to obtain a first output result; putting the first output result into a preset buffer queue; selecting a first output result from a preset buffer queue as a training sample by adopting a uniform random sampling method; and training the initial deep neural network model by adopting the training sample to obtain the deep neural network model.

Optionally, constructing the initial deep neural network model according to the loss function and the action value function includes: constructing an input layer according to the driving environment data, wherein the input layer comprises a first state space and a second state space, the first state space is used for inputting vehicle data of a target vehicle, and the second state space is used for inputting obstacle data; the method comprises the steps that a first data extraction layer and a second data extraction layer are built according to driving environment data, wherein the first data extraction layer is connected with a first state space, the second data extraction layer is connected with a second state space, the first extraction layer is used for carrying out data extraction on vehicle data of a target vehicle to obtain first extraction data, and the second extraction layer is used for carrying out data extraction on obstacle data to obtain second extraction data; constructing a first fusion layer and a second fusion layer according to the driving environment data, wherein the first fusion layer is connected with the first data extraction layer and the second data extraction layer and is used for fusing the first extracted data and the second extracted data to obtain first fusion data, and the second fusion layer is connected with the first data extraction layer, the second data extraction layer and the first fusion layer and is used for fusing the first fusion data, the first extracted data and the second extracted data to obtain second fusion data; and constructing an output layer based on the loss function and the action value function, wherein the output layer is connected with the second data fusion layer and is used for making a decision and outputting a decision action according to the second fusion data.

Optionally, the vehicle data of the target vehicle includes at least a speed of the target vehicle and a distance between a geometric center of the target vehicle and a center line of the current lane, and the obstacle data includes at least a distance between the obstacle and the target vehicle and a speed of the obstacle.

Optionally, the decision action comprises at least one of: acceleration action, deceleration action, holding speed, left lane change action and right lane change action.

According to an embodiment of the present invention, there is also provided an automatic driving decision device, including:

the acquisition module is used for acquiring driving environment data of the target vehicle, wherein the driving environment data comprises vehicle data and obstacle data of the target vehicle; the decision module is used for analyzing the driving environment data by utilizing the deep neural network model to obtain decision actions, wherein the deep neural network model is constructed and trained on the basis of a return function and a deep reinforcement learning algorithm, and the return function is used for training the deep neural network model according to the driving speed and the driving displacement of the target vehicle at least two moments; and the control module is used for controlling the target vehicle to execute the decision-making action.

Optionally, the decision module is further configured to construct and train a deep neural network model based on a reward function and a deep reinforcement learning algorithm, including: calculating to obtain an accumulated return value according to the value of the return function; constructing an action value function according to the accumulated return value and a deep reinforcement learning algorithm; constructing a loss function according to the action value function; constructing an initial deep neural network model according to the loss function and the action value function; and training the initial deep neural network model according to a preset sample and a loss function to obtain the deep neural network model.

Optionally, the decision module is further configured to construct a loss function according to the action value function, including: setting the depth network with the weight as a preset value as a function approximator of an action value function; and constructing a loss function according to the action value function and the function approximator.

Optionally, the training of the initial deep neural network model according to the preset sample and the loss function by the decision module to obtain the deep neural network model includes: training the initial deep neural network model according to a preset sample to obtain a first output result; putting the first output result into a preset buffer queue; selecting a first output result from a preset buffer queue as a training sample by adopting a uniform random sampling method; and training the initial deep neural network model by adopting the training sample to obtain the deep neural network model.

Optionally, the decision module is further configured to construct an initial deep neural network model according to the loss function and the action value function, including: constructing an input layer according to the driving environment data, wherein the input layer comprises a first state space and a second state space, the first state space is used for inputting vehicle data of a target vehicle, and the second state space is used for inputting obstacle data; the method comprises the steps that a first data extraction layer and a second data extraction layer are built according to driving environment data, wherein the first data extraction layer is connected with a first state space, the second data extraction layer is connected with a second state space, the first extraction layer is used for carrying out data extraction on vehicle data of a target vehicle to obtain first extraction data, and the second extraction layer is used for carrying out data extraction on obstacle data to obtain second extraction data; constructing a first fusion layer and a second fusion layer according to the driving environment data, wherein the first fusion layer is connected with the first data extraction layer and the second data extraction layer and is used for fusing the first extracted data and the second extracted data to obtain first fusion data, and the second fusion layer is connected with the first data extraction layer, the second data extraction layer and the first fusion layer and is used for fusing the first fusion data, the first extracted data and the second extracted data to obtain second fusion data; and constructing an output layer based on the loss function and the action value function, wherein the output layer is connected with the second data fusion layer and is used for making a decision and outputting a decision action according to the second fusion data.

Optionally, the vehicle data of the target vehicle collected by the collecting module at least includes a speed of the target vehicle and a distance between a geometric center of the target vehicle and a current lane center line, and the obstacle data at least includes a distance between the obstacle and the target vehicle and a speed of the obstacle.

Optionally, the decision-making action performed by the control module comprises at least one of: acceleration action, deceleration action, holding speed, left lane change action and right lane change action.

There is also provided, in accordance with an embodiment of the present invention, a vehicle, including a memory having a computer program stored therein and a processor configured to execute the computer program to perform the automated driving decision method of any one of the above.

There is further provided, according to an embodiment of the present invention, a non-volatile storage medium having a computer program stored therein, wherein the computer program is configured to execute the automatic driving decision method in any one of the above when run on a computer or a processor.

In the embodiment of the invention, driving environment data of a target vehicle are firstly collected, the driving environment data comprise vehicle data and obstacle data of the target vehicle, then the driving environment data are analyzed by using a deep neural network model to obtain a decision action, wherein the deep neural network model is constructed and trained on the basis of a return function and a deep reinforcement learning algorithm, the return function is used for training the deep neural network model according to the driving speed and the driving displacement of the target vehicle at least two moments, and finally the target vehicle is controlled to execute the decision action. By adopting the method, the return function based on the driving speed and the driving displacement of the target vehicle at least two moments is constructed, and the decision result output by the target neural network model can be more correct by guiding the training of the target neural network model through the return function, so that the technical problem that the decision result output by the model is not reasonable due to unreasonable design of the return function in the decision model is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow diagram of an automated driving decision method according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of an initial deep neural network model according to one embodiment of the present invention;

fig. 3 is a block diagram of an automatic driving decision device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In accordance with an embodiment of the present invention, there is provided an embodiment of an automated driving decision method, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than presented herein.

The method embodiments may be performed in an electronic device, similar control device or system, comprising a memory and a processor. Taking an electronic device as an example, the electronic device may include one or more processors and memory for storing data. Optionally, the electronic apparatus may further include a communication device for a communication function and a display device. It is understood by those skilled in the art that the above structural description is only illustrative and not restrictive on the structure of the electronic device. For example, the electronic device may also include more or fewer components than described above, or have a different configuration than described above.

A processor may include one or more processing units. For example: the processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processing (DSP) chip, a Microprocessor (MCU), a field-programmable gate array (FPGA), a neural Network Processor (NPU), a Tensor Processing Unit (TPU), an Artificial Intelligence (AI) type processor, and the like. Wherein the different processing units may be separate components or may be integrated in one or more processors. In some examples, the electronic device may also include one or more processors.

The memory may be configured to store a computer program, for example, a computer program corresponding to the automatic driving decision method in the embodiment of the present invention, and the processor may implement the automatic driving decision method by operating the computer program stored in the memory. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, which may be connected to the electronic device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Communication devices are used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the communication device includes a Network Interface Controller (NIC) that may be connected to other network devices via a base station to communicate with the internet. In one example, the communication device may be a Radio Frequency (RF) module for communicating with the internet by wireless means.

The display device may be, for example, a touch screen type Liquid Crystal Display (LCD) and a touch display (also referred to as a "touch screen" or "touch display screen"). The liquid crystal display may enable a user to interact with a user interface of the mobile terminal. In some embodiments, the mobile terminal has a Graphical User Interface (GUI) with which a user can interact by touching finger contacts and/or gestures on a touch-sensitive surface, where the man-machine interaction function optionally includes the following interactions: executable instructions for creating web pages, drawing, word processing, making electronic documents, games, video conferencing, instant messaging, emailing, call interfacing, playing digital video, playing digital music, and/or web browsing, etc., for performing the above-described human-computer interaction functions, are configured/stored in one or more processor-executable computer program products or readable storage media.

Fig. 1 is an automatic driving decision method according to an embodiment of the present invention, as shown in fig. 1, the method includes the steps of:

and S101, acquiring driving environment data of the target vehicle.

The driving environment data comprises vehicle data and obstacle data of the target vehicle.

In the automatic driving decision process of the target vehicle, the driving environment information needs to be collected in real time. The target vehicle can make automated driving decisions based on the vehicle data of the target vehicle and the obstacle data surrounding the target vehicle.

Optionally, the driving data at least includes the speed of the target vehicle and the distance between the geometric center of the target vehicle and the center line of the current lane, and the obstacle data at least includes the distance between the obstacle and the target vehicle and the speed of the obstacle.

It should be noted that the obstacle data may be obstacle data in eight directions of the target vehicle. The obstacle may be another vehicle in driving or an unmoved object. The eight-direction obstacle data includes data of a nearest obstacle in front of the current lane, data of a nearest obstacle behind the current lane, data of a nearest obstacle in the left lane, data of a nearest obstacle in the right lane, a distance of the nearest obstacle in front of the left lane, a distance of the nearest obstacle behind the left lane, a distance of the nearest obstacle in front of the right lane, and a distance of the nearest obstacle behind the right lane.

And S102, analyzing the driving environment data by using the deep neural network model to obtain a decision action.

The deep neural network model is constructed and trained based on a return function and a deep reinforcement learning algorithm, and the return function is used for training the deep neural network model according to the driving speed and the driving displacement of the target vehicle at least two moments.

For example, the two moments can be t moment and t-1 moment respectively, the driving speed is the driving speed at t moment and the driving speed at t-1 moment, and the driving displacement is the total mileage traveled by the vehicle at t moment and the total mileage traveled by the vehicle at t-1 moment. The return function is used for guiding the training of the deep neural network model, and the return value is influenced by the driving speed and the driving displacement, so that the decision-making action output by the deep neural network model can be trained more accurately.

And step S103, controlling the target vehicle to execute a decision-making action.

And after the deep neural network model outputs the decision-making action, the control module controls the target vehicle to execute the output decision-making action.

According to the steps, firstly, driving environment data of a target vehicle are collected, the driving environment data comprise vehicle data and obstacle data of the target vehicle, then, the driving environment data are analyzed through a deep neural network model, a decision-making action is obtained, wherein the deep neural network model is constructed and trained on the basis of a return function and a deep reinforcement learning algorithm, the return function is used for training the deep neural network model according to driving speeds and driving displacements of the target vehicle at least two moments, and finally, the target vehicle is controlled to execute the decision-making action. By adopting the method, the return function based on the driving speed and the driving displacement of the target vehicle at least two moments is constructed, and the decision result output by the target neural network model can be more correct by guiding the training of the target neural network model through the return function, so that the technical problem that the decision result output by the model is not reasonable due to unreasonable design of the return function in the decision model is solved.

Optionally, the construction and training of the deep neural network model based on the reward function and the deep reinforcement learning algorithm may include the following steps:

step S102a, calculating an accumulated return value according to the value of the return function.

The return function is constructed based on the running speed, the running displacement and the collision penalty function of the target vehicle at least two moments and is used for evaluating the value of the target vehicle for converting from the last state to the current state in the automatic driving decision process. The state is the image information collected by the vehicle. The report function is expressed as follows:

r _t ＝(d _t -d _t-1 )+r _col +(v _t -v _t-1 )

wherein, d _t Total distance traveled by target vehicle at time t, d _t-1 Is the total distance traveled by the target vehicle at time t, v _t Speed of the target vehicle at time t, v _t-1 The speed of the target vehicle at time t-1.

Wherein a collision penalty function r _col The expression is as follows:

the accumulated return value is obtained by calculating an accumulated return function and is used for evaluating the quality of decision actions in the automatic driving process, and the function expression of the accumulated return function is as follows:

wherein R is _t Is the accumulated return value at time t, r _t+1 The function value is the return function value at the moment t +1, gamma is a discount factor, the value range of the discount factor is [0, 1], and L is a preset time length.

And step S102b, constructing an action value function according to the accumulated return value and the depth reinforcement learning algorithm.

The deep reinforcement learning algorithm is a deep Q network algorithm, the action value function is used for expressing the value of the current decision action, and the function expression of the action value function is as follows:

Q ^* (s,a)＝E[R+γmax _a′ Q ^* (s′,a′)∣s,a]

the function represents the action value when the target vehicle is in state s, taking decision action a. Wherein, R is the accumulated return function value in the state transition process, γ is a discount factor, the value range of the discount factor is [0, 1], s 'is the next state of the target vehicle, and a' is the decision action executed by the next state of the target vehicle.

And step S102c, constructing a loss function according to the action value function.

The loss function is used for representing the difference degree between the prediction and the actual data, and further measuring the quality of the model.

Optionally, the step S102c of constructing the loss function according to the action value function may include the following steps:

step S102c1, the depth network with the weight of the preset value is set as a function approximator of the action value function.

Specifically, a Q neural network with the weight of a preset value theta is adopted as a function approximator of an action value function, namely Q (s, a; theta) is approximately equal to Q (s, a).

And step S102c2, constructing a loss function according to the action value function and the function approximator.

The expression for the loss function is as follows:

and step S102d, constructing an initial deep neural network model according to the loss function and the action value function.

And step S102e, training the initial deep neural network model according to a preset sample and a loss function to obtain the deep neural network model.

The preset sample is quintuple data, and the expression form of the quintuple data is e _t ＝(s _t ,a _t ,r _t ,s _t+1 F). Wherein s is _t For the state of the target vehicle and the obstacle at time t, a _t Decision-making action performed for the target vehicle at time t, r _t The cumulative reward function value, s, for the target vehicle at time t _t+1 F is a flag bit for representing s for the next state of the target vehicle and the obstacle at the time t _t+1 Whether to follow the final state of the mission decision sequence for the target vehicle.

During training, preset samples are input, and parameters theta in a loss function are adjusted at each iteration i _i The initial deep neural network model is trained, and the error is reduced through random gradient descent.

Is the network parameter used to compute the target at iteration i. In order to make the learning process more stable, a fixed step number C is set, and the latest network parameters are used every C steps

Optionally, the step S102e of training the initial deep neural network model according to the preset sample and the loss function to obtain the deep neural network model may include the following steps:

step S102e1, training the initial deep neural network model according to a preset sample to obtain a first output result.

Step S102e2, putting the first output result into a preset buffer queue.

And step S102e3, selecting a first output result from a preset buffer queue as a training sample by adopting a uniform random sampling method.

And S102e4, training the initial deep neural network model by adopting the training sample to obtain the deep neural network model.

According to the steps S102e1, S102e2, S102e3 and S102e4, data generated by training are placed into a preset buffer queue, and then training data are extracted from the preset buffer queue by adopting a random uniform sampling method. The extracted training data is used as a sample to train the initial deep neural network model, so that the relevance among the data can be broken, and the trained deep neural network model is more stable.

Referring to fig. 2, optionally, the constructing of the initial deep neural network model according to the loss function and the action value function in step S102d may include the following steps:

and step S102d1, constructing an input layer according to the driving environment data.

The input layer includes a first state space for inputting vehicle data of the target vehicle and a second state space for inputting obstacle data.

The first state space is represented as:

s _e ＝[v _e ,d _e ]

wherein v is _e Is the speed of the target vehicle, d _e The distance between the geometric center point of the target vehicle and the center line of the lane. The second state space is represented as:

s _sur ＝[d _f ,v _f ,d _b ,v _b ,d _lf ,v _lf ,d _lb ,v _lb ,d _rf ,v _rf ,d _rb ,v _rb ]

wherein the distance of the nearest vehicle in front of the current lane is d _f Its velocity is v _f D is the distance of the nearest vehicle behind the current lane _b Its velocity is v _b The distance of the nearest vehicle in front of the left lane is d _lf Its velocity is v _lf The distance between the nearest vehicle behind the left lane is d _lb Its velocity is v _lb The distance of the nearest vehicle in front of the right lane is d _rf Its velocity is v _rf D is the distance of the nearest vehicle behind the right lane _rb Its velocity is v _rb 。

And step S102d2, constructing a first data extraction layer and a second data extraction layer according to the driving environment data.

The first data extraction layer is connected with the first state space, the second data extraction layer is connected with the second state space, the first extraction layer is used for carrying out data extraction on vehicle data of a target vehicle to obtain first extraction data, and the second extraction layer is used for carrying out data extraction on obstacle data to obtain second extraction data.

And step S102d3, constructing a first fusion layer and a second fusion layer according to the driving environment data.

The first fusion layer is connected with the first data extraction layer and the second data extraction layer and used for fusing the first extracted data and the second extracted data to obtain first fused data, and the second fusion layer is connected with the first data extraction layer, the second data extraction layer and the first fusion layer and used for fusing the first fused data, the first extracted data and the second extracted data to obtain second fused data.

Step S102d4, an output layer is constructed based on the loss function and the action value function.

And the output layer is connected with the second data fusion layer and used for making a decision according to the second fusion data and outputting a decision action.

The first data extraction layer, the second data extraction layer, the first fusion layer, the second fusion layer, and the output layer are all fully connected layers. The output dimension of the output layer is the action decision result of [1,1 ].

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, an automatic driving decision device is further provided, and the device is used to implement the above embodiments and preferred embodiments, which have already been described and will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 3 is a block diagram of an automatic driving decision device 200 according to an embodiment of the present invention, which is illustrated in fig. 3 as the automatic driving decision device 200, and includes: the system comprises an acquisition module 201, wherein the acquisition module 201 is used for acquiring driving environment data of a target vehicle, and the driving environment data comprises vehicle data and obstacle data of the target vehicle; the decision module 202 is used for analyzing the driving environment data by using a deep neural network model to obtain a decision action, wherein the deep neural network model is constructed and trained based on a return function and a deep reinforcement learning algorithm, and the return function is used for training the deep neural network model according to the driving speed and the driving displacement of the target vehicle at least two moments; and the control module 203, wherein the control module 203 is used for controlling the target vehicle to execute decision-making actions.

Optionally, the decision module 202 is further configured to construct and train a deep neural network model based on a reward function and a deep reinforcement learning algorithm, including: calculating to obtain an accumulated return value according to the value of the return function; constructing an action value function according to the accumulated return value and a deep reinforcement learning algorithm; constructing a loss function according to the action value function; constructing an initial deep neural network model according to the loss function and the action value function; and training the initial deep neural network model according to a preset sample and a loss function to obtain the deep neural network model.

Optionally, the decision module 202 is further configured to construct a loss function according to the action value function, including: setting the depth network with the weight as a preset value as a function approximator of an action value function; and constructing a loss function according to the action value function and the function approximator.

Optionally, the decision module 202 is further configured to train the initial deep neural network model according to a preset sample and a loss function to obtain a deep neural network model, where the training includes: training the initial deep neural network model according to a preset sample to obtain a first output result; putting the first output result into a preset buffer queue; selecting a first output result from a preset buffer queue as a training sample by adopting a uniform random sampling method; and training the initial deep neural network model by adopting the training sample to obtain the deep neural network model.

Optionally, the decision module 202 is further configured to construct an initial deep neural network model according to the loss function and the action value function, including: constructing an input layer according to the driving environment data, wherein the input layer comprises a first state space and a second state space, the first state space is used for inputting vehicle data of a target vehicle, and the second state space is used for inputting obstacle data; the method comprises the steps that a first data extraction layer and a second data extraction layer are built according to driving environment data, wherein the first data extraction layer is connected with a first state space, the second data extraction layer is connected with a second state space, the first extraction layer is used for carrying out data extraction on vehicle data of a target vehicle to obtain first extraction data, and the second extraction layer is used for carrying out data extraction on obstacle data to obtain second extraction data; constructing a first fusion layer and a second fusion layer according to the driving environment data, wherein the first fusion layer is connected with the first data extraction layer and the second data extraction layer and is used for fusing the first extracted data and the second extracted data to obtain first fusion data, and the second fusion layer is connected with the first data extraction layer, the second data extraction layer and the first fusion layer and is used for fusing the first fusion data, the first extracted data and the second extracted data to obtain second fusion data; and constructing an output layer based on the loss function and the action value function, wherein the output layer is connected with the second data fusion layer and is used for making a decision and outputting a decision action according to the second fusion data.

Optionally, the vehicle data of the target vehicle collected by the collecting module 201 at least includes the speed of the target vehicle and the distance between the geometric center of the target vehicle and the center line of the current lane, and the obstacle data at least includes the distance between the obstacle and the target vehicle and the speed of the obstacle.

Optionally, the decision-making action performed by the control module 203 comprises at least one of: acceleration action, deceleration action, holding speed, left lane change action and right lane change action.

Embodiments of the present invention also provide a vehicle comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps in the embodiments of the automated driving decision method described above.

Alternatively, in this embodiment, the processor in the vehicle may be configured to run a computer program to perform the steps of:

and S101, acquiring driving environment data of the target vehicle.

And S102, analyzing the driving environment data by using a deep neural network model to obtain a decision action.

Optionally, for a specific example in this embodiment, reference may be made to the examples described in the above embodiment and optional implementation, and this embodiment is not described herein again.

Embodiments of the present invention also provide a non-volatile storage medium having a computer program stored therein, wherein the computer program is configured to, when run on a computer or a processor, perform the steps in the embodiments of the automatic driving decision method described above.

Alternatively, in the present embodiment, the above-mentioned nonvolatile storage medium may be configured to store a computer program for executing the steps of:

and S101, collecting driving environment data of the target vehicle.

Optionally, in this embodiment, the nonvolatile storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and amendments can be made without departing from the principle of the present invention, and these modifications and amendments should also be considered as the protection scope of the present invention.

Claims

1. An automated driving decision method, comprising:

acquiring driving environment data of a target vehicle, wherein the driving environment data comprises vehicle data and obstacle data of the target vehicle;

analyzing the driving environment data by using a deep neural network model to obtain a decision action, wherein the deep neural network model is constructed and trained based on a return function and a deep reinforcement learning algorithm, and the return function is used for training the deep neural network model according to the driving speed and the driving displacement of the target vehicle at least two moments;

controlling the target vehicle to perform the decision-making action.

2. The automated driving decision method of claim 1, wherein the deep neural network model is constructed and trained based on a reward function and a deep reinforcement learning algorithm, comprising:

calculating to obtain an accumulated return value according to the value of the return function;

constructing an action value function according to the accumulated return value and the deep reinforcement learning algorithm;

constructing a loss function according to the action value function;

constructing an initial deep neural network model according to the loss function and the action value function;

and training the initial deep neural network model according to a preset sample and the loss function to obtain the deep neural network model.

3. The automated driving decision method of claim 2, wherein the constructing a loss function from the action value function comprises:

setting the depth network with the weight as a preset value as a function approximator of the action value function;

and constructing the loss function according to the action value function and the function approximator.

4. The automated driving decision method of claim 2, wherein the training the initial deep neural network model according to a preset sample and the loss function to obtain the deep neural network model comprises:

training the initial deep neural network model according to the preset sample to obtain a first output result;

putting the first output result into a preset buffer queue;

selecting the first output result from the preset buffer queue as a training sample by adopting a uniform random sampling method;

and training the initial deep neural network model by adopting the training sample to obtain the deep neural network model.

5. The automated driving decision method of claim 2, wherein the constructing an initial deep neural network model from the loss function and the action value function comprises:

constructing an input layer according to the driving environment data, wherein the input layer comprises a first state space and a second state space, the first state space is used for inputting vehicle data of the target vehicle, and the second state space is used for inputting the obstacle data;

constructing a first data extraction layer and a second data extraction layer according to the driving environment data, wherein the first data extraction layer is connected with the first state space, the second data extraction layer is connected with the second state space, the first extraction layer is used for carrying out data extraction on vehicle data of the target vehicle to obtain first extraction data, and the second extraction layer is used for carrying out data extraction on the obstacle data to obtain second extraction data;

constructing a first fusion layer and a second fusion layer according to the driving environment data, wherein the first fusion layer is connected with the first data extraction layer and the second data extraction layer and is used for fusing the first extraction data and the second extraction data to obtain first fusion data, and the second fusion layer is connected with the first data extraction layer, the second data extraction layer and the first fusion layer and is used for fusing the first fusion data, the first extraction data and the second extraction data to obtain second fusion data;

and constructing an output layer based on the loss function and the action value function, wherein the output layer is connected with the second data fusion layer and is used for making a decision according to the second fusion data and outputting the decision action.

6. The automated driving decision method of claim 1, wherein the vehicle data of the target vehicle comprises at least a speed of the target vehicle and a distance of a geometric center of the target vehicle from a current lane center line, and the obstacle data comprises at least a distance between an obstacle and the target vehicle and a speed of the obstacle.

7. The automated driving decision method of claim 1, wherein the decision action comprises at least one of: acceleration action, deceleration action, holding speed, left lane change action and right lane change action.

8. An automated driving decision device, comprising:

the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring driving environment data of a target vehicle, and the driving environment data comprises vehicle data and obstacle data of the target vehicle;

the decision module is used for analyzing the driving environment data by utilizing a deep neural network model to obtain a decision action, wherein the deep neural network model is constructed and trained based on a return function and a deep reinforcement learning algorithm, and the return function is used for training the deep neural network model according to the driving speed and the driving displacement of the target vehicle at least two moments;

a control module to control the target vehicle to perform the decision-making action.

9. A vehicle comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the automated driving decision method of any one of claims 1 to 7.

10. A non-volatile storage medium, having stored thereon a computer program, wherein the computer program is arranged to, when run on a computer or processor, perform the automated driving decision method of any of the preceding claims 1 to 7.