CN116382150A

CN116382150A - Remote driving method and device based on deep reinforcement learning decision system and electronic equipment

Info

Publication number: CN116382150A
Application number: CN202310106346.9A
Authority: CN
Inventors: 李永伟; 霍向; 吴新开
Original assignee: Beijing Lobby Technology Co ltd
Current assignee: Beijing Lobby Technology Co ltd
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-07-04

Abstract

The invention discloses a remote driving scheme based on a deep reinforcement learning decision system, which belongs to the technical field of intelligent hardware, and comprises the following steps: receiving vehicle sensing data and vehicle position data uploaded by a vehicle; receiving a manual control instruction sent by a remote control cockpit; inputting the vehicle perception data, the vehicle position data and the map data into a preset deep reinforcement learning decision system to obtain a server control instruction of the vehicle; fusing the manual control instruction and the server control instruction to generate a target control instruction of the vehicle; and sending the target control instruction to the vehicle-mounted domain controller so that the vehicle-mounted domain controller controls the vehicle according to the target control instruction. The remote driving scheme based on the deep reinforcement learning decision system can reduce the dependence on a remote driver, can improve the reliability of a target control instruction for controlling a vehicle, and further improves the safety coefficient of automatic driving.

Description

Remote driving method and device based on deep reinforcement learning decision system and electronic equipment

Technical Field

The invention relates to the technical field of intelligent hardware, in particular to a remote driving method and device based on a deep reinforcement learning decision system and electronic equipment.

Background

The remote driving of the vehicle refers to the behavior of a driver for remotely controlling the running of the vehicle according to a road picture shot by a camera, and the driver can realize the conventional operations of acceleration, deceleration, turning, gear shifting, braking and the like of the vehicle through remote control, so that the remote driving of the vehicle is a driving mode of separating people from vehicles.

The current remote driving system has higher dependence on a remote driver, and the vehicle running is controlled completely according to the driving operation of the remote driver, so that the remote driver needs to participate in the whole process. On the one hand, driver fatigue is easily caused; in the second aspect, the risk coefficient is high because the safety hidden trouble is brought by the operator operation error once the operator is manually operated by the operator.

Disclosure of Invention

The embodiment of the invention aims to provide a remote driving method and device based on a deep reinforcement learning decision system and electronic equipment, which can solve the problems of fatigue and high risk coefficient of a driver in the existing scheme of controlling a vehicle completely depending on the operation of the remote driver.

In order to solve the technical problems, the invention provides the following technical scheme:

the embodiment of the invention provides a remote driving method based on a deep reinforcement learning decision system, wherein the method comprises the following steps:

receiving vehicle sensing data and vehicle position data uploaded by a vehicle;

receiving a manual control instruction sent by a remote control cockpit, wherein the manual control instruction is generated by remotely operating the vehicle by a remote operator according to the vehicle perception data and the vehicle position data received by the remote control cockpit;

inputting the vehicle perception data, the vehicle position data and the map data into a preset deep reinforcement learning decision system to obtain a server control instruction of the vehicle;

fusing the manual control instruction and the server control instruction to generate a target control instruction of the vehicle;

and sending the target control instruction to a vehicle-mounted domain controller so that the vehicle-mounted domain controller controls the vehicle according to the target control instruction.

Optionally, the preset deep reinforcement learning decision system is generated by training in the following manner:

collecting operation environment data and driving behavior data of a worker when driving according to a target driving task;

building a corresponding simulation system according to the running environment of the vehicle;

issuing a target driving task to a vehicle in the simulation system aiming at each simulation system to obtain autonomous driving behavior data of the vehicle, traffic rule violation information in the environment and collision information with environmental obstacles;

determining a reward and punishment mechanism according to the autonomous driving behavior data of the vehicle, the driving behavior data of a worker in a corresponding vehicle running environment, traffic rule violation information in the environment and collision information with environmental obstacles;

training the deep reinforcement learning network according to each simulation system and the corresponding reward and punishment mechanism until a preset convergence condition is met, and obtaining the deep reinforcement learning decision system.

Optionally, the step of determining the reward and punishment mechanism according to the autonomous driving behavior data of the vehicle, the driving behavior data of the staff in the corresponding vehicle running environment, the traffic rule violation information in the environment and the collision information with the environmental obstacle comprises the following steps:

analyzing and obtaining a journey deviation parameter and a driving behavior deviation parameter according to the autonomous driving behavior data of the vehicle and the driving behavior data of the staff;

determining a travel deviation degree according to the travel deviation parameter;

determining driving behavior deviation degree according to the driving behavior deviation parameter;

calculating a violation punishment parameter value according to traffic rule violation information in the environment;

calculating a collision penalty parameter value according to the environmental obstacle collision information;

substituting the travel deviation, the driving behavior deviation, the violation punishment parameter value and the collision punishment parameter value into a preset punishment function to obtain a punishment mechanism.

Optionally, the step of determining the degree of travel deviation according to the travel deviation parameter includes:

substituting the speed of the vehicle during autonomous driving, the included angle between the vehicle and the target travel axis and the relative position of the vehicle and the target travel axis into a preset travel deviation degree expression to obtain the travel deviation degree of the vehicle.

Optionally, the step of determining the driving behavior deviation degree according to the driving behavior deviation parameter includes:

and respectively carrying out corresponding difference solving on the steering wheel angle, the accelerator and the brake of the vehicle in the driving behavior data of the staff and the steering wheel angle, the accelerator and the brake of the vehicle in the autonomous driving behavior data of the vehicle, and determining the sum of the calculated differences as the deviation degree of the driving behavior of the vehicle.

Optionally, the step of calculating the value of the violation punishment parameter according to the traffic rule violation information in the environment includes:

substituting the relative position of the vehicle and the lane edge line, the vehicle running speed and the maximum value and the minimum value of the vehicle speed preset by the system into a preset violation punishment expression to obtain the vehicle violation punishment parameter value.

Optionally, the step of fusing the manual manipulation instruction and the server manipulation instruction to generate a target manipulation instruction of the vehicle includes:

constructing a fusion coefficient matrix of the manual control instruction and the server control instruction;

constructing a manual control instruction matrix and a server control instruction matrix;

the safety evaluation value, the high-efficiency evaluation value and the energy-saving evaluation value of the control instruction are represented by the fusion coefficient matrix, the manual control instruction matrix and the server control instruction matrix;

and inputting the control instruction safety evaluation value, the control instruction high-efficiency evaluation value and the control instruction energy-saving evaluation value into a preset fusion expression, and then solving the fusion expression to obtain a target control instruction of the vehicle.

The embodiment of the invention also provides a remote driving device based on the deep reinforcement learning decision system, wherein the device comprises:

the first receiving module is used for receiving vehicle perception data and vehicle position data uploaded by a vehicle;

the second receiving module is used for receiving a manual control instruction sent by a remote control cockpit, wherein the manual control instruction is generated by remotely operating the vehicle by a remote control personnel according to the vehicle perception data and the vehicle position data received by the remote control cockpit;

the input module is used for inputting the vehicle perception data, the vehicle position data and the map data into a preset deep reinforcement learning decision system to obtain a server control instruction of the vehicle;

the fusion module is used for fusing the manual control instruction and the server control instruction to generate a target control instruction of the vehicle;

and the sending module is used for sending the target control instruction to the vehicle-mounted domain controller so that the vehicle-mounted domain controller controls the vehicle according to the target control instruction.

Optionally, the preset deep reinforcement learning decision system is generated by a generating module, and the generating module includes:

the first sub-module is used for collecting operation environment data and driving behavior data of a worker when driving according to a target driving task;

the second sub-module is used for building a corresponding simulation system according to the vehicle running environment;

the third sub-module is used for issuing a target driving task to the vehicle in the simulation system aiming at each simulation system to obtain vehicle autonomous driving behavior data, intra-environment traffic rule violation information and collision information with environmental obstacles;

a fourth sub-module, configured to determine a reward and punishment mechanism according to the autonomous driving behavior data of the vehicle, driving behavior data of a worker in a corresponding vehicle running environment, traffic rule violation information in the environment, and collision information with an environmental obstacle;

and the fifth sub-module is used for training the deep reinforcement learning network according to each simulation system and the corresponding reward and punishment mechanism until the preset convergence condition is met, and obtaining the deep reinforcement learning decision system.

Optionally, the third submodule includes:

the first unit is used for analyzing and obtaining a journey deviation parameter and a driving behavior deviation parameter according to the autonomous driving behavior data of the vehicle and the driving behavior data of the staff;

the second unit is used for determining the travel deviation degree according to the travel deviation parameter;

a third unit for determining a driving behavior deviation degree according to the driving behavior deviation parameter;

a fourth unit for calculating a violation penalty parameter value according to the traffic rule violation information in the environment;

a fifth unit for calculating a collision penalty parameter value according to the environmental obstacle collision information;

and a sixth unit, configured to substitute the travel deviation, the driving behavior deviation, the violation punishment parameter value and the collision punishment parameter value into a preset punishment function to obtain a punishment mechanism.

Optionally, the second unit is specifically configured to:

Optionally, the third unit is specifically configured to:

Optionally, the fourth unit is specifically configured to:

Optionally, the fusion module includes:

a sixth sub-module, configured to construct a fusion coefficient matrix of the manual control instruction and the server control instruction;

a seventh sub-module, configured to construct a manual control instruction matrix and a server control instruction matrix;

the eighth sub-module is used for representing the control instruction safety evaluation value, the control instruction high-efficiency evaluation value and the control instruction energy-saving evaluation value through the fusion coefficient matrix, the manual control instruction matrix and the server control instruction matrix;

and the ninth submodule is used for inputting the control instruction safety evaluation value, the control instruction high-efficiency evaluation value and the control instruction energy-saving evaluation value into a preset fusion expression and then solving the fusion expression to obtain a target control instruction of the vehicle.

The embodiment of the invention provides electronic equipment, which comprises a processor, a memory and a program or instructions stored on the memory and capable of running on the processor, wherein the program or instructions realize the steps of any remote driving method based on a deep reinforcement learning decision system when being executed by the processor.

The embodiment of the invention provides a readable storage medium, wherein a program or instructions are stored on the readable storage medium, and the program or instructions realize the steps of any remote driving method based on a deep reinforcement learning decision system when being executed by a processor.

The remote driving scheme based on the deep reinforcement learning decision system provided by the embodiment of the invention receives vehicle perception data and vehicle position data uploaded by a vehicle; receiving a manual control instruction sent by a remote control cockpit; inputting vehicle perception data, vehicle position data and map data into a preset deep reinforcement learning decision system to obtain a server control instruction of a vehicle; fusing the manual control instruction and the server control instruction to generate a target control instruction of the vehicle; and sending the target control instruction to the vehicle-mounted domain controller so that the vehicle-mounted domain controller controls the vehicle according to the target control instruction. According to the remote driving scheme based on the deep reinforcement learning decision system, provided by the embodiment of the invention, the manual control instruction and the server control instruction are fused to generate the target control instruction of the vehicle, so that on one hand, the dependence on manual control can be reduced, and a remote driver can rest slightly when tired; in the second aspect, since the target manipulation instruction is finally generated by fusing the manipulation instructions of the two branches, the generated target manipulation instruction is more reliable than a manual manipulation instruction generated by a remote driver only, and the safety coefficient can be improved.

Drawings

FIG. 1 is a flow chart showing the steps of a method of remote driving based on a deep reinforcement learning decision system in accordance with an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating interactions of devices between remote driving systems according to an embodiment of the present application;

FIG. 3 is a block diagram illustrating a remote driving device based on a deep reinforcement learning decision system according to an embodiment of the present application;

fig. 4 is a block diagram showing a structure of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.

The unmanned delivery vehicle speed optimization scheme provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

As shown in fig. 1, the remote driving method based on the deep reinforcement learning decision system according to the embodiment of the application includes the following steps:

step 101: vehicle sensing data and vehicle position data uploaded by the vehicle are received.

The remote driving method based on the deep reinforcement learning decision system is applied to electronic equipment, and the electronic equipment can be a remote control server with an analysis function. The storage medium in the electronic equipment stores a remote driving optimization program based on the deep reinforcement learning decision system, and the processor of the electronic equipment runs the program in the storage medium to execute a remote driving method flow based on the deep reinforcement learning decision system.

The vehicle sensing data are acquired by a camera, a laser radar, a millimeter wave radar and other sensors of the vehicle, the vehicle position data are acquired by a positioning system of the vehicle, and the vehicle sensing data and the vehicle position data are transmitted back to the remote control cockpit and the remote control server in real time.

The remote driving method based on the deep reinforcement learning decision system is executed by a remote driving system, and an exemplary interaction diagram of devices between the remote driving system is shown in fig. 2. As shown in fig. 2, the system includes a vehicle (in which a vehicle sensor and a vehicle positioning system are provided), a remote control cabin, a remote control server, and a vehicle-mounted domain controller. The vehicle sensor and the vehicle positioning system upload the acquired vehicle sensing data and vehicle position data to the remote control cockpit and the remote control server. And the remote driver generates a manual control instruction for the vehicle according to the vehicle perception data and the vehicle position data received by the remote control cockpit, and the remote control cockpit sends the manual control instruction to the remote control server. The remote control server generates a server control instruction according to the received vehicle perception data and vehicle position data, fuses the manual control instruction and the server control instruction into an optimized control instruction, namely a target control instruction, and sends the optimized control instruction to the vehicle-mounted domain controller, and the vehicle-mounted domain controller controls the vehicle according to the target operation instruction.

In the embodiment of the application, from the perspective of a remote control server, a remote driving method based on a deep reinforcement learning decision system is described in detail, including but not limited to specific implementation of a remote control server generating a server control instruction, a fusion generating target control instruction, and the like.

Step 102: and receiving a manual control instruction sent by the remote control cockpit.

The manual control instruction is generated by remotely operating the vehicle through a remote control person according to vehicle perception data and vehicle position data received by the remote control cockpit. In the actual implementation process, a background remote driver controls an acceleration and deceleration controller, a turning controller and a braking controller in a remote control cockpit according to acquired vehicle perception data, generates a manual control instruction, and the remote control cockpit sends the generated manual control instruction to a remote control server.

Step 103: and inputting the vehicle perception data, the vehicle position data and the map data into a preset deep reinforcement learning decision system to obtain a server control instruction of the vehicle.

The remote control server is preset with a deep reinforcement learning decision system, and takes vehicle perception data, vehicle position data and map data as inputs of the deep reinforcement learning decision system, and outputs the input are server control instructions.

The deep reinforcement learning decision system is pre-created and trained to apply after meeting convergence criteria. The training process of the deep reinforcement learning decision system may be referred to the related description in the following optional embodiments, and will not be described herein.

Step 104: and fusing the manual control instruction and the server control instruction to generate a target control instruction of the vehicle.

An optional remote control server fuses the manual manipulation instruction and the server manipulation instruction, and the manner of generating the target manipulation instruction of the vehicle can be as follows:

firstly, constructing a fusion coefficient matrix of a manual control instruction and a server control instruction;

the fusion coefficient matrix of the manual manipulation instruction and the server manipulation instruction may be denoted as C.

Secondly, constructing a manual control instruction matrix and a server control instruction matrix;

the manual manipulation instruction matrix may be represented as X ₁ The manual control instruction matrix comprises an acceleration instruction, a braking instruction and a steering instruction. The server manipulation instruction matrix may be represented as X ₂ The server control instruction matrix comprises an acceleration instruction and a control instructionA move command and a steer command.

Thirdly, representing a control instruction safety evaluation value, a control instruction high-efficiency evaluation value and a control instruction energy-saving evaluation value through a fusion coefficient matrix, a manual control instruction matrix and a server control instruction matrix;

the control instruction safety evaluation value can pass through a function f ₁ (C,X ₁ ,X ₂ ) A representation; the high-efficiency evaluation value of the control instruction can pass through a function f ₂ (C,X ₁ ,X ₂ ) A representation; the energy-saving evaluation value of the control instruction can pass through a function f ₃ (C,X ₁ ,X ₂ ) And (3) representing.

And finally, inputting the control instruction safety evaluation value, the control instruction high-efficiency evaluation value and the control instruction energy-saving evaluation value into a preset fusion expression, and then solving the fusion expression to obtain a target control instruction of the vehicle.

The preset fusion expression may be set as: min [ f ₁ (C,X ₁ ,X ₂ )+f ₂ (C,X ₁ ,X ₂ )+f ₃ (C,X ₁ ,X ₂ )]. In definition C, X ₁ 、X ₂ And then solving the fusion expression to obtain the target control instruction of the vehicle. The target manipulation command includes control commands for acceleration, deceleration, turning, braking, etc. of the vehicle.

According to the method for selectively generating the target control instruction of the vehicle, the target control instruction is finally optimized by fusing the control instructions of both the manual and the remote server, so that the vehicle is safer, more efficient and more energy-saving than the mode of controlling the vehicle only according to the manual control instruction when the vehicle is controlled according to the generated target control instruction.

Step 105: and sending the target control instruction to the vehicle-mounted domain controller so that the vehicle-mounted domain controller controls the vehicle according to the target control instruction.

And after receiving the target control instruction sent by the remote control server, the vehicle-mounted domain controller controls the vehicle to accelerate, decelerate, turn, brake and the like according to the target control instruction.

In an alternative embodiment, the preset deep reinforcement learning decision system is generated by training in the following manner, and comprises the following steps:

step one: collecting operation environment data and driving behavior data of a worker when driving according to a target driving task;

the method comprises the steps that a worker drives a vehicle carrying sensors such as a camera, a laser radar, a millimeter wave radar and the like in a set running environment, the sensors carried on the vehicle collect running environment data, and driving behavior data such as acceleration and deceleration, turning and braking of the worker are collected in real time.

Step two: building a corresponding simulation system according to the running environment of the vehicle;

the deep reinforcement learning decision system is built with simulation systems corresponding to various vehicle running environments, and each simulation system can correspond to one or more vehicle running environments.

Step three: aiming at each simulation system, issuing a target driving task to a vehicle in the simulation system to obtain autonomous driving behavior data of the vehicle, traffic rule violation information in the environment and collision information with environmental obstacles;

step four: determining a reward and punishment mechanism according to autonomous driving behavior data of the vehicle, driving behavior data of a worker in a corresponding vehicle running environment, traffic rule violation information in the environment and collision information with environmental obstacles;

in the alternative embodiment, a corresponding simulation system is built according to the running environment of the vehicle, a target driving task is issued to the vehicle in the simulation system, the vehicle autonomously generates a control command of the vehicle, and a corresponding reward and punishment mechanism is given out according to the deviation degree from a target journey, the deviation degree from the driving behavior of a worker, the violation condition of the traffic rule in the environment, the collision condition of an environmental obstacle and the like.

Step five: training the deep reinforcement learning network according to each simulation system and the corresponding reward and punishment mechanism until a preset convergence condition is met, and obtaining the deep reinforcement learning decision system.

The first step to the fourth step are equivalent to the process that in the data acquisition stage, given a target driving task, a human driver performs accelerator, brake and steering operation of a steering wheel through a force feedback steering wheel sleeve, the control of a vehicle is realized in a virtual driving environment, and meanwhile, the system acquires environment data and records driving action data to form a driving action-environment state data pair.

After the driving action-environment state data pair is generated, training the deep reinforcement learning network by utilizing the generated data pair, wherein the training adopts the environment state data as input, outputs driving action data, and takes a Euclidean distance function of a predicted output value and a true value of the deep reinforcement learning network as a loss function. After training converges to meet preset convergence conditions, taking the obtained network weights as weights of the deep reinforcement learning network to obtain the deep reinforcement learning decision system.

The remote control server generates a server control instruction by means of a punishment and punishment mechanism of the deep reinforcement learning decision system, so that the reliability of the server control instruction output by the system can be ensured.

In an alternative embodiment, the step of determining the reward and punishment mechanism based on the vehicle autonomous driving behavior data, the corresponding vehicle operating environment operator driving behavior data, the intra-environment traffic rule violation information, and the collision information with the environmental obstacle may comprise the sub-steps of:

the method comprises the following substeps: analyzing and obtaining a journey deviation parameter and a driving behavior deviation parameter according to the autonomous driving behavior data of the vehicle and the driving behavior data of the staff;

the trip deviation parameter and the driving behavior deviation parameter include: a vehicle state space related parameter and a vehicle motion space related parameter. Wherein the vehicle state space related parameters include: vehicle speed V, angle between vehicle and target travel axis

A relative position d of the vehicle and the target travel axis, a relative position e of the vehicle and the lane edge line; the vehicle motion space related parameters include: steering wheel angle a, throttle b, brake c.

Sub-step two: determining a travel deviation degree according to the travel deviation parameter;

in the actual implementation process, when the travel deviation degree is determined according to the travel deviation parameter, the speed of the vehicle during autonomous driving, the included angle between the vehicle and the target travel axis and the relative position of the vehicle and the target travel axis can be substituted into a preset travel deviation degree expression to obtain the travel deviation degree of the vehicle.

The stroke offset expression may be:

this bonus function is desirable to maximize the axial speed of the vehicle, minimize the lateral speed of the vehicle, and to drive the vehicle along the target travel axis.

And a sub-step three: determining driving behavior deviation degree according to the driving behavior deviation parameter;

in the actual implementation process, when the driving behavior deviation degree is determined according to the driving behavior deviation parameter, the steering wheel angle, the accelerator and the brake of the vehicle in the driving behavior data of the staff can be respectively correspondingly differentiated from the steering wheel angle, the accelerator and the brake in the autonomous driving behavior data of the vehicle, and the sum of the calculated differences is determined as the driving behavior deviation degree of the vehicle.

In the actual implementation, the vehicle driving behavior deviation degree may be calculated by the following driving behavior deviation degree expression:

R ₂ ＝|a-a ₀ |+|b-b ₀ |+|c-c ₀ |

wherein a is ₀ ,b ₀ ,c ₀ Respectively representing steering wheel rotation angle, accelerator and brake in the driving behavior data of the staff, and a, b and c respectively representing steering wheel rotation angle, accelerator and brake in the autonomous driving behavior data of the vehicle.

The motion output range after steering wheel angle normalization is (-1, 1), and the maximum right turn of the vehicle is indicated when the output value is-1, and the maximum left turn of the vehicle is indicated when the output value is 1.

The accelerator is normalized to (0, 1), indicating that the vehicle is not accelerated when the output value is 0, i.e., the accelerator is not stepped on, and indicating that the vehicle is fully accelerated when the output value is 1.

The brake is normalized to (0, 1), 0 indicating no brake applied, and 1 indicating maximum brake applied force.

And a sub-step four: calculating a violation punishment parameter value according to traffic rule violation information in the environment;

in the actual implementation process, when the violation punishment parameter value is calculated according to the traffic rule violation information in the environment, the relative position of the vehicle and the lane edge line when the vehicle is autonomously driven, the vehicle running speed and the maximum value and the minimum value of the vehicle speed preset by the system can be substituted into a preset violation punishment expression to obtain the vehicle violation punishment parameter value.

The vehicle violation penalty parameter values may be characterized in terms of the following violation penalty expressions:

R ₃ ＝Ve-min{0,V _max -V}-min{0,V-V _min }

the relative position of the vehicle and the lane edge line indicates that the vehicle is located in the lane when the value of the relative position is positive, and indicates that the vehicle is located outside the lane when the value of the relative position is negative. V (V) _max ,V _min The maximum value and the minimum value of the vehicle speed preset by the system are respectively.

Fifth, the sub-steps are: calculating a collision penalty parameter value according to the environmental obstacle collision information;

regarding the collision penalty, if the vehicle collides with a vehicle, a pedestrian, a bicycle, and other traffic facilities, the penalty is given. An exemplary collision penalty expression may be as follows:

R ₄ ＝-collision_flag

wherein, the collision_flag is a collision detection flag, the initial value is 0, and the value increases with the increase of the number of collisions.

And step six: substituting the travel deviation degree, the driving behavior deviation degree, the violation punishment parameter value and the collision punishment parameter value into a preset punishment function to obtain a punishment mechanism.

An exemplary punishment function is designed as follows:

R＝αR ₁ +βR ₂ +δR ₃ +εR ₄

wherein α, β, δ, and ε are system parameters, and specific values of these four system parameters can be flexibly set by those skilled in the art according to the importance of rewards and punishments in each dimension, which is not specifically limited in this application.

The method for optionally determining the reward and punishment mechanism integrates four dimensions of the travel deviation degree, the driving behavior deviation degree, the violation punishment and the collision punishment of the vehicle to finally determine the reward and punishment mechanism, and the determined reward and punishment mechanism is more accurate and comprehensive.

Fig. 3 is a block diagram of a remote driving device based on a deep reinforcement learning decision system according to an embodiment of the present application.

The remote driving device based on the deep reinforcement learning decision system provided by the embodiment of the application comprises the following functional modules:

a first receiving module 301, configured to receive vehicle sensing data and vehicle position data uploaded by a vehicle;

the second receiving module 302 is configured to receive a manual control instruction sent by a remote control cockpit, where the manual control instruction is generated by remotely operating the vehicle by a remote operator according to the vehicle perception data and the vehicle position data received by the remote control cockpit;

the input module 303 is configured to input the vehicle sensing data, the vehicle position data, and the map data to a preset deep reinforcement learning decision system, so as to obtain a server control instruction of the vehicle;

the fusion module 304 is configured to fuse the manual control instruction and the server control instruction, and generate a target control instruction of the vehicle;

and the sending module 305 is configured to send the target manipulation instruction to a vehicle-mounted domain controller, so that the vehicle-mounted domain controller controls the vehicle according to the target manipulation instruction.

Optionally, the third submodule includes:

Optionally, the second unit is specifically configured to:

Optionally, the third unit is specifically configured to:

Optionally, the fourth unit is specifically configured to:

Optionally, the fusion module includes:

According to the remote driving device based on the deep reinforcement learning decision system, provided by the embodiment of the application, the manual control instruction and the server control instruction are fused to generate the target control instruction of the vehicle, so that on one hand, the dependence on manual control can be reduced, and a remote driver can rest slightly when tired; in the second aspect, since the target manipulation instruction is finally generated by fusing the manipulation instructions of the two branches, the generated target manipulation instruction is more reliable than a manual manipulation instruction generated by a remote driver only, and the safety coefficient can be improved.

The remote driving device based on the deep reinforcement learning decision system shown in fig. 3 in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a server. The remote driving device based on the deep reinforcement learning decision system shown in fig. 3 in the embodiment of the present application may be a device with an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.

The remote driving device based on the deep reinforcement learning decision system shown in fig. 3 provided in this embodiment of the present application can implement each process implemented by the method embodiment of fig. 1, and in order to avoid repetition, a description is omitted here.

Optionally, as shown in fig. 4, the embodiment of the present application further provides an electronic device 400, including a processor 401, a memory 402, and a program or an instruction stored in the memory 402 and capable of being executed on the processor 401, where the program or the instruction is executed by the processor 401 to implement each process of the foregoing embodiment of the remote driving method based on the deep reinforcement learning decision system, and the process may achieve the same technical effect, and will not be repeated herein.

It should be noted that the electronic device in the embodiment of the present application includes the server described above.

The embodiment of the application further provides a readable storage medium, on which a program or an instruction is stored, where the program or the instruction, when executed by a processor, implements each process of the foregoing embodiment of the remote driving method based on the deep reinforcement learning decision system, and can achieve the same technical effect, so that repetition is avoided, and no further description is given here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running a program or an instruction, implementing each process of the remote driving method embodiment based on the deep reinforcement learning decision system, and achieving the same technical effect, so as to avoid repetition, and no redundant description is provided herein.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims

1. A method of remote driving based on a deep reinforcement learning decision system, the method comprising:

receiving vehicle sensing data and vehicle position data uploaded by a vehicle;

2. The method of claim 1, wherein the pre-set deep reinforcement learning decision system is generated by training in the following manner:

3. The method of claim 2, wherein determining the reward and punishment mechanism based on the vehicle autonomous driving behavior data, the corresponding vehicle operating environment operator driving behavior data, the intra-environment traffic rule violation information, and the collision information with the environmental obstacle comprises:

4. A method according to claim 3, wherein the step of determining the degree of travel deviation from the travel deviation parameter comprises:

5. A method according to claim 3, wherein the step of determining the degree of deviation of driving behaviour in dependence on the driving behaviour deviation parameter comprises:

6. A method according to claim 3, wherein the step of calculating a violation penalty parameter value in dependence of traffic rule violation information within the environment comprises:

7. The method of claim 1, wherein the step of generating the target manipulation instruction of the vehicle by fusing the manual manipulation instruction and the server manipulation instruction comprises:

8. A remote driving apparatus based on a deep reinforcement learning decision system, the apparatus comprising:

9. The apparatus of claim 8, wherein the pre-set deep reinforcement learning decision system is generated by a generation module comprising:

10. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the deep reinforcement learning decision system based remote driving method of any of claims 1-7.