CN113593035A

CN113593035A - Motion control decision generation method and device, electronic equipment and storage medium

Info

Publication number: CN113593035A
Application number: CN202110778925.9A
Authority: CN
Inventors: 刘永进; 韩义恒; 赵旺; 詹昊哲
Original assignee: Tsinghua University
Current assignee: Beijing Jinrui Technology Co ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-11-02

Abstract

The invention provides a motion control decision generation method, a motion control decision generation device, electronic equipment and a storage medium, wherein the motion control decision generation method comprises the following steps: determining a depth prediction map corresponding to RGB image information based on first point cloud information of a target area acquired by a laser radar and RGB image information of the target area acquired by a camera; determining target point cloud information based on second point cloud information projected into a three-dimensional space by a depth prediction map, and obtaining third point cloud information according to the target point cloud information and the first point cloud information; and determining target multivariate state data according to the third point cloud information, and inputting the target multivariate state data into a multi-stage trained deep reinforcement learning control decision model to obtain a target motion control decision. The method provided by the invention can effectively overcome the plane limitation existing in radar detection, efficiently acquire the target motion control decision and realize better moving obstacle avoidance effect.

Description

Motion control decision generation method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a motion control decision generation method and device, electronic equipment and a storage medium.

Background

With the rapid development of artificial intelligence and computer vision technology, intelligent robots have been widely used in restaurants, shopping malls, banks, and other large service areas. The obstacle avoidance technology is used as a key ring for the application of the intelligent robot, and the development of the obstacle avoidance technology arouses high attention in the industry. The traditional robot obstacle avoidance means mainly comprises an ultrasonic sensor obstacle avoidance means, an infrared sensor obstacle avoidance means, a laser radar obstacle avoidance means and the like.

The obstacle avoidance method based on the laser radar is limited by cost, power consumption, simulation difficulty and the like, and a single-line radar is often used. However, the single-line radar can only detect a certain fixed plane, and if the robot has a high height, the obstacle avoidance effect cannot be achieved only by detecting the certain plane. For example, when the radar height setting is low, a single line radar cannot detect an obstacle at a high position; when the radar height setting is high, the single line radar cannot detect obstacles close to the ground, so that unfavorable motion control decisions are generated, and the condition of collision is caused.

Therefore, in the prior art, the control decision effect of moving obstacle avoidance is not good due to inaccurate obstacle detection.

Disclosure of Invention

The invention provides a motion control decision generation method and device, electronic equipment and a storage medium, which are used for solving the problem of poor control decision effect of moving obstacle avoidance caused by inaccurate obstacle detection.

The invention provides a motion control decision generation method, which comprises the following steps:

determining a depth prediction map corresponding to RGB image information based on first point cloud information of a target area acquired by a laser radar and the RGB image information of the target area acquired by a camera;

determining target point cloud information based on second point cloud information projected into a three-dimensional space by the depth prediction map, and obtaining third point cloud information according to the target point cloud information and the first point cloud information;

and determining target multivariate state data according to the third point cloud information, and inputting the target multivariate state data into a multi-stage trained deep reinforcement learning control decision model to obtain a target motion control decision.

According to the motion control decision generation method provided by the invention, before the determining the target point cloud information, the method further comprises the following steps:

calculating the first height of each pixel point in the world coordinate system according to the depth information of each pixel point in the depth prediction image;

processing the depth prediction image according to the first height of each pixel point and a preset height threshold value to obtain a filtered target depth image;

determining the second point cloud information based on the target depth map;

wherein the preset height threshold is determined based on the height of the ground in the world coordinate system and the height of the camera in the world coordinate system.

According to the motion control decision generation method provided by the invention, the determining of the target point cloud information based on the second point cloud information projected into the three-dimensional space by the depth prediction map comprises the following steps:

grouping the second point cloud information based on the coordinate information of the second point cloud information in a world coordinate system to obtain a plurality of groups of fourth point cloud information;

and determining target point cloud information according to the distance between each group of the fourth point cloud information and the camera.

According to the motion control decision generation method provided by the invention, the obtaining of the third point cloud information according to the target point cloud information and the first point cloud information comprises the following steps:

converting the target point cloud information into a radar coordinate system to obtain fifth point cloud information corresponding to the target point cloud information;

updating the first point cloud information according to the fifth point cloud information to obtain third point cloud information;

wherein the radar coordinate system is determined based on the first point cloud information.

According to the motion control decision generation method provided by the present invention, before inputting the target multivariate state data into the multi-stage trained deep reinforcement learning control decision model, the method further comprises:

taking a multi-element state data sample containing radar point cloud sample data as a group of training samples to obtain a plurality of groups of training samples;

and performing multi-stage training on the deep reinforcement learning control decision model by using the multiple groups of training samples, and stopping training when a preset convergence condition is met to obtain the deep reinforcement learning control decision model after the multi-stage training.

According to the motion control decision generation method provided by the invention, the multi-stage training of the deep reinforcement learning control decision model by using the multiple groups of training samples comprises the following steps:

for any group of training samples, inputting the training samples into a deep reinforcement learning control decision model for first-stage training, and obtaining a first profit reward value corresponding to the training samples by using a first price function; under the condition that the first profit reward value is converged, stopping the first-stage training to obtain a first deep reinforcement learning control decision model;

for any group of training samples, inputting the training samples into the first deep reinforcement learning control decision model for second-stage training, and obtaining a second profit reward value corresponding to the training samples by using a second value function; stopping the second-stage training under the condition that the second profit reward value is converged to obtain a second deep reinforcement learning control decision model;

for any group of training samples, inputting the training samples into the second deep reinforcement learning control decision model for third-stage training, and obtaining a third profit reward value corresponding to the training samples by using a third valence function; and under the condition that the third profit reward value is converged, stopping the third-stage training to obtain a deep reinforcement learning control decision model after the multi-stage training.

According to the motion control decision generation method provided by the invention, the first cost function is determined according to a preset Euclidean distance reduction term, a collision penalty term and a rotation angle penalty term;

the second price function is determined according to the first price function and a preset global reward and punishment term;

the third valence function is determined according to the first valence function, a preset maximum time step and an average time step of the second stage training.

The present invention also provides a motion control decision generating device, including:

the prediction module is used for determining a depth prediction map corresponding to RGB image information based on first point cloud information of a target area acquired by a laser radar and the RGB image information of the target area acquired by a camera;

the updating module is used for determining target point cloud information based on second point cloud information projected to a three-dimensional space by the depth prediction map and obtaining third point cloud information according to the target point cloud information and the first point cloud information;

and the decision module is used for determining target multivariate state data according to the third point cloud information, inputting the target multivariate state data into a multi-stage trained deep reinforcement learning control decision model, and obtaining a target motion control decision.

The invention also provides an obstacle avoidance robot, which comprises an obstacle avoidance robot body, wherein an obstacle avoidance processor, a monocular camera and a single line radar are arranged in the obstacle avoidance robot body, the monocular camera is arranged at the top of the obstacle avoidance robot body, the single line radar is arranged on the obstacle avoidance robot body, and the scanning field of the single line radar is at least partially overlapped with the shooting range of the monocular camera; the method further comprises a memory and a program or instructions stored on the memory and executable on the obstacle avoidance processor, wherein the program or instructions, when executed by the obstacle avoidance processor, perform the steps of any of the motion control decision generation methods described above.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the motion control decision generation method as any one of the above.

According to the motion control decision generation method, the motion control decision generation device, the electronic equipment and the storage medium, a depth prediction map corresponding to RGB image information is obtained according to first point cloud information of a target area acquired by a laser radar and RGB image information of the target area acquired by a camera, then the distance from the ground to the point closest to the camera in the height range of the top end of the camera is determined from the depth prediction map, the distance is used for updating data of the laser radar, the updated radar data contains point cloud data of obstacles which cannot be detected by the radar, and therefore plane limitation of radar detection is overcome; according to the updated radar data, a multi-stage value function training depth reinforcement learning control decision model is used, the result of each training is transmitted to each time step through the value function, overall consideration is achieved, the total time step length is transmitted to each time step through the value function, optimization and improvement on speed are achieved, target motion control decisions are obtained efficiently, a better moving obstacle avoidance effect is achieved, and the practicability is high.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a motion control decision making method provided by the present invention;

fig. 2 is a schematic structural front view of the obstacle avoidance robot provided by the invention;

fig. 3 is a schematic structural side view of an obstacle avoidance robot provided by the present invention;

FIG. 4 is a schematic overall flow chart of a motion control decision making method provided by the present invention;

FIG. 5 is a schematic diagram of a motion control decision making apparatus provided by the present invention;

fig. 6 is a schematic structural diagram of the obstacle avoidance robot provided by the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The motion control decision making method, apparatus, electronic device and storage medium of the present invention are described below with reference to fig. 1 to 6.

Fig. 1 is a schematic flow chart of a motion control decision generation method provided by the present invention, as shown in fig. 1, including:

step S110, determining a depth prediction map corresponding to RGB image information based on first point cloud information of a target area acquired by a laser radar and the RGB image information of the target area acquired by a camera;

specifically, the target area described in the present invention refers to the same area in the same environment as the laser radar scan and the camera shoot.

The first point cloud information described in the invention refers to point cloud information obtained by scanning a target area through a laser radar.

In the embodiment of the invention, the laser radar can adopt a single-line radar or a multi-line radar, the camera can adopt a monocular camera or a multi-line camera, the monocular radar or the multi-line radar can be arranged at the lower half part of the motion equipment to detect the obstacle close to the ground part, meanwhile, the monocular camera or the multi-line camera can be arranged at the top of the motion equipment to obtain a better camera visual field and detect the obstacle at a high position, the visual field direction of the camera is arranged to be vertical to the gravity direction, radar data and RGB image data of a target area are collected, and then calibration can be carried out through the radar data and the RGB image data to obtain first point cloud information and RGB image information.

In the embodiment of the invention, the motion device may be an obstacle avoidance robot, an automobile or other devices capable of realizing intelligent movement.

Further, calibrating the camera and the laser radar based on the relative position and angle of the camera and the laser radar and the camera internal parameters, and aiming at the original point cloud information obtained after the radar scans the target area, calibrating each point P of the original point cloud information in a radar coordinate system_lBoth using the camera's internal reference matrix R and the relative position of the camera to the radar T, P_lConverting into new point P under camera coordinate_cI.e. by

P_c＝RP_l+T；

And then, points exceeding the RGB image range of the camera in the radar original point cloud information can be filtered, calibration of radar data and RGB image data is completed, and the first point cloud information of the radar and the RGB image information of the camera are obtained.

It should be noted that, in the prior art, the problem that the model cannot be well applied to the real scene from the simulation scene exists only by the mobile obstacle avoidance scheme based on the visual information and the depth reinforcement learning model, so that the obstacle avoidance success rate in the real scene is greatly reduced.

Because the single-line radar can only detect a certain fixed plane, the obstacle avoidance can not be realized only by a mobile obstacle avoidance scheme based on single-line radar information and depth reinforcement learning when the moving equipment provided with the single-line radar has higher height.

Due to the fact that the use cost of the single-line radar and the monocular camera is relatively low, the single-line radar and the monocular camera can save large cost. Therefore, the invention designs a robot structure of an obstacle avoidance method combining a single-line radar and a monocular camera. In the embodiment of the invention, the single-line radar and the monocular camera are carried on the obstacle avoidance robot, so that the single-line radar and the monocular camera can simultaneously acquire data, the robot can perform depth calculation according to the data acquired by the single-line radar and the monocular camera to obtain a motion control decision, and a better moving obstacle avoidance effect can be realized.

Fig. 2 is a schematic structural front view of the obstacle avoidance robot according to the present invention, as shown in fig. 2, the obstacle avoidance robot mainly includes a monocular camera, a single line radar and a mobile chassis, the single line radar is installed at a lower half portion of the robot to detect an obstacle near a ground portion, the monocular camera is installed at a top portion of the robot to obtain a better camera view and detect an obstacle at a high position, and a view direction of the camera is set to be perpendicular to a gravity direction.

Fig. 3 is a schematic structural side view of the obstacle avoidance robot provided by the present invention, and as shown in fig. 3, a monocular camera is disposed right in front of the top of the robot to obtain a better camera view.

The depth prediction graph described by the invention refers to an RGB image with prediction depth information obtained based on a trained image depth prediction network according to first point cloud information and RGB image information.

Because the radar data obtains relatively accurate depth information, the radar data and the camera RGB image are simultaneously used as input data, namely the first point cloud information and the RGB image information are input into a trained depth prediction network, so that an image depth estimation result which is more accurate than that of single RGB image data can be obtained, and a required depth prediction image can be obtained.

The depth prediction network is used for performing depth estimation on the calibrated radar point cloud information and the camera RGB image information to obtain a depth prediction map corresponding to the RGB image. The depth prediction network may be obtained based on an existing depth neural network, which may be a depth convolutional neural network, or may be another depth neural network used for obtaining a required depth prediction map, and is not specifically limited in the present invention.

In the embodiment of the present invention, the depth prediction network may adopt a depth convolutional neural network, which is composed of two parts, namely, an encoder and a decoder, wherein the encoder is composed of a convolution module of 3 × 3, and the decoder is composed of a transposed convolution module of 4 cores of 3 × 3. The learning rate of the network starts at 0.01, decreases by 20 percent every 5 epochs (epochs), the batch size (batch size) is set to 8, and a total of 15 epochs are trained. And inputting the first point cloud information and the RGB image information into the depth convolution neural network to obtain a depth prediction map corresponding to the RGB image.

In the embodiment of the invention, after the first point cloud information and the RGB image information are obtained, the depth data of the laser radar can correspond to the pixel data in the RGB image through the first point cloud information and the RGB image information, and the first point cloud information and the RGB image information are input into the trained image depth prediction network, so that the depth prediction image corresponding to the RGB image is obtained.

Step S120, determining target point cloud information based on second point cloud information projected to a three-dimensional space by the depth prediction map, and obtaining third point cloud information according to the target point cloud information and the first point cloud information;

specifically, the second point cloud in the present invention is a point cloud in a range from a ground height to a camera height, which is obtained by projecting the depth prediction map into point cloud information of a three-dimensional space. Since the camera is often arranged at the top of the moving equipment, obstacles influencing the passing of the moving equipment can be fully considered through the second point cloud.

The target point cloud information described in the invention refers to point cloud information formed by points which are closest to the camera in each grouped point cloud obtained after the second point cloud is grouped. Because road condition information and the like in the moving process can change constantly and obstacles which move constantly can exist, the motion decision is always required to be updated constantly in the application, and the obstacle closest to the camera is a factor which needs to be considered preferentially, so that the basis of the current decision is generated, and the target point cloud information closest to the camera needs to be considered fully.

In the invention, because the point cloud information scanned by the radar is limited by a fixed plane, the radar cannot acquire the information of the obstacles beyond the height of the scanning fixed plane, and the point cloud data of the obstacles which cannot be detected by the radar can be embodied in the radar data by updating the first point cloud information according to the target point cloud information by virtue of the target point cloud information obtained by the depth prediction map corresponding to the RGB image, thereby effectively overcoming the plane limitation of the radar detection.

And step S130, determining target multivariate state data according to the third point cloud information, and inputting the target multivariate state data into a multi-stage trained deep reinforcement learning control decision model to obtain a target motion control decision.

Specifically, the target multivariate state data described in the present invention refers to multivariate state data including radar third point cloud information, and in an embodiment of the present invention, the target multivariate state data may specifically include the third point cloud information, the current speed of the moving device, the current position of the moving device, and the target position.

Furthermore, the current speed of the sports equipment, the current position of the sports equipment and the target position can be obtained through the existing monitoring and positioning technical means, so that the target multivariate state data can be further determined according to the third point cloud information, and then the target multivariate state data is input into the deep reinforcement learning control decision model after multi-stage training, and the target motion control decision can be obtained.

In the embodiment of the invention, the multi-stage trained depth reinforcement learning control decision model is a model obtained by sequentially carrying out multi-stage training on multi-element state data samples containing radar point cloud sample data, is used for carrying out depth processing on input target multi-element state data and outputting a target motion control decision; the training samples are composed of a plurality of groups of multi-element state data samples, and the multi-element state data samples are real-time data samples in the training process and comprise point cloud sample data acquired by a radar in real time, the real-time speed of the moving equipment, the real-time position of the moving equipment and the target position.

In the embodiment of the invention, in the process of multi-stage training of the deep reinforcement learning control decision model, each training stage corresponds to a cost function, each training stage updates the profit reward value corresponding to each stage according to the corresponding cost function, the model training of each stage is completed through the condition of convergence of the profit reward value, and after the training of the current stage is completed, the next stage is started to train until the training of all stages is completed.

The method of the invention trains the deep reinforcement learning control decision model by using the multi-stage value function, so that the simulation result of each time can be transferred to each time step by the value function in the training process, thereby realizing the global consideration; meanwhile, the total time step length can be transferred to each time step through a value function, and optimization and improvement on algorithm efficiency are achieved.

In the embodiment of the invention, based on the target motion control decision, the motion equipment can make an accurate motion control strategy, so that the front obstacle is effectively avoided.

The method provided by the embodiment of the invention can be particularly applied to the technical fields of robot obstacle avoidance or automatic driving and the like, and can bring a good moving obstacle avoidance effect for robot obstacle avoidance or automatic driving.

According to the method, a depth prediction map corresponding to an RGB image is obtained according to first point cloud information of a target area obtained by a laser radar and RGB image information of the target area obtained by a camera, then the distance from the ground to the point closest to the camera in the height range of the top end of the camera is determined from the depth prediction map, and the data of the laser radar is updated by using the distance, so that the updated radar data contains point cloud data of obstacles which cannot be detected by the radar, and the plane limitation of radar detection is overcome; according to the updated radar data, a multi-stage value function training depth reinforcement learning control decision model is used, the result of each training is transmitted to each time step through the value function, overall consideration is achieved, the total time step length is transmitted to each time step through the value function, optimization and improvement on speed are achieved, target motion control decisions are obtained efficiently, a better moving obstacle avoidance effect is achieved, and the practicability is high.

Optionally, before the determining the target point cloud information, further comprising:

determining the second point cloud information based on the target depth map;

Specifically, the depth information described in the present invention refers to distance information between each point in the scene of the target area and the camera.

The first height described in the invention refers to the height of each pixel point in the depth prediction map corresponding to the depth information in the world coordinate system.

The preset height threshold described in the invention refers to a pixel point filtering range threshold determined based on the height of the ground in the world coordinate system and the height of the camera in the world coordinate system, and is used for filtering all pixel points in the depth prediction image within the range of the upper and lower preset height thresholds of the ground height and simultaneously filtering all pixel points in the depth prediction image above the height of the camera.

It should be noted that the camera in the embodiment of the present invention is disposed on the top of the sports apparatus to obtain a better camera view and detect an obstacle at a high radar level. For example, when the motion device is a robot, the camera is placed on top of the robot; when the sports apparatus is an automobile, the camera can be placed on the roof with a good view.

The target depth map described by the invention refers to a depth map obtained by filtering pixel points of a depth prediction map and only reserving all pixel points in the interval of the ground height and the camera height.

Further, according to the depthAnd mapping the depth information of all pixel points in the image, and calculating the height of the depth information corresponding to the world coordinate system. For a depth prediction map M e R with size (u, v)^u*vFor any pixel (x, y) sampled in the depth prediction map, the height H in the world coordinate system corresponding to the depth information d of the pixel can use the focus parameter f of the camera_yTo calculate, i.e.

Next, points in the depth prediction map at ground level and points above the camera level are all filtered out. Here, the present invention defaults to a sporting equipment that will only collide with objects above the ground. Setting a filtering range threshold epsilon under the condition that the height of the camera under a world coordinate system is h, filtering out pixel points in the range of upper epsilon and lower epsilon of the ground height, and enabling pixel points M (x) in the following condition range to be satisfied₀,y₀) Is equal to 0, i.e

|H+h|<ε；

Further, filtering out points at ground height in the depth prediction map, and filtering out points above the camera height according to the height of the camera in the world coordinate system, thereby obtaining the filtered target depth prediction map.

After the filtered target depth prediction image is obtained, the target depth prediction image is projected into a three-dimensional space, point cloud information in the range from the ground height to the camera height can be obtained, and second point cloud information can be obtained.

According to the method provided by the embodiment of the invention, the height of all pixel point depth information in the depth prediction image under the world coordinate system is calculated, and according to the preset filtering range threshold, the pixel points which are not needed in the calculating process can be effectively filtered, so that the calculating load of the algorithm can be reduced, and the algorithm efficiency is improved.

Optionally, the determining target point cloud information based on the second point cloud information projected into the three-dimensional space by the depth prediction map includes:

Specifically, the fourth point cloud information described in the present invention refers to each group of point cloud information obtained by grouping the second point cloud information based on the coordinates of the second point cloud information in the world coordinate system.

In the embodiment of the present invention, in the plurality of sets of fourth point cloud information, a point cloud composed of points having the smallest distance from the camera in each set of point cloud information is used as the target point cloud.

For a depth prediction sampled in the map M ∈ R^u*vAnd (3) if the coordinate of the pixel point is (i, j), dividing the points with the same coordinate i value into a group, and further grouping the second point cloud information, so that a plurality of groups of point sets with the same coordinate i value can be obtained, and the fourth point cloud information can be obtained according to the point sets with the same coordinate i value.

In the fourth point cloud information, the distance L from the closest point to the camera in each row of points with the same i value in the range from the ground height to the camera height, namely

L(j)＝minM(,:j),j∈1,2,3...,v,where M(i,j)≠0；

Wherein M (i, j) represents a distance between each point in the fourth point cloud information and the camera; i represents the abscissa of the pixel point in a world coordinate system, and i belongs to 1,2,3. j represents the ordinate of the pixel point in the world coordinate system, j belongs to 1,2,3.

Further, target point cloud information can be obtained according to the points satisfying the l (j) condition.

According to the method provided by the embodiment of the invention, in the range from the ground height to the camera top height, the point closest to the camera in each row of point clouds obtained by grouping is determined from the point cloud information projected from the depth prediction image to the three-dimensional space, and the target point cloud information is obtained, so that the point cloud data of the obstacle which cannot be detected by the laser radar can be determined, and the plane limitation of the laser radar in the scanning detection process can be favorably solved.

Optionally, the obtaining third point cloud information according to the target point cloud information and the first point cloud information includes:

Specifically, the fifth point cloud information described in the present invention refers to point cloud information obtained by converting target point cloud information into a radar coordinate system.

In the embodiment of the invention, the radar coordinate system is determined based on the first point cloud information, and according to the coordinate information of the first point cloud information in the world coordinate system, the position information of the laser radar in the world coordinate system is taken as the coordinate origin of the radar coordinate system, so that the radar coordinate system of the laser radar and the fixed plane information scanned by the laser radar can be determined.

Because the field angle of the laser radar is larger than that of the camera, the target point cloud information only corresponds to partial data in the radar data.

Further, according to the existing coordinate system conversion algorithm, the target point cloud information can be converted into a radar coordinate system to obtain fifth point cloud information, the fifth point cloud information can correspond to a fixed plane scanned by the laser radar, and each point in the fifth point cloud information replaces each point in the first point cloud information corresponding to the coordinate position of the first point cloud information, so that the first point cloud information is updated, the updated point cloud information is obtained, and the third point cloud information is obtained.

According to the method provided by the embodiment of the invention, the distance from the ground height to the nearest point in the height range of the top end of the camera from the point cloud information of the depth prediction map is determined, and the distance is further used for updating the data of the laser radar, so that the updated radar data contains the point cloud data of the obstacle which cannot be detected by the radar, and the plane limitation of the laser radar detection can be effectively overcome.

Optionally, before inputting the target multivariate state data into the multi-stage trained deep reinforcement learning control decision model, the method further includes:

Specifically, the multivariate state data sample containing the radar point cloud sample data described in the present invention may be a data sample obtained by obtaining real-time state data of the motion device, and may be a quadruple data sample composed of point cloud information detected by a current lidar, a moving speed of the current motion device, position information of the current motion device, and target position information.

The preset convergence condition described in the invention refers to convergence of the revenue reward value corresponding to the last stage in the multi-stage model training process.

Before inputting the target multivariate state data into the depth reinforcement learning control decision model after the multi-stage training, the depth reinforcement learning control decision model also needs to be trained, which is specifically as follows:

and taking a four-tuple data sample consisting of point cloud sample data acquired by the current radar, the moving speed of the current motion equipment, the position information of the current motion equipment and the target position information as a group of training samples, thereby acquiring a plurality of groups of four-tuple data training samples.

Then, after obtaining a plurality of groups of training samples, sequentially inputting a plurality of groups of quadruple data training samples to the deep reinforcement learning control decision model, namely simultaneously inputting a quadruple data sample consisting of point cloud data detected by a current laser radar, a current moving speed of the moving equipment, current position information of the moving equipment and target position information in each group of training samples to the deep reinforcement learning control decision model, and adjusting parameters of the deep reinforcement learning control decision model by calculating a profit reward value of the deep reinforcement learning control decision model according to each output result of the deep reinforcement learning control decision model to finally complete a training process of the deep reinforcement learning control decision model to obtain the multi-stage trained deep reinforcement learning control decision model.

According to the method provided by the embodiment of the invention, a multi-element state data sample containing radar point cloud sample data is used as a group of training samples, and a plurality of groups of training samples are used for training the deep reinforcement learning control decision model, so that the accuracy of the model obtained by training is improved.

Optionally, the multiple training samples are used for performing multi-stage training on the deep reinforcement learning control decision model, including:

Specifically, the first cost function described in the present invention refers to a cost function used in the first stage training of the deep reinforcement learning control decision model.

The first deep reinforcement learning control decision model described in the invention refers to a deep reinforcement learning control decision model obtained after the first-stage training is completed.

Similarly, the second cost function described in the present invention refers to a cost function used in the second stage training of the deep reinforcement learning control decision model.

The second deep reinforcement learning control decision model described in the invention refers to a deep reinforcement learning control decision model obtained after the second stage training is completed.

The third valence function described in the present invention refers to a merit function used in the third-stage training of the deep reinforcement learning control decision model.

Further, for any group of training samples, inputting the training samples into the deep reinforcement learning control decision model to train in three stages in sequence, updating the Q value of the deep reinforcement learning control decision model by using the value function corresponding to each stage, determining the motion control decision corresponding to the Q value, further obtaining the profit reward value for executing the motion control decision, and finishing the model training in each stage when the profit reward value corresponding to each stage is converged. And after the training of the current stage is finished, the next stage is started to train until the training of the multi-stage deep reinforcement learning control decision model is finished, so that the multi-stage trained deep reinforcement learning control decision model is obtained.

According to the method provided by the embodiment of the invention, a multi-stage model training is carried out on the deep reinforcement learning control decision model by using the value function, and the training result of each stage is continuously optimized and promoted, so that the high-precision deep reinforcement learning control decision model is efficiently obtained.

Optionally, the first cost function is determined according to a preset euclidean distance reduction term, a collision penalty term and a rotation angle penalty term;

Specifically, in the embodiment of the invention, when the first-stage deep reinforcement learning control decision model training is performed, the first value function r is set₁Comprising a reduced Euclidean distance r from the current position of the mobile device to the target position_dPenalty term r for collision of moving equipment_cAnd a penalty term r representing the rotation angle of the sports equipment_wT is used for representing the current time step, i is used for representing the number of the sports equipment, and the deep reinforcement learning network is trained by a plurality of sports equipment carrying cameras and laser radars, namely the formula of a first value function is

When the second stage deep reinforcement learning control decision model training is carried out, a global reward and punishment item r is added^resultWhen the result is that the target position is successfully reached, adding a value of

The bonus item of (1); when the result is a collision, the whole moving track is added with a value of

The penalty term of (2) then sets a first cost function r₂Is composed of

Control decision model for deep reinforcement learning in third stageDuring type training, adding a time penalty item, determining the range of the average time step according to the training of the second stage, and determining penalty constants rho and t by taking the range as the center_maxIf it is the set maximum time step, the cost function r of the third stage₃Is composed of

And finally, inputting the updated cloud information of the third point of the radar, the current moving speed of the sports equipment, the current position information of the sports equipment and the target position information into the deep reinforcement learning network trained in the three stages to obtain a target motion control decision of the mobile carrier.

According to the method provided by the embodiment of the invention, a multi-stage value function is used for training the deep reinforcement learning control decision model, the result of each training is transferred to each time step through the value function, so that the global consideration can be realized, the total time step length is transferred to each time step through the value function, the optimization and the improvement on the algorithm speed are realized, and the multi-stage deep reinforcement learning control decision model training can be efficiently completed.

Fig. 4 is a schematic overall flow chart of the motion control decision generation method provided by the present invention, as shown in fig. 4, including:

and step S1, installing the monocular radar at the lower half part of the robot to detect the obstacle close to the ground part, placing the monocular camera at the top of the robot to obtain a better camera view and detect the obstacle at a high position, and collecting the data of the monocular radar and the RGB image data of the monocular camera, wherein the view direction of the camera is vertical to the gravity direction.

And step S2, calibrating the monocular camera and the single line radar, converting the radar data into a camera coordinate system, and enabling the depth data of the single line radar to correspond to the pixel data in the RGB image. And inputting the calibrated single-line radar data and RGB image data into a trained image depth prediction network to obtain a depth estimation image, namely a depth prediction image, corresponding to the RGB image.

In step S3, the distance from the ground height to the nearest point in the range of the robot top height from the depth prediction map is found according to the height of the robot, and the data of the single-line radar is updated using the distance.

Wherein, for step S3, the method further includes:

step S30, calculating the heights of all pixel points in the depth prediction image under a world coordinate system according to the depths of all the pixel points in the depth prediction image;

step S31, filtering all points at ground height according to the fixed height of the camera, wherein the robot is only collided with the objects above the ground by default;

step S32, finding out the point of each line of data closest to the robot in the range from the ground height to the height of the top end of the robot from the filtered depth prediction image to obtain a depth correction image;

and step S33, converting the points with the shortest distance in the depth correction map into a radar coordinate system, and replacing the points with the points corresponding to the coordinate positions in the single-line radar data to obtain the updated single-line radar data.

And (4) inputting the updated single-line radar data obtained in the step (S3), the current speed and the target position of the robot into the deep reinforcement learning network trained in the three stages to obtain a motion control decision of the obstacle avoidance robot.

According to the motion control decision method provided by the embodiment of the invention, the obstacle avoidance problem of the obstacle avoidance robot is solved by simultaneously using the single line radar and the monocular camera, so that the generalization problem caused by the difference between a simulation scene and a real scene in an independent camera scheme is solved, and the limitation of the flatness of obstacle detection in the independent radar scheme is overcome. Meanwhile, due to the low-cost single-line radar and the monocular camera, the obstacle avoidance problem of the obstacle avoidance robot can be better solved under the condition of extremely low cost.

In the context of the present invention, the radar data and camera data need to be calibrated according to the relative positions of the radar and camera and the camera's internal parameters. After calibration is completed, because the radar data obtains relatively accurate depth, the radar depth and RGB are simultaneously used as input data, an image depth prediction result which is more accurate than that of single RGB image data can be obtained, and the more accurate image depth prediction result is very important for subsequent obstacle avoidance.

In the embodiment of the invention, the distance from the ground to the nearest point in the height range of the top end of the robot in the predicted depth picture is found according to the height of the robot, the distance is used for updating the data of the single-line radar, and the high-altitude obstacle data collected by the camera can be reflected in the updated radar data, so that the plane limitation of the single-line radar can be well overcome.

The invention designs a multi-stage cost function on the basis of using the existing deep reinforcement learning model. In the training process, the result of each simulation is transferred to each time step through a cost function to realize global consideration; and transferring the total time step length to each time step through a cost function to realize the optimization on the speed.

Fig. 5 is a schematic structural diagram of a motion control decision making apparatus provided by the present invention, as shown in fig. 5, including:

the prediction module 510 is configured to determine a depth prediction map corresponding to RGB image information based on first point cloud information of a target area acquired by a laser radar and the RGB image information of the target area acquired by a camera;

an updating module 520, configured to determine target point cloud information based on second point cloud information projected into a three-dimensional space by the depth prediction map, and obtain third point cloud information according to the target point cloud information and the first point cloud information;

and the decision module 530 is configured to determine target multivariate state data according to the third point cloud information, and input the target multivariate state data into a deep reinforcement learning control decision model after multi-stage training to obtain a target motion control decision.

The apparatus described in this embodiment may be used to implement the above method embodiments, and the principle and technical effect are similar, which are not described herein again.

Fig. 6 is a schematic structural diagram of the obstacle avoidance robot provided by the present invention, as shown in fig. 6, the obstacle avoidance robot includes an obstacle avoidance robot body 6, an obstacle avoidance processor, a monocular camera 61 and a single line radar 62 are arranged in the obstacle avoidance robot body 6, the monocular camera 61 is arranged on the top of the obstacle avoidance robot body 6, the single line radar 62 is arranged on the obstacle avoidance robot body 6, and a scanning field of the single line radar 62 at least partially coincides with a shooting range of the monocular camera 61.

The obstacle avoidance robot of the present invention further includes a memory and a program or instructions stored on the memory and executable on the obstacle avoidance processor. The obstacle avoidance processor may call a logic instruction in the memory to execute the motion control decision generation method according to the present invention, where the method includes: determining a depth prediction map corresponding to RGB image information based on first point cloud information of a target area acquired by a laser radar and the RGB image information of the target area acquired by a camera; determining target point cloud information based on second point cloud information projected into a three-dimensional space by the depth prediction map, and updating the first point cloud information according to the target point cloud information to obtain third point cloud information; determining target multivariate state data according to the third point cloud information, and inputting the target multivariate state data into a multi-stage trained deep reinforcement learning control decision model to obtain a target motion control decision; the depth reinforcement learning control decision model after the multi-stage training is obtained by performing multi-stage training on a multi-element state data sample containing radar point cloud sample data.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a motion control decision generation method provided by the above methods, the method comprising: determining a depth prediction map corresponding to RGB image information based on first point cloud information of a target area acquired by a laser radar and the RGB image information of the target area acquired by a camera; determining target point cloud information based on second point cloud information projected into a three-dimensional space by the depth prediction map, and obtaining third point cloud information according to the target point cloud information and the first point cloud information; and determining target multivariate state data according to the third point cloud information, and inputting the target multivariate state data into a multi-stage trained deep reinforcement learning control decision model to obtain a target motion control decision.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform the motion control decision generation methods provided above, the method comprising: determining a depth prediction map corresponding to RGB image information based on first point cloud information of a target area acquired by a laser radar and the RGB image information of the target area acquired by a camera; determining target point cloud information based on second point cloud information projected into a three-dimensional space by the depth prediction map, and obtaining third point cloud information according to the target point cloud information and the first point cloud information; and determining target multivariate state data according to the third point cloud information, and inputting the target multivariate state data into a multi-stage trained deep reinforcement learning control decision model to obtain a target motion control decision.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A motion control decision generation method, comprising:

2. The motion control decision making method according to claim 1, further comprising, prior to said determining target point cloud information:

determining the second point cloud information based on the target depth map;

3. The motion control decision making method according to claim 1, wherein the determining target point cloud information based on the second point cloud information of the projection of the depth prediction map into the three-dimensional space comprises:

4. The motion control decision making method according to claim 1, wherein the obtaining third point cloud information from the target point cloud information and the first point cloud information comprises:

wherein the radar coordinate system is determined based on coordinate information of the first point cloud information in a world coordinate system.

5. The motion control decision making method according to claim 1, wherein before inputting the target multivariate state data into the multi-stage trained deep reinforcement learning control decision model, further comprising:

6. The motion control decision making method according to claim 5, wherein the performing a multi-stage training on the deep reinforcement learning control decision model using the plurality of sets of training samples comprises:

7. The motion control decision making method according to claim 6, wherein the first cost function is determined according to a preset Euclidean distance reduction term, a collision penalty term, and a rotation angle penalty term;

8. A motion control decision making apparatus, comprising:

9. An obstacle avoidance robot is characterized by comprising an obstacle avoidance robot body, wherein an obstacle avoidance processor, a monocular camera and a single line radar are arranged in the obstacle avoidance robot body, the monocular camera is arranged at the top of the obstacle avoidance robot body, the single line radar is arranged on the obstacle avoidance robot body, and the scanning view field of the single line radar is at least partially overlapped with the shooting range of the monocular camera; further comprising a memory and a program or instructions stored on the memory and executable on the obstacle avoidance processor, the program or instructions when executed by the obstacle avoidance processor performing the steps of the motion control decision making method according to any one of claims 1 to 7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the motion control decision generation method according to any of claims 1 to 7.