CN117173520A

CN117173520A - Method and device for determining three-dimensional fusion data

Info

Publication number: CN117173520A
Application number: CN202210579966.XA
Authority: CN
Inventors: 杨立荣; 孔祥浩; 张立鹏
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2023-12-05

Abstract

The application discloses a method and a device for determining three-dimensional fusion data, and belongs to the technical field of computers. The method comprises the following steps: acquiring calibration information, three-dimensional point cloud data acquired by a vehicle and a two-dimensional image; determining target two-dimensional coordinates obtained by converting the three-dimensional coordinates of each three-dimensional space point based on calibration information, so as to determine a depth value corresponding to each target two-dimensional coordinate; based on the depth prediction model, the two-dimensional image and the depth value corresponding to each target two-dimensional coordinate, determining predicted depth values corresponding to other two-dimensional coordinates except the target two-dimensional coordinate in the two-dimensional coordinates of all pixel points of the two-dimensional image; and determining three-dimensional fusion data based on the predicted depth value and the pixel value corresponding to the other two-dimensional coordinates, the three-dimensional coordinates and the pixel value corresponding to the target two-dimensional coordinates and the calibration information. By adopting the method and the device, three-dimensional fusion data are enriched, and further, the result of a downstream task is more accurate.

Description

Method and device for determining three-dimensional fusion data

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and apparatus for determining three-dimensional fusion data.

Background

When map data are collected by a map car or a scene of automatic driving of the car, three-dimensional point cloud data are usually obtained through a laser radar, two-dimensional images are obtained through a camera, and then three-dimensional fusion data fused with information of the two are obtained based on the three-dimensional point cloud data and the two-dimensional images, wherein the three-dimensional point cloud data can reflect the positions and the shapes of surrounding objects, but the three-dimensional point cloud data lack semantic information such as colors, and the two-dimensional images can well make up for the defects.

The existing fusion method between the three-dimensional point cloud data and the two-dimensional image comprises the following steps: the method comprises the steps of firstly determining target two-dimensional coordinates corresponding to each three-dimensional coordinate in three-dimensional point cloud data based on the three-dimensional point cloud data and calibration information, then obtaining pixel values of each target two-dimensional coordinate in a two-dimensional image, and further obtaining the pixel values corresponding to each three-dimensional coordinate in a suspicious manner, namely obtaining the three-dimensional point cloud data fused with the pixel values, namely the three-dimensional fusion data.

However, since the precision of the laser radar is limited by various factors, the precision is generally not high, so that three-dimensional coordinate points are fewer, three-dimensional coordinate points with pixel values are fewer, the precision of three-dimensional fusion data is lower, and the result of a downstream task is possibly inaccurate.

Disclosure of Invention

The embodiment of the application provides a method for determining three-dimensional fusion data, which can solve the problems of lower accuracy of the three-dimensional fusion data and inaccurate task results in the prior art.

In a first aspect, a method for determining three-dimensional fusion data is provided, the method comprising:

the method comprises the steps of obtaining calibration information, three-dimensional point cloud data and a two-dimensional image, wherein the three-dimensional point cloud data comprise three-dimensional coordinates and depth values of a plurality of three-dimensional space points, the two-dimensional image comprises two-dimensional coordinates and pixel values of a plurality of pixel points, and the calibration information is a coordinate conversion relation between a three-dimensional coordinate system corresponding to the three-dimensional point cloud data and a two-dimensional coordinate system corresponding to the two-dimensional image;

determining target two-dimensional coordinates obtained by converting the three-dimensional coordinates of each three-dimensional space point based on the calibration information, so as to determine a depth value corresponding to each target two-dimensional coordinate;

based on a depth prediction model, the two-dimensional image and depth values corresponding to each target two-dimensional coordinate, determining predicted depth values corresponding to other two-dimensional coordinates except the target two-dimensional coordinate in the two-dimensional coordinates of all pixel points of the two-dimensional image;

And determining three-dimensional fusion data based on the predicted depth value and the pixel value corresponding to the other two-dimensional coordinates, the three-dimensional coordinates and the pixel value corresponding to the target two-dimensional coordinates and the calibration information.

In one possible implementation manner, the determining three-dimensional fusion data based on the predicted depth value and the pixel value corresponding to the other two-dimensional coordinates, the three-dimensional coordinate and the pixel value corresponding to the target two-dimensional coordinate, and the calibration information includes:

determining predicted three-dimensional coordinates corresponding to the other two-dimensional coordinates based on the predicted depth values corresponding to the other two-dimensional coordinates and the calibration information;

the three-dimensional fusion data is determined based on the pixel value and the three-dimensional coordinates corresponding to each target two-dimensional coordinate, and the pixel value and the predicted three-dimensional coordinates corresponding to each other two-dimensional coordinate.

In one possible implementation manner, the determining the three-dimensional fusion data based on the pixel value and the three-dimensional coordinate corresponding to each target two-dimensional coordinate and the pixel value and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate includes:

determining a pixel value corresponding to each three-dimensional coordinate based on the pixel value corresponding to each target two-dimensional coordinate and the three-dimensional coordinate corresponding to each target two-dimensional coordinate;

Determining a pixel value corresponding to each predicted three-dimensional coordinate based on the pixel value corresponding to each other two-dimensional coordinate and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate;

and determining the pixel value corresponding to each three-dimensional coordinate and the pixel value corresponding to each predicted three-dimensional coordinate as three-dimensional fusion data.

voxel processing is carried out on the three-dimensional point cloud data to obtain initial voxel data;

voxel processing is carried out on the pixel value corresponding to each three-dimensional coordinate and the pixel value corresponding to each predicted three-dimensional coordinate, so that predicted voxel data are obtained;

The three-dimensional fusion data is determined based on the initial voxelized data and the predicted voxelized data.

In one possible implementation, the method further includes:

determining an object type corresponding to the two-dimensional coordinates of each pixel point included in the two-dimensional image based on an object type prediction model, a depth value corresponding to the target two-dimensional coordinates and predicted depth values corresponding to other two-dimensional coordinates;

the determining three-dimensional fusion data based on the pixel value and the three-dimensional coordinate corresponding to each target two-dimensional coordinate and the pixel value and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate includes:

and determining three-dimensional fusion data based on the pixel value, the object type and the three-dimensional coordinate corresponding to each target two-dimensional coordinate, and the pixel value, the object type and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate.

In one possible implementation, the object type prediction model includes a feature extraction module and a classification module;

the determining the object type corresponding to the two-dimensional coordinates of each pixel point included in the two-dimensional image based on the object type prediction model, the depth value corresponding to the target two-dimensional coordinates, and the predicted depth value corresponding to the other two-dimensional coordinates includes:

Inputting the depth value corresponding to each target two-dimensional coordinate and the predicted depth value corresponding to other two-dimensional coordinates into the feature extraction module to obtain intermediate feature information;

inputting the intermediate characteristic information into the classification module to obtain an object type corresponding to the two-dimensional coordinates of each pixel point included in the two-dimensional image;

the determining three-dimensional fusion data based on the pixel value, the object type and the three-dimensional coordinate corresponding to each target two-dimensional coordinate, and the pixel value, the object type and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate includes:

and determining three-dimensional fusion data based on the pixel value, the object type and the three-dimensional coordinate corresponding to each target two-dimensional coordinate, the pixel value, the object type and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate, and the intermediate characteristic information.

based on a BA (Bundle Adjustment) algorithm, vehicle coordinate change data corresponding to every two adjacent frames in a plurality of continuous frames and depth values of target two-dimensional coordinates and predicted depth values of other two-dimensional coordinates corresponding to each frame in the plurality of continuous frames, carrying out joint optimization on the calibration information, pixel values and depth values of the target two-dimensional coordinates of the current frame and pixel values and predicted depth values corresponding to other two-dimensional coordinates to obtain optimized calibration information, optimized target two-dimensional coordinates, optimized depth values, optimized other two-dimensional coordinates, optimized pixel values and optimized predicted depth values, wherein the continuous frames comprise the current frame;

Determining predicted three-dimensional coordinates corresponding to the optimized other two-dimensional coordinates based on the optimized other two-dimensional coordinates, the optimized predicted depth value and the optimized calibration information;

and determining the three-dimensional fusion data based on the optimized pixel value and the three-dimensional coordinate corresponding to each optimized target two-dimensional coordinate and the optimized pixel value and the predicted three-dimensional coordinate corresponding to each optimized other two-dimensional coordinate.

In a second aspect, there is provided a device for determining three-dimensional fusion data, the device comprising:

the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring calibration information, three-dimensional point cloud data and a two-dimensional image acquired by a vehicle, the three-dimensional point cloud data comprise three-dimensional coordinates and depth values of a plurality of three-dimensional space points, the two-dimensional image comprises two-dimensional coordinates and pixel values of a plurality of pixel points, and the calibration information is a coordinate conversion relation between a three-dimensional coordinate system corresponding to the three-dimensional point cloud data and a two-dimensional coordinate system corresponding to the two-dimensional image;

the first determining module is used for determining target two-dimensional coordinates obtained by converting the three-dimensional coordinates of each three-dimensional space point based on the calibration information so as to determine a depth value corresponding to each target two-dimensional coordinate;

The second determining module is used for determining predicted depth values corresponding to other two-dimensional coordinates except the target two-dimensional coordinate in the two-dimensional coordinates of all pixel points of the two-dimensional image based on the depth prediction model, the two-dimensional image and the depth values corresponding to each target two-dimensional coordinate;

and the third determining module is used for determining three-dimensional fusion data based on the predicted depth value and the pixel value corresponding to the other two-dimensional coordinates, the three-dimensional coordinate and the pixel value corresponding to the target two-dimensional coordinate and the calibration information.

In one possible implementation manner, the third determining module is configured to:

In a possible implementation manner, the apparatus further includes a fourth determining module, configured to:

the third determining module is configured to:

the fourth determining module is configured to:

the third determining module is configured to:

based on a BA algorithm, vehicle coordinate change data corresponding to every two adjacent frames in a plurality of continuous frames and a depth value of a target two-dimensional coordinate corresponding to each frame in the plurality of continuous frames and a predicted depth value of other two-dimensional coordinates, carrying out joint optimization on the calibration information, a pixel value and a depth value of the target two-dimensional coordinate of the current frame and a pixel value and a predicted depth value corresponding to other two-dimensional coordinates to obtain optimized calibration information, an optimized target two-dimensional coordinate, an optimized depth value, other optimized two-dimensional coordinates, an optimized pixel value and an optimized predicted depth value, wherein the continuous frames comprise the current frame;

In a third aspect, a computer device is provided that includes a processor and a memory having at least one instruction stored therein that is loaded and executed by the processor to perform operations performed by a method of determining three-dimensional fused data.

In a fourth aspect, a computer-readable storage medium having stored therein at least one instruction for loading and execution by a processor to perform operations performed by a method for determining three-dimensional fused data is provided.

In a fifth aspect, a computer program product is provided that includes at least one instruction therein that is loaded and executed by a processor to perform operations performed by a method of determining three-dimensional fused data.

The technical scheme provided by the embodiment of the application has the beneficial effects that: according to the scheme, the target two-dimensional coordinates corresponding to the three-dimensional coordinates in the three-dimensional point cloud data and the depth values corresponding to the target two-dimensional coordinates can be determined based on the calibration information, and then the pixel values corresponding to the target two-dimensional coordinates can be determined based on the two-dimensional image.

For other two-dimensional coordinates except the target two-dimensional coordinates in the two-dimensional coordinates of all the pixel points of the two-dimensional image, the predicted depth values corresponding to the other two-dimensional coordinates can be determined based on the depth prediction model, the two-dimensional image and the depth values corresponding to each target two-dimensional coordinate, then the predicted three-dimensional coordinates corresponding to the other two-dimensional coordinates can be determined based on the predicted depth values and the calibration information corresponding to the other two-dimensional coordinates, and the pixel values corresponding to the other two-dimensional coordinates can be obtained based on the two-dimensional image.

And then, determining three-dimensional fusion data based on the pixel value and the three-dimensional coordinate corresponding to the two-dimensional coordinate of the target, the pixel value and the predicted three-dimensional coordinate corresponding to other two-dimensional coordinates. The three-dimensional fusion data comprises pixel values corresponding to the predicted three-dimensional coordinates besides the pixel values corresponding to the three-dimensional coordinates included in the three-dimensional point cloud data, enriches the three-dimensional fusion data, and further enables the downstream task result to be more accurate.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for determining three-dimensional fusion data according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for determining three-dimensional fusion data according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for determining three-dimensional fusion data according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for determining three-dimensional fusion data according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for determining three-dimensional fusion data according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a device for determining three-dimensional fusion data according to an embodiment of the present application;

fig. 7 is a block diagram of a terminal according to an embodiment of the present application;

fig. 8 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The embodiment of the application provides a method for determining three-dimensional fusion data, which can be realized by computer equipment. The computer device may be a terminal, a server, etc., and the terminal may be a desktop computer, a notebook computer, a tablet computer, a mobile phone, etc.

The computer device may include a processor, memory, communication components, and the like.

The processor may be a CPU (Central Processing Unit ). The processor may be configured to read the instructions and process the data, for example, determine target two-dimensional coordinates transformed from each three-dimensional coordinate, determine a depth value corresponding to each target two-dimensional coordinate, determine predicted depth values corresponding to other two-dimensional coordinates based on the depth prediction model, the two-dimensional image, and the depth value corresponding to each target two-dimensional coordinate, determine three-dimensional fusion data, and so forth.

The Memory may include ROM (Read-Only Memory), RAM (Random Access Memory ), CD-ROM (Compact Disc Read-Only Memory), magnetic disk, optical data storage device, and the like. The memory may be used for data storage, for example, storage of acquired calibration information, three-dimensional point cloud data and data of a two-dimensional image, storage of data of target two-dimensional coordinates obtained by converting the determined three-dimensional coordinates, storage of data of depth values corresponding to each target two-dimensional coordinate, storage of data of predicted depth values corresponding to the determined other two-dimensional coordinates, storage of data corresponding to a depth prediction model, storage of data of predicted three-dimensional coordinates corresponding to the determined other two-dimensional coordinates, storage of data of determined three-dimensional fusion data, and the like.

The communication means may be a wired network connector, a wireless fidelity module, a bluetooth module, a cellular network communication module, etc. The communication means may be used to receive and transmit signals, e.g. to receive three-dimensional point cloud data and two-dimensional images transmitted by other devices, etc.

The meaning of each noun in the embodiment of the present application is correspondingly described below:

three-dimensional point cloud data

The three-dimensional point cloud data may be acquired by a lidar mounted on the vehicle, which may be mounted on top of the vehicle for better viewing angles.

The three-dimensional point cloud data includes three-dimensional coordinates and depth values of a plurality of three-dimensional space points, and the depth value of each three-dimensional space point can be calculated based on the three-dimensional coordinates of the three-dimensional space point and is used for representing the distance between the three-dimensional space point and the laser radar.

Two-dimensional image

The two-dimensional image may be acquired by a camera mounted on the vehicle, and in order to meet the driving requirement of the vehicle, the camera may be generally mounted on the periphery of the vehicle, for example, may be mounted on the front end of the vehicle to acquire a two-dimensional image in front of the vehicle, may be mounted on the rear end of the vehicle to acquire a two-dimensional image in rear of the vehicle, and so on.

The two-dimensional image comprises a plurality of pixel points and pixel values corresponding to the pixel points, wherein each pixel point corresponds to one two-dimensional coordinate.

Calibration information

The calibration information is set based on the relative position between the laser radar and the camera, and is the coordinate conversion relation between a three-dimensional coordinate system corresponding to the three-dimensional point cloud data and a two-dimensional coordinate system corresponding to the two-dimensional image.

Fig. 1 is a flowchart of a method for determining three-dimensional fusion data according to an embodiment of the present application. Referring to fig. 1, this embodiment includes:

101. and acquiring calibration information, three-dimensional point cloud data acquired by the vehicle and a two-dimensional image.

In an implementation, when a set of corresponding three-dimensional point cloud data and two-dimensional images is desired to be acquired, the vehicle may be stopped at the time of acquisition, and then the three-dimensional point cloud data and two-dimensional images around the current vehicle may be acquired, or the three-dimensional point cloud data and the two-dimensional images may be acquired at the same time point during the running of the vehicle, so that the acquired three-dimensional point cloud data and two-dimensional images are the same environment with respect to the vehicle.

102. And determining target two-dimensional coordinates obtained by converting the three-dimensional coordinates of each three-dimensional space point based on the calibration information, so as to determine a depth value corresponding to each target two-dimensional coordinate.

In implementation, each three-dimensional coordinate in the three-dimensional point cloud data can be converted based on calibration information to obtain a target two-dimensional coordinate corresponding to each three-dimensional coordinate, wherein a coordinate system corresponding to the target two-dimensional coordinate is the same as a coordinate system corresponding to the two-dimensional image, namely, the coordinate of a projection point of the three-dimensional coordinate on the two-dimensional image is the target two-dimensional coordinate.

Then, the corresponding relation between the three-dimensional coordinates and the two-dimensional coordinates of the targets can be obtained, and the depth value corresponding to each two-dimensional coordinate of the targets can be determined based on the corresponding relation between the three-dimensional coordinates and the depth values.

103. And determining predicted depth values corresponding to other two-dimensional coordinates except the target two-dimensional coordinates in the two-dimensional coordinates of all pixel points of the two-dimensional image based on the depth prediction model, the two-dimensional image and the depth values corresponding to each target two-dimensional coordinate.

In the two-dimensional image, each target two-dimensional coordinate may correspond to one pixel point in the two-dimensional image, and among all the pixel points in the two-dimensional image, other pixel points except those corresponding to the target two-dimensional coordinate also correspond to one two-dimensional coordinate, and the two-dimensional coordinates corresponding to the other pixel points are other two-dimensional coordinates.

In implementation, the two-dimensional image and the depth value corresponding to each target two-dimensional image may be input to the trained depth prediction model to obtain depth values of other two-dimensional coordinates predicted by the depth prediction model, i.e., predicted depth values.

In one possible implementation, the depth prediction model may be a monocular depth complement model, and of course, may be other model structures, which are not limited by the embodiment of the present application.

104. And determining three-dimensional fusion data based on the predicted depth value and the pixel value corresponding to the other two-dimensional coordinates, the three-dimensional coordinates and the pixel value corresponding to the target two-dimensional coordinates and the calibration information.

As shown in fig. 2, in one possible implementation, the predicted three-dimensional coordinates corresponding to the other two-dimensional coordinates may be determined first based on the predicted depth values and calibration information corresponding to the other two-dimensional coordinates.

In implementation, since the predicted depth values corresponding to the other two-dimensional coordinates are predicted in the above step 103, three-dimensional conversion may be performed on the other two-dimensional coordinates, that is, the predicted three-dimensional coordinates corresponding to the other two-dimensional coordinates may be determined based on the calibration information, the other two-dimensional coordinates, and the predicted depth values corresponding to the other two-dimensional coordinates.

The three-dimensional coordinate system corresponding to the predicted three-dimensional coordinate is the same as the three-dimensional coordinate system corresponding to the three-dimensional point cloud data, and is equivalent to adding the predicted three-dimensional coordinate on the basis of the three-dimensional coordinate included in the three-dimensional point cloud data.

Then, three-dimensional fusion data is determined based on the pixel value and the three-dimensional coordinates corresponding to each target two-dimensional coordinate, and the pixel value and the predicted three-dimensional coordinates corresponding to each other two-dimensional coordinate.

According to the three-dimensional fusion data, the information of the pixel value corresponding to each three-dimensional coordinate in the three-dimensional point cloud data is added on the basis of the three-dimensional point cloud data, and the predicted three-dimensional coordinate and the pixel value corresponding to each predicted three-dimensional coordinate in the environment corresponding to the three-dimensional point cloud data are also added, so that the three-dimensional fusion information obtained after fusion is more accurate, and the accuracy of a downstream task result can be improved when a downstream task is carried out.

For example, when it is necessary to identify objects around the vehicle, since the three-dimensional fusion data is more accurate, the data content thereof is more abundant, and the identification result of each object based on the three-dimensional fusion data and the identification model can be more accurate.

In one possible implementation, a more detailed method of determining three-dimensional fusion data may be as follows:

And determining the pixel value corresponding to each three-dimensional coordinate based on the pixel value corresponding to each target two-dimensional coordinate and the three-dimensional coordinate corresponding to each target two-dimensional coordinate. And determining the pixel value corresponding to each predicted three-dimensional coordinate based on the pixel value corresponding to each other two-dimensional coordinate and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate. And determining the pixel value corresponding to each three-dimensional coordinate and the pixel value corresponding to each predicted three-dimensional coordinate as three-dimensional fusion data, namely performing splicing processing on the three-dimensional coordinate, the pixel value corresponding to the three-dimensional coordinate, the predicted three-dimensional coordinate and the pixel value corresponding to the predicted three-dimensional coordinate, so as to obtain the three-dimensional fusion data.

In one possible implementation manner, for easier management and fusion operation, the three-dimensional data may be further subjected to voxel processing, and then fused after voxel processing, as shown in fig. 3, a corresponding method may be as follows:

301. and determining the pixel value corresponding to each three-dimensional coordinate based on the pixel value corresponding to each target two-dimensional coordinate and the three-dimensional coordinate corresponding to each target two-dimensional coordinate.

In implementation, after determining the target two-dimensional coordinates corresponding to the three-dimensional coordinates of the three-dimensional point cloud data, the pixel value corresponding to each three-dimensional coordinate may be determined based on the pixel value of the pixel point in the two-dimensional image corresponding to each target two-dimensional coordinate and the correspondence between each target two-dimensional coordinate and the three-dimensional coordinate.

302. And determining the pixel value corresponding to each predicted three-dimensional coordinate based on the pixel value corresponding to each other two-dimensional coordinate and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate.

In an implementation, after determining the predicted three-dimensional coordinates corresponding to each other two-dimensional coordinate, the pixel value corresponding to each predicted three-dimensional coordinate may be determined based on the pixel value of the pixel point in the two-dimensional image corresponding to each other two-dimensional coordinate and the correspondence between each other two-dimensional coordinate and the predicted three-dimensional coordinate.

303. And voxelized processing is carried out on the three-dimensional point cloud data to obtain initial voxelized data.

In implementation, the three-dimensional point cloud data is subjected to voxelization, so that the voxelized three-dimensional point cloud data, namely initial voxelization data, is obtained.

304. And voxelization processing is carried out on the pixel value corresponding to each three-dimensional coordinate, the predicted three-dimensional coordinate and the pixel value corresponding to each predicted three-dimensional coordinate, so as to obtain predicted voxelization data.

In practice, the data is subjected to voxelization, that is, three-dimensional point cloud data added with predicted three-dimensional coordinates and pixel values is subjected to voxelization, so that new three-dimensional point cloud data after the voxelization, that is, predicted voxelization data, is obtained.

305. Three-dimensional fusion data is determined based on the initial voxelized data and the predicted voxelized data.

In implementation, the initial voxel data and the predicted voxel data may be directly spliced to obtain three-dimensional fusion data, and of course, other fusion methods may also be used, which is not limited by the embodiment of the present application.

In a possible implementation manner, in order to increase semantic information in the three-dimensional point cloud data, prediction may be further performed on the three-dimensional coordinates and the object types corresponding to the predicted three-dimensional coordinates, and when fusion is performed, the information of the object types is also added to the three-dimensional fusion data, so as to obtain richer and more accurate three-dimensional fusion data, as shown in fig. 4, a corresponding method may be as follows:

and determining the object type corresponding to the two-dimensional coordinates of each pixel point included in the two-dimensional image based on the object type prediction model, the depth value corresponding to the target two-dimensional coordinates and the predicted depth value corresponding to other two-dimensional coordinates.

In implementation, the target two-dimensional coordinates, the depth value corresponding to each target two-dimensional coordinate, other two-dimensional coordinates and the predicted depth value corresponding to each other two-dimensional coordinate may be input into the object type prediction model after training is completed, so as to obtain the object type corresponding to the two-dimensional coordinates of each pixel included in the predicted two-dimensional image, that is, the object type corresponding to the target two-dimensional coordinates and the object type corresponding to the other two-dimensional coordinates are included.

In one possible implementation, the object type prediction model may be a semantic segmentation model, and of course, may be other model structures, which are not limited by the embodiment of the present application.

The corresponding fusion method can be: and determining three-dimensional fusion data based on the pixel value, the object type and the three-dimensional coordinate corresponding to each target two-dimensional coordinate, and the pixel value, the object type and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate.

In implementation, the pixel value, the object type and the three-dimensional coordinate corresponding to each target two-dimensional coordinate, and the pixel value, the object type and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate can be directly spliced, so that three-dimensional fusion data are obtained.

Alternatively, the voxel processing may be performed on the three-dimensional point cloud data according to the above-mentioned voxel processing method to obtain initial voxel data, and the voxel processing may be performed on the three-dimensional point cloud data, the pixel value and the object type corresponding to each three-dimensional coordinate, the predicted three-dimensional coordinate, and the pixel value and the object type corresponding to each predicted three-dimensional coordinate to obtain predicted voxel data, and then the initial voxel data and the predicted voxel data may be spliced to obtain three-dimensional fusion data.

In one possible implementation manner, to improve the accuracy of the three-dimensional fusion data, the intermediate feature information output by the object type prediction model may be fused into the three-dimensional fusion data, so as to provide more dimensional information for the three-dimensional fusion data, as shown in fig. 5, a corresponding method may be as follows:

the object type prediction model may include a feature extraction module and a classification module, when predicting an object type, a depth value corresponding to each target two-dimensional coordinate and a predicted depth value corresponding to other two-dimensional coordinates may be input to the feature extraction module, so as to obtain intermediate feature information, and then the intermediate feature information is input to the classification module, so as to obtain an object type corresponding to the two-dimensional coordinates of each pixel point included in the two-dimensional image.

The corresponding fusion method may be: and determining three-dimensional fusion data based on the pixel value, the object type and the three-dimensional coordinates corresponding to each target two-dimensional coordinate, the pixel value, the object type and the predicted three-dimensional coordinates corresponding to each other two-dimensional coordinate, and the intermediate characteristic information.

In implementation, the data may be directly spliced to obtain a three-dimensional fusion model, or may be first voxelized and then spliced, and a specific method and the content type are not described herein.

In one possible implementation, since jolting is not avoided during the running of the vehicle, the jolting may cause a small change in the relative position between the lidar and the camera, and in this case, the calibration information will no longer be accurate, so that to solve this problem, the following processing may be performed:

and based on the BA algorithm, the vehicle coordinate change data corresponding to every two adjacent frames in the continuous frames and the depth values of the target two-dimensional coordinates and the predicted depth values of other two-dimensional coordinates corresponding to each frame in the continuous frames, carrying out joint optimization on the calibration information, the pixel values and the depth values of the target two-dimensional coordinates of the current frame and the pixel values and the predicted depth values corresponding to other two-dimensional coordinates to obtain optimized calibration information, optimized target two-dimensional coordinates, optimized depth values, optimized other two-dimensional coordinates, optimized pixel values and optimized predicted depth values, wherein the continuous frames comprise the current frame.

In an implementation, three-dimensional point cloud data and two-dimensional images corresponding to a plurality of continuous frames may be acquired first, where the plurality of continuous frames include a current frame, for example, at least one frame located before and adjacent to the current frame, or at least one frame located before and adjacent to the current frame, and at least one frame located after and adjacent to the current frame, and the setting of the continuous frames may be set according to a specific situation, which is not limited by the embodiment of the present application.

After three-dimensional point cloud data and two-dimensional images corresponding to each continuous frame in a plurality of continuous frames are obtained, a target two-dimensional coordinate corresponding to each frame, a pixel value and a depth value corresponding to each target two-dimensional coordinate, other two-dimensional coordinates, a pixel value corresponding to each other two-dimensional coordinate and a predicted depth value are respectively determined, the target two-dimensional coordinate, the depth value, the pixel value corresponding to each other two-dimensional coordinate and the predicted depth value are then input into a BA algorithm to be jointly optimized, so that optimized calibration information, optimized target two-dimensional coordinates, optimized depth values, optimized other two-dimensional coordinates, optimized pixel values and optimized predicted depth values are obtained, and it is understood that the optimized data are all optimized data corresponding to the current frame.

After the optimized data is obtained, the processing of step 104 may be performed based on the optimized data.

Correspondingly, the processing of step 104 may be: and determining the predicted three-dimensional coordinates corresponding to the other optimized two-dimensional coordinates based on the other optimized two-dimensional coordinates, the optimized predicted depth value and the optimized calibration information. And determining three-dimensional fusion data based on the optimized pixel value and the three-dimensional coordinate corresponding to each optimized target two-dimensional coordinate and the optimized pixel value and the predicted three-dimensional coordinate corresponding to each optimized other two-dimensional coordinate.

Through the joint optimization, the vehicle bumping situation can be determined according to the vehicle coordinate change data corresponding to every two adjacent frames, so that the calibration information is optimized based on the vehicle bumping situation and various data, the optimized calibration information is more suitable for the vehicle driving situation, and the accuracy of the calibration information is improved.

Meanwhile, the two-dimensional coordinates of the targets, the pixel values and the depth values corresponding to the two-dimensional coordinates of each target, other two-dimensional coordinates, the pixel values corresponding to the other two-dimensional coordinates and the predicted depth values are subjected to joint optimization, so that the two-dimensional coordinates of the targets are more accurate.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.

According to the scheme, the target two-dimensional coordinates corresponding to the three-dimensional coordinates in the three-dimensional point cloud data and the depth values corresponding to the target two-dimensional coordinates can be determined based on the calibration information, and then the pixel values corresponding to the target two-dimensional coordinates can be determined based on the two-dimensional image.

An embodiment of the present application provides a device for determining three-dimensional fusion data, where the device may be a computer device in the foregoing embodiment, as shown in fig. 6, and the device includes:

the acquiring module 610 is configured to acquire calibration information, three-dimensional point cloud data and a two-dimensional image acquired by a vehicle, where the three-dimensional point cloud data includes three-dimensional coordinates and depth values of a plurality of three-dimensional space points, the two-dimensional image includes two-dimensional coordinates and pixel values of a plurality of pixel points, and the calibration information is a coordinate conversion relationship between a three-dimensional coordinate system corresponding to the three-dimensional point cloud data and a two-dimensional coordinate system corresponding to the two-dimensional image;

a first determining module 620, configured to determine, based on the calibration information, a target two-dimensional coordinate obtained by converting the three-dimensional coordinate of each three-dimensional space point, so as to determine a depth value corresponding to each target two-dimensional coordinate;

A second determining module 630, configured to determine, based on a depth prediction model, the two-dimensional image, and the depth value corresponding to each target two-dimensional coordinate, predicted depth values corresponding to two-dimensional coordinates other than the target two-dimensional coordinate in two-dimensional coordinates of all pixel points of the two-dimensional image;

and a third determining module 640, configured to determine three-dimensional fusion data based on the predicted depth value and the pixel value corresponding to the other two-dimensional coordinates, the three-dimensional coordinate and the pixel value corresponding to the target two-dimensional coordinate, and the calibration information.

In one possible implementation manner, the third determining module 640 is configured to:

the third determining module 640 is configured to:

the fourth determining module is configured to:

the third determining module 640 is configured to:

It should be noted that: the three-dimensional fusion data determining device provided in the above embodiment only illustrates the division of the above functional modules when determining the three-dimensional fusion data, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the device for determining three-dimensional fusion data provided in the foregoing embodiment and the method embodiment for determining three-dimensional fusion data belong to the same concept, and detailed implementation processes of the device are referred to in the method embodiment, and are not repeated herein.

Fig. 7 shows a block diagram of a terminal 700 according to an exemplary embodiment of the present application. The terminal may be a computer device in the above-described embodiments. The terminal 700 may be: a smart phone, a tablet computer, an MP3 player (moving picture experts group audio layer III, motion picture expert compression standard audio plane 3), an MP4 (moving picture experts group audio layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the terminal 700 includes: a processor 701 and a memory 702.

Processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 701 may be implemented in at least one hardware form of DSP (digital signal processing ), FPGA (field-programmable gate array, field programmable gate array), PLA (programmable logic array ). The processor 701 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (central processing unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 701 may integrate a GPU (graphics processing unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 701 may also include an AI (artificial intelligence ) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement the method of determining three-dimensional fusion data provided by an embodiment of the method of the present application.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 703 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, a display 705, a camera 706, audio circuitry 707, a positioning component 708, and a power supply 709.

A peripheral interface 703 may be used to connect I/O (input/output) related at least one peripheral device to the processor 701 and memory 702. In some embodiments, the processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The radio frequency circuit 704 is configured to receive and transmit RF (radio frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 704 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 704 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (wireless fidelity ) networks. In some embodiments, the radio frequency circuitry 704 may also include NFC (near field communication ) related circuitry, which is not limiting of the application.

The display screen 705 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 705 is a touch display, the display 705 also has the ability to collect touch signals at or above the surface of the display 705. The touch signal may be input to the processor 701 as a control signal for processing. At this time, the display 705 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 705 may be one, providing a front panel of the terminal 700; in other embodiments, the display 705 may be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 705 may be made of LCD (liquid crystal display ), OLED (organic light-emitting diode) or other materials.

The camera assembly 706 is used to capture images or video. Optionally, the camera assembly 706 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera, and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and VR (virtual reality) shooting function or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing, or inputting the electric signals to the radio frequency circuit 704 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 707 may also include a headphone jack.

The location component 708 is operative to locate the current geographic location of the terminal 700 for navigation or LBS (location based service, location-based services). The positioning component 708 may be a GPS (global positioning system ), beidou system, grainers system or galileo system based positioning component.

A power supply 709 is used to power the various components in the terminal 700. The power supply 709 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 700 further includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyroscope sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display a user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 711. The acceleration sensor 711 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may collect a 3D motion of the user to the terminal 700 in cooperation with the acceleration sensor 711. The processor 701 may implement the following functions based on the data collected by the gyro sensor 712: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 713 may be disposed at a side frame of the terminal 700 and/or at a lower layer of the display screen 705. When the pressure sensor 713 is disposed at a side frame of the terminal 700, a grip signal of the user to the terminal 700 may be detected, and the processor 701 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at the lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 714 is used to collect a fingerprint of the user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 714 may be provided on the front, back or side of the terminal 700. When a physical key or vendor Logo is provided on the terminal 700, the fingerprint sensor 714 may be integrated with the physical key or vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 705 is turned up; when the ambient light intensity is low, the display brightness of the display screen 705 is turned down. In another embodiment, the processor 701 may also dynamically adjust the shooting parameters of the camera assembly 706 based on the ambient light intensity collected by the optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is typically provided on the front panel of the terminal 700. The proximity sensor 716 is used to collect the distance between the user and the front of the terminal 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front face of the terminal 700 gradually decreases, the processor 701 controls the display 705 to switch from the bright screen state to the off screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually increases, the processor 701 controls the display screen 705 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 7 is not limiting of the terminal 700 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 800 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (Central Processing Units, CPU) 801 and one or more memories 802, where at least one instruction is stored in the memories 802, and the at least one instruction is loaded and executed by the central processing units 801 to implement the methods provided in the foregoing method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, a computer readable storage medium, such as a memory including instructions executable by a processor in a terminal to perform the method of determining three-dimensional fused data in the above embodiment, is also provided. The computer readable storage medium may be non-transitory. For example, the computer readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals (including but not limited to signals transmitted between the user terminal and other devices, etc.) related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant country and region. For example, the "three-dimensional point cloud data and two-dimensional image" referred to in the present application are acquired with sufficient authorization.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims

1. A method for determining three-dimensional fusion data, the method comprising:

2. The method of claim 1, wherein the determining three-dimensional fusion data based on the predicted depth values and pixel values corresponding to the other two-dimensional coordinates, the three-dimensional coordinates and pixel values corresponding to the target two-dimensional coordinates, and the calibration information comprises:

3. The method of claim 2, wherein the determining three-dimensional fusion data based on the pixel value and the three-dimensional coordinate corresponding to each target two-dimensional coordinate and the pixel value and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate comprises:

4. The method of claim 2, wherein the determining three-dimensional fusion data based on the pixel value and the three-dimensional coordinate corresponding to each target two-dimensional coordinate and the pixel value and the predicted three-dimensional coordinate corresponding to each other two-dimensional coordinate comprises:

5. The method according to claim 2, wherein the method further comprises:

6. The method of claim 5, wherein the object type prediction model comprises a feature extraction module and a classification module;

7. The method of claim 1, wherein the determining three-dimensional fusion data based on the predicted depth values and pixel values corresponding to the other two-dimensional coordinates, the three-dimensional coordinates and pixel values corresponding to the target two-dimensional coordinates, and the calibration information comprises:

Based on a beam adjustment method BA algorithm, vehicle coordinate change data corresponding to every two adjacent frames in a plurality of continuous frames and predicted depth values of target two-dimensional coordinates corresponding to each frame in the plurality of continuous frames, carrying out joint optimization on the calibration information, pixel values and depth values of the target two-dimensional coordinates of the current frame and pixel values and predicted depth values corresponding to other two-dimensional coordinates to obtain optimized calibration information, optimized target two-dimensional coordinates, optimized depth values, optimized other two-dimensional coordinates, optimized pixel values and optimized predicted depth values, wherein the continuous frames comprise the current frame;

8. A device for determining three-dimensional fusion data, the device comprising:

9. A computer device comprising a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to perform the operations performed by the method of determining three-dimensional fused data of any of claims 1 to 7.

10. A computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the operations performed by the method of determining three-dimensional fused data of any one of claims 1 to 7.

11. A computer program product comprising at least one instruction for loading and executing by a processor to perform the operations performed by the method of determining three-dimensional fusion data according to any one of claims 1 to 7.