WO2023024443A1

WO2023024443A1 - Data matching method and apparatus, and electronic device, storage medium and program product

Info

Publication number: WO2023024443A1
Application number: PCT/CN2022/075419
Authority: WO
Inventors: 吕伟杰; 杨国润; 王哲
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-08-27
Filing date: 2022-02-07
Publication date: 2023-03-02
Also published as: CN113705669A

Abstract

Provided in the embodiments of the present disclosure are a data matching method and apparatus, and an electronic device, a storage medium and a program product. The method comprises: respectively performing detection on point cloud data and image data, which are to be subjected to matching, so as to obtain a target detection result of the point cloud data and a target detection result of the image data; determining, according to the target detection result of the point cloud data, target feature information corresponding to the point cloud data, and determining, according to the target detection result of the image data, target feature information corresponding to the image data, wherein each piece of target feature information comprises geometric feature information of a bounding box of a detected target object, and appearance feature information of the target object in the bounding box; and according to the target feature information, matching a bounding box in the point cloud data and a bounding box in the image data. By means of the embodiments of the present disclosure, a 3D target is detected by means of combining image data with point cloud data, such that the detection accuracy of the 3D target can be improved, thereby obtaining a more accurate target detection result.

Description

Data matching method and device, electronic device, storage medium and program product

Cross References to Related Applications

This disclosure is based on a Chinese patent application with application number 202110994415.5 and a filing date of August 27, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference in its entirety into this disclosure .

technical field

The present disclosure relates to the technical field of automatic driving, and relates to but not limited to a data matching method and device, electronic equipment, storage media and program products.

Background technique

The goal of 3D (3-Dimensional, three-dimensional) object detection is to identify the 3D bounding box information of the object, which mainly includes information such as position, orientation, size, and confidence. In recent years, single-modal approaches based on LiDAR or camera sensors have achieved increasing progress in the field of 3D object detection. However, due to the characteristics of the data structure, the target detection effect of the image unimodal method is affected by the current environment. Exposure or too dark photos will affect the acquisition of target information, and it is also affected by front and rear occlusions; the point cloud unimodal method needs to solve Point clouds are sparse, irregular, lack texture and semantic information, and too few points for small objects and distant objects.

Based on this, there is an urgent need for a stable, reliable, accurate and robust multimodal matching algorithm.

Contents of the invention

Embodiments of the present disclosure at least provide a data matching method and device, electronic equipment, a storage medium, and a program product.

In the first aspect, an embodiment of the present disclosure provides a data matching method, including: respectively detecting the point cloud data and image data to be matched, and obtaining the target detection result of the point cloud data and the target detection result of the image data; wherein , the target detection result includes the detected bounding box information of the target object; according to the target detection result of the point cloud data, determine the target feature information corresponding to the point cloud data; according to the target detection result of the image data, Determine the target feature information corresponding to the image data; the target feature information includes the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object in the bounding box; according to the target feature information , matching the bounding box determined based on the point cloud data with the bounding box determined based on the image data.

In the above embodiments, by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defects that image data is easily affected by light and occlusion, and image data can be used to make up for the sparse and textureless point cloud data. defect. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.

In an optional implementation manner, the determining the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data includes: projecting the target detection result of the point cloud data to the In the image data, the projection frame of the bounding frame corresponding to the point cloud data is obtained; according to the pixel coordinates of the vertices of the projection frame in the image data, the geometric feature information of the point cloud data is determined.

In the above embodiment, by projecting the 3D bounding box corresponding to the point cloud data into the image data to obtain a 2D (2-Dimensional, two-dimensional) projection frame, the unification of the data format can be realized, so that the data format can be quickly obtained according to the image data. The relative position between the corresponding 2D bounding box and the 2D projection box is used to match the target object, and then the corresponding matching result is obtained.

In an optional implementation manner, the geometric feature information includes position information and/or size information of the bounding box.

In the above embodiments, by expanding the position information and/or size information of the bounding box into the geometric feature information, the geometric feature information can be enriched, further improving the accuracy of data matching.

In an optional implementation manner, the determining the target feature information corresponding to the image data according to the target detection result of the image data includes: determining the target image located in the detected bounding box in the image data data; extracting image features of the target image data, and determining the extracted image features as appearance feature information corresponding to the image data.

In the above-mentioned embodiment, by extracting the image features of the target image data in the bounding box corresponding to the image data, it is possible to realize the matching of the bounding box according to the position information, and to further check the matching position according to the appearance feature information. Whether the objects in the box are the same object. Through the above-mentioned processing method, the matching error caused by the weak synchronization between the image data and the point cloud data can be compensated, thereby improving the accuracy of data matching, and further improving the safety factor of automatic driving.

In an optional implementation manner, the determining the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data includes: determining that the point cloud data is located within the detected bounding box The target point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the bounding box and/or the coordinate information of the target point cloud in the point cloud coordinate system; based on the target point cloud data , determining global point cloud features used to describe the overall features of the target point cloud, and determining appearance feature information corresponding to the point cloud data according to the global point cloud features.

In the above-mentioned embodiment, by extracting the global point cloud features of the target point cloud data in the bounding box corresponding to the point cloud data, it can be realized on the basis of matching the bounding box according to the position information, and further verifying the position according to the appearance feature information Whether the objects in the matching bounding boxes are the same object. Through the above-mentioned processing method, the matching error caused by the weak synchronization between the image data and the point cloud data can be compensated, thereby improving the accuracy of data matching, and further improving the safety factor of automatic driving.

In an optional implementation manner, the matching the bounding box determined based on the point cloud data and the bounding box determined based on the image data according to the target feature information includes: Carrying out correlation calculation between the target feature information and the target feature information of the point cloud data to obtain a correlation calculation result, and obtain a correlation calculation result; determine the bounding box in the point cloud data and the Matching results between bounding boxes in image data.

In the above embodiment, by combining the geometric image information and the appearance feature information, the correlation calculation is performed on the target feature information of the point cloud data and the target feature information of the image data, and the feature information corresponding to the M 2D bounding boxes can be accurately determined. Correlation between each feature information of the 3D bounding boxes and each of the feature information corresponding to the N 3D bounding boxes, so that when the bounding boxes are matched according to the correlation calculation results, the matching accuracy of the bounding boxes is improved.

In an optional implementation manner, performing correlation calculation on the target feature information of the image data and the target feature information of the point cloud data to obtain a correlation calculation result includes: The geometric feature information and the appearance feature information corresponding to the image data are spliced to obtain the target image feature; the geometric feature information corresponding to the point cloud data and the appearance feature information corresponding to the point cloud data are spliced to obtain the target point cloud Features: performing a correlation calculation on the target image features and the target point cloud features to obtain the correlation calculation results.

In the above embodiment, by splicing the geometric feature information and the appearance feature information, the corresponding target image features and target point cloud features are obtained, and then the correlation calculation is performed on the target image features and target point cloud features, so that according to the correlation calculation results The method of determining the matching result of the bounding box can make up for the matching error caused by the weak synchronization between the image data and the point cloud data, thereby improving the accuracy of data matching, and thus improving the safety factor of automatic driving.

In an optional implementation manner, the determining the matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result includes: calculating the correlation The calculation result is convoluted to obtain a similarity matrix, wherein the similarity matrix is used to characterize the degree of similarity between the bounding boxes in the point cloud data and the bounding boxes in the image data; Degree matrix is inversely calculated to obtain a matching cost matrix; a bipartite graph matching process is performed on the matching cost matrix to obtain a matching result between the bounding boxes in the point cloud data and the bounding boxes in the image data.

In the above embodiments, operations such as convolution, inversion, and bipartite graph matching may be performed on the above correlation calculation results to improve the processing efficiency of the matching results and obtain matching results with high accuracy.

In an optional implementation manner, said respectively detecting the point cloud data and image data to be matched, and obtaining the target detection result of the point cloud data and the target detection result of the image data, includes: The state detection model performs object detection on the point cloud data and the image data respectively, and obtains object detection results of the point cloud data and object detection results of the image data.

In the above embodiment, the target detection is performed on the point cloud data through the trained point cloud single-modal detection model, and the target detection is performed on the image data through the image single-modal detection model, which can improve the accuracy of the target detection result, thereby obtaining More accurate bounding box information that includes the full object of interest.

In an optional implementation manner, the single-modal detection model is trained according to the following steps to determine a training sample set containing multiple training samples; wherein, each training sample includes: sample image data and samples carrying sample labels point cloud data; through the single mode detection model to be trained, carry out target detection on the training sample set to obtain the sample target detection result; according to the sample target detection result and the sample label, determine the label matching matrix; according to the The label matching matrix calculates the function value of the target loss function, and adjusts the model parameters of the single-modal detection model according to the target loss function value until the preset condition is reached, and the trained single-modal detection model is obtained.

In the above embodiment, by training the single-modal detection model in the manner described above, a single-modal detection model whose processing accuracy meets the precision requirements can be obtained, and when the target detection is performed according to the single-modal detection model, the target can be improved. The accuracy of the detection results, thereby improving the accuracy of data matching.

In an optional implementation manner, the determining the label matching matrix according to the sample target detection result and the sample label includes: calculating at least one predicted bounding box corresponding to the sample image data in the sample target detection result and the The intersection and union ratio between the labeled bounding boxes corresponding to the sample image data in the sample label to obtain the first intersection and union ratio; and filter the at least one predicted bounding box according to the first intersection and merge ratio to obtain the target predicted bounding box Box; calculate the intersection and union ratio between at least one predicted bounding box corresponding to the sample point cloud data in the sample target detection result and the marked bounding box corresponding to the sample point cloud data in the sample label, to obtain a second intersection ratio; and filtering the at least one predicted bounding box according to the second intersection and union ratio to obtain a target predicted bounding box; matching the target predicted bounding box with the target predicted bounding box to obtain a label matching result, and according to The tag matching result determines the tag matching matrix.

In the above embodiment, through the above processing method, the predicted bounding box can be accurately matched with the predicted bounding box to obtain the label matching matrix; when the function value of the target loss function is determined according to the label matching matrix, an accurate function can be obtained value, thereby improving the training accuracy of the unimodal detection model.

In an optional implementation manner, the determining a training sample set comprising a plurality of training samples includes: acquiring a target tracking data sequence, wherein the target tracking data sequence includes image data and Point cloud data; at least one data combination is determined in the target tracking data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the The second tracking moment of the target point cloud data is different, and the time interval between the first tracking moment and the second tracking moment is a preset interval; the data in each of the data combinations is used as each training data in the sample.

In the above-mentioned embodiments, the technical solution of the present disclosure proposes a method for constructing a point cloud and image weak synchronization multi-modal data set, through which the weak synchronization situation that may occur in the actual scene of automatic driving can be simulated, and the training sample set When training the unimodal detection model, the trained unimodal detection model can be adapted to the weak synchronization scenario of the multimodal data set.

In the second aspect, the embodiment of the present disclosure also provides a data matching device, including: an acquisition module configured to detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the image data Target detection result; wherein, the target detection result includes the detected bounding box information of the target object; the determination module is configured to determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; According to the target detection result of the image data, determine the target feature information corresponding to the image data; the target feature information includes the geometric feature information of the detected bounding box of the target object, and the target object in the bounding box Appearance feature information; matching module: configured to match the bounding box determined based on the point cloud data with the bounding box determined based on the image data according to the target feature information.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.

In a fourth aspect, embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned first aspect, or any of the first aspects of the first aspect, may be executed. Steps in one possible implementation.

In the fifth aspect, an embodiment of the present disclosure further provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the computer program product according to the present disclosure can be realized. Some or all steps of the methods described in the examples. The computer program product may be a software installation package.

In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.

FIG. 1 shows a flowchart of a data matching method provided by an embodiment of the present disclosure;

FIG. 2 shows a flowchart of another data matching method provided by an embodiment of the present disclosure;

FIG. 3 shows a flow chart of determining a tag matching matrix according to sample target detection results and sample tags in a data matching method provided by an embodiment of the present disclosure;

Fig. 4 shows a schematic diagram of a data matching device provided by an embodiment of the present disclosure;

Fig. 5 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.

It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.

After research, it is found that in the existing field of automatic driving, when performing 3D target detection, it is usually a single-modal detection method based on lidar or camera sensor, wherein the above-mentioned single-modal detection method usually includes image single-modal method and Point cloud unimodal approach. However, due to the characteristics of the data structure, the target detection effect of the image single-modal method is affected by the current environment. Exposure or too dark photos will affect the acquisition of target information, and it is also affected by front and rear occlusions; the point cloud single-modal method needs to solve the problem Sparse, irregular clouds, lack of texture and semantic information, and too few points for small objects, distant objects, etc.

Based on the above studies, the present disclosure provides a data matching method and device, electronic equipment, storage media and program products. In the embodiment of the present disclosure, by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion, and image data can be used to make up for the sparseness and lack of point cloud data. Texture flaws. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.

To facilitate the understanding of this embodiment, a data matching method disclosed in the embodiments of the present disclosure is first introduced in detail. The execution subject of the data matching method provided in the embodiments of the present disclosure is generally an electronic device with certain computing capabilities.

Referring to FIG. 1 , which is a flow chart of a data matching method provided by an embodiment of the present disclosure, the method includes steps S101 to S105, wherein:

S101: Detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the bounding box information of the detected target object .

In the embodiment of the present disclosure, image data may be collected by a camera device, and point cloud data may be collected by a laser radar sensor, wherein the camera device and the laser radar sensor are sensors pre-installed on the target vehicle. The target vehicle may be a vehicle with an automatic driving function, for example, a minibus, a car, etc., and the present disclosure does not specifically limit the type of the target vehicle.

For the point cloud data, the target detection result of the point cloud data includes the bounding box information of the detected target object, where the bounding box is a 3D bounding box. For example, if the number of target objects is N, then the bounding box information includes information about the 3D bounding boxes of the N target objects.

For image data, the target detection result of the image data includes bounding box information of the detected target object, where the bounding box is a 2D bounding box. For example, if the number of target objects is M, then the bounding box information includes information about the 2D bounding boxes of the M target objects.

It should be noted that the N target objects and the M target objects may include the same target object, or may include different target objects.

S103: Determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; determine the target feature information corresponding to the image data according to the target detection result of the image data; the target feature information The information includes geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box.

In the embodiment of the present disclosure, the target feature information may be the target feature information corresponding to the above-mentioned point cloud data, and may also be the target feature information corresponding to the image data.

For example, the target feature information corresponding to the point cloud data may include the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object within the bounding box; the target feature information corresponding to the image data may also include Geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box.

In an embodiment of the present disclosure, the appearance feature information may be an attribute feature used to characterize the target object framed by the bounding box, and the attribute feature may be object category information of the target object, wherein the category information is the category of the target object Labels, e.g., vehicles, pedestrians, etc.

S105: According to the target feature information, match the bounding box determined based on the point cloud data with the bounding box determined based on the image data.

In the embodiment of the present disclosure, by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion, and image data can be used to make up for the sparseness and lack of point cloud data. Texture flaws. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.

In the embodiment of the present disclosure, for the technical problem of poor target detection accuracy existing in the existing single-modal detection method, an optional implementation is a multi-modal detection method, where the multi-modal detection The method refers to the detection of objects by combining image data and point cloud data. For example, multimodal detection methods can include point cloud projection methods, image back-projection methods, and similarity matrix methods.

The point cloud projection method only considers the prediction results of point cloud 3D candidate frames, and relies heavily on the effect of point cloud single-modal detectors. The image projection method only considers the prediction results of image 2D candidate boxes, and relies heavily on the effect of image single-modal detectors. The similarity matrix method has not explored much on the problem of multimodal matching in the case of weak synchronization.

The prerequisite for the point cloud projection method, image back-projection method and similarity matrix method described above is the strong synchronization between the lidar sensor and the camera device. Therefore, the above technical solution does not consider the lidar sensor and camera device. There is weak synchronization between them.

Here, the strong synchronization between the camera device and the lidar sensor means that the acquisition time of point cloud data and image data is highly synchronized, and it can also be understood as the point cloud single-mode detector and image single-mode detector detect the same For objects, the point cloud projection 2D frame will coincide with the image 2D frame, or the image backprojection 3D frame will coincide with the point cloud 3D frame. However, in the actual scene of autonomous driving, the camera device and the lidar sensor may have weak synchronization due to response delay or complex road conditions. At this time, weak synchronization will cause synchronization errors between image data and point cloud data. Once a synchronization error occurs, according to the transitivity of the error, the result of one party's projection and the other party will also produce a corresponding synchronization error, which can be understood as the bounding box of the object detected through the point cloud data and the object detected through the image data. The bounding boxes of the object will not coincide. At this time, wrong matching results may be generated. Poor matching results will bring worse detection results for multi-modal fusion than for single-modal detection.

Based on this, the present disclosure provides a data matching method. In this data matching method, by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion. At the same time, the image data can make up for the sparse point cloud data and the lack of texture. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.

In the embodiment of the present disclosure, the steps described in the above step S101 to step S105 will be described in detail, and the detailed description process is as follows.

For step S101, the point cloud data and the image data to be matched are detected respectively, and the target detection result of the point cloud data and the target detection result of the image data are obtained, including the following process:

Target detection is performed on the point cloud data and the image data respectively through the trained single-modal detection model, and the target detection result of the point cloud data and the target detection result of the image data are obtained.

Here, the unimodal detection module includes a point cloud unimodal detection model and an image unimodal detection model. Among them, the point cloud single-modal detection model is used for target detection on point cloud data to obtain corresponding target detection results; the image single-modal detection model is used for target detection on image data to obtain corresponding target detection results.

In the embodiment of the present disclosure, first, the point cloud data is collected by the laser radar sensor, and the image data is collected by the camera device. Afterwards, target detection is performed on the point cloud data through the point cloud single-modal detection model, and target detection is performed on the image data through the image single-modal detection model.

It should be noted that point cloud unimodal detection models include but not limited to SECOND, PointPillars, PointRCNN, PV-RCNN, and image unimodal detection models include but not limited to RRC, MSCNN, Cascade R-CNN.

In step S103, after the target detection result is determined, the target feature information corresponding to the point cloud data and the target feature information corresponding to the image data can be determined. Therefore, for step S103, it can be described as the following process:

Step S1031, according to the object detection result of the point cloud data, determine the geometric feature information and appearance feature information corresponding to the point cloud data.

Step S1032, according to the target detection result of the image data, determine the geometric feature information and the appearance feature information corresponding to the image data.

Here, step S1031 and step S1032 are in no order. Step S1031 and step S1032 can be executed at the same time; step S1031 can also be executed first, and then step S1032; or step S1032 can be executed first, and then step S1031 can be executed. The above step S1031 and step S1032 will be described in detail below.

For S1032, the geometric feature information corresponding to the image data can be determined through the following process, and the detailed process is described as follows:

(1) Determine the bounding box of the target object according to the target detection result of the image data.

(2) Determine the geometric feature information of the image data according to the bounding box of the target object.

In the embodiment of the present disclosure, assuming that the image data contains M target objects, then the target detection result of the image data includes 2D bounding boxes of the M target objects. If the bounding box information of each 2D bounding box is denoted as I _j , then the bounding box information of M 2D bounding boxes can be denoted as

Wherein, the bounding box information of each 2D bounding box can be expressed as: I _j ={x _j1 , y _j1 , x _j2 , y _j2 }.

At this time, the bounding box information of each 2D bounding box may be determined as the geometric feature information corresponding to the image data.

Here, (x _j1 , y _j1 ) and (x _j2 , y _j2 ) respectively represent the coordinate information of the upper left corner and lower right corner of each 2D bounding box in the image data under the external reference coordinate system of the camera device.

For S1031, according to the target detection result of the point cloud data, determine the target feature information corresponding to the point cloud data, the detailed process is described as follows:

(1) Projecting the target detection result of the point cloud data into the image data to obtain a projection frame of a bounding frame corresponding to the point cloud data.

(2) Determine the geometric feature information of the point cloud data according to the pixel coordinates of the vertices of the projection frame in the image data.

After target detection is performed on the point cloud data through the point cloud single-modal detection model, the target detection result is obtained. It is assumed that the target detection result includes: 3D bounding boxes of N target objects, if the bounding box of each 3D bounding box information is denoted as P _i , then the bounding box information of N 3D bounding boxes can be denoted as

Wherein, the bounding box information P _i of each 3D bounding box = {xi _, y _i , z _i , h _i , w _i , l _i , θ _i }.

Here, x _i , y _i , z _i represent the coordinates of the center of the 3D bounding box on the lidar sensor coordinate system, h _i , w _i , l _i represent the three-dimensional dimensions of the point cloud 3D bounding box, length, width and height, and θ _i represents the point The orientation information of the cloud 3D bounding box on the bird's-eye view, that is, the rotation angle around the Y-axis of the lidar sensor coordinate system.

After obtaining the target detection result P of the point cloud data, the target detection result corresponding to the above point cloud data can be projected to the image according to the external reference coordinate system of the camera device and the calibration relationship between the lidar sensor and the camera device In the data, the 2D projection frame of the bounding frame corresponding to the point cloud data is obtained

in,

In the embodiment of the present disclosure, in the above 2D projection frame

middle,

are the pixel coordinates of the upper left corner and lower right corner of the 2D projection box in the image data. After the coordinate information of the 2D projection frame is obtained, the coordinate information can be used to determine the geometric feature information of the point cloud data.

It should be noted that the geometric feature information of the 2D projection frame of the bounding frame corresponding to the above point cloud data can be written as

in,

At this point, it can be seen that the geometric feature information

The data dimension D _G in is 4, including

The pixel coordinates of the upper left corner point and the lower right corner point, then the size of the geometric feature information of the 2D projection frame of the bounding frame corresponding to the above point cloud data can be recorded as N×D _G .

Similarly, the geometric feature information of the 2D bounding box of the above image data can be recorded as

in,

At this point, it can be seen that the geometric feature information

The data dimension D _G in is 4, including

The pixel coordinates of the upper left corner point and the lower right corner point, then the size of the geometric features of the above image data can be recorded as M×D _G .

It should be noted here that, in addition to the pixel coordinates of the upper left corner point and the lower right corner point, the bounding box information in the geometric feature information may also be the pixel coordinates of the lower left corner point and the upper right corner point, which is not specifically limited in the present disclosure.

In an embodiment of the present disclosure, the above geometric feature information may further include position information and/or size information of the bounding box.

Here, after determining the geometric feature information of the point cloud data and the geometric feature information of the image data in the manner described above, the position information and/or size information of the bounding box can also be expanded to the geometric feature information. At this time, The expanded geometric feature information includes position information and/or size information of the bounding box.

In the following, the expansion process of geometric feature information can be introduced in two cases: the expansion of geometric feature information of image data and the expansion of geometric feature information of point cloud data.

Case 1: The process of expanding the geometric feature information of the image data.

In the embodiment of the present disclosure, when expanding the geometric feature information of the above-mentioned image data, the pixel coordinates (x _c , y _c ) of the center point of the 2D bounding box can be extended to the above-mentioned

, so that D _G =6, then the expanded

In addition, the size information (h, w) of the 2D bounding box can also be extended to the above

, so that D _G =8, then the expanded

Wherein, the above-mentioned size information h and w are respectively the height and width size information of the 2D bounding box.

Case 2: The process of expanding the geometric feature information of point cloud data.

In the embodiment of the present disclosure, when expanding the geometric feature information of the above-mentioned point cloud data, the pixel coordinates of the center point of the 2D projection frame can be

extended to the above

, so that D _G =6, then the expanded

In addition, the size information of the 2D projection frame can also be

Also extended to the above

, so that D _G =8, then the expanded

where the above size information

are the height and width size information of the 2D projection frame, respectively.

In the above embodiment, by projecting the 3D bounding box corresponding to the point cloud data into the image data to obtain a 2D projection frame, the unification of the data format can be realized, so that the 2D bounding box and the 2D bounding box corresponding to the image data can be quickly obtained. The relative positions between the projection frames are used to match the target objects, and then the corresponding matching results are obtained.

For S1032, when the target feature information includes appearance feature information, determine the target feature information corresponding to the image data according to the target detection result of the image data, and the detailed process is described as follows:

(1) Determine target image data within the detected bounding box in the image data.

(2) Extract image features of the target image data, and determine the extracted image features as appearance feature information corresponding to the image data.

In the embodiment of the present disclosure, M 2D bounding boxes corresponding to the image data are determined in the image data

The image within, and the determined image is clipped to obtain the target image data; for the M 2D bounding boxes, M target image data can be obtained. Afterwards, the cropped target image data can be scaled to obtain M RBG images with uniform pixels of r×r, and the scaled target image data can be represented as Image, where,

The scaled target image data Image is used to represent the pixel value of each pixel, and the size of the scaled target image data Image is r×r×3. After obtaining the target image data, the target image data Image can be input into the preset image feature extraction network, and the D _A- dimensional image features of the target image data can be extracted

Then according to the image features

Obtain the appearance feature information of the target image data

Among them, the appearance feature information

Can be represented as a vector form of size M× _DA . It should be noted that the above-mentioned image feature extraction network includes but not limited to VGG-Net, ResNet, GoogleNet and other networks that can realize the above-mentioned image feature extraction.

For S1031, when the target feature information includes appearance feature information, step: determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data, and the detailed process is described as follows:

(1) Determine the target point cloud data located in the detected bounding box in the point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the bounding box and/or the target The coordinate information of the point cloud in the point cloud coordinate system.

(2) Based on the target point cloud data, determine global point cloud features used to describe the overall features of the target point cloud, and determine appearance feature information of the point cloud data according to the global point cloud features.

In the embodiment of the present disclosure, the above-mentioned lidar sensor can scan the road conditions within the scanning range, so as to obtain several point cloud data used to characterize the characteristics of the objects within the collection range. When determining the appearance feature information corresponding to the point cloud data, the target point cloud data located in the 3D bounding box can be determined in the point cloud data, wherein the target point cloud data includes the number L and /or coordinate information C of the target point cloud in the point cloud coordinate system (that is, the lidar coordinate system), where C=3.

Here, the extracted target point cloud data can be recorded as PC, where

Then, the target point cloud data is input into the point cloud feature extraction network, and the global point cloud feature of the D _A dimension of the target point cloud data is extracted

Then according to the global point cloud features

Obtain the appearance feature information of the point cloud data

Among them, the appearance feature information

Can be represented as a vector form of size N× _DA .

It should be noted that the above-mentioned point cloud feature extraction network includes but not limited to Pointnet, Pointnet++, PointSIFT and other extraction networks that can realize the above-mentioned point cloud feature extraction.

For step S105, according to the target feature information, the bounding box determined based on the point cloud data is matched with the bounding box determined based on the image data, and the detailed process is described as follows:

(1) Perform correlation calculation on the target feature information of the image data and the target feature information of the point cloud data to obtain a correlation calculation result, and obtain a correlation calculation result.

(2) Determine a matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result.

In the embodiment of the present disclosure, after the target feature information is determined, a correlation calculation may be performed on the target feature information of the image data and the target feature information of the point cloud data according to a preset correlation algorithm to obtain a correlation calculation result.

It is assumed that the target feature information of the image data includes feature information corresponding to M 2D bounding boxes, and the target feature information of the point cloud data includes feature information corresponding to N 3D bounding boxes. At this time, the correlation calculation result can be understood as the correlation between each feature information in the feature information corresponding to the M 2D bounding boxes and each feature information in the feature information corresponding to the N 3D bounding boxes.

After the above correlation calculation result is obtained, the matching result between the bounding box determined based on the point cloud data and the bounding box determined based on the image data may be determined according to the correlation calculation result.

In some embodiments, the matching result indicates whether the bounding box determined based on the point cloud data matches the bounding box determined based on the image data, where the matching result may be an N*M matching matrix. When the element in the matching matrix is 1, it indicates that the two bounding boxes are matching bounding boxes, and when the element in the matching matrix is 0, it indicates that the two bounding boxes are not mutually matching bounding boxes.

In the embodiment of the present disclosure, in the above steps: the correlation calculation is performed on the target feature information of the image data and the target feature information of the point cloud data to obtain the correlation calculation result, and the detailed process description as follows:

(1) Concatenating the geometric feature information corresponding to the image data and the appearance feature information corresponding to the image data to obtain target image features.

(2) Splicing the geometric feature information corresponding to the point cloud data and the appearance feature information corresponding to the point cloud data to obtain target point cloud features.

(3) Perform a correlation calculation on the target image feature and the target point cloud feature to obtain the correlation calculation result.

In the embodiment of the present disclosure, the geometric feature information of the above image data and the appearance feature information of the image data may be spliced, so as to obtain the target image feature of the image data. For example, _an appearance feature vector of size M×D can be

(the appearance feature information of the image data) and a geometric feature vector of size M×D _G

(Geometric feature information of the image data) is spliced to obtain an image feature vector F _img with a size of M×( _DA +D _G ), which is the above-mentioned target image feature. In the embodiment of the present disclosure, the geometric feature information of the point cloud data and the appearance feature information of the point cloud data may be concatenated, so as to obtain the target point cloud feature of the point cloud data. For example, _an appearance feature vector of size N×D can be

(the appearance feature information of point cloud data) and the geometric feature vector F _pc (geometric feature information of point cloud data) of size N×D _G are spliced to obtain a point cloud of size N×( _DA +D _G ) The point cloud feature vector F _pc of the data is the feature of the above-mentioned target point cloud. Afterwards, a correlation operation can be performed on the target image features and the target point cloud features through a preset correlation algorithm. For example, the above-mentioned image feature vector F _img and image feature vector F _pc can be calculated by a preset correlation algorithm to obtain a correlation matrix F _correlation of N×M×( _DA +D _G ) (that is, the above-mentioned correlation Calculation results).

In an optional implementation manner, the preset correlation algorithm may be an algorithm corresponding to any one of the following calculation formulas:

In the embodiment of the present disclosure, in the above steps: determine the matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result, the detailed process is described as follows:

(1) Perform convolution calculation on the correlation calculation result to obtain a similarity matrix, wherein the similarity matrix is used to characterize the relationship between the bounding box in the point cloud data and the bounding box in the image data the degree of similarity between them.

(2) Inverting the similarity matrix to obtain a matching cost matrix.

(3) Perform bipartite graph matching processing on the matching cost matrix to obtain a matching result between the bounding boxes in the point cloud data and the bounding boxes in the image data.

In the embodiment of the present disclosure, after the correlation calculation result is obtained, the above correlation calculation result (that is, the correlation matrix F _correlation ) can be input into several two-dimensional convolutional networks for convolution calculation, and a size It is a similarity matrix of N×M×1. Wherein, each element in the similarity matrix represents: the degree of similarity between each 3D bounding box in the N 3D bounding boxes and each 2D bounding box in the M 2D bounding boxes.

Here, the degree of similarity includes: the degree of similarity determined based on geometric feature information, and the degree of similarity determined based on appearance feature information.

For example, the geometric feature information of the nth 3D bounding box and the mth 2D bounding box have a higher degree of similarity, and the similarity of the appearance feature information of the nth 3D bounding box with the mth 2D bounding box is relatively small. is high, it can be determined that the nth 3D bounding box and the mth 2D bounding box are matching bounding boxes.

After the similarity matrix is obtained, the similarity matrix can be inversely calculated to obtain the matching cost matrix; then, the bipartite graph matching process is performed on the matching cost matrix to obtain the bounding box in the point cloud data and the bounding box in the image data. matching results.

Assuming that between two single-modal detection results (that is, the target detection result of image data and the target detection result of point cloud data), each detection result can only constitute at most one match, and the detection results in the same modality are mutually exclusive. Are not the same.

Here, each detection result can only constitute at most one match, which can be understood as: a 2D bounding box determined based on image data can at most match a 3D bounding box determined based on point cloud data.

At this point, the matching problem of two unimodal detection results can be regarded as a bipartite graph matching problem. For example, in an undirected graph, the target detection results of image data and the target detection results of point cloud data can be divided into two subsets, for example, the target detection results of image data as a subset, and the target detection results of point cloud data as Another subset. Each subset contains multiple vertices, each vertex corresponds to a bounding box, and the vertices in each subset are mutually disjoint, and the vertices associated with all edges in the undirected graph belong to two different sets. For a bipartite graph, the number of matched matches can be different, and the matching goal is to make the two subsets match each other as accurately as possible. Therefore, through the matching algorithm, the similarity matrix is inverted one by one as the matching cost matrix, and then the matching threshold δ is set. The matching cost is higher than δ and does not participate in the matching, and then the final multimodal matching matrix is calculated (that is, in the point cloud data The matching result between the bounding box of and the bounding box in the image data).

It should be noted that the matching algorithms described above include but are not limited to the Hungarian matching algorithm and the Kuhn-Munkres matching algorithm.

Under the weak synchronization multimodal data set, for the geometric feature information, since the object matching is performed through the position information of the projection frame of the 2D bounding box and the 3D bounding box, the geometric feature information will also have a weak synchronization problem. Especially for small objects, the deviation of geometric feature information will lead to a serious decline in the matching effect, so the existing common IOU similarity matrix matching algorithm cannot solve the multimodal matching problem under weak synchronization. The appearance feature information is always extracted following the objects in the 3D frame and the 2D frame. Therefore, the appearance feature information will not be affected by the weak synchronization problem, so the appearance feature information helps to correct the error caused by the weak synchronization.

In summary, the data matching method proposed by the embodiments of the present disclosure can be applied to the many-to-many multi-modal data matching process in both strong synchronization and weak synchronization situations.

In the embodiment of the present disclosure, as shown in FIG. 2 , a schematic flowchart of another data matching method is also provided, and the method is described in detail as follows:

(1) Determine the target detection result.

The image data to be matched is collected by a camera device; and the image data is detected by an image single-modal detection model to obtain a target detection result A1, wherein the target detection result A1 includes a 2D surround of M objects contained in the image data frame

The point cloud data to be matched is collected by the lidar sensor; and the point cloud data is detected by the point cloud single-mode detection model to obtain the target detection result A2, wherein the target detection result A2 includes the perceived point cloud data 3D bounding boxes for N objects

(2) Determine the target feature information corresponding to the image data according to the target detection result A1.

The 2D bounding boxes of M objects

The position information of is determined as the geometric feature information in the target feature information corresponding to the image data. Determine the target image data located in each 2D bounding box in the image data, and extract the image features of the above target image data through the image feature extraction network, and determine the extracted above image features as the target feature information corresponding to the above image data The appearance feature information M× _DA in .

(3) Determine the target feature information corresponding to the point cloud data according to the target detection result A2.

Project the target detection results corresponding to the point cloud data into the image data to obtain the projection frame of the 3D bounding box corresponding to the point cloud data; determine the geometric features of the point cloud data according to the pixel coordinates of the vertices of the projection frame in the image data information.

Determine the target point cloud data located in the detected 3D bounding box in the point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the 3D bounding box and/or the target point cloud in the point cloud Coordinate information in the coordinate system; based on the target point cloud data, determine the global point cloud features used to describe the overall features of the target point cloud, and determine the appearance feature information of the point cloud data according to the global point cloud features.

(4) Correlation calculation.

The geometric feature information corresponding to the image data and the appearance feature information corresponding to the image data are spliced to obtain the target image feature; the geometric feature information corresponding to the point cloud data and the appearance feature information corresponding to the point cloud data are spliced to obtain the target point cloud feature ; Carry out a correlation operation on the target image feature and the target point cloud feature to obtain the correlation calculation result.

(5), data matching process.

Perform convolution calculation on the above correlation calculation results to obtain a similarity matrix N×M×1; perform inverse calculation on the above similarity matrix to obtain a matching cost matrix N×M; perform bipartite graph matching processing on the above matching cost matrix, A matching result between the bounding box in the point cloud data and the bounding box in the image data is obtained, wherein the matching result may be a matching matrix with a size of N×M.

It can be seen from the above description that in the embodiment of the present disclosure, a data matching method is proposed for weak synchronization between the camera device and the lidar sensor due to response delay or complex road conditions. This method uses a point cloud single-modal detection model and the bounding box predicted by the image unimodal detection model, first obtain the geometric feature information of the projected 2D bounding box of the 3D bounding box corresponding to the point cloud data and the 2D bounding box corresponding to the image data, and then extract the network through the point cloud feature And the image feature extraction network extracts the appearance feature information in the corresponding bounding box, and finally predicts the similarity matrix between the target detection results of the point cloud data and the target detection results of the image data based on the joint features of the geometric feature information and the appearance feature information.

In the embodiment of the present disclosure, the target detection result of the point cloud data and the target detection result of the image data are obtained by respectively performing target detection on the point cloud data and the image data through the trained single-modal detection model Previously, the unimodal detection model also needs to be trained according to the following steps:

(1) Determine a training sample set including a plurality of training samples; wherein, each training sample includes: sample image data and sample point cloud data carrying sample labels.

(2) Perform target detection on the training sample set through the single-modal detection model to be trained, and obtain sample target detection results.

(3) Determine a label matching matrix according to the sample target detection result and the sample label.

(4) Calculate the function value of the target loss function according to the label matching matrix, and adjust the model parameters of the single-modal detection model according to the function value of the target loss function until the preset condition is reached, and the training is completed The unimodal detection model.

In the embodiment of the present disclosure, when training the single-modal detection model, it is first necessary to construct a training sample set including multiple training samples, that is, a sample collection of sample image data or sample point cloud data including sample labels.

In the embodiment of the present disclosure, by inputting the training sample set into the single-modal detection model to be trained, the above-mentioned single-modality detection model can be trained to carry out the sample labels in the above-mentioned point cloud data and image data respectively. recognition, so as to obtain the sample target detection results.

After the sample target detection result is obtained, the label matching matrix can be determined according to the sample label and the sample target detection result. Furthermore, the target loss function is calculated according to the label matching matrix, and the model parameters of the single-modal detection model are adjusted according to the target loss function until the preset condition is reached, and the trained single-modal detection model is obtained, wherein the above pre-set The precondition may be that the number of training times of the single-modal detection model meets the preset requirement, and/or, the training accuracy of the single-modal detection model meets the preset accuracy requirement.

It should be noted that the target loss function includes but is not limited to mean square error loss (MSE), absolute error loss (MAE), cross-entropy loss (BCE) and other algorithms that can realize the above-mentioned single-modal detection model training.

It can be seen from the above description that the single-modal detection model includes a point cloud single-modal detection model and an image single-modal detection model. When training the point cloud unimodal detection model and image unimodal detection model, the image unimodal detection model can be trained based on the sample training set containing sample image data, and based on the sample training set containing sample point cloud data Set to train the point cloud single-modal detection model. The detailed training process is as described above, and will not be described separately here.

In the embodiment of the present disclosure, as shown in FIG. 3, the above steps: determine the label matching matrix according to the sample target detection result and the sample label, and the detailed process is described as follows:

(1), calculating the intersection ratio between at least one predicted bounding box corresponding to the sample image data in the sample target detection result and the labeled bounding box corresponding to the sample image data in the sample label, to obtain a first intersection ratio; And filtering the at least one predicted bounding box according to the first intersection-union ratio to obtain a target predicted bounding box.

For example, in an embodiment of the present disclosure, the sample image data included in the training sample can be input into the image unimodal detection model to obtain a sample target detection result containing at least one predicted bounding box, wherein the at least one predicted bounding box is Can be called predicting 2D bounding boxes.

After that, calculate the intersection ratio IOU (Intersection Over Union) between each predicted 2D bounding box and each labeled bounding box in the sample image data to obtain multiple first intersection ratios; then, according to multiple first intersection ratios Filter at least one predicted bounding box to obtain the target predicted bounding box. The detailed screening process is described as follows:

Firstly, for each predicted 2D bounding box, it is judged whether among the plurality of first intersection and union ratios satisfies an intersection and union ratio greater than or equal to a preset threshold. If it is determined that it is contained, the predicted 2D bounding box is determined as the target predicted bounding box. At this time, the labeled bounding box corresponding to the largest intersection-union ratio may be determined from the plurality of first intersection-union ratios, and the labeled bounding box is determined as a bounding box matching the predicted 2D bounding box. If it is determined that it does not contain, the predicted 2D bounding box is discarded.

(2) Calculate the intersection and union ratio between at least one predicted bounding box corresponding to the sample point cloud data in the sample target detection result and the marked bounding box corresponding to the sample point cloud data in the sample label, to obtain the second intersection and union ratio; and filter the at least one prediction bounding box according to the second intersection ratio to obtain a target prediction bounding box.

For example, in an embodiment of the present disclosure, the sample point cloud data included in the training sample can be input into the point cloud single-modal detection model to obtain a sample target detection result containing at least one predicted bounding box, wherein at least one predicted bounding box A box may also be called a predicted 3D bounding box.

After that, calculate the intersection and union ratio IOU between each predicted 3D bounding box and each labeled bounding box in the sample point cloud data, and obtain multiple second intersection and union ratios; then, at least one prediction is made according to multiple second intersection and union ratios The bounding box is screened to obtain the target prediction bounding box. The detailed screening process is described as follows:

First, for each predicted 3D bounding box, it is judged whether the plurality of second intersection and union ratios satisfy an intersection and union ratio greater than or equal to a preset threshold. If it is determined that it is contained, the predicted 3D bounding box is determined as the target predicted bounding box. At this time, the label bounding box corresponding to the maximum intersection and union ratio may be determined from among the plurality of second intersection and union ratios, and the label bounding box is determined as a bounding box matching the predicted 3D bounding box. If it is determined that it does not contain, the predicted 3D bounding box is discarded.

(3) Match the predicted target bounding box with the predicted target bounding box to obtain a label matching result, and determine the label matching matrix according to the label matching result.

After the target prediction bounding box and the target prediction bounding box are determined, the target prediction bounding box and the target prediction bounding box corresponding to the same object in the target prediction bounding box and the target prediction bounding box are regarded as a label matching pair; and the label matching matrix is corresponding to The position of is set to 1, and the unmatched position is set to 0, so as to obtain a label matching matrix.

In the embodiment of the present disclosure, a training sample set including multiple training samples is determined, and the detailed process is described as follows:

(1) Acquiring a target tracking data sequence, wherein the target tracking data sequence includes image data and point cloud data acquired at each tracking moment.

(2) At least one data combination is determined in the target tracking data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the The second tracking moment of the target point cloud data is different, and the time interval between the first tracking moment and the second tracking moment is a preset interval.

(3) Using the data in each of the data combinations as the data in each training sample.

In the embodiment of the present disclosure, firstly, target tracking data sequences including image data and point cloud data are respectively acquired, wherein the above target tracking data sequences contain enough data for tracking and training the above single modality detection model.

In the embodiment of the present disclosure, firstly, in the same target tracking data sequence, the target image data image ^k at the first tracking moment is selected, and then according to the preset interval, the second tracking moment is determined at intervals of several frames and the image k at this moment is selected. The target point cloud data PC ^k+n , whose construction principle is the transitivity of weak synchronization, the target image data image ^k of the current frame and the image data image ^k+n of several frames apart will have weak synchronization in time and space, then the image data image ^k The strongly synchronized target point cloud data PC ^k+n corresponding to ⁺ n will also generate spatiotemporal weak synchronization with the target image data image ^k .

In the embodiment of the present disclosure, the above-mentioned target image data image ^k and target point cloud data PC ^k+n are respectively determined as sample image data and sample point cloud data in the training samples.

Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not imply a strict execution order and constitutes any limitation on the implementation process. The actual execution order of each step should be based on its function and possible The inner logic is OK.

Based on the same inventive concept, the embodiment of the present disclosure also provides a data matching device corresponding to the data matching method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned data matching method of the embodiment of the present disclosure, the implementation of the device See the implementation of the method.

Referring to FIG. 4 , it is a schematic diagram of a data matching device provided by an embodiment of the present disclosure. The device includes: an acquisition module 41, a determination module 42, and a matching module 43; wherein,

The acquisition module 41 is configured to detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the detected target object The bounding box information;

Determination module 42: configured to determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; determine the target feature information corresponding to the image data according to the target detection result of the image data; The target feature information includes geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box;

Matching module 43: configured to match the bounding box determined based on the point cloud data with the bounding box determined based on the image data according to the target feature information.

In the embodiment of the present disclosure, by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion, and image data can be used to make up for the sparseness and texturelessness of point cloud data. Defects. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.

In a possible implementation manner, the determination module 42 is further configured to: project the target detection result of the point cloud data into the image data to obtain the projection frame of the bounding frame of the point cloud data; according to the The pixel coordinates of the vertices of the projection frame in the image data determine the geometric feature information of the point cloud data.

In a possible implementation manner, the geometric feature information includes position information and/or size information of the bounding box.

In a possible implementation manner, the determination module 42 is further configured to: determine the target image data within the detected bounding box in the image data; extract image features of the target image data, and extract all the extracted The image feature is determined as appearance feature information corresponding to the image data.

In a possible implementation manner, the determination module 42 is further configured to: determine the target point cloud data located in the detected bounding box in the point cloud data; wherein, the target point cloud data includes: located in the bounding box The quantity of the target point cloud and/or the coordinate information of the target point cloud in the point cloud coordinate system; based on the target point cloud data, determine the global point cloud feature used to describe the overall feature of the target point cloud, and The appearance feature information corresponding to the point cloud data is determined according to the global point cloud feature.

In a possible implementation manner, the matching module 43 is further configured to: perform correlation calculation on the target feature information of the image data and the target feature information of the point cloud data, obtain a correlation calculation result, and obtain a correlation calculation Result: determining a matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result.

In a possible implementation manner, the matching module 43 is further configured to: splice the geometric feature information corresponding to the image data and the appearance feature information corresponding to the image data to obtain target image features; The corresponding geometric feature information and the appearance feature information corresponding to the point cloud data are spliced to obtain the target point cloud feature; the correlation calculation is performed on the target image feature and the target point cloud feature to obtain the correlation calculation result .

In a possible implementation manner, the matching module 43 is further configured to: perform convolution calculation on the correlation calculation result to obtain a similarity matrix, wherein the similarity matrix is used to represent the points in the point cloud data. The degree of similarity between the bounding box and the bounding box in the image data; inverting the similarity matrix to obtain a matching cost matrix; performing bipartite graph matching processing on the matching cost matrix to obtain the point cloud Matching results between bounding boxes in the data and bounding boxes in the image data.

In a possible implementation manner, the matching module 43 is further configured to: respectively perform target detection on the point cloud data and the image data through the trained single-modal detection model, to obtain the target detection of the point cloud data Results and object detection results of the image data.

In a possible implementation manner, the device is further configured to: train the single-modal detection model according to the following steps: determine a training sample set including a plurality of training samples; wherein, each training sample includes: carrying a sample The sample image data and sample point cloud data of the label; the target detection is performed on the training sample set through the single-mode detection model to be trained, and the sample target detection result is obtained; according to the sample target detection result and the sample label, determine Label matching matrix; calculate the target loss function according to the label matching matrix, and adjust the model parameters of the single-modal detection model according to the target loss function until the preset condition is reached, and the single-modal detection after training is obtained Model.

In a possible implementation manner, the device is further configured to: calculate the distance between at least one predicted bounding box corresponding to the sample image data in the sample target detection result and the labeled bounding box corresponding to the sample image data in the sample label Intersection and union ratio to obtain the first intersection and union ratio; and filter the at least one prediction bounding box according to the first intersection and union ratio to obtain the target prediction bounding box; calculate the corresponding sample point cloud data in the sample target detection result The intersection and union ratio between at least one predicted bounding box and the label bounding box corresponding to the sample point cloud data in the sample label, to obtain a second intersection and union ratio; and according to the second intersection and union ratio for the at least one prediction The bounding box is screened to obtain a target prediction bounding box; the target prediction bounding box is matched with the target prediction bounding box to obtain a label matching result, and the label matching matrix is determined according to the label matching result.

In a possible implementation manner, the device is further configured to: acquire a target tracking data sequence, wherein the target tracking data sequence includes image data and point cloud data acquired at each tracking moment; At least one data combination is determined in the data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the second tracking moment of the target point cloud data are not the same, and the time interval between the first tracking moment and the second tracking moment is a preset interval; the data in each data combination is used as the data in each training sample.

For a description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments.

Corresponding to the data matching method in FIG. 1, the embodiment of the present disclosure also provides an electronic device 500, as shown in FIG. 5, which is a schematic structural diagram of the electronic device 500 provided in the embodiment of the present disclosure, including:

Processor 51, memory 52, and bus 53; memory 52 is used for storing and executing instruction, comprises memory 521 and external memory 522; memory 521 here is also called internal memory, is used for temporarily storing computing data in processor 51, and The data exchanged by the external memory 522 such as hard disk, the processor 51 exchanges data with the external memory 522 through the memory 521, when the electronic device 500 is running, the processor 51 communicates with the memory 52 through the bus 53, so that The processor 51 executes the following instructions:

Detecting the point cloud data and the image data to be matched respectively, and obtaining the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the bounding box information of the detected target object; according to According to the target detection result of the point cloud data, determine the target feature information corresponding to the point cloud data; according to the target detection result of the image data, determine the target feature information corresponding to the image data; the target feature information includes detection The geometric feature information of the bounding box of the target object and the appearance feature information of the target object in the bounding box; according to the target feature information, the bounding box determined based on the point cloud data and based on the image data The determined bounding boxes are matched.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored. When the computer program is run by a processor, the steps of the data matching method described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the data matching method described in the above method embodiment, please refer to the above method implementation example.

Wherein, the above-mentioned computer program product may be realized by hardware, software or a combination thereof. In an optional embodiment, the computer program product can be embodied as a computer storage medium, and in another optional embodiment, the computer program product can be embodied as a software product, such as SDK (Software Development Kit, software development kit) etc. .

Those skilled in the art can clearly understand that for the convenience and brevity of description, for the detailed working process of the system and device described above, reference can be made to the corresponding process in the foregoing method embodiments. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Industrial Applicability

Embodiments of the present disclosure provide a data matching method and device, electronic equipment, a storage medium, and a program product, wherein the method includes: respectively detecting the point cloud data and image data to be matched, and obtaining the target detection result of the point cloud data and the target detection results of the image data; according to the target detection results of the point cloud data, determine the target feature information corresponding to the point cloud data and the image data respectively; according to the target detection results of the image data, determine the target feature information corresponding to the image data; The information includes the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object in the bounding box; according to the target feature information, the bounding box in the point cloud data and the bounding box in the image data are processed match. In the embodiments of the present disclosure, the 3D target is detected by combining image data and point cloud data, which can improve the detection accuracy of the 3D target, and further obtain a more accurate target detection result.

Claims

A data matching method, including:

Detecting the point cloud data and the image data to be matched respectively, and obtaining the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the bounding box information of the detected target object;

According to the target detection result of the point cloud data, determine the target feature information corresponding to the point cloud data; according to the target detection result of the image data, determine the target feature information corresponding to the image data; the target feature information includes Geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box;

Matching the bounding box determined based on the point cloud data and the bounding box determined based on the image data according to the target feature information.
The method according to claim 1, wherein said determining the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data comprises:

Projecting the target detection result of the point cloud data into the image data to obtain the projection frame of the bounding frame corresponding to the point cloud data;

The geometric feature information of the point cloud data is determined according to the pixel coordinates of the vertices of the projection frame in the image data.
The method according to claim 1 or 2, wherein the geometric feature information includes position information and/or size information of the bounding box.
The method according to any one of claims 1 to 3, wherein said determining the target feature information corresponding to the image data according to the target detection result of the image data includes:

determining target image data within the detected bounding box in the image data;

Extracting image features of the target image data, and determining the extracted image features as appearance feature information corresponding to the image data.
The method according to any one of claims 1 to 4, wherein said determining the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data includes:

Determine the target point cloud data located in the detected bounding box in the point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the bounding box and/or the target point cloud in the point cloud Coordinate information in the coordinate system;

Based on the target point cloud data, determine global point cloud features used to describe the overall features of the target point cloud, and determine appearance feature information corresponding to the point cloud data according to the global point cloud features.
The method according to any one of claims 1 to 5, wherein, according to the target feature information, matching the bounding box determined based on the point cloud data and the bounding box determined based on the image data, include:

performing correlation calculation on the target feature information of the image data and the target feature information of the point cloud data to obtain a correlation calculation result;

A matching result between the bounding box in the point cloud data and the bounding box in the image data is determined according to the correlation calculation result.
The method according to claim 6, wherein the correlation calculation is performed on the target feature information of the image data and the target feature information of the point cloud data to obtain a correlation calculation result, including:

splicing the geometric feature information corresponding to the image data and the appearance feature information corresponding to the image data to obtain target image features;

Splicing the geometric feature information corresponding to the point cloud data and the appearance feature information corresponding to the point cloud data to obtain the target point cloud feature;

performing a correlation calculation on the target image feature and the target point cloud feature to obtain the correlation calculation result.
The method according to claim 6 or 7, wherein said determining the matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result comprises:

Carrying out convolution calculation on the correlation calculation result to obtain a similarity matrix, wherein the similarity matrix is used to characterize the degree of similarity between the bounding boxes in the point cloud data and the bounding boxes in the image data ;

Inverting the similarity matrix to obtain a matching cost matrix;

A bipartite graph matching process is performed on the matching cost matrix to obtain a matching result between the bounding boxes in the point cloud data and the bounding boxes in the image data.
The method according to any one of claims 1 to 8, wherein said point cloud data and image data to be matched are detected respectively to obtain target detection results of said point cloud data and target detection results of image data, include:

Target detection is performed on the point cloud data and the image data respectively by using the trained single-mode detection model to obtain a target detection result of the point cloud data and a target detection result of the image data.
The method according to claim 9, wherein the single modality detection model is trained according to the following steps:

Determining a training sample set comprising a plurality of training samples; wherein, each training sample includes: sample image data or sample point cloud data carrying sample labels;

performing target detection on the training sample set through the single-mode detection model to be trained to obtain a sample target detection result;

Determine a label matching matrix according to the sample target detection result and the sample label;

Calculate the function value of the target loss function according to the label matching matrix, and adjust the model parameters of the single-mode detection model according to the function value of the target loss function until the preset condition is reached, and the trained single-mode is obtained. state detection model.
The method according to claim 10, wherein said determining a label matching matrix according to said sample target detection result and said sample label comprises:

calculating an intersection ratio between at least one predicted bounding box corresponding to the sample image data in the sample target detection result and an labeled bounding box corresponding to the sample image data in the sample label, to obtain a first intersection ratio; and according to the Filtering the at least one predicted bounding box by the first intersection ratio to obtain the target predicted bounding box;

calculating an intersection ratio between at least one predicted bounding box corresponding to the sample point cloud data in the sample target detection result and an labeled bounding box corresponding to the sample point cloud data in the sample label, to obtain a second intersection ratio; and according to The second intersection ratio filters the at least one predicted bounding box to obtain a target predicted bounding box;

Matching the target prediction bounding box and the target prediction bounding box to obtain a label matching result, and determining the label matching matrix according to the label matching result.
The method according to claims 10 to 11, wherein said determining a training sample set comprising a plurality of training samples comprises:

Obtaining a target tracking data sequence, wherein the target tracking data sequence includes image data and point cloud data obtained at each tracking moment;

Determine at least one data combination in the target tracking data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the target point cloud data The second tracking moment is different, and the time interval between the first tracking moment and the second tracking moment is a preset interval;

The data in each of the data combinations is used as the data in each training sample.
A data matching device, including:

Acquisition module: configured to detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the detected target object bounding box information;

Determination module: configured to determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; determine the target feature information corresponding to the image data according to the target detection result of the image data; The target feature information includes the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object in the bounding box;

Matching module: configured to match the bounding box determined based on the point cloud data with the bounding box determined based on the image data according to the target feature information.
An electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and the memory pass through Bus communication, when the machine-readable instructions are executed by the processor, the steps of the data matching method according to any one of claims 1 to 12 are executed.
A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the data matching method according to any one of claims 1 to 12 are executed .
A computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, any one of claims 1 to 12 is realized. The steps of the data matching method.