WO2023024443A1 - Data matching method and apparatus, and electronic device, storage medium and program product - Google Patents

Data matching method and apparatus, and electronic device, storage medium and program product Download PDF

Info

Publication number
WO2023024443A1
WO2023024443A1 PCT/CN2022/075419 CN2022075419W WO2023024443A1 WO 2023024443 A1 WO2023024443 A1 WO 2023024443A1 CN 2022075419 W CN2022075419 W CN 2022075419W WO 2023024443 A1 WO2023024443 A1 WO 2023024443A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
point cloud
image data
data
bounding box
Prior art date
Application number
PCT/CN2022/075419
Other languages
French (fr)
Chinese (zh)
Inventor
吕伟杰
杨国润
王哲
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023024443A1 publication Critical patent/WO2023024443A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • the present disclosure relates to the technical field of automatic driving, and relates to but not limited to a data matching method and device, electronic equipment, storage media and program products.
  • 3D (3-Dimensional, three-dimensional) object detection is to identify the 3D bounding box information of the object, which mainly includes information such as position, orientation, size, and confidence.
  • single-modal approaches based on LiDAR or camera sensors have achieved increasing progress in the field of 3D object detection.
  • the target detection effect of the image unimodal method is affected by the current environment. Exposure or too dark photos will affect the acquisition of target information, and it is also affected by front and rear occlusions; the point cloud unimodal method needs to solve Point clouds are sparse, irregular, lack texture and semantic information, and too few points for small objects and distant objects.
  • Embodiments of the present disclosure at least provide a data matching method and device, electronic equipment, a storage medium, and a program product.
  • an embodiment of the present disclosure provides a data matching method, including: respectively detecting the point cloud data and image data to be matched, and obtaining the target detection result of the point cloud data and the target detection result of the image data; wherein , the target detection result includes the detected bounding box information of the target object; according to the target detection result of the point cloud data, determine the target feature information corresponding to the point cloud data; according to the target detection result of the image data, Determine the target feature information corresponding to the image data; the target feature information includes the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object in the bounding box; according to the target feature information , matching the bounding box determined based on the point cloud data with the bounding box determined based on the image data.
  • point cloud data can be used to make up for the defects that image data is easily affected by light and occlusion, and image data can be used to make up for the sparse and textureless point cloud data. defect. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.
  • the determining the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data includes: projecting the target detection result of the point cloud data to the In the image data, the projection frame of the bounding frame corresponding to the point cloud data is obtained; according to the pixel coordinates of the vertices of the projection frame in the image data, the geometric feature information of the point cloud data is determined.
  • the unification of the data format can be realized, so that the data format can be quickly obtained according to the image data.
  • the relative position between the corresponding 2D bounding box and the 2D projection box is used to match the target object, and then the corresponding matching result is obtained.
  • the geometric feature information includes position information and/or size information of the bounding box.
  • the geometric feature information can be enriched, further improving the accuracy of data matching.
  • the determining the target feature information corresponding to the image data according to the target detection result of the image data includes: determining the target image located in the detected bounding box in the image data data; extracting image features of the target image data, and determining the extracted image features as appearance feature information corresponding to the image data.
  • the matching error caused by the weak synchronization between the image data and the point cloud data can be compensated, thereby improving the accuracy of data matching, and further improving the safety factor of automatic driving.
  • the determining the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data includes: determining that the point cloud data is located within the detected bounding box The target point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the bounding box and/or the coordinate information of the target point cloud in the point cloud coordinate system; based on the target point cloud data , determining global point cloud features used to describe the overall features of the target point cloud, and determining appearance feature information corresponding to the point cloud data according to the global point cloud features.
  • the matching error caused by the weak synchronization between the image data and the point cloud data can be compensated, thereby improving the accuracy of data matching, and further improving the safety factor of automatic driving.
  • the matching the bounding box determined based on the point cloud data and the bounding box determined based on the image data according to the target feature information includes: Carrying out correlation calculation between the target feature information and the target feature information of the point cloud data to obtain a correlation calculation result, and obtain a correlation calculation result; determine the bounding box in the point cloud data and the Matching results between bounding boxes in image data.
  • the correlation calculation is performed on the target feature information of the point cloud data and the target feature information of the image data, and the feature information corresponding to the M 2D bounding boxes can be accurately determined. Correlation between each feature information of the 3D bounding boxes and each of the feature information corresponding to the N 3D bounding boxes, so that when the bounding boxes are matched according to the correlation calculation results, the matching accuracy of the bounding boxes is improved.
  • performing correlation calculation on the target feature information of the image data and the target feature information of the point cloud data to obtain a correlation calculation result includes: The geometric feature information and the appearance feature information corresponding to the image data are spliced to obtain the target image feature; the geometric feature information corresponding to the point cloud data and the appearance feature information corresponding to the point cloud data are spliced to obtain the target point cloud Features: performing a correlation calculation on the target image features and the target point cloud features to obtain the correlation calculation results.
  • the method of determining the matching result of the bounding box can make up for the matching error caused by the weak synchronization between the image data and the point cloud data, thereby improving the accuracy of data matching, and thus improving the safety factor of automatic driving.
  • the determining the matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result includes: calculating the correlation The calculation result is convoluted to obtain a similarity matrix, wherein the similarity matrix is used to characterize the degree of similarity between the bounding boxes in the point cloud data and the bounding boxes in the image data; Degree matrix is inversely calculated to obtain a matching cost matrix; a bipartite graph matching process is performed on the matching cost matrix to obtain a matching result between the bounding boxes in the point cloud data and the bounding boxes in the image data.
  • operations such as convolution, inversion, and bipartite graph matching may be performed on the above correlation calculation results to improve the processing efficiency of the matching results and obtain matching results with high accuracy.
  • said respectively detecting the point cloud data and image data to be matched, and obtaining the target detection result of the point cloud data and the target detection result of the image data includes:
  • the state detection model performs object detection on the point cloud data and the image data respectively, and obtains object detection results of the point cloud data and object detection results of the image data.
  • the target detection is performed on the point cloud data through the trained point cloud single-modal detection model, and the target detection is performed on the image data through the image single-modal detection model, which can improve the accuracy of the target detection result, thereby obtaining More accurate bounding box information that includes the full object of interest.
  • the single-modal detection model is trained according to the following steps to determine a training sample set containing multiple training samples; wherein, each training sample includes: sample image data and samples carrying sample labels point cloud data; through the single mode detection model to be trained, carry out target detection on the training sample set to obtain the sample target detection result; according to the sample target detection result and the sample label, determine the label matching matrix; according to the The label matching matrix calculates the function value of the target loss function, and adjusts the model parameters of the single-modal detection model according to the target loss function value until the preset condition is reached, and the trained single-modal detection model is obtained.
  • the determining the label matching matrix according to the sample target detection result and the sample label includes: calculating at least one predicted bounding box corresponding to the sample image data in the sample target detection result and the The intersection and union ratio between the labeled bounding boxes corresponding to the sample image data in the sample label to obtain the first intersection and union ratio; and filter the at least one predicted bounding box according to the first intersection and merge ratio to obtain the target predicted bounding box Box; calculate the intersection and union ratio between at least one predicted bounding box corresponding to the sample point cloud data in the sample target detection result and the marked bounding box corresponding to the sample point cloud data in the sample label, to obtain a second intersection ratio; and filtering the at least one predicted bounding box according to the second intersection and union ratio to obtain a target predicted bounding box; matching the target predicted bounding box with the target predicted bounding box to obtain a label matching result, and according to The tag matching result determines the tag matching matrix.
  • the predicted bounding box can be accurately matched with the predicted bounding box to obtain the label matching matrix; when the function value of the target loss function is determined according to the label matching matrix, an accurate function can be obtained value, thereby improving the training accuracy of the unimodal detection model.
  • the determining a training sample set comprising a plurality of training samples includes: acquiring a target tracking data sequence, wherein the target tracking data sequence includes image data and Point cloud data; at least one data combination is determined in the target tracking data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the The second tracking moment of the target point cloud data is different, and the time interval between the first tracking moment and the second tracking moment is a preset interval; the data in each of the data combinations is used as each training data in the sample.
  • the technical solution of the present disclosure proposes a method for constructing a point cloud and image weak synchronization multi-modal data set, through which the weak synchronization situation that may occur in the actual scene of automatic driving can be simulated, and the training sample set
  • the trained unimodal detection model can be adapted to the weak synchronization scenario of the multimodal data set.
  • the embodiment of the present disclosure also provides a data matching device, including: an acquisition module configured to detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the image data Target detection result; wherein, the target detection result includes the detected bounding box information of the target object; the determination module is configured to determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; According to the target detection result of the image data, determine the target feature information corresponding to the image data; the target feature information includes the geometric feature information of the detected bounding box of the target object, and the target object in the bounding box Appearance feature information; matching module: configured to match the bounding box determined based on the point cloud data with the bounding box determined based on the image data according to the target feature information.
  • a data matching device including: an acquisition module configured to detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the image data Target detection result; wherein, the target detection result includes the
  • an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processing
  • the processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.
  • embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned first aspect, or any of the first aspects of the first aspect, may be executed. Steps in one possible implementation.
  • an embodiment of the present disclosure further provides a computer program product
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the computer program product according to the present disclosure can be realized. Some or all steps of the methods described in the examples.
  • the computer program product may be a software installation package.
  • FIG. 1 shows a flowchart of a data matching method provided by an embodiment of the present disclosure
  • FIG. 2 shows a flowchart of another data matching method provided by an embodiment of the present disclosure
  • FIG. 3 shows a flow chart of determining a tag matching matrix according to sample target detection results and sample tags in a data matching method provided by an embodiment of the present disclosure
  • Fig. 4 shows a schematic diagram of a data matching device provided by an embodiment of the present disclosure
  • Fig. 5 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
  • the above-mentioned single-modal detection method usually includes image single-modal method and Point cloud unimodal approach.
  • the target detection effect of the image single-modal method is affected by the current environment. Exposure or too dark photos will affect the acquisition of target information, and it is also affected by front and rear occlusions; the point cloud single-modal method needs to solve the problem Sparse, irregular clouds, lack of texture and semantic information, and too few points for small objects, distant objects, etc.
  • the present disclosure provides a data matching method and device, electronic equipment, storage media and program products.
  • point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion, and image data can be used to make up for the sparseness and lack of point cloud data. Texture flaws. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.
  • the execution subject of the data matching method provided in the embodiments of the present disclosure is generally an electronic device with certain computing capabilities.
  • FIG. 1 is a flow chart of a data matching method provided by an embodiment of the present disclosure, the method includes steps S101 to S105, wherein:
  • S101 Detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the bounding box information of the detected target object .
  • image data may be collected by a camera device
  • point cloud data may be collected by a laser radar sensor, wherein the camera device and the laser radar sensor are sensors pre-installed on the target vehicle.
  • the target vehicle may be a vehicle with an automatic driving function, for example, a minibus, a car, etc., and the present disclosure does not specifically limit the type of the target vehicle.
  • the target detection result of the point cloud data includes the bounding box information of the detected target object, where the bounding box is a 3D bounding box. For example, if the number of target objects is N, then the bounding box information includes information about the 3D bounding boxes of the N target objects.
  • the target detection result of the image data includes bounding box information of the detected target object, where the bounding box is a 2D bounding box. For example, if the number of target objects is M, then the bounding box information includes information about the 2D bounding boxes of the M target objects.
  • the N target objects and the M target objects may include the same target object, or may include different target objects.
  • S103 Determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; determine the target feature information corresponding to the image data according to the target detection result of the image data; the target feature information
  • the information includes geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box.
  • the target feature information may be the target feature information corresponding to the above-mentioned point cloud data, and may also be the target feature information corresponding to the image data.
  • the target feature information corresponding to the point cloud data may include the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object within the bounding box; the target feature information corresponding to the image data may also include Geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box.
  • the appearance feature information may be an attribute feature used to characterize the target object framed by the bounding box, and the attribute feature may be object category information of the target object, wherein the category information is the category of the target object Labels, e.g., vehicles, pedestrians, etc.
  • S105 According to the target feature information, match the bounding box determined based on the point cloud data with the bounding box determined based on the image data.
  • point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion, and image data can be used to make up for the sparseness and lack of point cloud data. Texture flaws. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.
  • an optional implementation is a multi-modal detection method, where the multi-modal detection The method refers to the detection of objects by combining image data and point cloud data.
  • multimodal detection methods can include point cloud projection methods, image back-projection methods, and similarity matrix methods.
  • the point cloud projection method only considers the prediction results of point cloud 3D candidate frames, and relies heavily on the effect of point cloud single-modal detectors.
  • the image projection method only considers the prediction results of image 2D candidate boxes, and relies heavily on the effect of image single-modal detectors.
  • the similarity matrix method has not explored much on the problem of multimodal matching in the case of weak synchronization.
  • the prerequisite for the point cloud projection method, image back-projection method and similarity matrix method described above is the strong synchronization between the lidar sensor and the camera device. Therefore, the above technical solution does not consider the lidar sensor and camera device. There is weak synchronization between them.
  • the strong synchronization between the camera device and the lidar sensor means that the acquisition time of point cloud data and image data is highly synchronized, and it can also be understood as the point cloud single-mode detector and image single-mode detector detect the same
  • the point cloud projection 2D frame will coincide with the image 2D frame, or the image backprojection 3D frame will coincide with the point cloud 3D frame.
  • the camera device and the lidar sensor may have weak synchronization due to response delay or complex road conditions. At this time, weak synchronization will cause synchronization errors between image data and point cloud data.
  • the result of one party's projection and the other party will also produce a corresponding synchronization error, which can be understood as the bounding box of the object detected through the point cloud data and the object detected through the image data.
  • the bounding boxes of the object will not coincide.
  • wrong matching results may be generated. Poor matching results will bring worse detection results for multi-modal fusion than for single-modal detection.
  • the present disclosure provides a data matching method.
  • this data matching method by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion.
  • the image data can make up for the sparse point cloud data and the lack of texture. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.
  • step S101 to step S105 the steps described in the above step S101 to step S105 will be described in detail, and the detailed description process is as follows.
  • step S101 the point cloud data and the image data to be matched are detected respectively, and the target detection result of the point cloud data and the target detection result of the image data are obtained, including the following process:
  • Target detection is performed on the point cloud data and the image data respectively through the trained single-modal detection model, and the target detection result of the point cloud data and the target detection result of the image data are obtained.
  • the unimodal detection module includes a point cloud unimodal detection model and an image unimodal detection model.
  • the point cloud single-modal detection model is used for target detection on point cloud data to obtain corresponding target detection results
  • the image single-modal detection model is used for target detection on image data to obtain corresponding target detection results.
  • the point cloud data is collected by the laser radar sensor, and the image data is collected by the camera device. Afterwards, target detection is performed on the point cloud data through the point cloud single-modal detection model, and target detection is performed on the image data through the image single-modal detection model.
  • point cloud unimodal detection models include but not limited to SECOND, PointPillars, PointRCNN, PV-RCNN, and image unimodal detection models include but not limited to RRC, MSCNN, Cascade R-CNN.
  • the target detection is performed on the point cloud data through the trained point cloud single-modal detection model, and the target detection is performed on the image data through the image single-modal detection model, which can improve the accuracy of the target detection result, thereby obtaining More accurate bounding box information that includes the full object of interest.
  • step S103 after the target detection result is determined, the target feature information corresponding to the point cloud data and the target feature information corresponding to the image data can be determined. Therefore, for step S103, it can be described as the following process:
  • Step S1031 according to the object detection result of the point cloud data, determine the geometric feature information and appearance feature information corresponding to the point cloud data.
  • Step S1032 according to the target detection result of the image data, determine the geometric feature information and the appearance feature information corresponding to the image data.
  • step S1031 and step S1032 are in no order. Step S1031 and step S1032 can be executed at the same time; step S1031 can also be executed first, and then step S1032; or step S1032 can be executed first, and then step S1031 can be executed. The above step S1031 and step S1032 will be described in detail below.
  • the geometric feature information corresponding to the image data can be determined through the following process, and the detailed process is described as follows:
  • the target detection result of the image data includes 2D bounding boxes of the M target objects.
  • the bounding box information of each 2D bounding box is denoted as I j
  • the bounding box information of M 2D bounding boxes can be denoted as
  • the bounding box information of each 2D bounding box may be determined as the geometric feature information corresponding to the image data.
  • (x j1 , y j1 ) and (x j2 , y j2 ) respectively represent the coordinate information of the upper left corner and lower right corner of each 2D bounding box in the image data under the external reference coordinate system of the camera device.
  • x i , y i , z i represent the coordinates of the center of the 3D bounding box on the lidar sensor coordinate system
  • h i , w i , l i represent the three-dimensional dimensions of the point cloud 3D bounding box, length, width and height
  • ⁇ i represents the point The orientation information of the cloud 3D bounding box on the bird's-eye view, that is, the rotation angle around the Y-axis of the lidar sensor coordinate system.
  • the target detection result corresponding to the above point cloud data can be projected to the image according to the external reference coordinate system of the camera device and the calibration relationship between the lidar sensor and the camera device In the data, the 2D projection frame of the bounding frame corresponding to the point cloud data is obtained in,
  • the coordinate information of the 2D projection frame is obtained, the coordinate information can be used to determine the geometric feature information of the point cloud data.
  • the geometric feature information of the 2D projection frame of the bounding frame corresponding to the above point cloud data can be written as in, At this point, it can be seen that the geometric feature information The data dimension D G in is 4, including The pixel coordinates of the upper left corner point and the lower right corner point, then the size of the geometric feature information of the 2D projection frame of the bounding frame corresponding to the above point cloud data can be recorded as N ⁇ D G .
  • the geometric feature information of the 2D bounding box of the above image data can be recorded as in, At this point, it can be seen that the geometric feature information
  • the data dimension D G in is 4, including The pixel coordinates of the upper left corner point and the lower right corner point, then the size of the geometric features of the above image data can be recorded as M ⁇ D G .
  • the bounding box information in the geometric feature information may also be the pixel coordinates of the lower left corner point and the upper right corner point, which is not specifically limited in the present disclosure.
  • the above geometric feature information may further include position information and/or size information of the bounding box.
  • the position information and/or size information of the bounding box can also be expanded to the geometric feature information.
  • the expanded geometric feature information includes position information and/or size information of the bounding box.
  • the expansion process of geometric feature information can be introduced in two cases: the expansion of geometric feature information of image data and the expansion of geometric feature information of point cloud data.
  • Case 1 The process of expanding the geometric feature information of the image data.
  • Case 2 The process of expanding the geometric feature information of point cloud data.
  • the unification of the data format can be realized, so that the 2D bounding box and the 2D bounding box corresponding to the image data can be quickly obtained.
  • the relative positions between the projection frames are used to match the target objects, and then the corresponding matching results are obtained.
  • the geometric feature information can be enriched, further improving the accuracy of data matching.
  • M 2D bounding boxes corresponding to the image data are determined in the image data
  • the image within, and the determined image is clipped to obtain the target image data; for the M 2D bounding boxes, M target image data can be obtained.
  • the cropped target image data can be scaled to obtain M RBG images with uniform pixels of r ⁇ r, and the scaled target image data can be represented as Image, where,
  • the scaled target image data Image is used to represent the pixel value of each pixel, and the size of the scaled target image data Image is r ⁇ r ⁇ 3.
  • the target image data Image can be input into the preset image feature extraction network, and the D A- dimensional image features of the target image data can be extracted Then according to the image features Obtain the appearance feature information of the target image data Among them, the appearance feature information Can be represented as a vector form of size M ⁇ DA .
  • the above-mentioned image feature extraction network includes but not limited to VGG-Net, ResNet, GoogleNet and other networks that can realize the above-mentioned image feature extraction.
  • the matching error caused by the weak synchronization between the image data and the point cloud data can be compensated, thereby improving the accuracy of data matching, and further improving the safety factor of automatic driving.
  • the target point cloud data located in the detected bounding box in the point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the bounding box and/or the target The coordinate information of the point cloud in the point cloud coordinate system.
  • the above-mentioned lidar sensor can scan the road conditions within the scanning range, so as to obtain several point cloud data used to characterize the characteristics of the objects within the collection range.
  • the extracted target point cloud data can be recorded as PC, where Then, the target point cloud data is input into the point cloud feature extraction network, and the global point cloud feature of the D A dimension of the target point cloud data is extracted Then according to the global point cloud features Obtain the appearance feature information of the point cloud data Among them, the appearance feature information Can be represented as a vector form of size N ⁇ DA .
  • point cloud feature extraction network includes but not limited to Pointnet, Pointnet++, PointSIFT and other extraction networks that can realize the above-mentioned point cloud feature extraction.
  • the matching error caused by the weak synchronization between the image data and the point cloud data can be compensated, thereby improving the accuracy of data matching, and further improving the safety factor of automatic driving.
  • step S105 according to the target feature information, the bounding box determined based on the point cloud data is matched with the bounding box determined based on the image data, and the detailed process is described as follows:
  • a correlation calculation may be performed on the target feature information of the image data and the target feature information of the point cloud data according to a preset correlation algorithm to obtain a correlation calculation result.
  • the correlation calculation result can be understood as the correlation between each feature information in the feature information corresponding to the M 2D bounding boxes and each feature information in the feature information corresponding to the N 3D bounding boxes.
  • the matching result between the bounding box determined based on the point cloud data and the bounding box determined based on the image data may be determined according to the correlation calculation result.
  • the matching result indicates whether the bounding box determined based on the point cloud data matches the bounding box determined based on the image data, where the matching result may be an N*M matching matrix.
  • the matching matrix When the element in the matching matrix is 1, it indicates that the two bounding boxes are matching bounding boxes, and when the element in the matching matrix is 0, it indicates that the two bounding boxes are not mutually matching bounding boxes.
  • the correlation calculation is performed on the target feature information of the point cloud data and the target feature information of the image data, and the feature information corresponding to the M 2D bounding boxes can be accurately determined. Correlation between each feature information of the 3D bounding boxes and each of the feature information corresponding to the N 3D bounding boxes, so that when the bounding boxes are matched according to the correlation calculation results, the matching accuracy of the bounding boxes is improved.
  • the correlation calculation is performed on the target feature information of the image data and the target feature information of the point cloud data to obtain the correlation calculation result, and the detailed process description as follows:
  • the geometric feature information of the above image data and the appearance feature information of the image data may be spliced, so as to obtain the target image feature of the image data.
  • an appearance feature vector of size M ⁇ D can be (the appearance feature information of the image data) and a geometric feature vector of size M ⁇ D G (Geometric feature information of the image data) is spliced to obtain an image feature vector F img with a size of M ⁇ ( DA +D G ), which is the above-mentioned target image feature.
  • the geometric feature information of the point cloud data and the appearance feature information of the point cloud data may be concatenated, so as to obtain the target point cloud feature of the point cloud data.
  • an appearance feature vector of size N ⁇ D can be (the appearance feature information of point cloud data) and the geometric feature vector F pc (geometric feature information of point cloud data) of size N ⁇ D G are spliced to obtain a point cloud of size N ⁇ ( DA +D G )
  • the point cloud feature vector F pc of the data is the feature of the above-mentioned target point cloud.
  • a correlation operation can be performed on the target image features and the target point cloud features through a preset correlation algorithm.
  • the above-mentioned image feature vector F img and image feature vector F pc can be calculated by a preset correlation algorithm to obtain a correlation matrix F correlation of N ⁇ M ⁇ ( DA +D G ) (that is, the above-mentioned correlation Calculation results).
  • the preset correlation algorithm may be an algorithm corresponding to any one of the following calculation formulas:
  • the method of determining the matching result of the bounding box can make up for the matching error caused by the weak synchronization between the image data and the point cloud data, thereby improving the accuracy of data matching, and thus improving the safety factor of automatic driving.
  • the above correlation calculation result (that is, the correlation matrix F correlation ) can be input into several two-dimensional convolutional networks for convolution calculation, and a size It is a similarity matrix of N ⁇ M ⁇ 1.
  • each element in the similarity matrix represents: the degree of similarity between each 3D bounding box in the N 3D bounding boxes and each 2D bounding box in the M 2D bounding boxes.
  • the degree of similarity includes: the degree of similarity determined based on geometric feature information, and the degree of similarity determined based on appearance feature information.
  • the geometric feature information of the nth 3D bounding box and the mth 2D bounding box have a higher degree of similarity, and the similarity of the appearance feature information of the nth 3D bounding box with the mth 2D bounding box is relatively small. is high, it can be determined that the nth 3D bounding box and the mth 2D bounding box are matching bounding boxes.
  • the similarity matrix can be inversely calculated to obtain the matching cost matrix; then, the bipartite graph matching process is performed on the matching cost matrix to obtain the bounding box in the point cloud data and the bounding box in the image data. matching results.
  • each detection result can only constitute at most one match, and the detection results in the same modality are mutually exclusive. Are not the same.
  • each detection result can only constitute at most one match, which can be understood as: a 2D bounding box determined based on image data can at most match a 3D bounding box determined based on point cloud data.
  • the matching problem of two unimodal detection results can be regarded as a bipartite graph matching problem.
  • the target detection results of image data and the target detection results of point cloud data can be divided into two subsets, for example, the target detection results of image data as a subset, and the target detection results of point cloud data as Another subset.
  • Each subset contains multiple vertices, each vertex corresponds to a bounding box, and the vertices in each subset are mutually disjoint, and the vertices associated with all edges in the undirected graph belong to two different sets.
  • the number of matched matches can be different, and the matching goal is to make the two subsets match each other as accurately as possible. Therefore, through the matching algorithm, the similarity matrix is inverted one by one as the matching cost matrix, and then the matching threshold ⁇ is set. The matching cost is higher than ⁇ and does not participate in the matching, and then the final multimodal matching matrix is calculated (that is, in the point cloud data The matching result between the bounding box of and the bounding box in the image data).
  • the matching algorithms described above include but are not limited to the Hungarian matching algorithm and the Kuhn-Munkres matching algorithm.
  • operations such as convolution, inversion, and bipartite graph matching may be performed on the above correlation calculation results to improve the processing efficiency of the matching results and obtain matching results with high accuracy.
  • the geometric feature information Under the weak synchronization multimodal data set, for the geometric feature information, since the object matching is performed through the position information of the projection frame of the 2D bounding box and the 3D bounding box, the geometric feature information will also have a weak synchronization problem. Especially for small objects, the deviation of geometric feature information will lead to a serious decline in the matching effect, so the existing common IOU similarity matrix matching algorithm cannot solve the multimodal matching problem under weak synchronization.
  • the appearance feature information is always extracted following the objects in the 3D frame and the 2D frame. Therefore, the appearance feature information will not be affected by the weak synchronization problem, so the appearance feature information helps to correct the error caused by the weak synchronization.
  • the data matching method proposed by the embodiments of the present disclosure can be applied to the many-to-many multi-modal data matching process in both strong synchronization and weak synchronization situations.
  • FIG. 2 a schematic flowchart of another data matching method is also provided, and the method is described in detail as follows:
  • the image data to be matched is collected by a camera device; and the image data is detected by an image single-modal detection model to obtain a target detection result A1, wherein the target detection result A1 includes a 2D surround of M objects contained in the image data frame
  • the point cloud data to be matched is collected by the lidar sensor; and the point cloud data is detected by the point cloud single-mode detection model to obtain the target detection result A2, wherein the target detection result A2 includes the perceived point cloud data 3D bounding boxes for N objects
  • the 2D bounding boxes of M objects The position information of is determined as the geometric feature information in the target feature information corresponding to the image data. Determine the target image data located in each 2D bounding box in the image data, and extract the image features of the above target image data through the image feature extraction network, and determine the extracted above image features as the target feature information corresponding to the above image data.
  • the target point cloud data located in the detected 3D bounding box in the point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the 3D bounding box and/or the target point cloud in the point cloud Coordinate information in the coordinate system; based on the target point cloud data, determine the global point cloud features used to describe the overall features of the target point cloud, and determine the appearance feature information of the point cloud data according to the global point cloud features.
  • the geometric feature information corresponding to the image data and the appearance feature information corresponding to the image data are spliced to obtain the target image feature; the geometric feature information corresponding to the point cloud data and the appearance feature information corresponding to the point cloud data are spliced to obtain the target point cloud feature ; Carry out a correlation operation on the target image feature and the target point cloud feature to obtain the correlation calculation result.
  • a data matching method is proposed for weak synchronization between the camera device and the lidar sensor due to response delay or complex road conditions.
  • This method uses a point cloud single-modal detection model and the bounding box predicted by the image unimodal detection model, first obtain the geometric feature information of the projected 2D bounding box of the 3D bounding box corresponding to the point cloud data and the 2D bounding box corresponding to the image data, and then extract the network through the point cloud feature And the image feature extraction network extracts the appearance feature information in the corresponding bounding box, and finally predicts the similarity matrix between the target detection results of the point cloud data and the target detection results of the image data based on the joint features of the geometric feature information and the appearance feature information.
  • the target detection result of the point cloud data and the target detection result of the image data are obtained by respectively performing target detection on the point cloud data and the image data through the trained single-modal detection model Previously, the unimodal detection model also needs to be trained according to the following steps:
  • each training sample includes: sample image data and sample point cloud data carrying sample labels.
  • a training sample set including multiple training samples that is, a sample collection of sample image data or sample point cloud data including sample labels.
  • the above-mentioned single-modality detection model can be trained to carry out the sample labels in the above-mentioned point cloud data and image data respectively. recognition, so as to obtain the sample target detection results.
  • the label matching matrix can be determined according to the sample label and the sample target detection result. Furthermore, the target loss function is calculated according to the label matching matrix, and the model parameters of the single-modal detection model are adjusted according to the target loss function until the preset condition is reached, and the trained single-modal detection model is obtained, wherein the above pre-set
  • the precondition may be that the number of training times of the single-modal detection model meets the preset requirement, and/or, the training accuracy of the single-modal detection model meets the preset accuracy requirement.
  • target loss function includes but is not limited to mean square error loss (MSE), absolute error loss (MAE), cross-entropy loss (BCE) and other algorithms that can realize the above-mentioned single-modal detection model training.
  • MSE mean square error loss
  • MAE absolute error loss
  • BCE cross-entropy loss
  • the single-modal detection model includes a point cloud single-modal detection model and an image single-modal detection model.
  • the image unimodal detection model can be trained based on the sample training set containing sample image data, and based on the sample training set containing sample point cloud data Set to train the point cloud single-modal detection model.
  • the detailed training process is as described above, and will not be described separately here.
  • (1) calculating the intersection ratio between at least one predicted bounding box corresponding to the sample image data in the sample target detection result and the labeled bounding box corresponding to the sample image data in the sample label, to obtain a first intersection ratio; And filtering the at least one predicted bounding box according to the first intersection-union ratio to obtain a target predicted bounding box.
  • the sample image data included in the training sample can be input into the image unimodal detection model to obtain a sample target detection result containing at least one predicted bounding box, wherein the at least one predicted bounding box is Can be called predicting 2D bounding boxes.
  • the predicted 2D bounding box For each predicted 2D bounding box, it is judged whether among the plurality of first intersection and union ratios satisfies an intersection and union ratio greater than or equal to a preset threshold. If it is determined that it is contained, the predicted 2D bounding box is determined as the target predicted bounding box. At this time, the labeled bounding box corresponding to the largest intersection-union ratio may be determined from the plurality of first intersection-union ratios, and the labeled bounding box is determined as a bounding box matching the predicted 2D bounding box. If it is determined that it does not contain, the predicted 2D bounding box is discarded.
  • the sample point cloud data included in the training sample can be input into the point cloud single-modal detection model to obtain a sample target detection result containing at least one predicted bounding box, wherein at least one predicted bounding box A box may also be called a predicted 3D bounding box.
  • the predicted 3D bounding box For each predicted 3D bounding box, it is judged whether the plurality of second intersection and union ratios satisfy an intersection and union ratio greater than or equal to a preset threshold. If it is determined that it is contained, the predicted 3D bounding box is determined as the target predicted bounding box. At this time, the label bounding box corresponding to the maximum intersection and union ratio may be determined from among the plurality of second intersection and union ratios, and the label bounding box is determined as a bounding box matching the predicted 3D bounding box. If it is determined that it does not contain, the predicted 3D bounding box is discarded.
  • the target prediction bounding box and the target prediction bounding box corresponding to the same object in the target prediction bounding box and the target prediction bounding box are regarded as a label matching pair; and the label matching matrix is corresponding to The position of is set to 1, and the unmatched position is set to 0, so as to obtain a label matching matrix.
  • the predicted bounding box can be accurately matched with the predicted bounding box to obtain the label matching matrix; when the function value of the target loss function is determined according to the label matching matrix, an accurate function can be obtained value, thereby improving the training accuracy of the unimodal detection model.
  • a training sample set including multiple training samples is determined, and the detailed process is described as follows:
  • At least one data combination is determined in the target tracking data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the The second tracking moment of the target point cloud data is different, and the time interval between the first tracking moment and the second tracking moment is a preset interval.
  • target tracking data sequences including image data and point cloud data are respectively acquired, wherein the above target tracking data sequences contain enough data for tracking and training the above single modality detection model.
  • the target image data image k at the first tracking moment is selected, and then according to the preset interval, the second tracking moment is determined at intervals of several frames and the image k at this moment is selected.
  • the target point cloud data PC k+n whose construction principle is the transitivity of weak synchronization, the target image data image k of the current frame and the image data image k+n of several frames apart will have weak synchronization in time and space, then the image data image k
  • the strongly synchronized target point cloud data PC k+n corresponding to + n will also generate spatiotemporal weak synchronization with the target image data image k .
  • the above-mentioned target image data image k and target point cloud data PC k+n are respectively determined as sample image data and sample point cloud data in the training samples.
  • the technical solution of the present disclosure proposes a method for constructing a point cloud and image weak synchronization multi-modal data set, through which the weak synchronization situation that may occur in the actual scene of automatic driving can be simulated, and the training sample set
  • the trained unimodal detection model can be adapted to the weak synchronization scenario of the multimodal data set.
  • the embodiment of the present disclosure also provides a data matching device corresponding to the data matching method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned data matching method of the embodiment of the present disclosure, the implementation of the device See the implementation of the method.
  • FIG. 4 it is a schematic diagram of a data matching device provided by an embodiment of the present disclosure.
  • the device includes: an acquisition module 41, a determination module 42, and a matching module 43; wherein,
  • the acquisition module 41 is configured to detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the detected target object The bounding box information;
  • Determination module 42 configured to determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; determine the target feature information corresponding to the image data according to the target detection result of the image data;
  • the target feature information includes geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box;
  • Matching module 43 configured to match the bounding box determined based on the point cloud data with the bounding box determined based on the image data according to the target feature information.
  • point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion, and image data can be used to make up for the sparseness and texturelessness of point cloud data. Defects. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.
  • the determination module 42 is further configured to: project the target detection result of the point cloud data into the image data to obtain the projection frame of the bounding frame of the point cloud data; according to the The pixel coordinates of the vertices of the projection frame in the image data determine the geometric feature information of the point cloud data.
  • the geometric feature information includes position information and/or size information of the bounding box.
  • the determination module 42 is further configured to: determine the target image data within the detected bounding box in the image data; extract image features of the target image data, and extract all the extracted The image feature is determined as appearance feature information corresponding to the image data.
  • the determination module 42 is further configured to: determine the target point cloud data located in the detected bounding box in the point cloud data; wherein, the target point cloud data includes: located in the bounding box The quantity of the target point cloud and/or the coordinate information of the target point cloud in the point cloud coordinate system; based on the target point cloud data, determine the global point cloud feature used to describe the overall feature of the target point cloud, and The appearance feature information corresponding to the point cloud data is determined according to the global point cloud feature.
  • the matching module 43 is further configured to: perform correlation calculation on the target feature information of the image data and the target feature information of the point cloud data, obtain a correlation calculation result, and obtain a correlation calculation Result: determining a matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result.
  • the matching module 43 is further configured to: splice the geometric feature information corresponding to the image data and the appearance feature information corresponding to the image data to obtain target image features; The corresponding geometric feature information and the appearance feature information corresponding to the point cloud data are spliced to obtain the target point cloud feature; the correlation calculation is performed on the target image feature and the target point cloud feature to obtain the correlation calculation result .
  • the matching module 43 is further configured to: perform convolution calculation on the correlation calculation result to obtain a similarity matrix, wherein the similarity matrix is used to represent the points in the point cloud data.
  • the degree of similarity between the bounding box and the bounding box in the image data inverting the similarity matrix to obtain a matching cost matrix; performing bipartite graph matching processing on the matching cost matrix to obtain the point cloud Matching results between bounding boxes in the data and bounding boxes in the image data.
  • the matching module 43 is further configured to: respectively perform target detection on the point cloud data and the image data through the trained single-modal detection model, to obtain the target detection of the point cloud data Results and object detection results of the image data.
  • the device is further configured to: train the single-modal detection model according to the following steps: determine a training sample set including a plurality of training samples; wherein, each training sample includes: carrying a sample The sample image data and sample point cloud data of the label; the target detection is performed on the training sample set through the single-mode detection model to be trained, and the sample target detection result is obtained; according to the sample target detection result and the sample label, determine Label matching matrix; calculate the target loss function according to the label matching matrix, and adjust the model parameters of the single-modal detection model according to the target loss function until the preset condition is reached, and the single-modal detection after training is obtained Model.
  • the device is further configured to: calculate the distance between at least one predicted bounding box corresponding to the sample image data in the sample target detection result and the labeled bounding box corresponding to the sample image data in the sample label Intersection and union ratio to obtain the first intersection and union ratio; and filter the at least one prediction bounding box according to the first intersection and union ratio to obtain the target prediction bounding box; calculate the corresponding sample point cloud data in the sample target detection result The intersection and union ratio between at least one predicted bounding box and the label bounding box corresponding to the sample point cloud data in the sample label, to obtain a second intersection and union ratio; and according to the second intersection and union ratio for the at least one prediction
  • the bounding box is screened to obtain a target prediction bounding box; the target prediction bounding box is matched with the target prediction bounding box to obtain a label matching result, and the label matching matrix is determined according to the label matching result.
  • the device is further configured to: acquire a target tracking data sequence, wherein the target tracking data sequence includes image data and point cloud data acquired at each tracking moment; At least one data combination is determined in the data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the second tracking moment of the target point cloud data are not the same, and the time interval between the first tracking moment and the second tracking moment is a preset interval; the data in each data combination is used as the data in each training sample.
  • the embodiment of the present disclosure also provides an electronic device 500, as shown in FIG. 5, which is a schematic structural diagram of the electronic device 500 provided in the embodiment of the present disclosure, including:
  • processor 51 memory 52, and bus 53; memory 52 is used for storing and executing instruction, comprises memory 521 and external memory 522; memory 521 here is also called internal memory, is used for temporarily storing computing data in processor 51, and The data exchanged by the external memory 522 such as hard disk, the processor 51 exchanges data with the external memory 522 through the memory 521, when the electronic device 500 is running, the processor 51 communicates with the memory 52 through the bus 53, so that The processor 51 executes the following instructions:
  • the target detection result includes the bounding box information of the detected target object; according to According to the target detection result of the point cloud data, determine the target feature information corresponding to the point cloud data; according to the target detection result of the image data, determine the target feature information corresponding to the image data; the target feature information includes detection The geometric feature information of the bounding box of the target object and the appearance feature information of the target object in the bounding box; according to the target feature information, the bounding box determined based on the point cloud data and based on the image data The determined bounding boxes are matched.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored. When the computer program is run by a processor, the steps of the data matching method described in the foregoing method embodiments are executed.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the data matching method described in the above method embodiment, please refer to the above method implementation example.
  • the above-mentioned computer program product may be realized by hardware, software or a combination thereof.
  • the computer program product can be embodied as a computer storage medium, and in another optional embodiment, the computer program product can be embodied as a software product, such as SDK (Software Development Kit, software development kit) etc. .
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
  • Embodiments of the present disclosure provide a data matching method and device, electronic equipment, a storage medium, and a program product, wherein the method includes: respectively detecting the point cloud data and image data to be matched, and obtaining the target detection result of the point cloud data and the target detection results of the image data; according to the target detection results of the point cloud data, determine the target feature information corresponding to the point cloud data and the image data respectively; according to the target detection results of the image data, determine the target feature information corresponding to the image data;
  • the information includes the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object in the bounding box; according to the target feature information, the bounding box in the point cloud data and the bounding box in the image data are processed match.
  • the 3D target is detected by combining image data and point cloud data, which can improve the detection accuracy of the 3D target, and further obtain a more accurate target detection result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Provided in the embodiments of the present disclosure are a data matching method and apparatus, and an electronic device, a storage medium and a program product. The method comprises: respectively performing detection on point cloud data and image data, which are to be subjected to matching, so as to obtain a target detection result of the point cloud data and a target detection result of the image data; determining, according to the target detection result of the point cloud data, target feature information corresponding to the point cloud data, and determining, according to the target detection result of the image data, target feature information corresponding to the image data, wherein each piece of target feature information comprises geometric feature information of a bounding box of a detected target object, and appearance feature information of the target object in the bounding box; and according to the target feature information, matching a bounding box in the point cloud data and a bounding box in the image data. By means of the embodiments of the present disclosure, a 3D target is detected by means of combining image data with point cloud data, such that the detection accuracy of the 3D target can be improved, thereby obtaining a more accurate target detection result.

Description

数据匹配方法及装置、电子设备、存储介质和程序产品Data matching method and device, electronic device, storage medium and program product
相关申请的交叉引用Cross References to Related Applications
本公开基于申请号为202110994415.5、申请日为2021年08月27日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引用的方式引入本公开。This disclosure is based on a Chinese patent application with application number 202110994415.5 and a filing date of August 27, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference in its entirety into this disclosure .
技术领域technical field
本公开涉及自动驾驶技术领域,涉及但不限于一种数据匹配方法及装置、电子设备、存储介质和程序产品。The present disclosure relates to the technical field of automatic driving, and relates to but not limited to a data matching method and device, electronic equipment, storage media and program products.
背景技术Background technique
3D(3-Dimensional,三维)目标检测的目标是识别出物体的3D包围盒信息,主要包括位置、朝向、尺寸、置信度等信息。近年来,在3D目标检测领域,基于激光雷达或相机传感器的单模态方法取得了越来越大的进步。然而,由于数据结构特点,图像单模态方法的目标检测效果受当前环境影响,曝光或过暗的照片都会影响目标信息的获取,并且还受前后遮挡的影响;点云单模态方法需要解决点云稀疏、不规则、缺乏纹理和语义信息以及小物体、远处物体的点数太少等问题。The goal of 3D (3-Dimensional, three-dimensional) object detection is to identify the 3D bounding box information of the object, which mainly includes information such as position, orientation, size, and confidence. In recent years, single-modal approaches based on LiDAR or camera sensors have achieved increasing progress in the field of 3D object detection. However, due to the characteristics of the data structure, the target detection effect of the image unimodal method is affected by the current environment. Exposure or too dark photos will affect the acquisition of target information, and it is also affected by front and rear occlusions; the point cloud unimodal method needs to solve Point clouds are sparse, irregular, lack texture and semantic information, and too few points for small objects and distant objects.
基于此,亟需一种稳定、可靠、精准和鲁棒性高的多模态匹配算法。Based on this, there is an urgent need for a stable, reliable, accurate and robust multimodal matching algorithm.
发明内容Contents of the invention
本公开实施例至少提供一种数据匹配方法及装置、电子设备、存储介质和程序产品。Embodiments of the present disclosure at least provide a data matching method and device, electronic equipment, a storage medium, and a program product.
第一方面,本公开实施例提供了一种数据匹配方法,包括:分别对待匹配的点云数据和图像数据进行检测,得到所述点云数据的目标检测结果和图像数据的目标检测结果;其中,所述目标检测结果包括检测出的目标对象的包围框信息;根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息;根据所述图像数据的目标检测结果,确定所述图像数据对应的目标特征信息;所述目标特征信息包括检测出的目标对象的包围框的几何特征信息,以及所述包围框内的目标对象的外观特征信息;根据所述目标特征信息,对基于所述点云数据确定的包围框和基于所述图像数据确定的包围框进行匹配。In the first aspect, an embodiment of the present disclosure provides a data matching method, including: respectively detecting the point cloud data and image data to be matched, and obtaining the target detection result of the point cloud data and the target detection result of the image data; wherein , the target detection result includes the detected bounding box information of the target object; according to the target detection result of the point cloud data, determine the target feature information corresponding to the point cloud data; according to the target detection result of the image data, Determine the target feature information corresponding to the image data; the target feature information includes the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object in the bounding box; according to the target feature information , matching the bounding box determined based on the point cloud data with the bounding box determined based on the image data.
上述实施方式中,通过结合图像数据和点云数据进行目标检测的方式,可以通过点云数据弥补图像数据容易受到光照和遮挡影响的缺陷,同时通过图像数据可以弥补点云数据稀疏以及无纹理的缺陷。因此,将图像数据和点云数据进行结合来检测3D目标,可以提高3D目标的检测准确性,进而得到更加准确的目标检测结果。In the above embodiments, by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defects that image data is easily affected by light and occlusion, and image data can be used to make up for the sparse and textureless point cloud data. defect. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.
一种可选的实施方式中,所述根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息,包括:将所述点云数据的目标检测结果投影至所述图像数据中,得到所述点云数据所对应的包围框的投影框;根据所述投影框的顶点在所述图像数据中的像素坐标,确定所述点云数据的几何特征信息。In an optional implementation manner, the determining the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data includes: projecting the target detection result of the point cloud data to the In the image data, the projection frame of the bounding frame corresponding to the point cloud data is obtained; according to the pixel coordinates of the vertices of the projection frame in the image data, the geometric feature information of the point cloud data is determined.
上述实施例中,通过将点云数据所对应的3D包围框投影至图像数据中,以得到2D(2-Dimensional,二维)投影框,可以实现数据格式的统一,从而能够快速的根据图像数据所对应的2D包围框和2D投影框之间的相对位置进行目标对象的匹配,进而得到相应的匹 配结果。In the above embodiment, by projecting the 3D bounding box corresponding to the point cloud data into the image data to obtain a 2D (2-Dimensional, two-dimensional) projection frame, the unification of the data format can be realized, so that the data format can be quickly obtained according to the image data. The relative position between the corresponding 2D bounding box and the 2D projection box is used to match the target object, and then the corresponding matching result is obtained.
一种可选的实施方式中,所述几何特征信息包括所述包围框的位置信息和/或尺寸信息。In an optional implementation manner, the geometric feature information includes position information and/or size information of the bounding box.
上述实施例中,通过将包围框的位置信息和/或尺寸信息扩充至几何特征信息中,可以丰富几何特征信息,进一步提高了数据匹配的准确度。In the above embodiments, by expanding the position information and/or size information of the bounding box into the geometric feature information, the geometric feature information can be enriched, further improving the accuracy of data matching.
一种可选的实施方式中,所述根据所述图像数据的目标检测结果,确定所述图像数据对应的目标特征信息,包括:确定所述图像数据中位于检测出的包围框内的目标图像数据;提取所述目标图像数据的图像特征,并将提取到的所述图像特征确定为所述图像数据对应的外观特征信息。In an optional implementation manner, the determining the target feature information corresponding to the image data according to the target detection result of the image data includes: determining the target image located in the detected bounding box in the image data data; extracting image features of the target image data, and determining the extracted image features as appearance feature information corresponding to the image data.
上述实施方式中,通过提取图像数据所对应的包围框内的目标图像数据的图像特征,可以实现在根据位置信息对包围框进行匹配的基础上,进一步根据外观特征信息校验位置相匹配的包围框内对象是否为同一个对象。通过上述处理方式,可以弥补由于图像数据和点云数据之间的弱同步所导致的匹配错误,从而提高了数据匹配的精度,进而提高了自动驾驶的安全系数。In the above-mentioned embodiment, by extracting the image features of the target image data in the bounding box corresponding to the image data, it is possible to realize the matching of the bounding box according to the position information, and to further check the matching position according to the appearance feature information. Whether the objects in the box are the same object. Through the above-mentioned processing method, the matching error caused by the weak synchronization between the image data and the point cloud data can be compensated, thereby improving the accuracy of data matching, and further improving the safety factor of automatic driving.
一种可选的实施方式中,所述根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息,包括:确定所述点云数据中位于检测出的包围框内的目标点云数据;其中,所述目标点云数据包括:位于包围框内目标点云的数量和/或所述目标点云在点云坐标系下的坐标信息;基于所述目标点云数据,确定用于描述所述目标点云的整体特征的全局点云特征,并根据所述全局点云特征确定所述点云数据对应的外观特征信息。In an optional implementation manner, the determining the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data includes: determining that the point cloud data is located within the detected bounding box The target point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the bounding box and/or the coordinate information of the target point cloud in the point cloud coordinate system; based on the target point cloud data , determining global point cloud features used to describe the overall features of the target point cloud, and determining appearance feature information corresponding to the point cloud data according to the global point cloud features.
上述实施方式中,通过提取点云数据所对应的包围框内的目标点云数据的全局点云特征,可以实现在根据位置信息对包围框进行匹配的基础上,进一步根据外观特征信息校验位置相匹配的包围框内对象是否为同一个对象。通过上述处理方式,可以弥补由于图像数据和点云数据之间的弱同步所导致的匹配错误,从而提高了数据匹配的精度,进而提高了自动驾驶的安全系数。In the above-mentioned embodiment, by extracting the global point cloud features of the target point cloud data in the bounding box corresponding to the point cloud data, it can be realized on the basis of matching the bounding box according to the position information, and further verifying the position according to the appearance feature information Whether the objects in the matching bounding boxes are the same object. Through the above-mentioned processing method, the matching error caused by the weak synchronization between the image data and the point cloud data can be compensated, thereby improving the accuracy of data matching, and further improving the safety factor of automatic driving.
一种可选的实施方式中,所述根据所述目标特征信息,对基于所述点云数据确定的包围框和基于所述图像数据确定的包围框进行匹配,包括:对所述图像数据的目标特征信息和所述点云数据的目标特征信息进行相关性计算,得到相关性计算结果,得到相关性计算结果;根据所述相关性计算结果确定所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果。In an optional implementation manner, the matching the bounding box determined based on the point cloud data and the bounding box determined based on the image data according to the target feature information includes: Carrying out correlation calculation between the target feature information and the target feature information of the point cloud data to obtain a correlation calculation result, and obtain a correlation calculation result; determine the bounding box in the point cloud data and the Matching results between bounding boxes in image data.
上述实施方式中,通过结合几何图像信息和外观特征信息,对点云数据目标特征信息和图像数据的目标特征信息进行相关性计算,可以准确的确定出M个2D包围框所对应的特征信息中的每个特征信息与N个3D包围框所对应的特征信息中各个特征信息之间的相关性,从而在根据相关性计算结果对包围框进行匹配时,得到提高包围框的匹配准确度。In the above embodiment, by combining the geometric image information and the appearance feature information, the correlation calculation is performed on the target feature information of the point cloud data and the target feature information of the image data, and the feature information corresponding to the M 2D bounding boxes can be accurately determined. Correlation between each feature information of the 3D bounding boxes and each of the feature information corresponding to the N 3D bounding boxes, so that when the bounding boxes are matched according to the correlation calculation results, the matching accuracy of the bounding boxes is improved.
一种可选的实施方式中,所述对所述图像数据的目标特征信息和所述点云数据的目标特征信息进行相关性计算,得到相关性计算结果,包括:对所述图像数据对应的几何特征信息和所述图像数据对应的外观特征信息进行拼接,得到目标图像特征;对所述点云数据对应的几何特征信息和所述点云数据对应的外观特征信息进行拼接,得到目标点云特征;对所述目标图像特征和所述目标点云特征进行相关性运算,得到所述相关性计算结果。In an optional implementation manner, performing correlation calculation on the target feature information of the image data and the target feature information of the point cloud data to obtain a correlation calculation result includes: The geometric feature information and the appearance feature information corresponding to the image data are spliced to obtain the target image feature; the geometric feature information corresponding to the point cloud data and the appearance feature information corresponding to the point cloud data are spliced to obtain the target point cloud Features: performing a correlation calculation on the target image features and the target point cloud features to obtain the correlation calculation results.
上述实施方式中,通过将几何特征信息和外观特征信息进行拼接,得到相应的目标图像特征和目标点云特征,再对目标图像特征和目标点云特征进行相关性计算,从而根据相关性计算结果确定包围框的匹配结果的方式,可以弥补由于图像数据和点云数据之间的弱同步所导致的匹配错误,从而提高了数据匹配的精度,进而提高了自动驾驶的安全系数。In the above embodiment, by splicing the geometric feature information and the appearance feature information, the corresponding target image features and target point cloud features are obtained, and then the correlation calculation is performed on the target image features and target point cloud features, so that according to the correlation calculation results The method of determining the matching result of the bounding box can make up for the matching error caused by the weak synchronization between the image data and the point cloud data, thereby improving the accuracy of data matching, and thus improving the safety factor of automatic driving.
一种可选的实施方式中,所述根据所述相关性计算结果确定所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果,包括:对所述相关性计算结果进行卷积计算,得到相似度矩阵,其中,所述相似度矩阵用于表征所述点云数据中的包围框和所述图像数据中的包围框之间的相似程度;对所述相似度矩阵进行取反计算,得到匹配代价矩阵;对所述匹配代价矩阵进行二分图匹配处理,得到所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果。In an optional implementation manner, the determining the matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result includes: calculating the correlation The calculation result is convoluted to obtain a similarity matrix, wherein the similarity matrix is used to characterize the degree of similarity between the bounding boxes in the point cloud data and the bounding boxes in the image data; Degree matrix is inversely calculated to obtain a matching cost matrix; a bipartite graph matching process is performed on the matching cost matrix to obtain a matching result between the bounding boxes in the point cloud data and the bounding boxes in the image data.
上述实施方式中,可以通过对上述相关性计算结果进行卷积、取反和二分图匹配处理等操作,可以提高匹配结果的处理效率,并得到准确度较高的匹配结果。In the above embodiments, operations such as convolution, inversion, and bipartite graph matching may be performed on the above correlation calculation results to improve the processing efficiency of the matching results and obtain matching results with high accuracy.
一种可选的实施方式中,所述分别对待匹配的点云数据和图像数据进行检测,得到所述点云数据的目标检测结果和图像数据的目标检测结果,包括:通过训练好的单模态检测模型分别对所述点云数据和所述图像数据进行目标检测,得到所述点云数据的目标检测结果和所述图像数据的目标检测结果。In an optional implementation manner, said respectively detecting the point cloud data and image data to be matched, and obtaining the target detection result of the point cloud data and the target detection result of the image data, includes: The state detection model performs object detection on the point cloud data and the image data respectively, and obtains object detection results of the point cloud data and object detection results of the image data.
上述实施方式中,通过训练好的点云单模态检测模型对点云数据进行目标检测,并通过图像单模态检测模型对图像数据进行目标检测,可以提高目标检测结果的准确性,从而得到包含完整目标对象的准确性更高的包围框信息。In the above embodiment, the target detection is performed on the point cloud data through the trained point cloud single-modal detection model, and the target detection is performed on the image data through the image single-modal detection model, which can improve the accuracy of the target detection result, thereby obtaining More accurate bounding box information that includes the full object of interest.
一种可选的实施方式中,根据以下步骤训练所述单模态检测模型确定包含多个训练样本的训练样本集;其中,每个训练样本中包含:携带有样本标签的样本图像数据和样本点云数据;通过待训练的单模态检测模型对所述训练样本集进行目标检测,得到样本目标检测结果;根据所述样本目标检测结果和所述样本标签,确定标签匹配矩阵;根据所述标签匹配矩阵计算目标损失函数的函数值,并根据所述目标损失函数函数值调整所述单模态检测模型的模型参数,直至达到预设条件,得到训练完成的所述单模态检测模型。In an optional implementation manner, the single-modal detection model is trained according to the following steps to determine a training sample set containing multiple training samples; wherein, each training sample includes: sample image data and samples carrying sample labels point cloud data; through the single mode detection model to be trained, carry out target detection on the training sample set to obtain the sample target detection result; according to the sample target detection result and the sample label, determine the label matching matrix; according to the The label matching matrix calculates the function value of the target loss function, and adjusts the model parameters of the single-modal detection model according to the target loss function value until the preset condition is reached, and the trained single-modal detection model is obtained.
上述实施方式中,通过上述所描述的方式对单模态检测模型进行训练,可以得到处理精度满足精度要求的单模态检测模型,在根据该单模态检测模型进行目标检测时,可以提高目标检测结果的准确性,从而提高数据匹配的准确率。In the above embodiment, by training the single-modal detection model in the manner described above, a single-modal detection model whose processing accuracy meets the precision requirements can be obtained, and when the target detection is performed according to the single-modal detection model, the target can be improved. The accuracy of the detection results, thereby improving the accuracy of data matching.
一种可选的实施方式中,所述根据所述样本目标检测结果和所述样本标签,确定标签匹配矩阵包括:计算所述样本目标检测结果中样本图像数据对应的至少一个预测包围框和所述样本标签中样本图像数据对应的标注包围框之间的交并比,得到第一交并比值;并根据所述第一交并比值对所述至少一个预测包围框进行筛选,得到目标预测包围框;计算所述样本目标检测结果中样本点云数据对应的至少一个预测包围盒和所述样本标签中样本点云数据对应的标注包围盒之间的交并比,得到第二交并比值;并根据所述第二交并比值对所述至少一个预测包围盒进行筛选,得到目标预测包围盒;将所述目标预测包围框和所述目标预测包围盒进行匹配,得到标签匹配结果,并根据所述标签匹配结果确定所述标签匹配矩阵。In an optional implementation manner, the determining the label matching matrix according to the sample target detection result and the sample label includes: calculating at least one predicted bounding box corresponding to the sample image data in the sample target detection result and the The intersection and union ratio between the labeled bounding boxes corresponding to the sample image data in the sample label to obtain the first intersection and union ratio; and filter the at least one predicted bounding box according to the first intersection and merge ratio to obtain the target predicted bounding box Box; calculate the intersection and union ratio between at least one predicted bounding box corresponding to the sample point cloud data in the sample target detection result and the marked bounding box corresponding to the sample point cloud data in the sample label, to obtain a second intersection ratio; and filtering the at least one predicted bounding box according to the second intersection and union ratio to obtain a target predicted bounding box; matching the target predicted bounding box with the target predicted bounding box to obtain a label matching result, and according to The tag matching result determines the tag matching matrix.
上述实施方式中,通过上述处理方式,可以将准确的将预测包围框和预测包围盒进行匹配,得到标签匹配矩阵;在根据该标签匹配矩阵确定目标损失函数的函数值时,可以得到准确的函数值,从而提高单模态检测模型的训练精度。In the above embodiment, through the above processing method, the predicted bounding box can be accurately matched with the predicted bounding box to obtain the label matching matrix; when the function value of the target loss function is determined according to the label matching matrix, an accurate function can be obtained value, thereby improving the training accuracy of the unimodal detection model.
一种可选的实施方式中,所述确定包含多个训练样本的训练样本集,包括:获取目标跟踪数据序列,其中,所述目标跟踪数据序列中包含在各个跟踪时刻获取到的图像数据和点云数据;在所述目标跟踪数据序列中确定至少一个数据组合,其中,每个所述数据组合包括:目标图像数据和目标点云数据;所述目标图像数据的第一跟踪时刻和所述目标点云数据的第二跟踪时刻不相同,且所述第一跟踪时刻和所述第二跟踪时刻之间的时间间隔为预设间隔;将每个所述数据组合中的数据作为每个训练样本中的数据。In an optional implementation manner, the determining a training sample set comprising a plurality of training samples includes: acquiring a target tracking data sequence, wherein the target tracking data sequence includes image data and Point cloud data; at least one data combination is determined in the target tracking data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the The second tracking moment of the target point cloud data is different, and the time interval between the first tracking moment and the second tracking moment is a preset interval; the data in each of the data combinations is used as each training data in the sample.
上述实施方式中,本公开技术方案提出了一种构建点云与图像弱同步多模态数据集的方法,通过该方法可以模拟自动驾驶实际场景可能出现的弱同步情况,在通过该训练样本集训练单模态检测模型时,就可以使得训练之后的单模态检测模型能够适应多模态数据集的弱同步场景。In the above-mentioned embodiments, the technical solution of the present disclosure proposes a method for constructing a point cloud and image weak synchronization multi-modal data set, through which the weak synchronization situation that may occur in the actual scene of automatic driving can be simulated, and the training sample set When training the unimodal detection model, the trained unimodal detection model can be adapted to the weak synchronization scenario of the multimodal data set.
第二方面,本公开实施例还提供一种数据匹配装置,包括:获取模块:配置为分别对待匹配的点云数据和图像数据进行检测,得到所述点云数据的目标检测结果和图像数据的目标检测结果;其中,所述目标检测结果包括检测出的目标对象的包围框信息;确定模块:配置为根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息;根据所述图像数据的目标检测结果,确定所述图像数据对应的目标特征信息;所述目标特征信息包括检测出的目标对象的包围框的几何特征信息,以及所述包围框内的目标对象的外观特征信息;匹配模块:配置为根据所述目标特征信息,对基于所述点云数据确定的包围框和基于所述图像数据确定的包围框进行匹配。In the second aspect, the embodiment of the present disclosure also provides a data matching device, including: an acquisition module configured to detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the image data Target detection result; wherein, the target detection result includes the detected bounding box information of the target object; the determination module is configured to determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; According to the target detection result of the image data, determine the target feature information corresponding to the image data; the target feature information includes the geometric feature information of the detected bounding box of the target object, and the target object in the bounding box Appearance feature information; matching module: configured to match the bounding box determined based on the point cloud data with the bounding box determined based on the image data according to the target feature information.
第三方面,本公开实施例还提供一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.
第四方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a fourth aspect, embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned first aspect, or any of the first aspects of the first aspect, may be executed. Steps in one possible implementation.
第五方面,本公开实施例还提供一种计算机程序产品,该计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,该计算机程序被计算机读取并执行时,实现如本公开实施例中所描述的方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包。In the fifth aspect, an embodiment of the present disclosure further provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the computer program product according to the present disclosure can be realized. Some or all steps of the methods described in the examples. The computer program product may be a software installation package.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.
图1示出了本公开实施例所提供的一种数据匹配方法的流程图;FIG. 1 shows a flowchart of a data matching method provided by an embodiment of the present disclosure;
图2示出了本公开实施例所提供的另一种数据匹配方法的流程图;FIG. 2 shows a flowchart of another data matching method provided by an embodiment of the present disclosure;
图3示出了本公开实施例所提供的一种数据匹配方法中,根据样本目标检测结果和样本标签,确定标签匹配矩阵的流程图;FIG. 3 shows a flow chart of determining a tag matching matrix according to sample target detection results and sample tags in a data matching method provided by an embodiment of the present disclosure;
图4示出了本公开实施例所提供的一种数据匹配装置的示意图;Fig. 4 shows a schematic diagram of a data matching device provided by an embodiment of the present disclosure;
图5示出了本公开实施例所提供的一种电子设备的示意图。Fig. 5 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.
经研究发现,在现有的自动驾驶领域,在进行3D目标检测时,通常是基于激光雷达或相机传感器的单模态检测方法,其中,上述单模态检测方法通常包括图像单模态方法和点云单模态方法。但由于数据结构特点,图像单模态方法的目标检测效果受当前环境影响,曝光或过暗的照片都会影响目标信息的获取,并且还受前后遮挡的影响;点云单模态方法需要解 决点云稀疏、不规则、缺乏纹理和语义信息以及小物体、远处物体的点数太少等问题。After research, it is found that in the existing field of automatic driving, when performing 3D target detection, it is usually a single-modal detection method based on lidar or camera sensor, wherein the above-mentioned single-modal detection method usually includes image single-modal method and Point cloud unimodal approach. However, due to the characteristics of the data structure, the target detection effect of the image single-modal method is affected by the current environment. Exposure or too dark photos will affect the acquisition of target information, and it is also affected by front and rear occlusions; the point cloud single-modal method needs to solve the problem Sparse, irregular clouds, lack of texture and semantic information, and too few points for small objects, distant objects, etc.
基于上述研究,本公开提供了一种数据匹配方法及装置、电子设备、存储介质和程序产品。在本公开实施例中,通过结合图像数据和点云数据进行目标检测的方式,可以通过点云数据弥补图像数据容易受到光照和遮挡影响的缺陷,同时通过图像数据可以弥补点云数据稀疏以及无纹理的缺陷。因此,将图像数据和点云数据进行结合来检测3D目标,可以提高3D目标的检测准确性,进而得到更加准确的目标检测结果。Based on the above studies, the present disclosure provides a data matching method and device, electronic equipment, storage media and program products. In the embodiment of the present disclosure, by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion, and image data can be used to make up for the sparseness and lack of point cloud data. Texture flaws. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.
为便于对本实施例进行理解,首先对本公开实施例所公开的一种数据匹配方法进行详细介绍,本公开实施例所提供的数据匹配方法的执行主体一般为具有一定计算能力的电子设备。To facilitate the understanding of this embodiment, a data matching method disclosed in the embodiments of the present disclosure is first introduced in detail. The execution subject of the data matching method provided in the embodiments of the present disclosure is generally an electronic device with certain computing capabilities.
参见图1所示,为本公开实施例提供的一种数据匹配方法的流程图,所述方法包括步骤S101~S105,其中:Referring to FIG. 1 , which is a flow chart of a data matching method provided by an embodiment of the present disclosure, the method includes steps S101 to S105, wherein:
S101:分别对待匹配的点云数据和图像数据进行检测,得到所述点云数据的目标检测结果和图像数据的目标检测结果;其中,所述目标检测结果包括检测出的目标对象的包围框信息。S101: Detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the bounding box information of the detected target object .
在本公开实施例中,可以通过摄像装置采集图像数据,并通过激光雷达传感器采集点云数据,其中,摄像装置和激光雷达传感器为预先安装在目标车辆上的传感器。该目标车辆可以为具备自动驾驶功能的车辆,例如,小巴车,轿车等,本公开对目标车辆的类型不作具体限定。In the embodiment of the present disclosure, image data may be collected by a camera device, and point cloud data may be collected by a laser radar sensor, wherein the camera device and the laser radar sensor are sensors pre-installed on the target vehicle. The target vehicle may be a vehicle with an automatic driving function, for example, a minibus, a car, etc., and the present disclosure does not specifically limit the type of the target vehicle.
针对点云数据来说,点云数据的目标检测结果包括检测出的目标对象的包围框信息,此处的包围框为3D包围框。比如,目标对象的数量为N个,那么包围框信息包括N个目标对象的3D包围框的相关信息。For the point cloud data, the target detection result of the point cloud data includes the bounding box information of the detected target object, where the bounding box is a 3D bounding box. For example, if the number of target objects is N, then the bounding box information includes information about the 3D bounding boxes of the N target objects.
针对图像数据来说,图像数据的目标检测结果包括检测出的目标对象的包围框信息,此处的包围框为2D包围框。比如,目标对象的数量为M个,那么包围框信息包括M个目标对象的2D包围框的相关信息。For image data, the target detection result of the image data includes bounding box information of the detected target object, where the bounding box is a 2D bounding box. For example, if the number of target objects is M, then the bounding box information includes information about the 2D bounding boxes of the M target objects.
需要说明的是,N个目标对象和M个目标对象中可以包含相同的目标对象,还可以包含不相同的目标对象。It should be noted that the N target objects and the M target objects may include the same target object, or may include different target objects.
S103:根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息;根据所述图像数据的目标检测结果,确定所述图像数据对应的目标特征信息;所述目标特征信息包括检测出的目标对象的包围框的几何特征信息,以及所述包围框内的目标对象的外观特征信息。S103: Determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; determine the target feature information corresponding to the image data according to the target detection result of the image data; the target feature information The information includes geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box.
在本公开实施例中,目标特征信息可以为上述点云数据对应的目标特征信息,还可以为图像数据对应的目标特征信息。In the embodiment of the present disclosure, the target feature information may be the target feature information corresponding to the above-mentioned point cloud data, and may also be the target feature information corresponding to the image data.
例如,点云数据对应的目标特征信息中可以包含检测出的目标对象的包围框的几何特征信息,以及该包围框内的目标对象的外观特征信息;图像数据对应的目标特征信息中同样可以包含检测出的目标对象的包围框的几何特征信息,以及该包围框内的目标对象的外观特征信息。For example, the target feature information corresponding to the point cloud data may include the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object within the bounding box; the target feature information corresponding to the image data may also include Geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box.
在本公开实施例中,外观特征信息可以为用于表征包围框所框选目标对象的属性特征,该属性特征可以为该目标对象的对象类别信息,其中,该类别信息为该目标对象的类别标签,例如,车辆,行人等。In an embodiment of the present disclosure, the appearance feature information may be an attribute feature used to characterize the target object framed by the bounding box, and the attribute feature may be object category information of the target object, wherein the category information is the category of the target object Labels, e.g., vehicles, pedestrians, etc.
S105:根据所述目标特征信息,对基于所述点云数据确定的包围框和基于所述图像数据确定的包围框进行匹配。S105: According to the target feature information, match the bounding box determined based on the point cloud data with the bounding box determined based on the image data.
在本公开实施例中,通过结合图像数据和点云数据进行目标检测的方式,可以通过点云数据弥补图像数据容易受到光照和遮挡影响的缺陷,同时通过图像数据可以弥补点云数据稀疏以及无纹理的缺陷。因此,将图像数据和点云数据进行结合来检测3D目标,可以提高3D目标的检测准确性,进而得到更加准确的目标检测结果。In the embodiment of the present disclosure, by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion, and image data can be used to make up for the sparseness and lack of point cloud data. Texture flaws. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.
在本公开实施例中,为了现有的单模态检测方法中所存在的目标检测的准确性差的技术问题,一种可选的实施方式是多模态检测方法,此处的多模态检测方法是指结合图像数据和点云数据来进行对象的检测。例如,多模态检测方法可以包含点云投影法、图像反投影法和 相似度矩阵法。In the embodiment of the present disclosure, for the technical problem of poor target detection accuracy existing in the existing single-modal detection method, an optional implementation is a multi-modal detection method, where the multi-modal detection The method refers to the detection of objects by combining image data and point cloud data. For example, multimodal detection methods can include point cloud projection methods, image back-projection methods, and similarity matrix methods.
点云投影法只考虑点云3D候选框的预测结果,严重依赖点云单模态检测器的效果。图像投影法只考虑了图像2D候选框的预测结果,严重依赖图像单模态检测器的效果。相似度矩阵法并未对弱同步情况下的多模态匹配问题有太多探究。The point cloud projection method only considers the prediction results of point cloud 3D candidate frames, and relies heavily on the effect of point cloud single-modal detectors. The image projection method only considers the prediction results of image 2D candidate boxes, and relies heavily on the effect of image single-modal detectors. The similarity matrix method has not explored much on the problem of multimodal matching in the case of weak synchronization.
针对上述所描述的点云投影法、图像反投影法和相似度矩阵法的前提条件是激光雷达传感器和摄像装置之间的强同步性,因此,上述技术方案并未考虑激光雷达传感器和摄像装置之间出现弱同步的情况。The prerequisite for the point cloud projection method, image back-projection method and similarity matrix method described above is the strong synchronization between the lidar sensor and the camera device. Therefore, the above technical solution does not consider the lidar sensor and camera device. There is weak synchronization between them.
这里,摄像装置和激光雷达传感器之间具有强同步性是指点云数据和图像数据的采集时间高度同步,也可以理解为点云单模态检测器和图像单模态检测器在检测到同一个物体时,点云投影2D框与图像2D框会重合,或者图像反投影3D框与点云的3D框会重合。然而,在自动驾驶实际场景中,摄像装置和激光雷达传感器由于响应延迟或复杂的道路状况可能会出现很弱的同步。此时,弱同步会导致图像数据和点云数据之间产生同步误差。一旦产生了同步误差,那么根据误差的传递性,一方投影的结果与另一方也会产生相应的同步误差,即可以理解为通过点云数据所检测到的物体的包围框和通过图像数据所检测到的该物体的包围框不会重合,此时,可能会产生错误的匹配结果,糟糕的匹配效果会给多模态融合带来比单模态下更差的检测结果。Here, the strong synchronization between the camera device and the lidar sensor means that the acquisition time of point cloud data and image data is highly synchronized, and it can also be understood as the point cloud single-mode detector and image single-mode detector detect the same For objects, the point cloud projection 2D frame will coincide with the image 2D frame, or the image backprojection 3D frame will coincide with the point cloud 3D frame. However, in the actual scene of autonomous driving, the camera device and the lidar sensor may have weak synchronization due to response delay or complex road conditions. At this time, weak synchronization will cause synchronization errors between image data and point cloud data. Once a synchronization error occurs, according to the transitivity of the error, the result of one party's projection and the other party will also produce a corresponding synchronization error, which can be understood as the bounding box of the object detected through the point cloud data and the object detected through the image data. The bounding boxes of the object will not coincide. At this time, wrong matching results may be generated. Poor matching results will bring worse detection results for multi-modal fusion than for single-modal detection.
基于此,本公开提供了一种数据匹配方法,在该数据匹配方法中,通过结合图像数据和点云数据进行目标检测的方式,可以通过点云数据弥补图像数据容易受到光照和遮挡影响的缺陷,同时通过图像数据可以弥补点云数据稀疏以及无纹理的缺陷。因此,将图像数据和点云数据进行结合来检测3D目标,可以提高3D目标的检测准确性,进而得到更加准确的目标检测结果。Based on this, the present disclosure provides a data matching method. In this data matching method, by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion. At the same time, the image data can make up for the sparse point cloud data and the lack of texture. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.
在本公开实施例中,将对上述步骤S101至步骤S105所描述的步骤进行详细的描述,详细描述过程如下所述。In the embodiment of the present disclosure, the steps described in the above step S101 to step S105 will be described in detail, and the detailed description process is as follows.
针对步骤S101,分别对待匹配的点云数据和图像数据进行检测,得到所述点云数据的目标检测结果和图像数据的目标检测结果,包括如下过程:For step S101, the point cloud data and the image data to be matched are detected respectively, and the target detection result of the point cloud data and the target detection result of the image data are obtained, including the following process:
通过训练好的单模态检测模型分别对点云数据和所述图像数据进行目标检测,得到所述点云数据的目标检测结果和所述图像数据的目标检测结果。Target detection is performed on the point cloud data and the image data respectively through the trained single-modal detection model, and the target detection result of the point cloud data and the target detection result of the image data are obtained.
这里,单模态检测模块包含点云单模态检测模型和图像单模态检测模型。其中,点云单模态检测模型用于对点云数据进行目标检测,得到相应的目标检测结果;图像单模态检测模型用于对图像数据进行目标检测,得到相应的目标检测结果。Here, the unimodal detection module includes a point cloud unimodal detection model and an image unimodal detection model. Among them, the point cloud single-modal detection model is used for target detection on point cloud data to obtain corresponding target detection results; the image single-modal detection model is used for target detection on image data to obtain corresponding target detection results.
在本公开实施例中,首先通过激光雷达传感器采集点云数据,并通过摄像装置采集图像数据。之后,通过点云单模态检测模型对点云数据进行目标检测,并通过图像单模态检测模型对图像数据进行目标检测。In the embodiment of the present disclosure, first, the point cloud data is collected by the laser radar sensor, and the image data is collected by the camera device. Afterwards, target detection is performed on the point cloud data through the point cloud single-modal detection model, and target detection is performed on the image data through the image single-modal detection model.
需要说明的是,点云单模态检测模型包括但不限于SECOND、PointPillars、PointRCNN、PV-RCNN,图像单模态检测模型包括但不限于RRC、MSCNN、Cascade R-CNN。It should be noted that point cloud unimodal detection models include but not limited to SECOND, PointPillars, PointRCNN, PV-RCNN, and image unimodal detection models include but not limited to RRC, MSCNN, Cascade R-CNN.
上述实施方式中,通过训练好的点云单模态检测模型对点云数据进行目标检测,并通过图像单模态检测模型对图像数据进行目标检测,可以提高目标检测结果的准确性,从而得到包含完整目标对象的准确性更高的包围框信息。In the above embodiment, the target detection is performed on the point cloud data through the trained point cloud single-modal detection model, and the target detection is performed on the image data through the image single-modal detection model, which can improve the accuracy of the target detection result, thereby obtaining More accurate bounding box information that includes the full object of interest.
在步骤S103中,在确定出目标检测结果之后,就可以确定点云数据对应的目标特征信息和图像数据对应的目标特征信息。因此,针对步骤S103,可以描述为下述过程:In step S103, after the target detection result is determined, the target feature information corresponding to the point cloud data and the target feature information corresponding to the image data can be determined. Therefore, for step S103, it can be described as the following process:
步骤S1031,根据点云数据的目标检测结果,确定点云数据所对应的几何特征信息和外观特征信息。Step S1031, according to the object detection result of the point cloud data, determine the geometric feature information and appearance feature information corresponding to the point cloud data.
步骤S1032,根据图像数据的目标检测结果,确定图像数据所对应的几何特征信息和外观特征信息。Step S1032, according to the target detection result of the image data, determine the geometric feature information and the appearance feature information corresponding to the image data.
这里,步骤S1031和步骤S1032无先后顺序。可以同时执行步骤S1031和步骤S1032;还可以先执行步骤S1031,而后执行步骤S1032;或者先执行步骤S1032,而后执行步骤S1031。下面将详细介绍上述步骤S1031和步骤S1032。Here, step S1031 and step S1032 are in no order. Step S1031 and step S1032 can be executed at the same time; step S1031 can also be executed first, and then step S1032; or step S1032 can be executed first, and then step S1031 can be executed. The above step S1031 and step S1032 will be described in detail below.
针对S1032、可以通过下述过程确定图像数据所对应的几何特征信息,详细过程描述如下:For S1032, the geometric feature information corresponding to the image data can be determined through the following process, and the detailed process is described as follows:
(1)、根据图像数据的目标检测结果,确定目标对象的包围框。(1) Determine the bounding box of the target object according to the target detection result of the image data.
(2)、根据目标对象的包围框确定图像数据的几何特征信息。(2) Determine the geometric feature information of the image data according to the bounding box of the target object.
在本公开实施例中,假设,图像数据中包含M个目标对象,那么图像数据的目标检测结果中包括M个目标对象的2D包围框。若每个2D包围框的包围框信息记为I j,则M个2D包围框的包围框信息可以记为
Figure PCTCN2022075419-appb-000001
其中,每个2D包围框的包围框信息可以表示为:I j={x j1,y j1,x j2,y j2}。
In the embodiment of the present disclosure, assuming that the image data contains M target objects, then the target detection result of the image data includes 2D bounding boxes of the M target objects. If the bounding box information of each 2D bounding box is denoted as I j , then the bounding box information of M 2D bounding boxes can be denoted as
Figure PCTCN2022075419-appb-000001
Wherein, the bounding box information of each 2D bounding box can be expressed as: I j ={x j1 , y j1 , x j2 , y j2 }.
此时,可以将每个2D包围框的包围框信息确定为图像数据所对应的几何特征信息。At this time, the bounding box information of each 2D bounding box may be determined as the geometric feature information corresponding to the image data.
这里,(x j1,y j1)和(x j2,y j2)分别表示在摄像装置的外参坐标系下,每个2D包围框的左上角和右下角在图像数据中的坐标信息。 Here, (x j1 , y j1 ) and (x j2 , y j2 ) respectively represent the coordinate information of the upper left corner and lower right corner of each 2D bounding box in the image data under the external reference coordinate system of the camera device.
针对S1031、根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息,详细过程描述如下:For S1031, according to the target detection result of the point cloud data, determine the target feature information corresponding to the point cloud data, the detailed process is described as follows:
(1)、将所述点云数据的目标检测结果投影至所述图像数据中,得到所述点云数据所对应的包围框的投影框。(1) Projecting the target detection result of the point cloud data into the image data to obtain a projection frame of a bounding frame corresponding to the point cloud data.
(2)、根据所述投影框的顶点在所述图像数据中的像素坐标,确定所述点云数据的几何特征信息。(2) Determine the geometric feature information of the point cloud data according to the pixel coordinates of the vertices of the projection frame in the image data.
在通过点云单模态检测模型对点云数据进行目标检测之后,得到目标检测结果,假设,该目标检测结果中包括:N个目标对象的3D包围框,若每个3D包围框的包围盒信息记为P i,则N个3D包围框的包围框信息可以记为
Figure PCTCN2022075419-appb-000002
其中,每个3D包围框的包围框信息P i={x i,y i,z i,h i,w i,l ii}。
After target detection is performed on the point cloud data through the point cloud single-modal detection model, the target detection result is obtained. It is assumed that the target detection result includes: 3D bounding boxes of N target objects, if the bounding box of each 3D bounding box information is denoted as P i , then the bounding box information of N 3D bounding boxes can be denoted as
Figure PCTCN2022075419-appb-000002
Wherein, the bounding box information P i of each 3D bounding box = {xi , y i , z i , h i , w i , l i , θ i }.
这里,x i,y i,z i表示3D包围框中心在激光雷达传感器坐标系上的坐标,h i,w i,l i表示点云3D包围框的三维尺寸长宽高,θ i表示点云3D包围盒在鸟瞰图上的朝向信息,即绕着激光雷达传感器坐标系Y轴的转角。 Here, x i , y i , z i represent the coordinates of the center of the 3D bounding box on the lidar sensor coordinate system, h i , w i , l i represent the three-dimensional dimensions of the point cloud 3D bounding box, length, width and height, and θ i represents the point The orientation information of the cloud 3D bounding box on the bird's-eye view, that is, the rotation angle around the Y-axis of the lidar sensor coordinate system.
在获得点云数据的目标检测结果P之后,就可以根据摄像装置的外参坐标系,以及激光雷达传感器和摄像装置之间的标定关系,将上述点云数据所对应的目标检测结果投影至图像数据中,得到点云数据所对应的包围框的2D投影框
Figure PCTCN2022075419-appb-000003
其中,
Figure PCTCN2022075419-appb-000004
After obtaining the target detection result P of the point cloud data, the target detection result corresponding to the above point cloud data can be projected to the image according to the external reference coordinate system of the camera device and the calibration relationship between the lidar sensor and the camera device In the data, the 2D projection frame of the bounding frame corresponding to the point cloud data is obtained
Figure PCTCN2022075419-appb-000003
in,
Figure PCTCN2022075419-appb-000004
在本公开实施例中,在上述2D投影框
Figure PCTCN2022075419-appb-000005
中,
Figure PCTCN2022075419-appb-000006
为2D投影框的左上角和右下角在图像数据中的像素坐标。在得到2D投影框的坐标信息之后,可以将该坐标信息确定点云数据的几何特征信息。
In the embodiment of the present disclosure, in the above 2D projection frame
Figure PCTCN2022075419-appb-000005
middle,
Figure PCTCN2022075419-appb-000006
are the pixel coordinates of the upper left corner and lower right corner of the 2D projection box in the image data. After the coordinate information of the 2D projection frame is obtained, the coordinate information can be used to determine the geometric feature information of the point cloud data.
需要说明的是,上述点云数据所对应的包围框的2D投影框的几何特征信息可以记为
Figure PCTCN2022075419-appb-000007
其中,
Figure PCTCN2022075419-appb-000008
此时,可以看出该几何特征信息
Figure PCTCN2022075419-appb-000009
中的数据维度D G为4,即包括
Figure PCTCN2022075419-appb-000010
中左上角点和右下角点的像素坐标,则上述点云数据所对应的包围框的2D投影框的几何特征信息的大小可以记为N×D G
It should be noted that the geometric feature information of the 2D projection frame of the bounding frame corresponding to the above point cloud data can be written as
Figure PCTCN2022075419-appb-000007
in,
Figure PCTCN2022075419-appb-000008
At this point, it can be seen that the geometric feature information
Figure PCTCN2022075419-appb-000009
The data dimension D G in is 4, including
Figure PCTCN2022075419-appb-000010
The pixel coordinates of the upper left corner point and the lower right corner point, then the size of the geometric feature information of the 2D projection frame of the bounding frame corresponding to the above point cloud data can be recorded as N×D G .
同理可得,上述图像数据的2D包围框的几何特征信息可以记为
Figure PCTCN2022075419-appb-000011
其中,
Figure PCTCN2022075419-appb-000012
此时,可以看出该几何特征信息
Figure PCTCN2022075419-appb-000013
中的数据维度D G为4,即包括
Figure PCTCN2022075419-appb-000014
中左上角点和右下角点的像素坐标,则上述图像数据的几何特征的大小可以记为M×D G
Similarly, the geometric feature information of the 2D bounding box of the above image data can be recorded as
Figure PCTCN2022075419-appb-000011
in,
Figure PCTCN2022075419-appb-000012
At this point, it can be seen that the geometric feature information
Figure PCTCN2022075419-appb-000013
The data dimension D G in is 4, including
Figure PCTCN2022075419-appb-000014
The pixel coordinates of the upper left corner point and the lower right corner point, then the size of the geometric features of the above image data can be recorded as M×D G .
这里需要说明的是,几何特征信息中包围框信息除了是左上角点和右下角点的像素坐标之外,还可以为左下角点和右上角点的像素坐标,本公开对此不作具体限定。It should be noted here that, in addition to the pixel coordinates of the upper left corner point and the lower right corner point, the bounding box information in the geometric feature information may also be the pixel coordinates of the lower left corner point and the upper right corner point, which is not specifically limited in the present disclosure.
在本公开实施例中,上述几何特征信息还可以包含包围框的位置信息和/或尺寸信息。In an embodiment of the present disclosure, the above geometric feature information may further include position information and/or size information of the bounding box.
这里,在按照上述所描述的方式确定出点云数据的几何特征信息和图像数据的几何特征信息之后,还可以将包围框的位置信息和/或尺寸信息扩充至几何特征信息中,此时,扩充之后的几何特征信息包括所述包围框的位置信息和/或尺寸信息。Here, after determining the geometric feature information of the point cloud data and the geometric feature information of the image data in the manner described above, the position information and/or size information of the bounding box can also be expanded to the geometric feature information. At this time, The expanded geometric feature information includes position information and/or size information of the bounding box.
下面,可以分为对图像数据的几何特征信息的扩充和对点云数据的几何特征信息的扩充 两种情况来对几何特征信息的扩充过程进行介绍。In the following, the expansion process of geometric feature information can be introduced in two cases: the expansion of geometric feature information of image data and the expansion of geometric feature information of point cloud data.
情况一:对图像数据的几何特征信息的扩充过程。Case 1: The process of expanding the geometric feature information of the image data.
在本公开实施例中,在对上述图像数据的几何特征信息进行扩充时,可以将该2D包围框的中心点的像素坐标(x c,y c)扩充至上述
Figure PCTCN2022075419-appb-000015
中,使得D G=6,则扩充后的
Figure PCTCN2022075419-appb-000016
Figure PCTCN2022075419-appb-000017
In the embodiment of the present disclosure, when expanding the geometric feature information of the above-mentioned image data, the pixel coordinates (x c , y c ) of the center point of the 2D bounding box can be extended to the above-mentioned
Figure PCTCN2022075419-appb-000015
, so that D G =6, then the expanded
Figure PCTCN2022075419-appb-000016
Figure PCTCN2022075419-appb-000017
此外,还可以将该2D包围框的尺寸信息(h,w)也扩充至上述
Figure PCTCN2022075419-appb-000018
中,使得D G=8,则扩充后的
Figure PCTCN2022075419-appb-000019
其中,上述尺寸信息h、w分别为该2D包围框的高、宽的尺寸信息。
In addition, the size information (h, w) of the 2D bounding box can also be extended to the above
Figure PCTCN2022075419-appb-000018
, so that D G =8, then the expanded
Figure PCTCN2022075419-appb-000019
Wherein, the above-mentioned size information h and w are respectively the height and width size information of the 2D bounding box.
情况二:对点云数据的几何特征信息的扩充过程。Case 2: The process of expanding the geometric feature information of point cloud data.
在本公开实施例中,在对上述点云数据的几何特征信息进行扩充时,可以将该2D投影框的中心点的像素坐标
Figure PCTCN2022075419-appb-000020
扩充至上述
Figure PCTCN2022075419-appb-000021
中,使得D G=6,则扩充后的
Figure PCTCN2022075419-appb-000022
此外,还可以将该2D投影框的尺寸信息
Figure PCTCN2022075419-appb-000023
也扩充至上述
Figure PCTCN2022075419-appb-000024
中,使得D G=8,则扩充后的
Figure PCTCN2022075419-appb-000025
其中,上述尺寸信息
Figure PCTCN2022075419-appb-000026
分别为该2D投影框的高、宽的尺寸信息。
In the embodiment of the present disclosure, when expanding the geometric feature information of the above-mentioned point cloud data, the pixel coordinates of the center point of the 2D projection frame can be
Figure PCTCN2022075419-appb-000020
extended to the above
Figure PCTCN2022075419-appb-000021
, so that D G =6, then the expanded
Figure PCTCN2022075419-appb-000022
In addition, the size information of the 2D projection frame can also be
Figure PCTCN2022075419-appb-000023
Also extended to the above
Figure PCTCN2022075419-appb-000024
, so that D G =8, then the expanded
Figure PCTCN2022075419-appb-000025
where the above size information
Figure PCTCN2022075419-appb-000026
are the height and width size information of the 2D projection frame, respectively.
上述实施例中,通过将点云数据所对应的3D包围框投影至图像数据中,以得到2D投影框,可以实现数据格式的统一,从而能够快速的根据图像数据所对应的2D包围框和2D投影框之间的相对位置进行目标对象的匹配,进而得到相应的匹配结果。In the above embodiment, by projecting the 3D bounding box corresponding to the point cloud data into the image data to obtain a 2D projection frame, the unification of the data format can be realized, so that the 2D bounding box and the 2D bounding box corresponding to the image data can be quickly obtained. The relative positions between the projection frames are used to match the target objects, and then the corresponding matching results are obtained.
上述实施例中,通过将包围框的位置信息和/或尺寸信息扩充至几何特征信息中,可以丰富几何特征信息,进一步提高了数据匹配的准确度。In the above embodiments, by expanding the position information and/or size information of the bounding box into the geometric feature information, the geometric feature information can be enriched, further improving the accuracy of data matching.
针对S1032、在目标特征信息包括外观特征信息的情况下,根据图像数据的目标检测结果,确定所述图像数据对应的目标特征信息,详细过程描述如下:For S1032, when the target feature information includes appearance feature information, determine the target feature information corresponding to the image data according to the target detection result of the image data, and the detailed process is described as follows:
(1)、确定所述图像数据中位于检测出的包围框内的目标图像数据。(1) Determine target image data within the detected bounding box in the image data.
(2)、提取所述目标图像数据的图像特征,并将提取到的所述图像特征确定为所述图像数据对应的外观特征信息。(2) Extract image features of the target image data, and determine the extracted image features as appearance feature information corresponding to the image data.
在本公开实施例中,在图像数据中确定位于图像数据所对应的M个2D包围框
Figure PCTCN2022075419-appb-000027
内的图像,并将确定出的图像进行裁剪得到目标图像数据;针对M个2D包围框,可以得到M个目标图像数据。之后,可以将裁剪下来的目标图像数据进行放缩,得到M个统一像素为r×r的RBG图像,放缩之后的目标图像数据可以表示为Image,其中,
Figure PCTCN2022075419-appb-000028
放缩之后的目标图像数据Image用于表征各个像素点的像素值,且该放缩之后的目标图像数据Image的大小为r×r×3。在得到目标图像数据之后,可以将目标图像数据Image输入到预设的图像特征提取网络中,提取得到该目标图像数据的D A维的图像特征
Figure PCTCN2022075419-appb-000029
然后根据该图像特征
Figure PCTCN2022075419-appb-000030
得到该目标图像数据的外观特征信息
Figure PCTCN2022075419-appb-000031
其中,该外观特征信息
Figure PCTCN2022075419-appb-000032
可以表示为大小为M×D A的向量形式。需要说明的是,上述图像特征提取网络包括但不限于VGG-Net、ResNet、GoogleNet等能够实现上述图像特征提取的网络。
In the embodiment of the present disclosure, M 2D bounding boxes corresponding to the image data are determined in the image data
Figure PCTCN2022075419-appb-000027
The image within, and the determined image is clipped to obtain the target image data; for the M 2D bounding boxes, M target image data can be obtained. Afterwards, the cropped target image data can be scaled to obtain M RBG images with uniform pixels of r×r, and the scaled target image data can be represented as Image, where,
Figure PCTCN2022075419-appb-000028
The scaled target image data Image is used to represent the pixel value of each pixel, and the size of the scaled target image data Image is r×r×3. After obtaining the target image data, the target image data Image can be input into the preset image feature extraction network, and the D A- dimensional image features of the target image data can be extracted
Figure PCTCN2022075419-appb-000029
Then according to the image features
Figure PCTCN2022075419-appb-000030
Obtain the appearance feature information of the target image data
Figure PCTCN2022075419-appb-000031
Among them, the appearance feature information
Figure PCTCN2022075419-appb-000032
Can be represented as a vector form of size M× DA . It should be noted that the above-mentioned image feature extraction network includes but not limited to VGG-Net, ResNet, GoogleNet and other networks that can realize the above-mentioned image feature extraction.
上述实施方式中,通过提取图像数据所对应的包围框内的目标图像数据的图像特征,可以实现在根据位置信息对包围框进行匹配的基础上,进一步根据外观特征信息校验位置相匹配的包围框内对象是否为同一个对象。通过上述处理方式,可以弥补由于图像数据和点云数据之间的弱同步所导致的匹配错误,从而提高了数据匹配的精度,进而提高了自动驾驶的安全系数。In the above-mentioned embodiment, by extracting the image features of the target image data in the bounding box corresponding to the image data, it is possible to realize the matching of the bounding box according to the position information, and to further check the matching position according to the appearance feature information. Whether the objects in the box are the same object. Through the above-mentioned processing method, the matching error caused by the weak synchronization between the image data and the point cloud data can be compensated, thereby improving the accuracy of data matching, and further improving the safety factor of automatic driving.
针对S1031、在目标特征信息包括外观特征信息的情况下,步骤:根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息,详细过程描述如下:For S1031, when the target feature information includes appearance feature information, step: determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data, and the detailed process is described as follows:
(1)、确定所述点云数据中位于检测出的包围框内的目标点云数据;其中,所述目标点云数据包括:位于该包围框内目标点云的数量和/或所述目标点云在点云坐标系下的坐标信息。(1) Determine the target point cloud data located in the detected bounding box in the point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the bounding box and/or the target The coordinate information of the point cloud in the point cloud coordinate system.
(2)、基于所述目标点云数据,确定用于描述所述目标点云的整体特征的全局点云特征,并根据所述全局点云特征确定所述点云数据的外观特征信息。(2) Based on the target point cloud data, determine global point cloud features used to describe the overall features of the target point cloud, and determine appearance feature information of the point cloud data according to the global point cloud features.
在本公开实施例中,上述激光雷达传感器可以对扫描范围内的路况进行扫描,从而得到若干个用于表征采集范围内的物体特征的点云数据。在确定点云数据所对应的外观特征信息时,可以在点云数据中确定位于3D包围框内的目标点云数据,其中,目标点云数据包含位于该包围框内目标点云的数量L和/或目标点云在点云坐标系(也即,激光雷达坐标系)下的坐标信息C,其中C=3。In the embodiment of the present disclosure, the above-mentioned lidar sensor can scan the road conditions within the scanning range, so as to obtain several point cloud data used to characterize the characteristics of the objects within the collection range. When determining the appearance feature information corresponding to the point cloud data, the target point cloud data located in the 3D bounding box can be determined in the point cloud data, wherein the target point cloud data includes the number L and /or coordinate information C of the target point cloud in the point cloud coordinate system (that is, the lidar coordinate system), where C=3.
这里,可以将提取到的目标点云数据记为PC,其中
Figure PCTCN2022075419-appb-000033
然后,将该目标点云数据输入到点云特征提取网络中,提取得到目标点云数据的D A维的全局点云特征
Figure PCTCN2022075419-appb-000034
然后根据该全局点云特征
Figure PCTCN2022075419-appb-000035
得到该点云数据的外观特征信息
Figure PCTCN2022075419-appb-000036
其中,该外观特征信息
Figure PCTCN2022075419-appb-000037
可以表示为大小为N×D A的向量形式。
Here, the extracted target point cloud data can be recorded as PC, where
Figure PCTCN2022075419-appb-000033
Then, the target point cloud data is input into the point cloud feature extraction network, and the global point cloud feature of the D A dimension of the target point cloud data is extracted
Figure PCTCN2022075419-appb-000034
Then according to the global point cloud features
Figure PCTCN2022075419-appb-000035
Obtain the appearance feature information of the point cloud data
Figure PCTCN2022075419-appb-000036
Among them, the appearance feature information
Figure PCTCN2022075419-appb-000037
Can be represented as a vector form of size N× DA .
需要说明的是,上述点云特征提取网络包括但不限于Pointnet、Pointnet++、PointSIFT等能够实现上述点云特征提取的提取网络。It should be noted that the above-mentioned point cloud feature extraction network includes but not limited to Pointnet, Pointnet++, PointSIFT and other extraction networks that can realize the above-mentioned point cloud feature extraction.
上述实施方式中,通过提取点云数据所对应的包围框内的目标点云数据的全局点云特征,可以实现在根据位置信息对包围框进行匹配的基础上,进一步根据外观特征信息校验位置相匹配的包围框内对象是否为同一个对象。通过上述处理方式,可以弥补由于图像数据和点云数据之间的弱同步所导致的匹配错误,从而提高了数据匹配的精度,进而提高了自动驾驶的安全系数。In the above-mentioned embodiment, by extracting the global point cloud features of the target point cloud data in the bounding box corresponding to the point cloud data, it can be realized on the basis of matching the bounding box according to the position information, and further verifying the position according to the appearance feature information Whether the objects in the matching bounding boxes are the same object. Through the above-mentioned processing method, the matching error caused by the weak synchronization between the image data and the point cloud data can be compensated, thereby improving the accuracy of data matching, and further improving the safety factor of automatic driving.
针对步骤S105,根据所述目标特征信息,对基于所述点云数据确定的包围框和基于所述图像数据确定的包围框进行匹配,详细过程描述如下:For step S105, according to the target feature information, the bounding box determined based on the point cloud data is matched with the bounding box determined based on the image data, and the detailed process is described as follows:
(1)、对所述图像数据的目标特征信息和所述点云数据的目标特征信息进行相关性计算,得到相关性计算结果,得到相关性计算结果。(1) Perform correlation calculation on the target feature information of the image data and the target feature information of the point cloud data to obtain a correlation calculation result, and obtain a correlation calculation result.
(2)、根据所述相关性计算结果确定所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果。(2) Determine a matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result.
在本公开实施例中,在确定出目标特征信息之后,可以按照预设相关性算法对图像数据的目标特征信息和点云数据的目标特征信息进行相关性计算,得到相关性计算结果。In the embodiment of the present disclosure, after the target feature information is determined, a correlation calculation may be performed on the target feature information of the image data and the target feature information of the point cloud data according to a preset correlation algorithm to obtain a correlation calculation result.
假设,图像数据的目标特征信息中包含M个2D包围框所对应的特征信息,点云数据的目标特征信息中包含N个3D包围框所对应的特征信息。此时,相关性计算结果可以理解为M个2D包围框所对应的特征信息中的每个特征信息与N个3D包围框所对应的特征信息中各个特征信息之间的相关性。It is assumed that the target feature information of the image data includes feature information corresponding to M 2D bounding boxes, and the target feature information of the point cloud data includes feature information corresponding to N 3D bounding boxes. At this time, the correlation calculation result can be understood as the correlation between each feature information in the feature information corresponding to the M 2D bounding boxes and each feature information in the feature information corresponding to the N 3D bounding boxes.
在得到上述相关性计算结果之后,可以根据该相关性计算结果确定基于点云数据所确定的包围框和基于图像数据所确定的包围框之间的匹配结果。After the above correlation calculation result is obtained, the matching result between the bounding box determined based on the point cloud data and the bounding box determined based on the image data may be determined according to the correlation calculation result.
在一些实施例中,该匹配结果表征基于点云数据确定的包围框和基于图像数据确定的包围框是否匹配,其中,该匹配结果可以为一个N*M的匹配矩阵。当该匹配矩阵内的元素为1时,表征两个包围框为匹配的包围框,当该匹配矩阵内的元素为0时,表征两个包围框不是相互匹配的包围框。In some embodiments, the matching result indicates whether the bounding box determined based on the point cloud data matches the bounding box determined based on the image data, where the matching result may be an N*M matching matrix. When the element in the matching matrix is 1, it indicates that the two bounding boxes are matching bounding boxes, and when the element in the matching matrix is 0, it indicates that the two bounding boxes are not mutually matching bounding boxes.
上述实施方式中,通过结合几何图像信息和外观特征信息,对点云数据目标特征信息和图像数据的目标特征信息进行相关性计算,可以准确的确定出M个2D包围框所对应的特征信息中的每个特征信息与N个3D包围框所对应的特征信息中各个特征信息之间的相关性,从而在根据相关性计算结果对包围框进行匹配时,得到提高包围框的匹配准确度。In the above embodiment, by combining the geometric image information and the appearance feature information, the correlation calculation is performed on the target feature information of the point cloud data and the target feature information of the image data, and the feature information corresponding to the M 2D bounding boxes can be accurately determined. Correlation between each feature information of the 3D bounding boxes and each of the feature information corresponding to the N 3D bounding boxes, so that when the bounding boxes are matched according to the correlation calculation results, the matching accuracy of the bounding boxes is improved.
在本公开实施例中,上述步骤中:对所述图像数据的目标特征信息和所述点云数据的目标特征信息进行相关性计算,得到相关性计算结果,得到相关性计算结果,详细过程描述如下:In the embodiment of the present disclosure, in the above steps: the correlation calculation is performed on the target feature information of the image data and the target feature information of the point cloud data to obtain the correlation calculation result, and the detailed process description as follows:
(1)、对所述图像数据对应的几何特征信息和所述图像数据对应的外观特征信息进行拼接,得到目标图像特征。(1) Concatenating the geometric feature information corresponding to the image data and the appearance feature information corresponding to the image data to obtain target image features.
(2)、对所述点云数据对应的几何特征信息和所述点云数据对应的外观特征信息进行拼接,得到目标点云特征。(2) Splicing the geometric feature information corresponding to the point cloud data and the appearance feature information corresponding to the point cloud data to obtain target point cloud features.
(3)、对所述目标图像特征和所述目标点云特征进行相关性运算,得到所述相关性计算结果。(3) Perform a correlation calculation on the target image feature and the target point cloud feature to obtain the correlation calculation result.
在本公开实施例中,可以对上述图像数据的几何特征信息和该图像数据的外观特征信息进行拼接,从而得到该图像数据的目标图像特征。例如,可以将大小为M×D A外观特征向量
Figure PCTCN2022075419-appb-000038
(图像数据的外观特征信息)和大小为M×D G的几何特征向量
Figure PCTCN2022075419-appb-000039
(图像数据的几何特征信息)进行拼接,从而得到大小为M×(D A+D G)的图像特征向量F img,即上述目标图像特征。在本公开实施例中,可以对上述点云数据的几何特征信息和该点云数据的外观特征信息进行拼接,从而得到该点云数据的目标点云特征。例如,可以将大小为N×D A外观特征向量
Figure PCTCN2022075419-appb-000040
(点云数据的外观特征信息)和大小为N×D G的几何特征向量F pc(点云数据的几何特征信息)进行拼接,从而得到大小为N×(D A+D G)的点云数据的点云特征向量F pc,即上述目标点云特征。之后,可以通过预设相关性算法,对目标图像特征和目标点云特征进行相关性运算。例如,可以通过预设相关性算法对上述图像特征向量F img和图像特征向量F pc进行计算,得到N×M×(D A+D G)的相关性矩阵F correlation(也即,上述相关性计算结果)。
In the embodiment of the present disclosure, the geometric feature information of the above image data and the appearance feature information of the image data may be spliced, so as to obtain the target image feature of the image data. For example, an appearance feature vector of size M×D can be
Figure PCTCN2022075419-appb-000038
(the appearance feature information of the image data) and a geometric feature vector of size M×D G
Figure PCTCN2022075419-appb-000039
(Geometric feature information of the image data) is spliced to obtain an image feature vector F img with a size of M×( DA +D G ), which is the above-mentioned target image feature. In the embodiment of the present disclosure, the geometric feature information of the point cloud data and the appearance feature information of the point cloud data may be concatenated, so as to obtain the target point cloud feature of the point cloud data. For example, an appearance feature vector of size N×D can be
Figure PCTCN2022075419-appb-000040
(the appearance feature information of point cloud data) and the geometric feature vector F pc (geometric feature information of point cloud data) of size N×D G are spliced to obtain a point cloud of size N×( DA +D G ) The point cloud feature vector F pc of the data is the feature of the above-mentioned target point cloud. Afterwards, a correlation operation can be performed on the target image features and the target point cloud features through a preset correlation algorithm. For example, the above-mentioned image feature vector F img and image feature vector F pc can be calculated by a preset correlation algorithm to obtain a correlation matrix F correlation of N×M×( DA +D G ) (that is, the above-mentioned correlation Calculation results).
在一种可选的实施方式中,在根据预设相关性算法可以为如下任意一个计算公式所对应的算法:In an optional implementation manner, the preset correlation algorithm may be an algorithm corresponding to any one of the following calculation formulas:
Figure PCTCN2022075419-appb-000041
Figure PCTCN2022075419-appb-000041
Figure PCTCN2022075419-appb-000042
Figure PCTCN2022075419-appb-000042
Figure PCTCN2022075419-appb-000043
Figure PCTCN2022075419-appb-000043
Figure PCTCN2022075419-appb-000044
Figure PCTCN2022075419-appb-000044
Figure PCTCN2022075419-appb-000045
Figure PCTCN2022075419-appb-000045
上述实施方式中,通过将几何特征信息和外观特征信息进行拼接,得到相应的目标图像特征和目标点云特征,再对目标图像特征和目标点云特征进行相关性计算,从而根据相关性计算结果确定包围框的匹配结果的方式,可以弥补由于图像数据和点云数据之间的弱同步所导致的匹配错误,从而提高了数据匹配的精度,进而提高了自动驾驶的安全系数。In the above embodiment, by splicing the geometric feature information and the appearance feature information, the corresponding target image features and target point cloud features are obtained, and then the correlation calculation is performed on the target image features and target point cloud features, so that according to the correlation calculation results The method of determining the matching result of the bounding box can make up for the matching error caused by the weak synchronization between the image data and the point cloud data, thereby improving the accuracy of data matching, and thus improving the safety factor of automatic driving.
在本公开实施例中,上述步骤中:根据所述相关性计算结果确定所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果,详细过程描述如下:In the embodiment of the present disclosure, in the above steps: determine the matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result, the detailed process is described as follows:
(1)、对所述相关性计算结果进行卷积计算,得到相似度矩阵,其中,所述相似度矩阵用于表征所述点云数据中的包围框和所述图像数据中的包围框之间的相似程度。(1) Perform convolution calculation on the correlation calculation result to obtain a similarity matrix, wherein the similarity matrix is used to characterize the relationship between the bounding box in the point cloud data and the bounding box in the image data the degree of similarity between them.
(2)、对所述相似度矩阵进行取反计算,得到匹配代价矩阵。(2) Inverting the similarity matrix to obtain a matching cost matrix.
(3)、对所述匹配代价矩阵进行二分图匹配处理,得到所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果。(3) Perform bipartite graph matching processing on the matching cost matrix to obtain a matching result between the bounding boxes in the point cloud data and the bounding boxes in the image data.
在本公开实施例中,在得到相关性计算结果之后,可以将上述相关性计算结果(也即,相关性矩阵F correlation)输入到若干个二维卷积网络中进行卷积计算,得到一个大小为N×M×1的相似度矩阵。其中,该相似度矩阵中的每个元素表示:N个3D包围框中的每个3D包围框与M个2D包围框中各个2D包围框之间的相似程度。 In the embodiment of the present disclosure, after the correlation calculation result is obtained, the above correlation calculation result (that is, the correlation matrix F correlation ) can be input into several two-dimensional convolutional networks for convolution calculation, and a size It is a similarity matrix of N×M×1. Wherein, each element in the similarity matrix represents: the degree of similarity between each 3D bounding box in the N 3D bounding boxes and each 2D bounding box in the M 2D bounding boxes.
这里,相似程度包含:基于几何特征信息确定的出相似程度,还包含基于外观特征信息确定出的相似程度。Here, the degree of similarity includes: the degree of similarity determined based on geometric feature information, and the degree of similarity determined based on appearance feature information.
例如,第n个3D包围框与第m个2D包围框的几何特征信息相似程度较高,且第n个3D包围框与第m个2D包围框所框选对象的外观特征信息的相似程度较高,则可以确定出第n个3D包围框与第m个2D包围框为相互匹配的包围框。For example, the geometric feature information of the nth 3D bounding box and the mth 2D bounding box have a higher degree of similarity, and the similarity of the appearance feature information of the nth 3D bounding box with the mth 2D bounding box is relatively small. is high, it can be determined that the nth 3D bounding box and the mth 2D bounding box are matching bounding boxes.
在得到相似矩阵之后,可以对相似度矩阵进行取反计算,得到匹配代价矩阵;进而,对匹配代价矩阵进行二分图匹配处理,得到点云数据中的包围框和图像数据中的包围框之间的匹配结果。After the similarity matrix is obtained, the similarity matrix can be inversely calculated to obtain the matching cost matrix; then, the bipartite graph matching process is performed on the matching cost matrix to obtain the bounding box in the point cloud data and the bounding box in the image data. matching results.
假设两个单模态的检测结果之间(即,图像数据的目标检测结果和点云数据的目标检测结果),每个检测结果最多只能构成一个匹配,而同一模态下的检测结果互不相同。Assuming that between two single-modal detection results (that is, the target detection result of image data and the target detection result of point cloud data), each detection result can only constitute at most one match, and the detection results in the same modality are mutually exclusive. Are not the same.
这里,每个检测结果最多只能构成一个匹配可以理解为:基于图像数据确定的一个2D包围框最多能够匹配到一个基于点云数据确定的3D包围框。Here, each detection result can only constitute at most one match, which can be understood as: a 2D bounding box determined based on image data can at most match a 3D bounding box determined based on point cloud data.
此时,可以将两个单模态检测结果的匹配问题当成二分图匹配问题。例如,在一个无向图当中,图像数据的目标检测结果和点云数据的目标检测结果可以分成两个子集,例如,图像数据的目标检测结果作为一个子集,点云数据的目标检测结果作为另外一个子集。在每个子集中包含多个顶点,每个顶点对应一个包围框,且每个子集当中的顶点各自互不相交,并且无向图中的所有边关联的顶点都属于两个不同的集合。对于二分图而言,构成的匹配数量可以是不同的,而匹配目标是使得两个子集尽可能准确地两两匹配。因此通过匹配算法,将相似度矩阵逐个取反后作为匹配代价矩阵,然后设置匹配阈值δ,匹配代价高于δ不参与匹配,进而计算出最终的多模态匹配矩阵(即,点云数据中的包围框和所述图像数据中的包围框之间的匹配结果)。At this point, the matching problem of two unimodal detection results can be regarded as a bipartite graph matching problem. For example, in an undirected graph, the target detection results of image data and the target detection results of point cloud data can be divided into two subsets, for example, the target detection results of image data as a subset, and the target detection results of point cloud data as Another subset. Each subset contains multiple vertices, each vertex corresponds to a bounding box, and the vertices in each subset are mutually disjoint, and the vertices associated with all edges in the undirected graph belong to two different sets. For a bipartite graph, the number of matched matches can be different, and the matching goal is to make the two subsets match each other as accurately as possible. Therefore, through the matching algorithm, the similarity matrix is inverted one by one as the matching cost matrix, and then the matching threshold δ is set. The matching cost is higher than δ and does not participate in the matching, and then the final multimodal matching matrix is calculated (that is, in the point cloud data The matching result between the bounding box of and the bounding box in the image data).
需要说明的是,上述所描述的匹配算法包括但不限于匈牙利匹配算法、Kuhn-Munkres匹配算法。It should be noted that the matching algorithms described above include but are not limited to the Hungarian matching algorithm and the Kuhn-Munkres matching algorithm.
上述实施方式中,可以通过对上述相关性计算结果进行卷积、取反和二分图匹配处理等操作,可以提高匹配结果的处理效率,并得到准确度较高的匹配结果。In the above embodiments, operations such as convolution, inversion, and bipartite graph matching may be performed on the above correlation calculation results to improve the processing efficiency of the matching results and obtain matching results with high accuracy.
在弱同步多模态数据集下,针对几何特征信息来说,由于是通过2D包围框和3D包围框的投影框的位置信息来进行对象的匹配,因此几何特征信息也会出现弱同步问题。特别是,对于小物体而言,几何特征信息的偏差会导致匹配效果的严重下降,因此现有常用IOU相似度矩阵匹配算法并不能解决弱同步下的多模态匹配问题。而外观特征信息始终是跟随3D框和2D框内的对象进行特征提取到的,因此,外观特征信息不会受到弱同步问题的影响,因此外观特征信息有助于修正弱同步带来的误差。Under the weak synchronization multimodal data set, for the geometric feature information, since the object matching is performed through the position information of the projection frame of the 2D bounding box and the 3D bounding box, the geometric feature information will also have a weak synchronization problem. Especially for small objects, the deviation of geometric feature information will lead to a serious decline in the matching effect, so the existing common IOU similarity matrix matching algorithm cannot solve the multimodal matching problem under weak synchronization. The appearance feature information is always extracted following the objects in the 3D frame and the 2D frame. Therefore, the appearance feature information will not be affected by the weak synchronization problem, so the appearance feature information helps to correct the error caused by the weak synchronization.
综上分析,本公开实施例提出的数据匹配方法可以同时适用于强同步和弱同步情况下的多对多的多模态数据的匹配过程。In summary, the data matching method proposed by the embodiments of the present disclosure can be applied to the many-to-many multi-modal data matching process in both strong synchronization and weak synchronization situations.
在本公开实施例中,如图2所示,还提供了另一种数据匹配方法的流程示意图,该方法详细描述如下:In the embodiment of the present disclosure, as shown in FIG. 2 , a schematic flowchart of another data matching method is also provided, and the method is described in detail as follows:
(1)、确定目标检测结果。(1) Determine the target detection result.
通过摄像装置采集待匹配的图像数据;并通过图像单模态检测模型对图像数据进行检测,得到目标检测结果A1,其中,目标检测结果A1中包含图像数据中所包含的M个对象的2D包围框
Figure PCTCN2022075419-appb-000046
The image data to be matched is collected by a camera device; and the image data is detected by an image single-modal detection model to obtain a target detection result A1, wherein the target detection result A1 includes a 2D surround of M objects contained in the image data frame
Figure PCTCN2022075419-appb-000046
通过激光雷达传感器采集待匹配的点云数据;并通过点云单模态检测模型对点云数据进行检测,得到目标检测结果A2,其中,目标检测结果A2中包含点云数据中所感知到的N个对象的3D包围框
Figure PCTCN2022075419-appb-000047
The point cloud data to be matched is collected by the lidar sensor; and the point cloud data is detected by the point cloud single-mode detection model to obtain the target detection result A2, wherein the target detection result A2 includes the perceived point cloud data 3D bounding boxes for N objects
Figure PCTCN2022075419-appb-000047
(2)、根据目标检测结果A1确定图像数据所对应的目标特征信息。(2) Determine the target feature information corresponding to the image data according to the target detection result A1.
将M个对象的2D包围框
Figure PCTCN2022075419-appb-000048
的位置信息确定为图像数据所对应的目标特征信息中的几何特征信息。在图像数据中确定位于每个2D包围框内的目标图像数据,并通过图像特征提取网络提取上述目标图像数据的图像特征,将提取到的上述图像特征确定为上述图像数据所对应的目标特征信息中的外观特征信息M×D A
The 2D bounding boxes of M objects
Figure PCTCN2022075419-appb-000048
The position information of is determined as the geometric feature information in the target feature information corresponding to the image data. Determine the target image data located in each 2D bounding box in the image data, and extract the image features of the above target image data through the image feature extraction network, and determine the extracted above image features as the target feature information corresponding to the above image data The appearance feature information M× DA in .
(3)、根据目标检测结果A2确定点云数据所对应的目标特征信息。(3) Determine the target feature information corresponding to the point cloud data according to the target detection result A2.
将点云数据所对应的目标检测结果投影至图像数据中,得到点云数据所对应的3D包围框的投影框;根据投影框的顶点在图像数据中的像素坐标,确定点云数据的几何特征信息。Project the target detection results corresponding to the point cloud data into the image data to obtain the projection frame of the 3D bounding box corresponding to the point cloud data; determine the geometric features of the point cloud data according to the pixel coordinates of the vertices of the projection frame in the image data information.
在点云数据中确定位于检测出的3D包围框内的目标点云数据;其中,所述目标点云数据包括:位于该3D包围框内目标点云的数量和/或目标点云在点云坐标系下的坐标信息;基于目标点云数据,确定用于描述目标点云的整体特征的全局点云特征,并根据全局点云特征确定点云数据的外观特征信息。Determine the target point cloud data located in the detected 3D bounding box in the point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the 3D bounding box and/or the target point cloud in the point cloud Coordinate information in the coordinate system; based on the target point cloud data, determine the global point cloud features used to describe the overall features of the target point cloud, and determine the appearance feature information of the point cloud data according to the global point cloud features.
(4)相关性计算。(4) Correlation calculation.
对图像数据对应的几何特征信息和图像数据对应的外观特征信息进行拼接,得到目标图像特征;对点云数据对应的几何特征信息和点云数据对应的外观特征信息进行拼接,得到目标点云特征;对目标图像特征和目标点云特征进行相关性运算,得到相关性计算结果。The geometric feature information corresponding to the image data and the appearance feature information corresponding to the image data are spliced to obtain the target image feature; the geometric feature information corresponding to the point cloud data and the appearance feature information corresponding to the point cloud data are spliced to obtain the target point cloud feature ; Carry out a correlation operation on the target image feature and the target point cloud feature to obtain the correlation calculation result.
(5)、数据匹配过程。(5), data matching process.
对上述相关性计算结果进行卷积计算,得到相似度矩阵N×M×1;对上述相似度矩阵进行取反计算,得到匹配代价矩阵N×M;对上述匹配代价矩阵进行二分图匹配处理,得到上述点云数据中的包围框和上述图像数据中的包围框之间的匹配结果,其中,该匹配结果可以为大小是N×M的匹配矩阵。Perform convolution calculation on the above correlation calculation results to obtain a similarity matrix N×M×1; perform inverse calculation on the above similarity matrix to obtain a matching cost matrix N×M; perform bipartite graph matching processing on the above matching cost matrix, A matching result between the bounding box in the point cloud data and the bounding box in the image data is obtained, wherein the matching result may be a matching matrix with a size of N×M.
通过上述描述可知,在本公开实施例中,针对摄像装置和激光雷达传感器由于响应延迟或复杂道路状况出现的弱同步情况,提出了一种数据匹配方法,该方法利用点云单模态检测模型和图像单模态检测模型预测得到的包围框,先获取点云数据所对应的3D包围框的投影2D包围框与图像数据所对应的2D包围框的几何特征信息,再通过点云特征提取网络和图像特征提取网络提取出相应包围框内的外观特征信息,最后由几何特征信息和外观特征信息的联合特征,预测出点云数据的目标检测结果与图像数据的目标检测结果的相似度矩阵。It can be seen from the above description that in the embodiment of the present disclosure, a data matching method is proposed for weak synchronization between the camera device and the lidar sensor due to response delay or complex road conditions. This method uses a point cloud single-modal detection model and the bounding box predicted by the image unimodal detection model, first obtain the geometric feature information of the projected 2D bounding box of the 3D bounding box corresponding to the point cloud data and the 2D bounding box corresponding to the image data, and then extract the network through the point cloud feature And the image feature extraction network extracts the appearance feature information in the corresponding bounding box, and finally predicts the similarity matrix between the target detection results of the point cloud data and the target detection results of the image data based on the joint features of the geometric feature information and the appearance feature information.
在本公开实施例中,在通过训练好的单模态检测模型对所述点云数据和图像数据分别进行目标检测,得到所述点云数据的目标检测结果和所述图像数据的目标检测结果之前,还需要根据以下步骤训练所述单模态检测模型:In the embodiment of the present disclosure, the target detection result of the point cloud data and the target detection result of the image data are obtained by respectively performing target detection on the point cloud data and the image data through the trained single-modal detection model Previously, the unimodal detection model also needs to be trained according to the following steps:
(1)、确定包含多个训练样本的训练样本集;其中,每个训练样本中包含:携带有样本标签的样本图像数据和样本点云数据。(1) Determine a training sample set including a plurality of training samples; wherein, each training sample includes: sample image data and sample point cloud data carrying sample labels.
(2)、通过待训练的单模态检测模型对所述训练样本集进行目标检测,得到样本目标检测结果。(2) Perform target detection on the training sample set through the single-modal detection model to be trained, and obtain sample target detection results.
(3)、根据所述样本目标检测结果和所述样本标签,确定标签匹配矩阵。(3) Determine a label matching matrix according to the sample target detection result and the sample label.
(4)、根据所述标签匹配矩阵计算目标损失函数的函数值,并根据所述目标损失函数的函数值调整所述单模态检测模型的模型参数,直至达到预设条件,得到训练完成的所述单模态检测模型。(4) Calculate the function value of the target loss function according to the label matching matrix, and adjust the model parameters of the single-modal detection model according to the function value of the target loss function until the preset condition is reached, and the training is completed The unimodal detection model.
在本公开实施例中,在对单模态检测模型进行训练时,首先需要构建包含多个训练样本的训练样本集,即包含样本标签的样本图像数据或者样本点云数据的样本合集。In the embodiment of the present disclosure, when training the single-modal detection model, it is first necessary to construct a training sample set including multiple training samples, that is, a sample collection of sample image data or sample point cloud data including sample labels.
在本公开实施例中,可以通过将上述训练样本集输入到该待训练的单模态检测模型中的方式,训练上述单模态检测模型对上述点云数据和图像数据中的样本标签分别进行识别,从而得到样本目标检测结果。In the embodiment of the present disclosure, by inputting the training sample set into the single-modal detection model to be trained, the above-mentioned single-modality detection model can be trained to carry out the sample labels in the above-mentioned point cloud data and image data respectively. recognition, so as to obtain the sample target detection results.
在得到样本目标检测结果之后,可以根据样本标签和样本目标检测结果确定标签匹配矩阵。进而,根据标签匹配矩阵计算目标损失函数,并根据目标损失函数调整所述单模态检测模型的模型参数,直至达到预设条件,得到训练完成的所述单模态检测模型,其中,上述预设条件可以为单模态检测模型训练次数满足预设次数要求,和/或,该单模态检测模型的训练精度满足预设精度要求。After the sample target detection result is obtained, the label matching matrix can be determined according to the sample label and the sample target detection result. Furthermore, the target loss function is calculated according to the label matching matrix, and the model parameters of the single-modal detection model are adjusted according to the target loss function until the preset condition is reached, and the trained single-modal detection model is obtained, wherein the above pre-set The precondition may be that the number of training times of the single-modal detection model meets the preset requirement, and/or, the training accuracy of the single-modal detection model meets the preset accuracy requirement.
需要说明的是,目标损失函数包括但不限于均方误差损失(MSE)、绝对误差损失(MAE)、交叉熵损失(BCE)等能够实现上述单模态检测模型训练的算法。It should be noted that the target loss function includes but is not limited to mean square error loss (MSE), absolute error loss (MAE), cross-entropy loss (BCE) and other algorithms that can realize the above-mentioned single-modal detection model training.
通过上述描述可知,单模态检测模型包含点云单模态检测模型和图像单模态检测模型。在对点云单模态检测模型和图像单模态检测模型进行训练时,可以基于包含样本图像数据的样本训练集对图像单模态检测模型进行训练,并基于包含样本点云数据的样本训练集对点云单模态检测模型进行训练,详细训练过程如上所述,此处不再分开描述。It can be seen from the above description that the single-modal detection model includes a point cloud single-modal detection model and an image single-modal detection model. When training the point cloud unimodal detection model and image unimodal detection model, the image unimodal detection model can be trained based on the sample training set containing sample image data, and based on the sample training set containing sample point cloud data Set to train the point cloud single-modal detection model. The detailed training process is as described above, and will not be described separately here.
上述实施方式中,通过上述所描述的方式对单模态检测模型进行训练,可以得到处理精度满足精度要求的单模态检测模型,在根据该单模态检测模型进行目标检测时,可以提高目标检测结果的准确性,从而提高数据匹配的准确率。In the above embodiment, by training the single-modal detection model in the manner described above, a single-modal detection model whose processing accuracy meets the precision requirements can be obtained, and when the target detection is performed according to the single-modal detection model, the target can be improved. The accuracy of the detection results, thereby improving the accuracy of data matching.
在本公开实施例中,如图3所示,上述步骤:根据所述样本目标检测结果和所述样本标签,确定标签匹配矩阵,详细过程描述如下:In the embodiment of the present disclosure, as shown in FIG. 3, the above steps: determine the label matching matrix according to the sample target detection result and the sample label, and the detailed process is described as follows:
(1)、计算所述样本目标检测结果中样本图像数据对应的至少一个预测包围框和所述样本标签中样本图像数据对应的标注包围框之间的交并比,得到第一交并比值;并根据所述第一交并比值对所述至少一个预测包围框进行筛选,得到目标预测包围框。(1), calculating the intersection ratio between at least one predicted bounding box corresponding to the sample image data in the sample target detection result and the labeled bounding box corresponding to the sample image data in the sample label, to obtain a first intersection ratio; And filtering the at least one predicted bounding box according to the first intersection-union ratio to obtain a target predicted bounding box.
例如,在本公开实施例中,可以将训练样本所包含的样本图像数据输入至图像单模态检 测模型中,得到包含至少一个预测包围框的样本目标检测结果,其中,至少一个预测包围框又可以称为预测2D包围框。For example, in an embodiment of the present disclosure, the sample image data included in the training sample can be input into the image unimodal detection model to obtain a sample target detection result containing at least one predicted bounding box, wherein the at least one predicted bounding box is Can be called predicting 2D bounding boxes.
之后,计算每个预测2D包围框和样本图像数据中各个标注包围框之间的交并比IOU(Intersection Over Union),得到多个第一交并比值;然后,根据多个第一交并比值对至少一个预测包围框进行筛选,得到目标预测包围框,详细筛选过程描述如下:After that, calculate the intersection ratio IOU (Intersection Over Union) between each predicted 2D bounding box and each labeled bounding box in the sample image data to obtain multiple first intersection ratios; then, according to multiple first intersection ratios Filter at least one predicted bounding box to obtain the target predicted bounding box. The detailed screening process is described as follows:
首先,针对每个预测2D包围框来说,判断多个第一交并比值中是否满足大于或者等于预设阈值的交并比值。若判断出包含,则将该预测2D包围框确定为目标预测包围框。此时,可以在多个第一交并比值中确定最大交并比值所对应的标注包围框,并将该标注包围框确定为与该预测2D包围框相匹配的包围框。若确定出不包含,则舍弃该预测2D包围框。Firstly, for each predicted 2D bounding box, it is judged whether among the plurality of first intersection and union ratios satisfies an intersection and union ratio greater than or equal to a preset threshold. If it is determined that it is contained, the predicted 2D bounding box is determined as the target predicted bounding box. At this time, the labeled bounding box corresponding to the largest intersection-union ratio may be determined from the plurality of first intersection-union ratios, and the labeled bounding box is determined as a bounding box matching the predicted 2D bounding box. If it is determined that it does not contain, the predicted 2D bounding box is discarded.
(2)、计算所述样本目标检测结果中样本点云数据对应的至少一个预测包围盒和所述样本标签中样本点云数据对应的标注包围盒之间的交并比,得到第二交并比值;并根据所述第二交并比值对所述至少一个预测包围盒进行筛选,得到目标预测包围盒。(2) Calculate the intersection and union ratio between at least one predicted bounding box corresponding to the sample point cloud data in the sample target detection result and the marked bounding box corresponding to the sample point cloud data in the sample label, to obtain the second intersection and union ratio; and filter the at least one prediction bounding box according to the second intersection ratio to obtain a target prediction bounding box.
例如,在本公开实施例中,可以将训练样本所包含的样本点云数据输入至点云单模态检测模型中,得到包含至少一个预测包围盒的样本目标检测结果,其中,至少一个预测包围盒又可以称为预测3D包围盒。For example, in an embodiment of the present disclosure, the sample point cloud data included in the training sample can be input into the point cloud single-modal detection model to obtain a sample target detection result containing at least one predicted bounding box, wherein at least one predicted bounding box A box may also be called a predicted 3D bounding box.
之后,计算每个预测3D包围盒和样本点云数据中各个标注包围盒之间的交并比IOU,得到多个第二交并比值;然后,根据多个第二交并比值对至少一个预测包围盒进行筛选,得到目标预测包围盒,详细筛选过程描述如下:After that, calculate the intersection and union ratio IOU between each predicted 3D bounding box and each labeled bounding box in the sample point cloud data, and obtain multiple second intersection and union ratios; then, at least one prediction is made according to multiple second intersection and union ratios The bounding box is screened to obtain the target prediction bounding box. The detailed screening process is described as follows:
首先,针对每个预测3D包围盒来说,判断多个第二交并比值中是否满足大于或者等于预设阈值的交并比值。若判断出包含,则将该预测3D包围盒确定为目标预测包围盒。此时,可以在多个第二交并比值中确定最大交并比值所对应的标注包围盒,并将该标注包围盒确定为与该预测3D包围盒相匹配的包围盒。若确定出不包含,则舍弃该预测3D包围盒。First, for each predicted 3D bounding box, it is judged whether the plurality of second intersection and union ratios satisfy an intersection and union ratio greater than or equal to a preset threshold. If it is determined that it is contained, the predicted 3D bounding box is determined as the target predicted bounding box. At this time, the label bounding box corresponding to the maximum intersection and union ratio may be determined from among the plurality of second intersection and union ratios, and the label bounding box is determined as a bounding box matching the predicted 3D bounding box. If it is determined that it does not contain, the predicted 3D bounding box is discarded.
(3)、将所述目标预测包围框和所述目标预测包围盒进行匹配,得到标签匹配结果,并根据所述标签匹配结果确定所述标签匹配矩阵。(3) Match the predicted target bounding box with the predicted target bounding box to obtain a label matching result, and determine the label matching matrix according to the label matching result.
在确定出目标预测包围框和目标预测包围盒之后,将目标预测包围框和目标预测包围盒中对应相同对象的目标预测包围盒和目标预测包围框作为一个标签匹配对;并将标签匹配矩阵对应的位置设为1,而不匹配的位置为0,从而得到一个标签匹配矩阵。After the target prediction bounding box and the target prediction bounding box are determined, the target prediction bounding box and the target prediction bounding box corresponding to the same object in the target prediction bounding box and the target prediction bounding box are regarded as a label matching pair; and the label matching matrix is corresponding to The position of is set to 1, and the unmatched position is set to 0, so as to obtain a label matching matrix.
上述实施方式中,通过上述处理方式,可以将准确的将预测包围框和预测包围盒进行匹配,得到标签匹配矩阵;在根据该标签匹配矩阵确定目标损失函数的函数值时,可以得到准确的函数值,从而提高单模态检测模型的训练精度。In the above embodiment, through the above processing method, the predicted bounding box can be accurately matched with the predicted bounding box to obtain the label matching matrix; when the function value of the target loss function is determined according to the label matching matrix, an accurate function can be obtained value, thereby improving the training accuracy of the unimodal detection model.
在本公开实施例中,确定包含多个训练样本的训练样本集,详细过程描述如下:In the embodiment of the present disclosure, a training sample set including multiple training samples is determined, and the detailed process is described as follows:
(1)、获取目标跟踪数据序列,其中,所述目标跟踪数据序列中包含在各个跟踪时刻获取到的图像数据和点云数据。(1) Acquiring a target tracking data sequence, wherein the target tracking data sequence includes image data and point cloud data acquired at each tracking moment.
(2)、在所述目标跟踪数据序列中确定至少一个数据组合,其中,每个所述数据组合包括:目标图像数据和目标点云数据;所述目标图像数据的第一跟踪时刻和所述目标点云数据的第二跟踪时刻不相同,且所述第一跟踪时刻和所述第二跟踪时刻之间的时间间隔为预设间隔。(2) At least one data combination is determined in the target tracking data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the The second tracking moment of the target point cloud data is different, and the time interval between the first tracking moment and the second tracking moment is a preset interval.
(3)、将每个所述数据组合中的数据作为每个训练样本中的数据。(3) Using the data in each of the data combinations as the data in each training sample.
在本公开实施例中,首先分别获取包含图像数据和点云数据的目标跟踪数据序列,其中,上述目标跟踪数据序列中包含足够用于跟踪以及训练上述单模态检测模型的数据。In the embodiment of the present disclosure, firstly, target tracking data sequences including image data and point cloud data are respectively acquired, wherein the above target tracking data sequences contain enough data for tracking and training the above single modality detection model.
在本公开实施例中,首先,在同一个目标跟踪数据序列里,选取第一跟踪时刻的目标图像数据image k,然后根据预设间隔,间隔若干帧确定出第二跟踪时刻并选取该时刻的目标点云数据PC k+n,其构建原理为弱同步的传递性,当前帧的目标图像数据image k与间隔若干帧的图像数据image k+n会出现时空的弱同步,则图像数据image k+n对应的强同步目标点云数据PC k+n也会与目标图像数据image k产生时空弱同步。 In the embodiment of the present disclosure, firstly, in the same target tracking data sequence, the target image data image k at the first tracking moment is selected, and then according to the preset interval, the second tracking moment is determined at intervals of several frames and the image k at this moment is selected. The target point cloud data PC k+n , whose construction principle is the transitivity of weak synchronization, the target image data image k of the current frame and the image data image k+n of several frames apart will have weak synchronization in time and space, then the image data image k The strongly synchronized target point cloud data PC k+n corresponding to + n will also generate spatiotemporal weak synchronization with the target image data image k .
在本公开实施例中,将上述目标图像数据image k和目标点云数据PC k+n分别确定为训练样本中的样本图像数据和样本点云数据。 In the embodiment of the present disclosure, the above-mentioned target image data image k and target point cloud data PC k+n are respectively determined as sample image data and sample point cloud data in the training samples.
上述实施方式中,本公开技术方案提出了一种构建点云与图像弱同步多模态数据集的方法,通过该方法可以模拟自动驾驶实际场景可能出现的弱同步情况,在通过该训练样本集训练单模态检测模型时,就可以使得训练之后的单模态检测模型能够适应多模态数据集的弱同步场景。In the above-mentioned embodiments, the technical solution of the present disclosure proposes a method for constructing a point cloud and image weak synchronization multi-modal data set, through which the weak synchronization situation that may occur in the actual scene of automatic driving can be simulated, and the training sample set When training the unimodal detection model, the trained unimodal detection model can be adapted to the weak synchronization scenario of the multimodal data set.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的实际执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not imply a strict execution order and constitutes any limitation on the implementation process. The actual execution order of each step should be based on its function and possible The inner logic is OK.
基于同一发明构思,本公开实施例中还提供了与数据匹配方法对应的数据匹配装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述数据匹配方法相似,因此装置的实施可以参见方法的实施。Based on the same inventive concept, the embodiment of the present disclosure also provides a data matching device corresponding to the data matching method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned data matching method of the embodiment of the present disclosure, the implementation of the device See the implementation of the method.
参照图4所示,为本公开实施例提供的一种数据匹配装置的示意图,所述装置包括:获取模块41、确定模块42、匹配模块43;其中,Referring to FIG. 4 , it is a schematic diagram of a data matching device provided by an embodiment of the present disclosure. The device includes: an acquisition module 41, a determination module 42, and a matching module 43; wherein,
获取模块41,配置为分别对待匹配的点云数据和图像数据进行检测,得到所述点云数据的目标检测结果和图像数据的目标检测结果;其中,所述目标检测结果包括检测出的目标对象的包围框信息;The acquisition module 41 is configured to detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the detected target object The bounding box information;
确定模块42:配置为根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息;根据所述图像数据的目标检测结果,确定所述图像数据对应的目标特征信息;所述目标特征信息包括检测出的目标对象的包围框的几何特征信息,以及所述包围框内的目标对象的外观特征信息;Determination module 42: configured to determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; determine the target feature information corresponding to the image data according to the target detection result of the image data; The target feature information includes geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box;
匹配模块43:配置为根据所述目标特征信息,对基于所述点云数据确定的包围框和基于所述图像数据确定的包围框进行匹配。Matching module 43: configured to match the bounding box determined based on the point cloud data with the bounding box determined based on the image data according to the target feature information.
本公开实施例中,通过结合图像数据和点云数据进行目标检测的方式,可以通过点云数据弥补图像数据容易受到光照和遮挡影响的缺陷,同时通过图像数据可以弥补点云数据稀疏以及无纹理的缺陷。因此,将图像数据和点云数据进行结合来检测3D目标,可以提高3D目标的检测准确性,进而得到更加准确的目标检测结果。In the embodiment of the present disclosure, by combining image data and point cloud data for target detection, point cloud data can be used to make up for the defect that image data is easily affected by illumination and occlusion, and image data can be used to make up for the sparseness and texturelessness of point cloud data. Defects. Therefore, combining image data and point cloud data to detect 3D objects can improve the detection accuracy of 3D objects, and then obtain more accurate object detection results.
一种可能的实施方式中,确定模块42,还配置为:将所述点云数据的目标检测结果投影至所述图像数据中,得到所述点云数据的包围框的投影框;根据所述投影框的顶点在所述图像数据中的像素坐标,确定所述点云数据的几何特征信息。In a possible implementation manner, the determination module 42 is further configured to: project the target detection result of the point cloud data into the image data to obtain the projection frame of the bounding frame of the point cloud data; according to the The pixel coordinates of the vertices of the projection frame in the image data determine the geometric feature information of the point cloud data.
一种可能的实施方式中,所述几何特征信息包括所述包围框的位置信息和/或尺寸信息。In a possible implementation manner, the geometric feature information includes position information and/or size information of the bounding box.
一种可能的实施方式中,确定模块42,还配置为:确定所述图像数据中位于检测出的包围框内的目标图像数据;提取所述目标图像数据的图像特征,并将提取到的所述图像特征确定为所述图像数据对应的外观特征信息。In a possible implementation manner, the determination module 42 is further configured to: determine the target image data within the detected bounding box in the image data; extract image features of the target image data, and extract all the extracted The image feature is determined as appearance feature information corresponding to the image data.
一种可能的实施方式中,确定模块42,还配置为:确定所述点云数据中位于检测出的包围框内的目标点云数据;其中,所述目标点云数据包括:位于包围框内目标点云的数量和/或所述目标点云在点云坐标系下的坐标信息;基于所述目标点云数据,确定用于描述所述目标点云的整体特征的全局点云特征,并根据所述全局点云特征确定所述点云数据对应的外观特征信息。In a possible implementation manner, the determination module 42 is further configured to: determine the target point cloud data located in the detected bounding box in the point cloud data; wherein, the target point cloud data includes: located in the bounding box The quantity of the target point cloud and/or the coordinate information of the target point cloud in the point cloud coordinate system; based on the target point cloud data, determine the global point cloud feature used to describe the overall feature of the target point cloud, and The appearance feature information corresponding to the point cloud data is determined according to the global point cloud feature.
一种可能的实施方式中,匹配模块43,还配置为:对所述图像数据的目标特征信息和所述点云数据的目标特征信息进行相关性计算,得到相关性计算结果,得到相关性计算结果;根据所述相关性计算结果确定所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果。In a possible implementation manner, the matching module 43 is further configured to: perform correlation calculation on the target feature information of the image data and the target feature information of the point cloud data, obtain a correlation calculation result, and obtain a correlation calculation Result: determining a matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result.
一种可能的实施方式中,匹配模块43,还配置为:对所述图像数据对应的几何特征信息和所述图像数据对应的外观特征信息进行拼接,得到目标图像特征;对所述点云数据对应的几何特征信息和所述点云数据对应的外观特征信息进行拼接,得到目标点云特征;对所述目标图像特征和所述目标点云特征进行相关性运算,得到所述相关性计算结果。In a possible implementation manner, the matching module 43 is further configured to: splice the geometric feature information corresponding to the image data and the appearance feature information corresponding to the image data to obtain target image features; The corresponding geometric feature information and the appearance feature information corresponding to the point cloud data are spliced to obtain the target point cloud feature; the correlation calculation is performed on the target image feature and the target point cloud feature to obtain the correlation calculation result .
一种可能的实施方式中,匹配模块43,还配置为:对所述相关性计算结果进行卷积计算,得到相似度矩阵,其中,所述相似度矩阵用于表征所述点云数据中的包围框和所述图像数据 中的包围框之间的相似程度;对所述相似度矩阵进行取反计算,得到匹配代价矩阵;对所述匹配代价矩阵进行二分图匹配处理,得到所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果。In a possible implementation manner, the matching module 43 is further configured to: perform convolution calculation on the correlation calculation result to obtain a similarity matrix, wherein the similarity matrix is used to represent the points in the point cloud data. The degree of similarity between the bounding box and the bounding box in the image data; inverting the similarity matrix to obtain a matching cost matrix; performing bipartite graph matching processing on the matching cost matrix to obtain the point cloud Matching results between bounding boxes in the data and bounding boxes in the image data.
一种可能的实施方式中,匹配模块43,还配置为:通过训练好的单模态检测模型分别对所述点云数据和所述图像数据进行目标检测,得到所述点云数据的目标检测结果和所述图像数据的目标检测结果。In a possible implementation manner, the matching module 43 is further configured to: respectively perform target detection on the point cloud data and the image data through the trained single-modal detection model, to obtain the target detection of the point cloud data Results and object detection results of the image data.
一种可能的实施方式中,该装置,还配置为:根据以下步骤训练所述单模态检测模型:确定包含多个训练样本的训练样本集;其中,每个训练样本中包含:携带有样本标签的样本图像数据和样本点云数据;通过待训练的单模态检测模型对所述训练样本集进行目标检测,得到样本目标检测结果;根据所述样本目标检测结果和所述样本标签,确定标签匹配矩阵;根据所述标签匹配矩阵计算目标损失函数,并根据所述目标损失函数调整所述单模态检测模型的模型参数,直至达到预设条件,得到训练完成的所述单模态检测模型。In a possible implementation manner, the device is further configured to: train the single-modal detection model according to the following steps: determine a training sample set including a plurality of training samples; wherein, each training sample includes: carrying a sample The sample image data and sample point cloud data of the label; the target detection is performed on the training sample set through the single-mode detection model to be trained, and the sample target detection result is obtained; according to the sample target detection result and the sample label, determine Label matching matrix; calculate the target loss function according to the label matching matrix, and adjust the model parameters of the single-modal detection model according to the target loss function until the preset condition is reached, and the single-modal detection after training is obtained Model.
一种可能的实施方式中,该装置,还配置为:计算所述样本目标检测结果中样本图像数据对应的至少一个预测包围框和所述样本标签中样本图像数据对应的标注包围框之间的交并比,得到第一交并比值;并根据所述第一交并比值对所述至少一个预测包围框进行筛选,得到目标预测包围框;计算所述样本目标检测结果中样本点云数据对应的至少一个预测包围盒和所述样本标签中样本点云数据对应的标注包围盒之间的交并比,得到第二交并比值;并根据所述第二交并比值对所述至少一个预测包围盒进行筛选,得到目标预测包围盒;将所述目标预测包围框和所述目标预测包围盒进行匹配,得到标签匹配结果,并根据所述标签匹配结果确定所述标签匹配矩阵。In a possible implementation manner, the device is further configured to: calculate the distance between at least one predicted bounding box corresponding to the sample image data in the sample target detection result and the labeled bounding box corresponding to the sample image data in the sample label Intersection and union ratio to obtain the first intersection and union ratio; and filter the at least one prediction bounding box according to the first intersection and union ratio to obtain the target prediction bounding box; calculate the corresponding sample point cloud data in the sample target detection result The intersection and union ratio between at least one predicted bounding box and the label bounding box corresponding to the sample point cloud data in the sample label, to obtain a second intersection and union ratio; and according to the second intersection and union ratio for the at least one prediction The bounding box is screened to obtain a target prediction bounding box; the target prediction bounding box is matched with the target prediction bounding box to obtain a label matching result, and the label matching matrix is determined according to the label matching result.
一种可能的实施方式中,该装置,还配置为:获取目标跟踪数据序列,其中,所述目标跟踪数据序列中包含在各个跟踪时刻获取到的图像数据和点云数据;在所述目标跟踪数据序列中确定至少一个数据组合,其中,每个所述数据组合包括:目标图像数据和目标点云数据;所述目标图像数据的第一跟踪时刻和所述目标点云数据的第二跟踪时刻不相同,且所述第一跟踪时刻和所述第二跟踪时刻之间的时间间隔为预设间隔;将每个所述数据组合中的数据作为每个训练样本中的数据。In a possible implementation manner, the device is further configured to: acquire a target tracking data sequence, wherein the target tracking data sequence includes image data and point cloud data acquired at each tracking moment; At least one data combination is determined in the data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the second tracking moment of the target point cloud data are not the same, and the time interval between the first tracking moment and the second tracking moment is a preset interval; the data in each data combination is used as the data in each training sample.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明。For a description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments.
对应于图1中的数据匹配方法,本公开实施例还提供了一种电子设备500,如图5所示,为本公开实施例提供的电子设备500结构示意图,包括:Corresponding to the data matching method in FIG. 1, the embodiment of the present disclosure also provides an electronic device 500, as shown in FIG. 5, which is a schematic structural diagram of the electronic device 500 provided in the embodiment of the present disclosure, including:
处理器51、存储器52、和总线53;存储器52用于存储执行指令,包括内存521和外部存储器522;这里的内存521也称内存储器,用于暂时存放处理器51中的运算数据,以及与硬盘等外部存储器522交换的数据,处理器51通过内存521与外部存储器522进行数据交换,当所述电子设备500运行时,所述处理器51与所述存储器52之间通过总线53通信,使得所述处理器51执行以下指令: Processor 51, memory 52, and bus 53; memory 52 is used for storing and executing instruction, comprises memory 521 and external memory 522; memory 521 here is also called internal memory, is used for temporarily storing computing data in processor 51, and The data exchanged by the external memory 522 such as hard disk, the processor 51 exchanges data with the external memory 522 through the memory 521, when the electronic device 500 is running, the processor 51 communicates with the memory 52 through the bus 53, so that The processor 51 executes the following instructions:
分别对待匹配的点云数据和图像数据进行检测,得到所述点云数据的目标检测结果和图像数据的目标检测结果;其中,所述目标检测结果包括检测出的目标对象的包围框信息;根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息;根据所述图像数据的目标检测结果,确定所述图像数据对应的目标特征信息;所述目标特征信息包括检测出的目标对象的包围框的几何特征信息,以及所述包围框内的目标对象的外观特征信息;根据所述目标特征信息,对基于所述点云数据确定的包围框和基于所述图像数据确定的包围框进行匹配。Detecting the point cloud data and the image data to be matched respectively, and obtaining the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the bounding box information of the detected target object; according to According to the target detection result of the point cloud data, determine the target feature information corresponding to the point cloud data; according to the target detection result of the image data, determine the target feature information corresponding to the image data; the target feature information includes detection The geometric feature information of the bounding box of the target object and the appearance feature information of the target object in the bounding box; according to the target feature information, the bounding box determined based on the point cloud data and based on the image data The determined bounding boxes are matched.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的数据匹配方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored. When the computer program is run by a processor, the steps of the data matching method described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的数据匹配方法的步骤,可参见上述方法实施例。The embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the data matching method described in the above method embodiment, please refer to the above method implementation example.
其中,上述计算机程序产品可以通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品可以体现为计算机存储介质,在另一个可选实施例中,计算机程序产品可以体现为软件产品,例如SDK(Software Development Kit,软件开发包)等。Wherein, the above-mentioned computer program product may be realized by hardware, software or a combination thereof. In an optional embodiment, the computer program product can be embodied as a computer storage medium, and in another optional embodiment, the computer program product can be embodied as a software product, such as SDK (Software Development Kit, software development kit) etc. .
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的详细工作过程,可以参考前述方法实施例中的对应过程。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that for the convenience and brevity of description, for the detailed working process of the system and device described above, reference can be made to the corresponding process in the foregoing method embodiments. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台电子设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.
工业实用性Industrial Applicability
本公开实施例提供了一种数据匹配方法及装置、电子设备、存储介质和程序产品,其中,该方法包括:分别对待匹配的点云数据和图像数据进行检测,得到点云数据的目标检测结果和图像数据的目标检测结果;根据点云数据的目标检测结果,确定点云数据和图像数据分别对应的目标特征信息;根据图像数据的目标检测结果,确定图像数据对应的目标特征信息;目标特征信息包括检测出的目标对象的包围框的几何特征信息,以及包围框内的目标对象的外观特征信息;根据目标特征信息,对点云数据中的包围框和所述图像数据中的包围框进行匹配。本公开实施例通过将图像数据和点云数据进行结合来检测3D目标,可以提高3D目标的检测准确性,进而得到更加准确的目标检测结果。Embodiments of the present disclosure provide a data matching method and device, electronic equipment, a storage medium, and a program product, wherein the method includes: respectively detecting the point cloud data and image data to be matched, and obtaining the target detection result of the point cloud data and the target detection results of the image data; according to the target detection results of the point cloud data, determine the target feature information corresponding to the point cloud data and the image data respectively; according to the target detection results of the image data, determine the target feature information corresponding to the image data; The information includes the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object in the bounding box; according to the target feature information, the bounding box in the point cloud data and the bounding box in the image data are processed match. In the embodiments of the present disclosure, the 3D target is detected by combining image data and point cloud data, which can improve the detection accuracy of the 3D target, and further obtain a more accurate target detection result.

Claims (16)

  1. 一种数据匹配方法,其中,包括:A data matching method, including:
    分别对待匹配的点云数据和图像数据进行检测,得到所述点云数据的目标检测结果和图像数据的目标检测结果;其中,所述目标检测结果包括检测出的目标对象的包围框信息;Detecting the point cloud data and the image data to be matched respectively, and obtaining the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the bounding box information of the detected target object;
    根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息;根据所述图像数据的目标检测结果,确定所述图像数据对应的目标特征信息;所述目标特征信息包括检测出的目标对象的包围框的几何特征信息,以及所述包围框内的目标对象的外观特征信息;According to the target detection result of the point cloud data, determine the target feature information corresponding to the point cloud data; according to the target detection result of the image data, determine the target feature information corresponding to the image data; the target feature information includes Geometric feature information of the detected bounding box of the target object, and appearance feature information of the target object within the bounding box;
    根据所述目标特征信息,对基于所述点云数据确定的包围框和基于所述图像数据确定的包围框进行匹配。Matching the bounding box determined based on the point cloud data and the bounding box determined based on the image data according to the target feature information.
  2. 根据权利要求1所述的方法,其中,所述根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息,包括:The method according to claim 1, wherein said determining the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data comprises:
    将所述点云数据的目标检测结果投影至所述图像数据中,得到所述点云数据所对应的包围框的投影框;Projecting the target detection result of the point cloud data into the image data to obtain the projection frame of the bounding frame corresponding to the point cloud data;
    根据所述投影框的顶点在所述图像数据中的像素坐标,确定所述点云数据的几何特征信息。The geometric feature information of the point cloud data is determined according to the pixel coordinates of the vertices of the projection frame in the image data.
  3. 根据权利要求1或2所述的方法,其中,所述几何特征信息包括所述包围框的位置信息和/或尺寸信息。The method according to claim 1 or 2, wherein the geometric feature information includes position information and/or size information of the bounding box.
  4. 根据权利要求1至3中任一项所述的方法,其中,所述根据所述图像数据的目标检测结果,确定所述图像数据对应的目标特征信息,包括:The method according to any one of claims 1 to 3, wherein said determining the target feature information corresponding to the image data according to the target detection result of the image data includes:
    确定所述图像数据中位于检测出的包围框内的目标图像数据;determining target image data within the detected bounding box in the image data;
    提取所述目标图像数据的图像特征,并将提取到的所述图像特征确定为所述图像数据对应的外观特征信息。Extracting image features of the target image data, and determining the extracted image features as appearance feature information corresponding to the image data.
  5. 根据权利要求1至4中任一项所述的方法,其中,所述根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息,包括:The method according to any one of claims 1 to 4, wherein said determining the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data includes:
    确定所述点云数据中位于检测出的包围框内的目标点云数据;其中,所述目标点云数据包括:位于包围框内目标点云的数量和/或所述目标点云在点云坐标系下的坐标信息;Determine the target point cloud data located in the detected bounding box in the point cloud data; wherein, the target point cloud data includes: the number of target point clouds located in the bounding box and/or the target point cloud in the point cloud Coordinate information in the coordinate system;
    基于所述目标点云数据,确定用于描述所述目标点云的整体特征的全局点云特征,并根据所述全局点云特征确定所述点云数据对应的外观特征信息。Based on the target point cloud data, determine global point cloud features used to describe the overall features of the target point cloud, and determine appearance feature information corresponding to the point cloud data according to the global point cloud features.
  6. 根据权利要求1至5中任一项所述的方法,其中,所述根据所述目标特征信息,对基于所述点云数据确定的包围框和基于所述图像数据确定的包围框进行匹配,包括:The method according to any one of claims 1 to 5, wherein, according to the target feature information, matching the bounding box determined based on the point cloud data and the bounding box determined based on the image data, include:
    对所述图像数据的目标特征信息和所述点云数据的目标特征信息进行相关性计算,得到相关性计算结果;performing correlation calculation on the target feature information of the image data and the target feature information of the point cloud data to obtain a correlation calculation result;
    根据所述相关性计算结果确定所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果。A matching result between the bounding box in the point cloud data and the bounding box in the image data is determined according to the correlation calculation result.
  7. 根据权利要求6所述的方法,其中,所述对所述图像数据的目标特征信息和所述点云数据的目标特征信息进行相关性计算,得到相关性计算结果,包括:The method according to claim 6, wherein the correlation calculation is performed on the target feature information of the image data and the target feature information of the point cloud data to obtain a correlation calculation result, including:
    对所述图像数据对应的几何特征信息和所述图像数据对应的外观特征信息进行拼接,得到目标图像特征;splicing the geometric feature information corresponding to the image data and the appearance feature information corresponding to the image data to obtain target image features;
    对所述点云数据对应的几何特征信息和所述点云数据对应的外观特征信息进行拼接,得到目标点云特征;Splicing the geometric feature information corresponding to the point cloud data and the appearance feature information corresponding to the point cloud data to obtain the target point cloud feature;
    对所述目标图像特征和所述目标点云特征进行相关性运算,得到所述相关性计算结果。performing a correlation calculation on the target image feature and the target point cloud feature to obtain the correlation calculation result.
  8. 根据权利要求6或7所述的方法,其中,所述根据所述相关性计算结果确定所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果,包括:The method according to claim 6 or 7, wherein said determining the matching result between the bounding box in the point cloud data and the bounding box in the image data according to the correlation calculation result comprises:
    对所述相关性计算结果进行卷积计算,得到相似度矩阵,其中,所述相似度矩阵用于表征所述点云数据中的包围框和所述图像数据中的包围框之间的相似程度;Carrying out convolution calculation on the correlation calculation result to obtain a similarity matrix, wherein the similarity matrix is used to characterize the degree of similarity between the bounding boxes in the point cloud data and the bounding boxes in the image data ;
    对所述相似度矩阵进行取反计算,得到匹配代价矩阵;Inverting the similarity matrix to obtain a matching cost matrix;
    对所述匹配代价矩阵进行二分图匹配处理,得到所述点云数据中的包围框和所述图像数据中的包围框之间的匹配结果。A bipartite graph matching process is performed on the matching cost matrix to obtain a matching result between the bounding boxes in the point cloud data and the bounding boxes in the image data.
  9. 根据权利要求1至8中任一项所述的方法,其中,所述分别对待匹配的点云数据和图像数据进行检测,得到所述点云数据的目标检测结果和图像数据的目标检测结果,包括:The method according to any one of claims 1 to 8, wherein said point cloud data and image data to be matched are detected respectively to obtain target detection results of said point cloud data and target detection results of image data, include:
    通过训练好的单模态检测模型分别对所述点云数据和所述图像数据进行目标检测,得到所述点云数据的目标检测结果和所述图像数据的目标检测结果。Target detection is performed on the point cloud data and the image data respectively by using the trained single-mode detection model to obtain a target detection result of the point cloud data and a target detection result of the image data.
  10. 根据权利要求9所述的方法,其中,根据以下步骤训练所述单模态检测模型:The method according to claim 9, wherein the single modality detection model is trained according to the following steps:
    确定包含多个训练样本的训练样本集;其中,每个训练样本中包含:携带有样本标签的样本图像数据或者样本点云数据;Determining a training sample set comprising a plurality of training samples; wherein, each training sample includes: sample image data or sample point cloud data carrying sample labels;
    通过待训练的单模态检测模型对所述训练样本集进行目标检测,得到样本目标检测结果;performing target detection on the training sample set through the single-mode detection model to be trained to obtain a sample target detection result;
    根据所述样本目标检测结果和所述样本标签,确定标签匹配矩阵;Determine a label matching matrix according to the sample target detection result and the sample label;
    根据所述标签匹配矩阵计算目标损失函数的函数值,并根据所述目标损失函数的函数值调整所述单模态检测模型的模型参数,直至达到预设条件,得到训练完成的所述单模态检测模型。Calculate the function value of the target loss function according to the label matching matrix, and adjust the model parameters of the single-mode detection model according to the function value of the target loss function until the preset condition is reached, and the trained single-mode is obtained. state detection model.
  11. 根据权利要求10所述的方法,其中,所述根据所述样本目标检测结果和所述样本标签,确定标签匹配矩阵包括:The method according to claim 10, wherein said determining a label matching matrix according to said sample target detection result and said sample label comprises:
    计算所述样本目标检测结果中样本图像数据对应的至少一个预测包围框和所述样本标签中样本图像数据对应的标注包围框之间的交并比,得到第一交并比值;并根据所述第一交并比值对所述至少一个预测包围框进行筛选,得到目标预测包围框;calculating an intersection ratio between at least one predicted bounding box corresponding to the sample image data in the sample target detection result and an labeled bounding box corresponding to the sample image data in the sample label, to obtain a first intersection ratio; and according to the Filtering the at least one predicted bounding box by the first intersection ratio to obtain the target predicted bounding box;
    计算所述样本目标检测结果中样本点云数据对应的至少一个预测包围盒和所述样本标签中样本点云数据对应的标注包围盒之间的交并比,得到第二交并比值;并根据所述第二交并比值对所述至少一个预测包围盒进行筛选,得到目标预测包围盒;calculating an intersection ratio between at least one predicted bounding box corresponding to the sample point cloud data in the sample target detection result and an labeled bounding box corresponding to the sample point cloud data in the sample label, to obtain a second intersection ratio; and according to The second intersection ratio filters the at least one predicted bounding box to obtain a target predicted bounding box;
    将所述目标预测包围框和所述目标预测包围盒进行匹配,得到标签匹配结果,并根据所述标签匹配结果确定所述标签匹配矩阵。Matching the target prediction bounding box and the target prediction bounding box to obtain a label matching result, and determining the label matching matrix according to the label matching result.
  12. 根据权利要求10至11所述的方法,其中,所述确定包含多个训练样本的训练样本集,包括:The method according to claims 10 to 11, wherein said determining a training sample set comprising a plurality of training samples comprises:
    获取目标跟踪数据序列,其中,所述目标跟踪数据序列中包含在各个跟踪时刻获取到的图像数据和点云数据;Obtaining a target tracking data sequence, wherein the target tracking data sequence includes image data and point cloud data obtained at each tracking moment;
    在所述目标跟踪数据序列中确定至少一个数据组合,其中,每个所述数据组合包括:目标图像数据和目标点云数据;所述目标图像数据的第一跟踪时刻和所述目标点云数据的第二跟踪时刻不相同,且所述第一跟踪时刻和所述第二跟踪时刻之间的时间间隔为预设间隔;Determine at least one data combination in the target tracking data sequence, wherein each of the data combinations includes: target image data and target point cloud data; the first tracking moment of the target image data and the target point cloud data The second tracking moment is different, and the time interval between the first tracking moment and the second tracking moment is a preset interval;
    将每个所述数据组合中的数据作为每个训练样本中的数据。The data in each of the data combinations is used as the data in each training sample.
  13. 一种数据匹配装置,其中,包括:A data matching device, including:
    获取模块:配置为分别对待匹配的点云数据和图像数据进行检测,得到所述点云数 据的目标检测结果和图像数据的目标检测结果;其中,所述目标检测结果包括检测出的目标对象的包围框信息;Acquisition module: configured to detect the point cloud data and image data to be matched respectively, and obtain the target detection result of the point cloud data and the target detection result of the image data; wherein, the target detection result includes the detected target object bounding box information;
    确定模块:配置为根据所述点云数据的目标检测结果,确定所述点云数据对应的目标特征信息;根据所述图像数据的目标检测结果,确定所述图像数据对应的目标特征信息;所述目标特征信息包括检测出的目标对象的包围框的几何特征信息,以及所述包围框内的目标对象的外观特征信息;Determination module: configured to determine the target feature information corresponding to the point cloud data according to the target detection result of the point cloud data; determine the target feature information corresponding to the image data according to the target detection result of the image data; The target feature information includes the geometric feature information of the detected bounding box of the target object, and the appearance feature information of the target object in the bounding box;
    匹配模块:配置为根据所述目标特征信息,对基于所述点云数据确定的包围框和基于所述图像数据确定的包围框进行匹配。Matching module: configured to match the bounding box determined based on the point cloud data with the bounding box determined based on the image data according to the target feature information.
  14. 一种电子设备,其中,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至12任意一项所述的数据匹配方法的步骤。An electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and the memory pass through Bus communication, when the machine-readable instructions are executed by the processor, the steps of the data matching method according to any one of claims 1 to 12 are executed.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行如权利要求1至12任意一项所述的数据匹配方法的步骤。A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the data matching method according to any one of claims 1 to 12 are executed .
  16. 一种计算机程序产品,其中,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序被计算机读取并执行时,实现如权利要求1至12任意一项所述的数据匹配方法的步骤。A computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, any one of claims 1 to 12 is realized. The steps of the data matching method.
PCT/CN2022/075419 2021-08-27 2022-02-07 Data matching method and apparatus, and electronic device, storage medium and program product WO2023024443A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110994415.5 2021-08-27
CN202110994415.5A CN113705669A (en) 2021-08-27 2021-08-27 Data matching method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023024443A1 true WO2023024443A1 (en) 2023-03-02

Family

ID=78655867

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/075419 WO2023024443A1 (en) 2021-08-27 2022-02-07 Data matching method and apparatus, and electronic device, storage medium and program product

Country Status (2)

Country Link
CN (1) CN113705669A (en)
WO (1) WO2023024443A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117894015A (en) * 2024-03-15 2024-04-16 浙江华是科技股份有限公司 Point cloud annotation data optimization method and system
CN117894015B (en) * 2024-03-15 2024-05-24 浙江华是科技股份有限公司 Point cloud annotation data optimization method and system

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705669A (en) * 2021-08-27 2021-11-26 上海商汤临港智能科技有限公司 Data matching method and device, electronic equipment and storage medium
CN114310875B (en) * 2021-12-20 2023-12-05 珠海格力智能装备有限公司 Crankshaft positioning identification method, device, storage medium and equipment
CN114241011A (en) * 2022-02-22 2022-03-25 阿里巴巴达摩院(杭州)科技有限公司 Target detection method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110675431A (en) * 2019-10-08 2020-01-10 中国人民解放军军事科学院国防科技创新研究院 Three-dimensional multi-target tracking method fusing image and laser point cloud
CN110988912A (en) * 2019-12-06 2020-04-10 中国科学院自动化研究所 Road target and distance detection method, system and device for automatic driving vehicle
US20210012124A1 (en) * 2019-07-09 2021-01-14 Mobiltech Method of collecting road sign information using mobile mapping system
CN113705669A (en) * 2021-08-27 2021-11-26 上海商汤临港智能科技有限公司 Data matching method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210012124A1 (en) * 2019-07-09 2021-01-14 Mobiltech Method of collecting road sign information using mobile mapping system
CN110675431A (en) * 2019-10-08 2020-01-10 中国人民解放军军事科学院国防科技创新研究院 Three-dimensional multi-target tracking method fusing image and laser point cloud
CN110988912A (en) * 2019-12-06 2020-04-10 中国科学院自动化研究所 Road target and distance detection method, system and device for automatic driving vehicle
CN113705669A (en) * 2021-08-27 2021-11-26 上海商汤临港智能科技有限公司 Data matching method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117894015A (en) * 2024-03-15 2024-04-16 浙江华是科技股份有限公司 Point cloud annotation data optimization method and system
CN117894015B (en) * 2024-03-15 2024-05-24 浙江华是科技股份有限公司 Point cloud annotation data optimization method and system

Also Published As

Publication number Publication date
CN113705669A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
WO2023024443A1 (en) Data matching method and apparatus, and electronic device, storage medium and program product
US11393173B2 (en) Mobile augmented reality system
US20200279121A1 (en) Method and system for determining at least one property related to at least part of a real environment
US11205298B2 (en) Method and system for creating a virtual 3D model
US10580164B2 (en) Automatic camera calibration
US10373380B2 (en) 3-dimensional scene analysis for augmented reality operations
Zhou et al. Moving object detection and segmentation in urban environments from a moving platform
EP3206163B1 (en) Image processing method, mobile device and method for generating a video image database
Balali et al. Multi-class US traffic signs 3D recognition and localization via image-based point cloud model using color candidate extraction and texture-based recognition
CN112435338B (en) Method and device for acquiring position of interest point of electronic map and electronic equipment
EP3414641A1 (en) System and method for achieving fast and reliable time-to-contact estimation using vision and range sensor data for autonomous navigation
CN110070578B (en) Loop detection method
CN113888458A (en) Method and system for object detection
JP6172432B2 (en) Subject identification device, subject identification method, and subject identification program
JP6240706B2 (en) Line tracking using automatic model initialization with graph matching and cycle detection
Jung et al. Object detection and tracking-based camera calibration for normalized human height estimation
GB2566443A (en) Cross-source point cloud registration
US11189053B2 (en) Information processing apparatus, method of controlling information processing apparatus, and non-transitory computer-readable storage medium
Lee et al. Temporally consistent road surface profile estimation using stereo vision
CN112989877A (en) Method and device for labeling object in point cloud data
CN114550117A (en) Image detection method and device
Ibisch et al. Arbitrary object localization and tracking via multiple-camera surveillance system embedded in a parking garage
Maidi et al. Open augmented reality system for mobile markerless tracking
Dong et al. Monocular visual-IMU odometry using multi-channel image patch exemplars
Jiang et al. A dense map optimization method based on common-view geometry

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE