CN114998436A

CN114998436A - Object labeling method and device, electronic equipment and storage medium

Info

Publication number: CN114998436A
Application number: CN202210742488.XA
Authority: CN
Inventors: 范圣印; 李雪; 孙文昭; 曲倩文
Original assignee: Beijing Yihang Yuanzhi Technology Co Ltd
Current assignee: Beijing Yihang Yuanzhi Technology Co Ltd
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2022-09-02

Abstract

The present disclosure provides an object labeling method, including: acquiring data collected by a plurality of sensors; performing primary time-space alignment on the image data and the laser point cloud data to obtain the laser point cloud data and the image data after the primary time-space alignment; grouping the laser point cloud data and the image data subjected to primary time-space alignment by a preset time length to obtain an RTK variance corresponding to each segment data; judging whether vehicle body IMU/wheel speed data are used for carrying out pose optimization on vehicle body RTK pose sequence data or not based on RTK variances corresponding to the segmented data so as to carry out secondary space-time alignment on the image data and the laser point cloud data; and respectively carrying out 3D object detection and 2D object detection on the laser point cloud data and the image data so as to carry out object labeling. The present disclosure also provides a method based on multi-frame data accumulation for automatic labeling. The disclosure also provides an object labeling apparatus, an electronic device and a storage medium.

Description

Object labeling method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of object labeling technologies, and in particular, to an object labeling method and apparatus, an electronic device, and a storage medium.

Background

The high-precision map is one of unmanned core technologies, and the precision degree of the high-precision map plays an important role in positioning, navigation, control and safety of the unmanned vehicle. Because the high-precision map is a combination of three-dimensional point cloud and related semantic information, information loss inevitably exists, and besides the influences of natural environment, foreign object shielding, noise and the like, the change of road conditions also increases a lot of difficulties for improving the precision of the high-precision map. Therefore, how to design an automatic labeling method which has wide applicability and can meet complex and changeable road conditions on the basis of the existing high-precision map has important theoretical significance and practical application value.

The core idea of automatic labeling of the high-precision map is that RGB semantic information is given to point cloud, and labeling efficiency is improved by means of the boundary and the color of an object to be generated. The method performs semantic segmentation on the acquired image data, performs fusion processing on the point cloud data and the segmented image data, and performs map drawing by using algorithms such as a neural network.

The technical scheme 1: a high-precision map drawing method, a device, equipment and a storage medium with a patent number of CN113674287A relate to the technical fields of computer vision, automatic driving, intelligent transportation and the like. The specific implementation scheme is as follows: 1) performing semantic segmentation processing on the acquired image data to obtain image data identifying an object to be labeled; 2) fusing the acquired point cloud data and the image data identifying the object to be marked to obtain semantic map data; 3) and obtaining a map for drawing the position of the object to be marked, the color of the object to be marked and the boundary of the object to be marked by utilizing a pre-trained neural network model according to the semantic map and the point cloud data. The method integrates the characteristics of image data and point cloud data to automatically draw and label the map, not only can generate the geometric position of the object to be labeled, but also can generate the color and the boundary of the object to be labeled. According to the method, through the fusion of the technical fields of computer vision, automatic driving, intelligent transportation and the like, the automation of map drawing and labeling is realized, and the labeling precision can be kept.

The technical scheme 2 is as follows: the patent No. CN113255578A discloses a method and an apparatus for identifying a traffic sign, an electronic device, and a storage medium, and relates to a method and an apparatus for identifying a traffic sign, an electronic device, and a storage medium. The method comprises the following steps: 1) acquiring images and point cloud data acquired in the driving process of a vehicle; 2) detecting the image to obtain an initial detection value sequence of each traffic identification in the image, and respectively selecting a target detection value from the initial detection value sequence of each traffic identification to obtain a target detection value sequence of the traffic identification; 3) grouping the target detection values in the target detection value sequence, and selecting point cloud data corresponding to each group from the point cloud data; 4) determining point cloud data corresponding to each traffic identification in each group from the point cloud data corresponding to each group, and determining the outline of each traffic identification under a reference coordinate system according to the point cloud data corresponding to each traffic identification; 5) and marking each traffic mark in the map according to the outline of each traffic mark in the reference coordinate system. The method has the advantages that the technical effect of improving the accuracy and efficiency of marking the traffic signs in the high-precision map is achieved through a mode of combining software and hardware, the problem that a certain laser point cloud is distorted due to timestamp, so that the coincidence degree is not high is solved, when the coincidence degree exceeds an image of a third preset threshold value, one traffic sign with low confidence degree in detection values of two traffic signs is deleted, the other traffic sign is directly used as a correct traffic sign, and certain potential safety hazards still exist after manual rechecking after automatic marking; meanwhile, the method mainly focuses on static objects needing to be identified in the driving process of the unmanned vehicle, and is not high in applicability to the aspect of dynamic object identification.

Technical scheme 3: CN111325136A patent No. CN111325136A, namely, "method and device for labeling object in smart vehicle, unmanned vehicle", relates to a method and device for labeling object in smart vehicle, and unmanned vehicle. The method comprises the following steps: acquiring initial image information of a current vehicle surrounding area in a target map, wherein the initial image information comprises: the position and footprint of the object; identifying the composition structure of each object in the initial image information to obtain object structure information; marking all the object objects on a target map based on the object structure information and a preset marking library, wherein the preset marking library records a first mapping relation between each object and an object identifier, records a second mapping information between each object structure and the structure identifier, and displays point cloud data of each object and each object structure in the target map. The method solves the technical problems that in the related art, the images displayed by the vehicle display screen cannot mark surrounding obstacle objects, so that the vehicle is easy to collide with other objects and the vehicle driving is influenced, but the method mainly focuses on the static objects needing to be identified in the driving process of the unmanned vehicle, and has low applicability to the aspect of dynamic object identification.

The technical schemes 1 and 2 integrate computer vision and neural networks, the application range and precision of automatic labeling are improved on the premise of ensuring wide adaptability range, but the technical scheme 1 does not align the time of the acquired data in the data fusion process, so that the labeling precision is difficult to improve; in the technical scheme 2, for solving the problem of low multi-image data coincidence rate, one of the data with low confidence coefficient is directly deleted, so that the processing result is not ideal, and the applicability of the method to the aspect of dynamic object identification is not high. Technical scheme 3 accurately and specifically marks part of road conditions by means of algorithms such as deep learning in a mode of combining software and hardware, but the action field is more specific and cannot be applied to all road conditions.

Disclosure of Invention

In order to solve at least one of the above technical problems, the present disclosure provides an object labeling method, an object labeling apparatus, an electronic device, and a storage medium.

According to an aspect of the present disclosure, there is provided an object labeling method, including:

s110, acquiring data collected by a plurality of sensors, wherein the data comprises image data, laser point cloud data, vehicle body RTK pose sequence data and vehicle body IMU/wheel speed data;

s120, performing primary time-space alignment on the image data and the laser point cloud data based on the time stamp of the image data, the time stamp of the laser point cloud data and RTK pose sequence data of the vehicle body to obtain the laser point cloud data and the image data after the primary time-space alignment;

s130, grouping the laser point cloud data and the image data subjected to primary time-space alignment by a preset time length to obtain an RTK variance corresponding to each segment data;

s140, judging whether to perform pose optimization on the RTK pose sequence data of the vehicle body by using IMU/wheel speed data of the vehicle body so as to perform secondary space-time alignment on the image data and the laser point cloud data based on the RTK variance corresponding to each segment data, and obtaining the laser point cloud data and the image data after the secondary space-time alignment;

and S160, respectively carrying out 3D object detection and 2D object detection on the accumulated laser point cloud data and the accumulated image data so as to carry out object labeling.

According to the object labeling method of at least one embodiment of the present disclosure, after step S140, the method further includes: s115, accumulating the laser point cloud data and the image data; in step S160, the 3D object detection and the 2D object detection are performed on the accumulated laser point cloud data and the accumulated image data, respectively.

According to an object labeling method of at least one embodiment of the present disclosure, the 3D object and the 2D object include a vehicle, a pedestrian, a lane line, a traffic sign, a traffic light, and the like.

According to an object labeling method of at least one embodiment of the present disclosure, S110, acquiring data collected by a plurality of sensors includes:

and converting the image data from a camera coordinate system to a vehicle body coordinate system, and converting the laser point cloud data from a laser radar coordinate system to the vehicle body coordinate system.

According to the object labeling method of at least one embodiment of the present disclosure, each type of data collected by a plurality of sensors is recorded on a uniform time axis.

According to the object labeling method of at least one embodiment of the present disclosure, the step S120 of performing primary time-space alignment on the image data and the laser point cloud data based on the timestamp of the image data, the timestamp of the laser point cloud data, and the vehicle body RTK pose sequence data to obtain the laser point cloud data and the image data after the primary time-space alignment includes:

s1202, for laser point cloud data at a certain moment, searching image data at the moment closest to the moment, and acquiring a time difference between the two moments;

s1204, extracting an RTK pose sequence under a world coordinate system including the two moments from the RTK pose sequence data of the vehicle body to acquire poses corresponding to the two moments respectively;

s1206, acquiring a pose difference under a world coordinate system corresponding to the time difference;

and S1208, correcting the laser point cloud data under the vehicle body coordinate system based on the pose difference to obtain the laser point cloud data which is aligned with the image data under the vehicle body coordinate system in a space-time mode.

s1202, under the condition that the laser point cloud data and the image data are accumulated (accumulation in a preset time period is carried out), acquiring the time difference between the moment and the initial radar image acquisition moment in the accumulation time period for the laser point cloud data at a certain moment;

According to the object labeling method of at least one embodiment of the present disclosure, S130, segmenting the laser point cloud data and the image data after the primary time-space alignment by a preset time length, and acquiring an RTK variance corresponding to each segment data, includes:

and when the RTK variances are obtained, respectively counting the RTK variances of all the degrees of freedom.

According to the object labeling method of at least one embodiment of the present disclosure, S140, based on the RTK variance corresponding to each segment data, determining whether to perform pose optimization on the vehicle body RTK pose sequence data using the vehicle body IMU/wheel speed data to perform secondary space-time alignment on the image data and the laser point cloud data, and obtaining the laser point cloud data and the image data after the secondary space-time alignment, includes:

and when the RTK variance of each degree of freedom is smaller than or equal to the variance threshold value, determining that the position optimization is not carried out on the RTK position sequence data of the vehicle body by using the IMU/wheel speed data of the vehicle body, otherwise, determining that the position optimization is carried out on the RTK position sequence data of the vehicle body by using the IMU/wheel speed data of the vehicle body.

According to an object labeling method of at least one embodiment of the present disclosure, in step S140, performing pose optimization on vehicle body RTK pose sequence data using vehicle body IMU/wheel speed data, including:

for a certain section of data with the RTK variance exceeding the variance threshold, acquiring first IMU data which is closest to and earlier than the starting time of the section of data, taking the corresponding time as a starting point, acquiring second IMU data which is closest to and later than the ending time of the section of data, and taking the corresponding time as an end point;

between the start point and the end point, for each acquisition instant t of image data ₁ And the acquisition time t of each laser point cloud data ₂ The following processing is performed:

acquiring the translation amount delta x of the vehicle body from the starting point moment to the end point moment based on the wheel speed data; acquiring rotation angle differences delta yaw, delta pitch and delta roll of the vehicle body in the directions of x, y and z from the starting point to the end point based on the IMU data;

integrating the rotation angle differences delta yaw, delta pitch and delta roll with the translation quantity delta x to obtain the position of the vehicle body at t ₁ To t ₂ The translation amount deltay and the translation amount deltaz in the directions of the y-axis and the z-axis in the time period, namely, the vehicle body at t ₁ To t ₂ Pose change amount in a time period.

According to the object labeling method of at least one embodiment of the present disclosure, in step S140, performing pose optimization on vehicle body RTK pose sequence data using vehicle body IMU/wheel speed data, further includes:

and replacing the pose difference corresponding to the data of which the RTK variance exceeds the variance threshold value by the pose variation.

According to the object labeling method of at least one embodiment of the present disclosure, the S160, respectively performing 3D object detection and 2D object detection on the laser point cloud data and the image data to perform object labeling, includes:

detecting a 3D object in the laser point cloud data by using a BEV-based target detection method, a range view-based target detection method, a point-wise feature-based target detection method or a fusion feature-based target detection method;

detecting the 2D object in the image data by using a Cascade/Haar/SVM-based target detection algorithm, an R-CNN/Fast R-CNN-based candidate region/frame + deep learning classification algorithm or an RRC detection/Deformable CNN-based deep learning detection method;

modeling is carried out by using the texture of Experts based on the weight of the feature map of the image data and the feature map of the laser point cloud data, and feature fusion is carried out through splicing operation, so that automatic detection of the 3D object and the 2D object is completed.

According to the object labeling method of at least one embodiment of the present disclosure, S160, respectively performing 3D object detection and 2D object detection on the laser point cloud data and the image data to perform object labeling, includes:

and projecting the 3D object to a corresponding 2D object, judging whether the detected object is correct or not based on the projection contact ratio of 3D and 2D of the same object, outputting the object if the detected object is correct, and not outputting the object if the detected object is incorrect so as to label the object.

According to the object labeling method of at least one embodiment of the present disclosure, in step S160, if the object is incorrect, the object labeling is performed again by performing the reprojection based on the time-domain multi-frame optimization.

According to an object labeling method of at least one embodiment of the present disclosure, a 3D object is projected to a corresponding 2D object, whether a detected object is correct is determined based on a degree of coincidence of 3D and 2D projections of the same object, and if correct, the object is output, and if incorrect, the object is not output, including:

if the projection contact ratio of a certain object is greater than or equal to the contact ratio threshold value, judging that the detected object is correct;

and if the projection coincidence degree of a certain object is lower than the coincidence degree threshold value, further detecting through pose optimization.

According to the object labeling method of at least one embodiment of the present disclosure, in step S160, the method further includes:

and projecting each vector object in the high-precision map to the laser point cloud data and the image data to update the laser point cloud data and the image data based on the RTK pose sequence data of the vehicle body so as to label the object.

And each vector object in the high-precision map is a static object and comprises a fixed traffic identification.

According to the object labeling method of at least one embodiment of the present disclosure, based on vehicle body RTK pose sequence data, projecting each vector object in a high-precision map into laser point cloud data and image data to update the laser point cloud data and the image data so as to perform object labeling, the method includes:

s1602, acquiring coordinates of each vector object in a high-precision map under a world coordinate system and coordinates of a vehicle body under the world coordinate system within an observation range of the laser radar based on RTK pose sequence data of the vehicle body, wherein the observation range takes the center of the laser radar as an origin;

s1604, transforming the coordinates of each vector object in the world coordinate system into a vehicle body coordinate system;

s1606, forming projection of each vector object in the vehicle body coordinate system in the laser point cloud data;

s1608, transforming the coordinates of each vector object in the vehicle body coordinate system to the camera coordinate system, and forming a projection in the image data;

s1610, when the variance of the RTK data is smaller than or equal to a variance threshold, projecting each vector object in the high-precision map to the laser point cloud data and the image data, and when the variance of the RTK data is larger than the variance threshold, performing pose optimization based on the image data on the point cloud data to perform re-projection, and completing detection of the 3D object and the 2D object.

According to the object labeling method of at least one embodiment of the present disclosure, in step S1610, when the variance of the RTK data is greater than the variance threshold, re-projection is performed based on time-domain multiframe optimization, and detection of the 3D object and the 2D object is performed again.

and overlapping the laser point cloud data in a preset time period to mark the static object and the dynamic object respectively.

According to the object labeling method of at least one embodiment of the present disclosure, superimposing laser point cloud data within a preset time period to label a static object and a dynamic object respectively, includes:

taking a laser image acquired by a laser radar in a preset time period, and performing point cloud accumulation and superposition to obtain superposed data;

carrying out one-time object detection and marking on static objects in the superposed data;

and for the dynamic object in the superposed data, extracting the movement route of the dynamic object, judging whether the movement route accords with a preset object movement model, if so, directly marking the dynamic object based on the preset object movement model, and if not, detecting and marking the dynamic object based on a deep learning algorithm.

According to the object labeling method of at least one embodiment of the present disclosure, for a static object in overlay data, one-time object detection and labeling are performed, including:

counting all the frame images in the preset time period to obtain a mean variance, and detecting the change of the position of the counted object to perform reprojection and object labeling; wherein the counting comprises removing an extreme frame image (i.e. removing as an extreme value) from all frame images within the preset time period.

According to another aspect of the present disclosure, there is provided an object labeling apparatus including:

the system comprises a sensor data acquisition module, a data acquisition module and a data acquisition module, wherein the sensor data acquisition module acquires data acquired by a plurality of sensors, and the data comprises image data, laser point cloud data, vehicle body RTK (real time kinematic) position sequence data and vehicle body IMU/wheel speed data;

the system comprises a primary alignment module, a time stamp module and a vehicle body RTK position sequence data module, wherein the primary alignment module is used for carrying out primary time-space alignment on image data and laser point cloud data based on the time stamp of the image data, the time stamp of the laser point cloud data and the vehicle body RTK position sequence data to obtain the laser point cloud data and the image data after the primary time-space alignment;

the data segmentation module is used for grouping the laser point cloud data and the image data subjected to primary time-space alignment by a preset time length;

the RTK variance calculating/judging module is used for acquiring the RTK variance corresponding to each segment data; judging whether vehicle body IMU/wheel speed data are used for carrying out pose optimization on vehicle body RTK pose sequence data or not based on RTK variances corresponding to the segmented data so as to carry out secondary space-time alignment on the image data and the laser point cloud data;

the secondary alignment module performs pose optimization on the vehicle body RTK pose sequence data by using vehicle body IMU/wheel speed data so as to perform secondary space-time alignment on the image data and the laser point cloud data and obtain the laser point cloud data and the image data after the secondary space-time alignment;

and the object labeling module is used for respectively carrying out 3D object detection and 2D object detection on the laser point cloud data and the image data so as to label the object.

The object labeling apparatus according to at least one embodiment of the present disclosure further includes:

a data accumulation module that accumulates laser point cloud data and image data;

the object labeling module performs the 3D object detection and the 2D object detection on the accumulated laser point cloud data and the accumulated image data.

According to still another aspect of the present disclosure, there is provided an electronic device including: a memory storing execution instructions; and a processor executing the execution instructions stored in the memory, so that the processor executes the object labeling method of any embodiment of the present disclosure.

According to still another aspect of the present disclosure, there is provided a readable storage medium having stored therein execution instructions, which when executed by a processor, are used for implementing the object labeling method of any one of the embodiments of the present disclosure.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the disclosure and together with the description serve to explain the principles of the disclosure.

Fig. 1 is a schematic flowchart of an object labeling method according to an embodiment of the present disclosure.

Fig. 2 shows a triggering schematic of a camera and lidar of one embodiment of the present disclosure.

Fig. 3 is a schematic flow chart of primary spatiotemporal alignment of image data and laser point cloud data according to an embodiment of the present disclosure.

Fig. 4 is a flowchart of pose adjustment based on RTK variance according to an embodiment of the present disclosure.

Fig. 5 is a schematic flow diagram of 3D object detection and 2D object detection of accumulated laser point cloud data and accumulated image data, respectively, according to an embodiment of the disclosure.

Fig. 6 is a flowchart of fusing the vector object of the high-precision map and the 2D detection object result, verifying the accuracy of the annotation by using RTK data, and performing optimization adjustment according to an embodiment of the present disclosure.

Fig. 7 is a block diagram schematically illustrating a structure of an object labeling apparatus implemented in hardware using a processing system according to an embodiment of the present disclosure.

Fig. 8 is a block diagram schematically illustrating the structure of an object labeling apparatus using a hardware-implemented method of a processing system according to still another embodiment of the present disclosure.

Description of the reference numerals

1000 object labeling device

1002 sensor data acquisition module

1004 primary alignment module

1006 data segmentation module

1008 RTK variance calculating/judging module

1010 secondary alignment module

1012 data accumulation module

1014 object tagging module

1100 bus

1200 processor

1300 memory

1400 and other circuits.

Detailed Description

The present disclosure will be described in further detail with reference to the drawings and embodiments. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limitations of the present disclosure. It should be further noted that, for the convenience of description, only the portions relevant to the present disclosure are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. Technical solutions of the present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Unless otherwise indicated, the illustrated exemplary embodiments/examples are to be understood as providing exemplary features of various details of some ways in which the technical concepts of the present disclosure may be practiced. Accordingly, unless otherwise indicated, features of the various embodiments may be additionally combined, separated, interchanged, and/or rearranged without departing from the technical concept of the present disclosure.

The use of cross-hatching and/or shading in the drawings is generally used to clarify the boundaries between adjacent components. As such, unless otherwise noted, the presence or absence of cross-hatching or shading does not convey or indicate any preference or requirement for a particular material, material property, size, proportion, commonality between the illustrated components and/or any other characteristic, attribute, property, etc., of a component. Further, in the drawings, the size and relative sizes of components may be exaggerated for clarity and/or descriptive purposes. While example embodiments may be practiced differently, the specific process sequence may be performed in a different order than that described. For example, two processes described consecutively may be performed substantially simultaneously or in reverse order to that described. In addition, like reference numerals denote like parts.

When an element is referred to as being "on" or "on," "connected to" or "coupled to" another element, it can be directly on, connected or coupled to the other element or intervening elements may be present. However, when an element is referred to as being "directly on," "directly connected to" or "directly coupled to" another element, there are no intervening elements present. For purposes of this disclosure, the term "connected" may refer to physically connected, electrically connected, and the like, with or without intervening components.

The terminology used herein is for the purpose of describing particular embodiments and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, when the terms "comprises" and/or "comprising" and variations thereof are used in this specification, the presence of stated features, integers, steps, operations, elements, components and/or groups thereof are stated but does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof. It is also noted that, as used herein, the terms "substantially," "about," and other similar terms are used as approximate terms and not as degree terms, and as such, are used to interpret inherent deviations in measured values, calculated values, and/or provided values that would be recognized by one of ordinary skill in the art.

The following describes in detail an object labeling method, an object labeling apparatus, an electronic device, and a readable storage medium according to the present disclosure with reference to fig. 1 to 7.

Referring to fig. 1, an object labeling method S100 according to this embodiment includes:

s130, segmenting the laser point cloud data and the image data subjected to the primary time-space alignment by a preset time length (for example, 20S) to obtain an RTK variance corresponding to each segment data;

s140, judging whether to use vehicle body IMU/wheel speed data to carry out position and orientation optimization on vehicle body RTK position and orientation sequence data so as to carry out secondary space-time alignment on the image data and the laser point cloud data and obtain laser point cloud data and image data after the secondary space-time alignment based on the RTK variance corresponding to each segment data;

and S160, respectively carrying out 3D object detection and 2D object detection on the laser point cloud data and the image data so as to carry out object labeling (motor vehicles, non-motor vehicles, pedestrians, lane lines, traffic lights, traffic signs and the like).

The object labeling method S100 of the present embodiment can realize real-time object labeling, that is, single frame labeling.

In some embodiments of the present disclosure, the object labeling method S100 of the present disclosure further includes:

s115, accumulating the laser point cloud data and the image data;

in step S160, the 3D object detection and the 2D object detection are performed on the accumulated laser point cloud data and the accumulated image data, respectively.

In some embodiments of the present disclosure, sensors used by the present disclosure include cameras, lidar, Inertial Measurement Units (IMUs), and RTKs for vehicle positioning, for the image data, laser point cloud data, vehicle body IMU/wheel speed data, vehicle body RTK pose sequence data, respectively, described above.

The cameras can be divided into six cameras including a front-view camera, a rear-view camera, a left front-view camera, a left rear-view camera, a right front-view camera and a right rear-view camera according to the installation positions of the cameras on the vehicle, and the number and the installation positions of the cameras can be adjusted by technicians in the field, so that the cameras fall into the protection range of the disclosure.

In some embodiments of the present disclosure, the relevant internal and external parameters of each sensor are also obtained.

Wherein the related external reference of the camera uses a matrix T _CW Representation that can be decomposed into a rotation matrix R _CW And a translation matrix t _CW They determine the relative positional relationship between the camera coordinate system and the world coordinate system.

Camera coordinates P _c With world coordinate P _w The relationship between the two is as follows:

P _c ＝T _CW P _w i.e. P _c ＝R _CW P _w +t _CW 。

The relevant parameters of the camera can be represented as a matrix K, which includes { f, K, S } _x ,S _y ,C _x ,C _y Six parameters, where f is the focal length; kappa represents the magnitude of radial distortion, if kappa is a negative value, the distortion is barrel-type distortion, and if kappa is a positive value, the distortion is pillow-type distortion; s _x ,S _y Is a scaling factor representing the distance between adjacent pixels in the horizontal direction and the distance between adjacent pixels in the vertical direction on the sensor; c _x ,C _y Is the perpendicular projection of the center of projection onto the imaging plane and is also the center of radial distortion.

The lidar of the present disclosure may be a solid-state lidar, and the number of the lidars may be four, and those skilled in the art may select/adjust the type and number of the lidars, all of which fall within the protection scope of the present disclosure.

Relevant external parameter of laser radar is rotation translation transformation matrix T _LW (ii) a The relevant configuration parameters may include: field angle, resolution, range finding range, refresh frequency, scan frequency, laser wavelength, maximum radiation power.

An Inertial Measurement Unit (IMU) is used to provide relative positioning information, the function of which is to measure the course of movement of an object (e.g., a vehicle) relative to a starting point. An Inertial Measurement Unit (IMU) may include a three-axis accelerometer and a three-axis gyroscope as sensing elements that output acceleration and angular velocity, respectively.

The RTK (real time kinematic) carrier phase division technology for vehicle positioning is a commonly used satellite positioning measurement technology in the prior art.

In the present disclosure, since the data acquisition process involves the joint operation of a plurality of sensors, but the acquisition frequencies, the self coordinate systems, and the like of the sensors are not consistent, the present disclosure aligns the data in time and space before using the data.

In some embodiments of the present disclosure, the body coordinate system uses the center of the rear axle of the vehicle as an origin, the forward direction of the vehicle is directly in front, the x-axis points directly in front, the y-axis points directly to the left, and the z-axis points directly above; the camera coordinate system takes the optical center of the camera as an origin, the z axis points to the front of the camera, the x axis points to the right of the camera, and the y axis points to the lower part of the camera; the laser radar coordinate system takes the geometric center of the laser radar as an origin, the z axis points to the front of the laser radar, the x axis points to the right of the laser radar, and the y axis points to the lower part of the laser radar; the coordinate system of the IMU is consistent with the vehicle body coordinate system, the center of a rear axle of the vehicle is taken as an origin, the advancing direction of the vehicle is in the front, the x-axis points to the front, the y-axis points to the left, and the z-axis points to the upper.

Because the collection of data involves multiple sensors and the collected data all contain timestamps, the present disclosure performs time synchronization processing on the multiple sensor data.

Different sensors have different sensor time stamps and the sensor frequencies are also different.

In some embodiments of the present disclosure, the 6 cameras used may trigger simultaneously, with 30fps of transmission frames per second for the cameras, but 4 solid state lidar could not trigger synchronously, and with 10fps of transmission frames per second.

In fig. 2, the grey bars indicate the camera and lidar acquisition frequencies, and the horizontal axis indicates a uniform time stamp. Each sampling instant of the sensor is recorded on a uniform time sequence. Because the acquisition rate of the laser radar is low, after the multi-sensor data are uniformly converted into the vehicle body coordinate system, the sampling surface of the laser radar is sampled towards the camera, and the time space alignment between the two sensor data is carried out by searching the image data acquired by the camera at the nearest moment to the acquisition moment of the laser radar.

In some embodiments of the present disclosure, in step S110, the image data is transformed from the camera coordinate system to the vehicle body coordinate system, and the laser point cloud data is transformed from the laser radar coordinate system to the vehicle body coordinate system.

In some embodiments of the present disclosure, the various types of data collected by multiple sensors (lidar, cameras, IMU, RTK, etc.) are recorded on a unified timeline.

Referring to fig. 3, preferably, the step S120 of performing primary time-space alignment on the image data and the laser point cloud data based on the timestamp of the image data, the timestamp of the laser point cloud data, and the vehicle body RTK pose sequence data to obtain the laser point cloud data and the image data after the primary time-space alignment includes:

s1202, for a certain time t _n,j Searching the closest moment t to the moment _n,i Obtaining the time difference deltat between two moments _n ；

S1204, extracting the sequence data of the RTK pose of the vehicle body including the two moments t _n,j And t _n,i (j and i represent different time instants) of RTK pose sequence in world coordinate system

To obtain the corresponding poses of the two moments

And

(wherein, subscript C represents a camera, subscript L represents a laser radar, and superscript W represents a world coordinate system), that is, a vehicle body pose when the camera collects image data and a vehicle body pose when the laser radar collects laser point cloud data;

s1206, obtaining the pose difference under the world coordinate system corresponding to the time difference

S1208, based on the pose difference

Laser point cloud data under vehicle body coordinate systemAnd correcting to obtain laser point cloud data which is aligned with the image data in the vehicle body coordinate system in a space-time mode.

In the present embodiment, it is preferable that a time stamp of a forward-looking camera in the camera is used as a reference. Where n denotes the identification/reference number of the lidar, and n is 1,2,3, 4.

In this embodiment, due to the problem of the acquisition frequency, the acquisition time of the RTK is not necessarily the same as the sampling time of the camera or the laser radar, and in this embodiment, it is considered that the acquisition times are not overlapped (if they are overlapped, the corresponding RTK pose data is directly taken as the pose data at the sampling time of the camera or the laser radar).

Get the inclusion t _n,j And t _n,i (suppose t) _n,j Later than t _n,i ) RTK pose sequence based on world coordinate system

t _n,i Should be located at

And T ₁ ^W Between corresponding times, t _n,j Should be located at

And

between the corresponding time instants. Preferably, the camera is determined at t according to a linear interpolation method _n，i Pose corresponding to time

And the laser radar is at t _n,j Pose corresponding to time

Preferably, based on the camera at t _n,i Pose corresponding to time

And the laser radar is at t _n,j Pose corresponding to time

By the formula

Determining at Δ t _n Pose difference between camera and laser radar sampling time based on world coordinate system in time

Preferably, the laser point cloud data collected by the laser radar based on the laser radar coordinate system is processed by

Space transformation is carried out on the vehicle body coordinate system, the superscript B represents the vehicle body coordinate system, T _BL Is a transformation matrix.

Further preferably, by

Will t _n,j Aligning the laser point cloud data of the moment to t _n,i And (6) finishing the time alignment of the laser radar reaching camera.

Preferably, the relative pose sequence is also adjusted based on the variance condition of the RTK and the point cloud matching.

In some embodiments of the present disclosure, S120, performing primary time-space alignment on the image data and the laser point cloud data based on the timestamp of the image data, the timestamp of the laser point cloud data, and the vehicle body RTK pose sequence data, to obtain the laser point cloud data and the image data after the primary time-space alignment, includes:

Referring to fig. 4, in S130, segmenting the laser point cloud data and the image data after the primary time-space alignment by a preset time length (for example, 20S), and acquiring an RTK variance corresponding to each segment data, includes:

when acquiring the RTK variance, the RTK variance of all degrees of freedom (i.e., all data segments) is counted respectively.

According to a preferred embodiment of the present disclosure, in S140, based on the RTK variance corresponding to each segment data, determining whether to perform pose optimization on the vehicle body RTK pose sequence data by using the vehicle body IMU/wheel speed data to perform secondary space-time alignment on the image data and the laser point cloud data, and obtaining the laser point cloud data and the image data after the secondary space-time alignment, includes:

Referring to fig. 4, in some embodiments of the present disclosure, when the variance of the RTK is less than or equal to a variance threshold, 3D laser point cloud accumulation is performed directly; and when the RTK variance is larger than a variance threshold value, carrying out pose transformation by means of IMU data and wheel speed meter data, and then carrying out 3D laser point cloud accumulation.

For the normal weather condition and the condition that the satellite signal is not interfered, accurate data processing can be performed by adopting the time-space alignment and the RTK pose transformation described above in the disclosure, but some conditions still exist, such as severe weather, vehicles passing through tall buildings or viaducts, and the like, may cause great variance of RTK in a certain time period, which may result in the actual position of the point after the accumulation of the laser point cloud being not accurate enough. To this end, the present disclosure preferably replaces the RTK computational pose transformation described above with a pose transformation of the vehicle by fusing the IMU data with wheel speed count data, in conjunction with the temporal-spatial alignment method described above, reprocessing the laser point cloud data for a time period in which the RTK variance is greater than the variance threshold.

For the object labeling method S100 of the present disclosure, preferably, in step S140, the pose optimization is performed on the vehicle body RTK pose sequence data by using the vehicle body IMU/wheel speed data, including:

for a certain section of data with the RTK variance exceeding the variance threshold, acquiring first IMU data which is closest to and earlier than the starting time of the section of data, and taking the corresponding time as a starting point t _start Acquiring second IMU data which is closest to and later than the termination time of the section of data, and taking the corresponding time as an end point t _end ；

Between the start and end points, the acquisition time t for each image data ₁ And the acquisition time t of each laser point cloud data ₂ The following processing is performed:

acquiring the translation amount delta x of the vehicle body from the starting point moment to the end point moment based on the wheel speed data; acquiring rotation angle differences delta yaw, delta pitch and delta roll of the vehicle body in directions of x, y and z (vehicle body coordinate system) from a starting time to an end time based on IMU data;

integrating the rotation angle differences delta yaw, delta pitch and delta roll with the translation quantity delta x to obtain the position of the vehicle body at t ₁ To t ₂ The translation amount deltay and the translation amount deltaz in the directions of the y-axis and the z-axis in the time period, namely, the vehicle body at t ₁ To t ₂ Pose variation within a time period;

and replacing the pose difference corresponding to the data with the RTK variance exceeding the variance threshold value by the pose variation.

According to the preferred embodiment of the present disclosure, 3D object detection and 2D object detection are performed on laser point cloud data and image data based on a deep learning algorithm.

In some embodiments of the present disclosure, in step S160, performing 3D object detection and 2D object detection on the laser point cloud data and the image data, respectively, to perform object labeling, includes:

In some embodiments of the present disclosure, for a static object in an accumulation condition, merging of point clouds is not performed, that is, object detection is not performed through the point clouds, merging of multiple frame objects can be directly performed with the help of a detection object outline, an object detection step performed after point clouds are merged can be omitted to reduce the amount of computation, and the problem of unclear observation caused by a large amount of coincidences of multiple point clouds can also be avoided.

Fig. 5 is a schematic flow diagram of 3D object detection and 2D object detection performed on laser point cloud data and image data, respectively, according to an embodiment of the present disclosure.

In step S160 of the object labeling method S100 of the present disclosure, performing 3D object detection and 2D object detection on the laser point cloud data and the image data, respectively, to perform object labeling, includes:

If the projection contact ratio of a certain object is greater than or equal to the contact ratio threshold value, judging that the detected object is correct; and if the projection contact ratio of a certain object is lower than a contact ratio threshold value, further detecting through pose optimization.

If the projection coincidence degree is low, the object marking method/system disclosed by the invention has certain errors, and the errors may be caused by unstable and inaccurate vehicle pose due to bumping in the vehicle motion process and also may be caused by inaccurate detection of the 3D object or the 2D object.

In some embodiments of the present disclosure, it is preferable to optimize the projection relationship between the 3D object and the 2D object by finding the vehicle pose with the minimum 2D projection error of the observed 3D object and the corresponding 2D projection error through an optimization method, and finding the abnormal 3D object detection result and the abnormal 2D object detection result, preferably by adopting the following method.

Since one 3D object may be observed by multiple frames of multiple cameras, that is, the projection error is the sum of multiple 3D objects and multiple 2D projection errors thereof, in the optimization process, the present disclosure performs local time domain multiple frame optimization, and if m frame data is total in a certain time period that needs to be re-projected, m may be set to 5 frames (or may be adjusted to other frame numbers).

The reprojection error (expressed using lie algebra ξ) is calculated with the geometric center points P of the n three-dimensional objects and the geometric center P of their 2D projections.

Suppose a certain spatial point P _i ＝[X _i ,Y _i ,Z _i ] ^T Projected pixel coordinate of p _i ＝[u _i ,v _i ] ^T Then the relationship between the pixel position and the spatial point position is as follows:

i.e. s _i p _i ＝Kexp(ξ^)P _i

Wherein, K _i Denotes camera internal parameters, s _i Representing the distance between the pixels, exp (ξ ^) is used to pose from the world coordinate system to the camera coordinate systemAnd (5) transforming.

However, the above equation has an error due to inaccurate camera pose and noise at the observation point. All pixel points of the observed object are subjected to multi-frame error summation to construct a least square problem, and then a better vehicle pose is found to minimize:

wherein, because there are 6 cameras and 4 lidar on the automobile body, and a 3D object can be observed by multiple sensors and multiple frames, for the sensor data observing a certain object, the pose transformation of the lidar object to the camera object needs to be carried out, here with EE _i And representing external parameters from the vehicle body coordinate system to the camera coordinate system to complete the transformation process.

Solving for P using Gauss-Newton's method ^B Using the adjusted vehicle pose P ^B And establishing a 3D projection and 2D projection relation by combining an internal and external reference relation of the camera and the laser radar to judge whether the automatic labeling object is correct or not, and if the automatic labeling object is inaccurate, independently labeling the automatic labeling object.

According to the method, the 3D and the 2D are visually fused, all adjusted object detection results are output, and automatic labeling of 3D images such as motor vehicles, non-motor vehicles, pedestrians, traffic signs and traffic lights can be completed.

According to the object labeling method S100 of the preferred embodiment of the present disclosure, in step S160, the method further includes:

The present embodiment is used together with the above-described object labeling method to complete the automatic labeling process. In the present embodiment, the present disclosure mainly acts on objects such as ground traffic signs, and objects such as poles, traffic signs, and traffic lights on the ground, and it is difficult to automatically detect 3D objects directly based on laser point clouds, so that the present disclosure only performs 2D object detection on image data. In this embodiment, the vector object of the high-precision map is fused with the result of the 2D detection object, the accuracy of the annotation is verified by using the RTK data, and the optimization adjustment is performed, and referring to fig. 6, the method preferably includes the following steps:

s1602, acquiring coordinates of each vector object in a high-precision map under a world coordinate system and coordinates of the vehicle body under the world coordinate system within an observation range of the laser radar based on the RTK pose sequence data of the vehicle body, wherein the observation range takes the center of the laser radar as an origin;

and S1610, when the variance of the RTK data is less than or equal to a variance threshold, projecting each vector object in the high-precision map to the laser point cloud data and the image data, and when the variance of the RTK data is greater than the variance threshold, performing pose optimization based on the image data on the point cloud data to perform re-projection, so as to complete the detection of the 3D object and the 2D object.

In the present embodiment, the coordinates of each vector object in the high-precision map based on the world coordinate system and the coordinates of the vehicle body in the world coordinate system within the laser radar observation range are obtained by RTK, the observation range is a range in which the center of the laser radar is the origin, the observation range may be 150m in front, and the left side, the right side, and the rear side are both 50 m.

In the present embodiment, the partial detection object categories are classified into categories having clear attributes (lamps, signs, bars, lane lines, crosswalks, stop lines, arrows, manhole covers, etc.) and categories having no clear attributes (building shapes, floor characters, etc.) according to the contents thereof.

Because the number of the objects contained in each image is small, the corresponding objects can be quickly found out through the types of the objects and the positions corresponding to the surrounding frames, and the feature points of the objects are utilized for detail matching.

And combining the detection results, and converting the coordinates of each vector object based on the world coordinate system into a vehicle body coordinate system according to the relative positions of the vector object and the vehicle body in the world coordinate system:

forming projection in laser radar, and converting each vector object coordinate based on vehicle body coordinate system to be under camera coordinate system

And forms a projection in the camera.

And calculating the variance of RTK data in the vector object projection, and when the RTK variance is less than or equal to a variance threshold, directly projecting each vector object in the high-precision map into a camera and a laser radar to finish the labeling of the 3D object and the 2D object.

When the RTK variance is larger than the variance threshold, performing integral pose optimization on each vector object data in the high-precision map through a minimum error of re-projection, and then performing projection. The adjusting and optimizing steps are as follows:

a time period is separately selected for each RTK variance greater than the variance threshold. Because there are multiple cameras and lidar, there are typically m (m >2) frames of images in the same time domain for the same object that can be detected and projected. Therefore, in the actual processing, it is preferable to perform data association of the m-th frame image (m >2) with its preceding frame (m-1, m-2, …,2,1), and to perform matching based on the detection result of the object in the three-dimensional space associated with the current frame image.

And carrying out re-projection on the 2D and 3D objects after detection and matching. In the re-projection process, a detection object in the 3D laser radar is projected into a 2D camera image by means of a formula

And taking the pose given by the RTK as an initial pose, and solving by using a Gauss-Newton method to obtain an optimized pose with the minimum reprojection error.

And further, according to the optimized pose obtained by the re-projection result, the original RTK pose is updated by combining the wheel speed odometer of the vehicle in the domain and the vehicle pose transformation calculated by the IMU, and 3D and 2D projection is directly carried out on the basis of the new pose to finish the marking of the vector object.

In addition, if the projection of one camera has a deviation or the coincidence degree of the projection of a certain object in one projection is low, the labeling of the vector object in the high-precision map is generally considered that the internal and external parameters of the camera or the individual object has a problem, and the individual problem can be directly adjusted manually. After the adjustment is completed, the 3D object and the 2D object are projected.

According to a preferred embodiment of the present disclosure, in step S160 of the object labeling method S100 of the present disclosure, the method further includes:

The embodiment improves the accuracy by multi-frame superposition, namely, all point clouds in a preset time period are projected under a world coordinate system by taking a certain time as a reference so as to have enough data support in the automatic labeling process.

And preferably, the present disclosure performs differentiated processing on dynamic and static objects in a scene, that is, performing labeling and processing on the static object once, and performing labeling and adjusting on the dynamic object multiple times, wherein the adjustment may be performed in a 3D view or fine-tuning may be performed in a 2D image view.

Preferably, in order to enhance the applicability and accuracy of the object labeling method of the present disclosure, the present disclosure preferably improves the accuracy by means of multi-frame superposition point cloud accumulation.

The method comprises the steps of taking laser point cloud data collected by a laser radar within a preset time period (for example, 20s, and the adjustment), namely, obtaining an image. And taking the origin of the vehicle body coordinate system in the first frame of image acquired in the preset time period as the origin of the world coordinate system, and taking the time t corresponding to the first frame of image as the initial time. And (3) performing the space-time alignment step on the laser point cloud data of all the other frames, performing time alignment to the time t, performing space alignment to a world coordinate system, and accumulating the aligned laser point cloud data.

For the accumulated static objects in the preset time period, including static objects (road signs and the like) which cannot move and objects (roadside static vehicles and the like) which have motion attributes but are static in the preset time period, one-time object detection is performed:

calculating an RTK variance;

judging that the positioning accuracy of the vehicle is high for the data with the RTK variance less than or equal to the variance threshold value, and directly marking; for data with the RTK variance larger than the variance threshold, the data is judged to be not directly labeled, the above-described space-time alignment method and the related processing in the method for projecting various vector objects based on the RTK data are needed, and then the object re-projection and the automatic labeling are carried out.

Under the condition of point cloud accumulation, counting all frame images in the preset time period for a static object, and detecting the change of the position of the detection object after counting through the mean variance. The statistical process comprises an outlier removing process, namely when the condition of even several frames of multi-frame point clouds in a preset time period is abnormal, the point clouds are not adjusted by reprojection or manual marking and the like, but are directly removed as extreme values.

The method can ensure the accuracy and reduce the manpower required in the marking process.

In addition, for static objects, when multi-frame images in a time domain are accumulated, the multi-frame objects are directly merged by means of the detection object outer frame lines, the object detection step after point cloud merging can be omitted, the calculated amount is reduced, and the problem of unclear observation caused by the superposition of a large number of multi-point clouds can be avoided.

For the dynamic object in the preset time period after accumulation, the virtual image of the motion route in a time period after accumulation is completed still needs to be detected frame by frame, and the outlier cannot be removed like a static object. And fitting all the frame images in the preset time period to calculate whether the motion of the object conforms to the motion model or not. If the detection result is in line with the standard, the detection is considered to be correct and can be directly marked; if the fit does not match the motion model (e.g., differs significantly), then object detection or manual adjustment is noted and performed based on the above-described deep learning-based approach.

The present disclosure also provides an object labeling apparatus, referring to fig. 7, an object labeling apparatus 1000 according to an embodiment of the present disclosure includes:

the sensor data acquisition module 1002 is used for acquiring data acquired by a plurality of sensors, wherein the data comprises image data, laser point cloud data, vehicle body RTK pose sequence data and vehicle body IMU/wheel speed data;

the primary alignment module 1004 is used for performing primary space-time alignment on the image data and the laser point cloud data based on the time stamp of the image data, the time stamp of the laser point cloud data and the RTK pose sequence data of the vehicle body to obtain the laser point cloud data and the image data after the primary space-time alignment;

the data segmentation module 1006, the data segmentation module 1006 groups the laser point cloud data and the image data after the primary time-space alignment by a preset time length;

an RTK variance calculation/determination module 1008, the RTK variance calculation/determination module 1008 obtains an RTK variance corresponding to each segment data; judging whether vehicle body RTK position and position sequence data are subjected to position and position optimization by using vehicle body IMU/wheel speed data based on the RTK variance corresponding to each segment data so as to perform secondary time-space alignment on the image data and the laser point cloud data;

the secondary alignment module 1010 is used for performing pose optimization on the vehicle body RTK pose sequence data by using the vehicle body IMU/wheel speed data so as to perform secondary space-time alignment on the image data and the laser point cloud data and obtain the laser point cloud data and the image data after the secondary space-time alignment;

and the object labeling module 1014, wherein the object labeling module 1014 respectively performs 3D object detection and 2D object detection on the laser point cloud data and the image data so as to label the object.

Fig. 8 is a schematic configuration diagram of an object labeling apparatus 1000 according to still another embodiment of the present disclosure.

In addition to fig. 7, the object labeling apparatus 1000 according to this embodiment further includes a data accumulation module 1012, and the data accumulation module 1012 accumulates the laser point cloud data and the image data;

the object labeling module 1014 performs 3D object detection and 2D object detection on the accumulated laser point cloud data and the accumulated image data, respectively.

The object labeling apparatus of the present disclosure may be implemented in a manner based on a computer software program architecture.

The disclosed object labeling method is an automatic labeling method, and the technical scheme of the disclosure improves the accuracy of automatic labeling by utilizing a plurality of items of data including 3D laser point cloud data containing timestamps, video data of a plurality of cameras at different positions, RTK for vehicle positioning, IMU, wheel speed and related state data. In some embodiments of the present disclosure, a method of wheel speed fusion of RTK and IMU + is progressively used according to the RTK variance condition to complete time and space alignment of various data, and reduce the influence of asynchronization of laser radar acquisition frequency on accuracy. According to the technical scheme, the automatic laser 3D object detection and the image 2D object detection are carried out by adopting a deep learning algorithm, and the rapid automatic marking and optimal adjustment process of the 3D (laser point cloud 3D frame and video 3D frame) and the image 2D frame is completed by means of high-precision map RTK positioning data and a re-projection mode, so that the automatic marking efficiency is improved. In some embodiments of the disclosure, point cloud accumulation is performed in a multi-frame overlapping manner, static and dynamic objects in a scene are distinguished and labeled, labeling and optimal adjustment of all static objects of multiple frames are completed through static one-time adjustment, and labeling efficiency is improved.

The technical scheme disclosed by the invention can particularly finish the automatic marking of high-precision universality for high-precision map scenes.

Fig. 7 and 8 are schematic block diagrams of the structures of the object labeling apparatus according to the hardware implementation of the processing system of the present disclosure.

The apparatus may include corresponding means for performing each or several of the steps of the flowcharts described above. Thus, each step or several steps in the above-described flow charts may be performed by a respective module, and the apparatus may comprise one or more of these modules. The modules may be one or more hardware modules specifically configured to perform the respective steps, or implemented by a processor configured to perform the respective steps, or stored within a computer-readable medium for implementation by a processor, or by some combination.

The hardware architecture may be implemented using a bus architecture. The bus architecture may include any number of interconnecting buses and bridges depending on the specific application of the hardware and the overall design constraints. The bus 1100 couples various circuits including the one or more processors 1200, the memory 1300, and/or the hardware modules together. The bus 1100 may also connect various other circuits 1400, such as peripherals, voltage regulators, power management circuits, external antennas, and the like.

The bus 1100 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one connection line is shown, but no single bus or type of bus is shown.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present disclosure includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the implementations of the present disclosure. The processor performs the various methods and processes described above. For example, method embodiments in the present disclosure may be implemented as a software program tangibly embodied in a machine-readable medium, such as a memory. In some embodiments, some or all of the software program may be loaded and/or installed via memory and/or a communication interface. When the software program is loaded into memory and executed by a processor, one or more steps of the method described above may be performed. Alternatively, in other embodiments, the processor may be configured to perform one of the methods described above by any other suitable means (e.g., by means of firmware).

The logic and/or steps represented in the flowcharts or otherwise described herein may be embodied in any readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

For the purposes of this description, a "readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the readable storage medium include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). In addition, the readable storage medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in the memory.

It should be understood that portions of the present disclosure may be implemented in hardware, software, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps of the method implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a readable storage medium, and when executed, the program may include one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present disclosure may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

The present disclosure also provides an electronic device, including: a memory storing execution instructions; and a processor or other hardware module, which executes the execution instructions stored in the memory, so that the processor or other hardware module executes the object labeling method.

The disclosure also provides a readable storage medium, in which an execution instruction is stored, and the execution instruction is used for implementing the object labeling method when being executed by a processor.

In the description of the present specification, reference to the description of "one embodiment/implementation", "some embodiments/implementations", "examples", "specific examples", or "some examples", etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment/implementation or example is included in at least one embodiment/implementation or example of the present application. In this specification, the schematic representations of the terms described above are not necessarily the same embodiment/mode or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments/modes or examples. Furthermore, the various embodiments/aspects or examples and features of the various embodiments/aspects or examples described in this specification can be combined and combined by one skilled in the art without conflicting therewith.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.

It will be understood by those skilled in the art that the foregoing embodiments are provided merely for clarity of explanation and are not intended to limit the scope of the disclosure. Other variations or modifications may occur to those skilled in the art, based on the foregoing disclosure, and are still within the scope of the present disclosure.

Claims

1. An object labeling method, comprising:

s140, judging whether to perform pose optimization on the RTK pose sequence data of the vehicle body by using IMU/wheel speed data of the vehicle body so as to perform secondary space-time alignment on the image data and the laser point cloud data based on the RTK variance corresponding to each segment data, and obtaining the laser point cloud data and the image data after the secondary space-time alignment; and

and S160, respectively carrying out 3D object detection and 2D object detection on the laser point cloud data and the image data so as to carry out object labeling.

2. The object labeling method of claim 1, after the step S110, further comprising:

s115, accumulating the laser point cloud data and the image data;

3. The object labeling method according to claim 1 or 2, wherein the step S110 of acquiring data collected by a plurality of sensors comprises:

4. The object labeling method of claim 3, wherein the types of data collected by the plurality of sensors are recorded on a uniform time axis.

5. The object labeling method according to claim 1, wherein the step S120 of performing a primary time-space alignment on the image data and the laser point cloud data based on the timestamp of the image data, the timestamp of the laser point cloud data, and the RTK pose sequence data of the car body to obtain the laser point cloud data and the image data after the primary time-space alignment comprises:

s1206, acquiring a pose difference under a world coordinate system corresponding to the time difference; and

6. The object labeling method according to claim 2, wherein the step S120 of performing a primary time-space alignment on the image data and the laser point cloud data based on the timestamp of the image data, the timestamp of the laser point cloud data, and the RTK pose sequence data of the car body to obtain the laser point cloud data and the image data after the primary time-space alignment comprises:

s1202, for laser point cloud data at a certain moment, acquiring a time difference between the moment and the initial radar image acquisition moment in an accumulation time period;

7. The object labeling method according to claim 5 or 6, wherein the step S130 of segmenting the laser point cloud data and the image data after the initial time-space alignment by a preset time length to obtain the RTK variance corresponding to each segmented data comprises:

when the RTK variances are obtained, respectively counting the RTK variances of all the degrees of freedom;

preferably, S140, based on the RTK variance corresponding to each segment data, determining whether to perform pose optimization on the vehicle body RTK pose sequence data by using the vehicle body IMU/wheel speed data to perform secondary space-time alignment on the image data and the laser point cloud data, and obtaining the laser point cloud data and the image data after the secondary space-time alignment, includes:

when the RTK variance of each degree of freedom is smaller than or equal to a variance threshold value, determining that the position and orientation of the RTK position sequence data of the vehicle body are not optimized by using the IMU/wheel speed data of the vehicle body, otherwise, determining that the position and orientation of the RTK position sequence data of the vehicle body are optimized by using the IMU/wheel speed data of the vehicle body;

preferably, in step S140, performing pose optimization on the vehicle body RTK pose sequence data using the vehicle body IMU/wheel speed data, includes:

acquiring the translation amount delta x of the vehicle body from the starting point moment to the end point moment based on the wheel speed data; acquiring rotation angle differences delta yaw, delta pitch and delta roll of the vehicle body in the directions of x, y and z from the starting time to the end time based on the IMU data;

preferably, in step S140, performing pose optimization on the vehicle body RTK pose sequence data using the vehicle body IMU/wheel speed data further includes:

replacing the pose difference corresponding to the data of which the RTK variance exceeds the variance threshold value with the pose variation;

preferably, S160, respectively performing 3D object detection and 2D object detection on the laser point cloud data and the image data for object labeling, including:

detecting the 2D object in the image data by using a Cascade/Haar/SVM-based target detection algorithm, an R-CNN/Fast R-CNN-based candidate region/frame + deep learning classification algorithm or an RRC detection/Deformable CNN-based deep learning detection method; and

modeling by using the texture of Experts based on the weight of the feature map of the image data and the feature map of the laser point cloud data, and performing feature fusion through splicing operation to finish automatic detection of the 3D object and the 2D object;

preferably, the S160, respectively performing 3D object detection and 2D object detection on the laser point cloud data and the image data for object labeling, includes:

projecting the 3D object to a corresponding 2D object, judging whether the detected object is correct or not based on the projection contact ratio of 3D and 2D of the same object, if so, outputting the object, and if not, not outputting the object so as to label the object;

preferably, projecting the 3D object to the corresponding 2D object, determining whether the detected object is correct based on the 3D and 2D projected coincidence degree of the same object, and if so, outputting the object, and if not, not outputting the object, including:

if the projection contact ratio of a certain object is lower than a contact ratio threshold value, further detecting through pose optimization;

preferably, step S160 further includes:

based on the RTK pose sequence data of the vehicle body, projecting each vector object in the high-precision map to the laser point cloud data and updating the laser point cloud data and the image data in the image data so as to label the object;

preferably, based on the vehicle body RTK pose sequence data, projecting each vector object in the high-precision map to the laser point cloud data and updating the laser point cloud data and the image data in the image data so as to label the object, and the method comprises the following steps:

s1608, transforming the coordinates of each vector object in the vehicle body coordinate system to the camera coordinate system, and forming a projection in the image data; and

s1610, when the variance of the RTK data is smaller than or equal to a variance threshold, projecting each vector object in the high-precision map to the laser point cloud data and the image data, and when the variance of the RTK data is larger than the variance threshold, performing pose optimization based on the image data on the point cloud data to perform re-projection, and completing detection of a 3D object and a 2D object;

preferably, step S160 further includes:

overlapping the laser point cloud data in a preset time period to respectively mark the static object and the dynamic object;

preferably, the superimposing the laser point cloud data in the preset time period to label the static object and the dynamic object respectively, includes:

8. An object labeling apparatus, comprising:

the RTK variance calculating/judging module is used for acquiring the RTK variance corresponding to each segment data; judging whether vehicle body RTK position and position sequence data are subjected to position and position optimization by using vehicle body IMU/wheel speed data based on the RTK variance corresponding to each segment data so as to perform secondary time-space alignment on the image data and the laser point cloud data;

the secondary alignment module performs pose optimization on the vehicle body RTK pose sequence data by using vehicle body IMU/wheel speed data so as to perform secondary space-time alignment on the image data and the laser point cloud data and obtain the laser point cloud data and the image data after the secondary space-time alignment; and

the object labeling module is used for respectively carrying out 3D object detection and 2D object detection on the laser point cloud data and the image data so as to label the object;

preferably, the method further comprises the following steps:

9. An electronic device, comprising:

a memory storing execution instructions; and

a processor executing execution instructions stored by the memory to cause the processor to perform the object labeling method of any of claims 1 to 7.

10. A readable storage medium, wherein an execution instruction is stored in the readable storage medium, and when executed by a processor, the execution instruction is used for implementing the object labeling method according to any one of claims 1 to 7.