CN115994934A - Data time alignment method and device and domain controller - Google Patents

Data time alignment method and device and domain controller Download PDF

Info

Publication number
CN115994934A
CN115994934A CN202310251493.5A CN202310251493A CN115994934A CN 115994934 A CN115994934 A CN 115994934A CN 202310251493 A CN202310251493 A CN 202310251493A CN 115994934 A CN115994934 A CN 115994934A
Authority
CN
China
Prior art keywords
target
sequence
image
point cloud
image sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310251493.5A
Other languages
Chinese (zh)
Other versions
CN115994934B (en
Inventor
董静毅
钱鑫明
王靖博
张兆年
周飞
邵展坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foss Hangzhou Intelligent Technology Co Ltd
Original Assignee
Foss Hangzhou Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foss Hangzhou Intelligent Technology Co Ltd filed Critical Foss Hangzhou Intelligent Technology Co Ltd
Priority to CN202310251493.5A priority Critical patent/CN115994934B/en
Publication of CN115994934A publication Critical patent/CN115994934A/en
Application granted granted Critical
Publication of CN115994934B publication Critical patent/CN115994934B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application relates to a data time alignment method, a data time alignment device and a domain controller. The method comprises the steps of obtaining an image sequence and a point cloud sequence of a target environment, wherein the image sequence comprises a plurality of continuous image frames, and the point cloud sequence comprises a plurality of continuous point cloud data; performing motion feature matching on the image sequence and a depth image sequence corresponding to the point cloud sequence, and determining a corresponding target image frame and a corresponding target depth image frame; the image sequence and the point cloud sequence are time aligned based on the target image frame and the target depth image frame. The data time alignment method provided by the embodiment of the application can time align the image sequence and the point cloud sequence without relying on a time stamp or the same trigger pulse, so that the resource cost can be saved, and the application range is wider.

Description

Data time alignment method and device and domain controller
Technical Field
The present disclosure relates to the field of radar data processing technologies, and in particular, to a data time alignment method, a device, and a domain controller.
Background
At present, the intelligent driving function of a vehicle depends on the perception of one or more sensors, and if a single sensor is used, the collected traffic data has the defects of missing, low quality and the like due to the limitation of the characteristics of the single sensor. In such a background, in order to improve the recognition degree and the comprehensiveness of data, it is also an indispensable means to perform fusion analysis on data of a plurality of sensors, such as image data collected by a camera and point cloud data collected by a radar. However, in the data acquisition process, it is limited by objective reasons such as acquisition environment, for example: the starting time of the sensors is different, the acquisition frequency is different, and the like, and the data from different sensors can be in the condition of time asynchronism, so that the data fusion of the multisource sensors is tested.
In the related art, time alignment methods during multi-source sensor data fusion are generally divided into two types, one type is to add time stamp information into data acquired by a plurality of sensors, and time alignment is performed according to the time stamp information; triggering for setting up a plurality of sensors is triggered by a pulse carried by a unified global navigation satellite system (such as the GPS in the united states). Both of these approaches have the disadvantages of additional processing cost and smaller application range, and the accuracy of alignment is limited by the accuracy of the original data timestamp information or the trigger itself.
Therefore, a data time alignment method with wide application range and high accuracy is needed in the related art.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data time alignment method, apparatus, and domain controller capable of improving alignment accuracy.
In a first aspect, embodiments of the present application provide a data time alignment method, the method including:
acquiring an image sequence and a point cloud sequence of a target environment, wherein the image sequence comprises a plurality of continuous image frames, and the point cloud sequence comprises a plurality of continuous point cloud data;
performing motion feature matching on the image sequence and a depth image sequence corresponding to the point cloud sequence, and determining a corresponding target image frame and a corresponding target depth image frame;
The image sequence and the point cloud sequence are time aligned based on the target image frame and the target depth image frame.
According to the data time alignment method provided by the embodiment of the application, the target image frames and the target depth image frames which can be aligned can be determined based on the motion information of the targets contained in the image sequence and the point cloud sequence, and then the image sequence and the point cloud sequence can be aligned one by one. Since the motion information of the target object is not changed by different acquisition devices, the determined target image frame and the determined target depth image frame are more accurate. Moreover, the image sequence and the point cloud sequence can be time aligned without relying on a time stamp or the same trigger pulse, so that the resource cost can be saved, and the application range is wider.
Optionally, in an embodiment of the present application, the matching the motion characteristics of the image sequence and the depth image sequence corresponding to the point cloud sequence to determine a corresponding target image frame and a target depth image frame includes:
determining a depth image sequence corresponding to the point cloud sequence;
Respectively determining a first motion track corresponding to a plurality of first targets in the image sequence and a second motion track corresponding to a plurality of second targets in the depth image sequence;
and performing motion feature matching on the image sequence and the depth image sequence based on the first motion tracks and the second motion tracks, and determining corresponding target image frames and target depth image frames.
Optionally, in an embodiment of the present application, the performing motion feature matching on the image sequence and the depth image sequence based on the first motion track and the second motion track, and determining the corresponding target image frame and the target depth image frame includes:
determining a target image sequence and at least one matched target depth image sequence according to the first motion trajectories and the second motion trajectories;
and selecting corresponding target image frames and target depth image frames from the target image sequence and the at least one target depth image sequence.
Optionally, in an embodiment of the present application, the selecting the corresponding target image frame and the target depth image frame from the target image sequence and the at least one target depth image sequence includes:
And performing target matching on the target image sequence and the at least one target depth image sequence, and determining corresponding target image frames and target depth image frames.
Optionally, in an embodiment of the present application, the performing object matching on the object image sequence and the at least one object depth image sequence, determining a corresponding object image frame and a corresponding object depth image frame includes:
determining a target image frame in the target image sequence;
performing target matching on the target image frames and the candidate depth image frames corresponding to the target depth image sequences, and determining at least one group of target pairs; the target pair comprises a matched reference first target and a matched reference second target;
and determining a target depth image frame corresponding to the target image frame under the condition that the number of the target pairs meets the preset requirement.
Optionally, in an embodiment of the present application, the determining the target image frame in the target image sequence includes:
respectively determining the number of first targets contained in each frame of image frame in the target image sequence;
based on the first motion tracks, respectively determining the total displacement amount of the first targets in the image frames of each frame relative to the image frame of the previous frame;
And taking the image frames of which the number of the first targets is larger than a preset number threshold value and/or the total displacement is larger than a preset total displacement threshold value as target image frames.
Optionally, in an embodiment of the present application, performing object matching on the target image frame and a candidate depth image frame corresponding to each of the target depth image sequences, determining at least one group of object pairs includes:
determining a difference in position of a reference first object in the object image frame and each second object in the candidate depth image frame, respectively;
and under the condition that the position difference is smaller than a preset position difference threshold value, determining a group of target pairs consisting of the reference first target and the matched reference second target.
Optionally, in an embodiment of the present application, the determining a first motion trajectory of the plurality of first targets in the image sequence includes:
performing multi-target tracking on the image sequence, and determining a plurality of first targets which are successfully tracked;
a first motion trajectory of the first object in the image sequence is determined based on tracking positions of the first object in a plurality of consecutive image frames.
Optionally, in an embodiment of the present application, the performing multi-object tracking on the image sequence includes:
Respectively inputting the plurality of continuous image frames into a target detection model, and outputting the position information and the identification information of a plurality of first candidate targets in the image frames through the target detection model;
and performing multi-target tracking on the image sequence based on the position information and the identification information of the first candidate targets.
Optionally, in an embodiment of the present application, the determining a depth image sequence corresponding to the point cloud sequence includes:
performing plane fitting on the point cloud data to determine a target point cloud;
clustering the target point clouds, and determining each point cloud set and a corresponding target boundary box;
and constructing a depth image sequence under the same view angle with the image sequence based on the target boundary box and the point cloud set.
Optionally, in an embodiment of the present application, the constructing a depth image sequence at a same view angle as the image sequence based on the target bounding box and the point cloud set includes:
acquiring internal parameters of a camera and internal parameters of a radar, wherein the camera is used for acquiring the image sequence and the radar is used for acquiring the point cloud sequence;
determining a field angle of the camera according to the camera internal parameters and the radar internal parameters;
And constructing a depth image sequence under the same view angle as the image sequence based on the target bounding box, the point cloud set and the view angle.
In a second aspect, embodiments of the present application provide a data time alignment method apparatus, where the apparatus includes:
the acquisition module is used for acquiring an image sequence and a point cloud sequence of the target environment, wherein the image sequence comprises a plurality of continuous image frames, and the point cloud sequence comprises a plurality of continuous point cloud data;
the feature matching module is used for matching the motion features of the image sequence and the depth image sequence corresponding to the point cloud sequence, and determining a corresponding target image frame and a corresponding target depth image frame;
and the time alignment module is used for performing time alignment on the image sequence and the point cloud sequence based on the target image frame and the target depth image frame.
In a third aspect, embodiments of the present application provide a domain controller, comprising a memory storing a computer program and a processor implementing the steps of the method described in the above embodiments when the processor executes the computer program.
Drawings
Fig. 1 is an application scenario diagram provided in one embodiment of the present application;
Fig. 2 is an application scenario diagram provided in another embodiment of the present application;
FIG. 3 is a method flow chart of a data time alignment method according to one embodiment of the present application;
FIG. 4 is a schematic diagram of input and output of an object detection model according to an embodiment of the present application;
FIG. 5 is a schematic illustration of an circumscribed matrix according to an embodiment of the present application;
fig. 6 is a schematic block diagram of a data alignment device according to an embodiment of the present disclosure;
FIG. 7 is a schematic block diagram of a domain controller according to an embodiment of the present disclosure;
fig. 8 is a conceptual partial view of a computer program product provided by embodiments of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described and illustrated below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden on the person of ordinary skill in the art based on the embodiments provided herein, are intended to be within the scope of the present application. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the embodiments described herein can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar terms herein do not denote a limitation of quantity, but rather denote the singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means greater than or equal to two. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, devices, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the subject matter of the present application.
In order to clearly show the technical solutions of the various embodiments of the present application, one of the exemplary scenarios of the embodiments of the present application is described below by means of fig. 1.
The data time alignment method provided by the embodiment of the application can be applied to an application scene including but not limited to the application scene shown in fig. 1. As shown in fig. 1, the scene includes an image acquisition device 101, a point cloud acquisition device 103, and a data time alignment device 105. The image acquisition device 101 may in particular be a camera including, but not limited to, a monocular camera, a multi-view camera, a depth camera, etc. The point cloud acquisition device 103 may specifically be a lidar, including a single-line lidar, a multi-line lidar. The image acquisition device 101, the point cloud acquisition device 103 may communicate via a network with the data time alignment device 105. The data time alignment device 105 may be a processing device with data processing capabilities and data transceiving capabilities, the processing device having a central processing unit (Central Processing Unit, CPU) and/or a Graphics processor (Graphics ProcessingUnit, GPU) for processing the image data acquired by the image acquisition device 101 and the point cloud data acquired by the point cloud acquisition device 103, thereby achieving time alignment of the image data and the point cloud data. It should be noted that the data time alignment device 105 may be a physical device or a physical device cluster, such as a terminal, a server, or a server cluster. Of course, the data time alignment device 105 may also be a virtualized cloud device, such as at least one cloud computing device in a cloud computing cluster.
In a specific implementation, the image capturing device 101 may capture image data of a target environment, and the point cloud capturing device 103 may capture point cloud data in the target environment, such as image data and point cloud data of the same road segment. Note that, the acquisition timings of the image acquisition apparatus 101 and the point cloud acquisition apparatus 103 may be different. The image acquisition device 101 then transmits the image data to the data time alignment device 105 and the point cloud acquisition device 103 transmits the point cloud data to the data time alignment device 105. The data time alignment device 105, after receiving the image data and the point cloud data, may align the image data and the point cloud data in a time domain based on motion characteristics of the image data and the point cloud data.
In another implementation scenario, as shown in FIG. 2, the data time alignment device 105 may be specifically a vehicle 200. The vehicle 200 may have mounted therein one or more sensors of a lidar, a camera, a global navigation satellite system (Global Navigation Satellite System, GNSS), an Inertial measurement unit (Inertial MeasurementUnit, IMU), and the like. In the vehicle 200, the data time alignment device 105 may be provided in a domain controller (Domain Control Unit, DCU), and the lidar and the camera may transmit acquired data to the domain controller and time-align the image data and the point cloud data by the domain controller. In one embodiment of the present application, as shown in fig. 2, a plurality of image capturing devices 101 and point cloud capturing devices 103 may be installed on a vehicle 200, and specific installation positions may include a front side, a rear side, two sides, and the like of the intelligent vehicle 200, so as to capture looking-around images and three-dimensional point cloud data of a plurality of angles around the vehicle 200, where the number and installation positions of the image capturing devices 101 and the point cloud capturing devices 103 on the vehicle 200 are not limited. After the vehicle 200 acquires the image data acquired by the image acquisition device 101 and the point cloud data acquired by the point cloud acquisition device 103, the image data and the point cloud data can be aligned in time and subjected to fusion analysis, so that driving decisions such as route planning, obstacle avoidance and the like can be realized.
The data time alignment method described in the present application is described in detail below with reference to the accompanying drawings. Fig. 3 is a method flow diagram of an embodiment of a data time alignment method provided herein. Although the present application provides method operational steps as illustrated in the following examples or figures, more or fewer operational steps may be included in the method, either on a routine or non-inventive basis. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided in the embodiments of the present application. The methods may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment) in accordance with the methods shown in the embodiments or figures, during actual data time alignment or when the apparatus is executing.
Specifically, an embodiment of a data time alignment method provided in the present application is shown in fig. 3, where the method may include:
s301: an image sequence of a target environment is acquired, the image sequence comprising a plurality of consecutive image frames, and a point cloud sequence comprising a plurality of consecutive point cloud data.
In the embodiment of the application, the image sequence and the point cloud sequence of the target environment can be acquired through the acquisition vehicle. The acquisition vehicle may be deployed with an image acquisition device 101 and a point cloud acquisition device 103, and may, of course, further include one or more other sensors such as a global navigation satellite system (Global Navigation Satellite System, GNSS), an Inertial measurement unit (Inertial MeasurementUnit, IMU), etc. to record information such as time and location of acquiring an image sequence or a point cloud sequence. Specifically, the image capturing apparatus 101 is mainly used for capturing images of objects such as pedestrians, vehicles, roads, and greens in a target environment. The storage format of the image may include BMP, JPEG, PNG, SVG or any other format. The types of images may include RGB images, gray scale images, binary images, and the like. The point cloud collecting device 103 is mainly used for collecting point cloud data of the same target environment, and because the point cloud collecting device 103 such as a laser radar can accurately reflect position information, the width of a road surface, the height of pedestrians, the width of a vehicle, the height of a signal lamp and some other information can be obtained through the point cloud collecting device 103. GNSS may be used to record the coordinates of the currently acquired image, point cloud data. The IMU is mainly used for recording and collecting angle and acceleration information of the vehicle and the like. It should be noted that the image sequence may include a plurality of images of different angles of the target environment acquired at the same time, for example, 360-degree looking-around images acquired by using the plurality of image capturing devices 101 on the vehicle 200. The image sequence may further include a panoramic image stitched using the plurality of images at different angles, which is not limited herein. In one embodiment of the present application, the image sequence may include a plurality of consecutive image frames, which may include consecutive multi-frame images acquired at a preset frame rate, which may be set to, for example, 1fps, 5fps, 10fps, etc., without limitation. It should be noted that, in the embodiment of the present application, the process of acquiring the image sequence is a continuous process, and the image sequence may be acquired at intervals of a preset time. For example, the image capturing device 101 may be configured to upload an image or video captured in a previous minute to the data time alignment device 105 every minute, and process the uploaded image or video through the data time alignment device 105, where the preset time interval may be set to 1s, 2s, 5s, etc. according to actual requirements, which is not limited herein. Likewise, the point cloud sequence may comprise a plurality of consecutive point cloud data. The point cloud data may include three-dimensional coordinates and reflected intensities of objects in the object environment, which may be (x, y, z, intensity), for example. It is understood that the acquisition time of the image sequence and the acquisition time of the point cloud sequence may not be synchronized, i.e. the start frame of the image sequence and the acquisition time of the start point cloud data of the point cloud sequence are different. The acquisition frame rates of the point cloud acquisition device 103 and the image acquisition device 101 may be the same, for example, 10Hz, 15Hz, 20Hz, etc.
Of course, in other embodiments, the image sequence and the point cloud sequence of the target environment may also be acquired by other acquisition devices, for example, the acquisition devices may include a drive test acquisition device, and the drive test acquisition device may be provided with the image acquisition device 101 and the point cloud acquisition device 103. The acquisition device may further comprise a robot, on which the image acquisition device 101 and the point cloud acquisition device 103 may be mounted, which is not limited in this application.
S303: and performing motion feature matching on the image sequence and a depth image sequence corresponding to the point cloud sequence, and determining a corresponding target image frame and a corresponding target depth image frame.
In practical applications, the difference in the acquisition times of the image acquisition device 101 and the point cloud acquisition device 103 may cause the time of the image sequence and the point cloud sequence to be unsynchronized. Since the image capturing device 101 and the point cloud capturing device 103 are usually projected to the same coordinate system for analysis when fusion analysis is performed on the image data and the point cloud data, it is necessary to obtain the relative external parameters between the image capturing device 101 and the point cloud capturing device 103. When solving the relative external parameters, the data acquired by the two needs to be time aligned. In addition, since the image capturing apparatus 101 and the point cloud capturing apparatus 103 capture a scene as the same target environment. In the target environment, the moving state of the target object, such as the running speed of the vehicle, is not changed in the time domain by the difference of the acquisition devices. In other words, the movement states of the same object acquired by the image acquisition device 101 and the point cloud acquisition device 103 at the same timing are the same. Based on this, in the embodiment of the present application, according to the motion information included in the image sequence and the motion information included in the point cloud sequence, motion feature matching may be performed on the image sequence and the depth image sequence corresponding to the point cloud sequence, so as to determine the corresponding target image frame and the target depth image frame. The motion information included in the image sequence may be motion information of a plurality of first targets, such as surrounding vehicles, pedestrians, and the like, in the target environment in the image sequence, for example, position information and moving speed of the first targets. Similarly, the motion information included in the point cloud sequence may be motion information of a plurality of second targets in the target environment, such as surrounding vehicles, pedestrians, and the like, in the point cloud sequence, for example, may be azimuth information, distance information, and the like of the second targets.
In an embodiment of the present application, in order to obtain a more accurate target image frame and a target depth image frame that need to be aligned, the motion information may be combined with time information, that is, the acquisition time, to obtain motion trajectories of a plurality of first targets or second targets. The motion trail may be position change information of the first object or the second object at different acquisition moments. Based on the above, in an embodiment of the present application, motion feature matching may be performed on the image sequence and a depth image sequence corresponding to the point cloud sequence according to a first motion track of a plurality of first targets in the image sequence and a second motion track of a plurality of second targets in the point cloud sequence. Specifically, in one embodiment of the present application, the matching the motion characteristics of the image sequence and the depth image sequence corresponding to the point cloud sequence to determine the corresponding target image frame and the target depth image frame includes:
s401: determining a depth image sequence corresponding to the point cloud sequence;
s403: respectively determining a first motion track corresponding to a plurality of first targets in the image sequence and a second motion track corresponding to a plurality of second targets in the depth image sequence;
S405: and performing motion feature matching on the image sequence and the depth image sequence based on the first motion track and the second motion track, and determining a corresponding target image frame and a corresponding target depth image frame.
In this embodiment of the present invention, since the point cloud data is generally three-dimensional point cloud data and the image data is generally two-dimensional image, in order to improve efficiency and accuracy of motion feature matching, a plurality of point cloud data included in the point cloud sequence may be projected to obtain a corresponding depth image sequence. For example, a depth image sequence corresponding to the point cloud sequence may be obtained using a projection principle and an orthogonality theorem. Specifically, in one embodiment of the present application, the determining the depth image sequence corresponding to the point cloud sequence includes:
s501: performing plane fitting on the point cloud data to determine a target point cloud;
s503: clustering the target point clouds, and determining each point cloud set and a corresponding target boundary box;
s505: and constructing a depth image sequence under the same view angle with the image sequence based on the target boundary box and the point cloud set.
In the embodiment of the application, the point cloud data can be preprocessed before being processed, so that the accuracy of the point cloud data is improved or the processing efficiency is improved. For example, in one embodiment of the present application, the point cloud data may be filtered with a filter to remove outliers, hashed points, and the like in the point cloud data. The filter may include, for example, a pass filter, a statistical filter, a gaussian filter, and the like. Of course, in other embodiments of the present application, if the volume of the data of the point cloud data is large, in order to improve the operation efficiency, the point cloud data may be further downsampled (downsampled), for example, the point cloud data may be downsampled by using a voxel filter, a uniform sampling filter, or the like. By downsampling the point cloud data, the scale of the point cloud data can be reduced, and the overall geometric and topological characteristic description accuracy of the point cloud data is ensured not to be obviously reduced.
In the embodiment of the application, the point cloud data can be processed by using a plane fitting method, and the ground point cloud is removed to determine the target point cloud corresponding to the point cloud data. Specifically, the plane equation corresponding to the point cloud data may be determined by using the plane fitting method. The ground point cloud may then be cut using a pass-through filter, leaving the road area with only the target point cloud. Wherein the plane fitting method may include a least squares (Ordinary Least Sqaure) OLS based minimum plane fitting method, a random sample consensus (Random Sample Consensus, RANSAC) based plane fitting method. In one embodiment of the present application, after determining the target point cloud, a clustering algorithm may be used to cluster the target point cloud to divide the target point cloud into different point cloud sets, and determine a boundary area of the point cloud set, that is, a target bounding box. The clustering algorithm may include kmeans clustering algorithm, DBSCAN clustering algorithm, euclidean clustering algorithm, KD-Tree algorithm, and the like. For example, in one example, a certain point P in space may be first selected from the target point cloud, k points closest to the point P are found by using a KD-Tree neighbor search algorithm and added to the set Q, so as to obtain each point cloud set and a target bounding box corresponding to each point cloud set. In one embodiment of the present application, after determining each of the point cloud set and the target bounding box, a depth Image (Deep Image) under the same view field as each Image frame in the Image sequence may be constructed. Specifically, in one embodiment of the present application, the constructing a depth image sequence under the same view angle as the image sequence based on the target bounding box and the point cloud set may include:
S601: acquiring internal parameters of a camera and internal parameters of a radar, wherein the camera is used for acquiring an image sequence and the radar is used for acquiring a point cloud sequence;
s603: determining a field angle of the camera according to the camera internal parameters and the radar internal parameters;
s605: and constructing a depth image sequence under the same view angle as the image sequence based on the target bounding box, the point cloud set and the view angle.
In this embodiment of the present application, the internal parameters of the camera (camera parameters) may be parameters related to the characteristics of the camera, for example, the focal length, pixel size, image width, etc. of the camera, for example, the acquired internal parameters of the camera may be the image width w, and the focal length of the camera is f x . In an embodiment of the present application, the internal parameters of the camera may be determined according to the image information acquired by the camera, and may also be acquired by a checkerboard calibration method. Likewise, the radar internal parameter (radar internal parameter) may be a parameter related to the characteristics of the radar itself, for example, a focal length, aperture, point cloud coordinates, vertical resolution, horizontal resolution, camera center shift, etc. of the radar, for example, the acquired radar internal parameter may be a parameter having a vertical resolution v res A horizontal resolution of h res The point cloud coordinates are (x, y, z). The radar internal parameters may be obtained directly or determined by a simple algorithm, without limitation.
In one embodiment of the present application, after determining the radar internal parameters and the camera internal parameters, a camera field of view angle (Fov) may be determined. For example, the angle of view Fov =2tan can be determined -1 [w/(2f x )]. In one embodiment of the present application, after each of the point cloud sets is acquired, the point cloud set may be projected into a forward-looking depth image through a projection algorithm. Specifically, in one example, a depth of the set of point clouds may be first determined, such as
Figure SMS_1
. Thereafter, the radian conversion factor corresponding to the vertical resolution and the horizontal resolution may be determined. For example, the vertical radian conversion factor may be +.>
Figure SMS_2
The horizontal radian conversion factor may be +.>
Figure SMS_3
. Then, the transformed image coordinates (u, v) can be determined according to the radian transformation coefficient and the coordinates of the point cloud sets, for example, the u coordinates of the image can be u=tan -1 (-y/x)/v res The v-coordinate of the image may be +.>
Figure SMS_4
. And finally, drawing a depth image sequence corresponding to the point cloud sequence according to each image coordinate.
In this embodiment of the present application, object detection may be performed on each frame of image frame included in the image sequence, and the first motion trail of each first object may be determined according to position information of a target frame corresponding to each first object. For example, the first motion trajectory a of the first target a may be determined as { a, b, c, d, e }. Of course, in other embodiments of the present application, in order to improve accuracy of the first motion trail, so as to more accurately describe a change of a position of each first target in different image frames, multi-target tracking may also be performed on the image sequence, and a first motion trail of a first target that is successfully tracked may be determined. Specifically, the determining a first motion trajectory of the plurality of first targets in the image sequence includes:
s701: performing multi-target tracking on the image sequence, and determining a plurality of first targets which are successfully tracked;
s703: a first motion trajectory of the first object in the image sequence is determined based on tracking positions of the first object in a plurality of consecutive image frames.
In this embodiment of the present application, a Multi-Object Tracking (Multi-Object Tracking) algorithm may be used to process the image sequence to determine a plurality of first objects that are successfully tracked and a first motion track corresponding to the first objects. In particular, the multi-target tracking algorithm may include SORT (Simple Online And Realtime Tracking) algorithm, deep SORT algorithm, and the like. Since the object processed by the multi-object tracking algorithm is the object frame of the data, such as the first object, to complete the object detection. Thus, in one embodiment of the present application, prior to multi-object tracking of the image sequence, object detection may be performed on the image sequence to determine an object frame of a plurality of first candidate objects in each of the image frames. Specifically, the performing multi-target tracking on the image sequence includes:
S801: respectively inputting the plurality of continuous image frames into a target detection model, and outputting the position information and the identification information of a plurality of first candidate targets in the image frames through the target detection model;
s803: and performing multi-target tracking on the image sequence based on the position information and the identification information of the first candidate targets.
In an embodiment of the present application, the target detection model may include a model trained by means of machine learning. For example, the object detection model may include convolutional neural network models (Convolutional Neural Networks, CNN), alexNet, YOLO, resNet, resNet1001 (pre-activation), hourglass, inception, xception, SENet, and so on. Before the multi-frame image frame is input to the target detection model, in order to eliminate irrelevant information in the image sample sequence, the multi-frame image frame may be subjected to image preprocessing to improve detection efficiency and accuracy. The image preprocessing may include an image standard normalization process, a smoothing process, a denoising process, an image enhancement process, an image resolution normalization process, and the like. Specifically, in one example, the R, G, B three channels of image frames in the image sequence may be normalized and scaled uniformly to
Figure SMS_5
Size of the dimension. In one embodiment of the present application, as shown in fig. 4, the object detection model 400 may be used to determine the position information of the first object in the successive image frames of the frames. The object detection model 400 may include a feature extraction network layer 401, a full connectivity layer 403, and so on. The feature extraction network layer 401 may perform feature extraction on the image frame to obtain a feature map of the image frame. The feature map may include feature information such as gray scale, edge, texture, color, gradient histogram, etc. of the first object. After that, the full connection layer may classify and locate the feature map, and finally the object detection model 400 may output the location information and identification information of the first candidate object in the image frame. In one embodiment of the present application, the object detection model 400 may be trained using a plurality of image samples in which the location of the first object may be marked with a marking box. In one embodiment of the present application, in order to obtain a more accurate target detection result, an intersection ratio between output target frames may be calculated during the model training process. Thereafter, a non-value suppression algorithm may be employed to filter repeated frames of the same object and combine the results into the original image to more finely output the object category and bounding box location in the image.
In this embodiment of the present application, after determining the target detection result of the image frame of each frame, multi-target tracking may be performed on the image sequence according to the target detection result. The following describes a multi-target tracking process using a multi-target tracking algorithm as an example of the SORT algorithm. Specifically, the object detection model 400 may be used to perform object detection on a first image frame in the image sequence, so as to obtain the classification and the position (assuming that there are M objects) of all the first candidate objects in the first image frame, and label a unique identification information. Then, the position of each first candidate target in the next frame, namely the predicted position, can be predicted by using an initialized Kalman filter; then, the object detection model 400 may be used to perform object detection on a second image frame in the image sequence, so as to obtain classification and positions (i.e., observation positions) of all the first candidate objects in the second image frame (assuming that there are N objects), calculate a merging ratio (Intersection Over Union, IOU) between M first candidate objects in the first image frame and N first candidate objects in the second image frame, and establish a cost matrix based on the merging ratio. The overlap ratio may include an overlap ratio representing two label boxes. And then, optimizing and solving the cost matrix through a Hungary algorithm, and outputting a plurality of first target matching pairs with the minimum global cost, wherein the first candidate targets on matching are first targets for successfully realizing tracking. In one embodiment of the present application, the error matrix and the kalman gain of the kalman filter may be updated by using the observed positions of the first targets that are matched in the second frame image frame, and the predicted positions of the first candidate targets in the next frame may be calculated by using the updated kalman filter, where the kalman filter may be reinitialized for the first candidate targets that are not matched in the second frame image frame. It can be appreciated that each image frame in the image sequence, such as the third image frame, the fourth image frame, etc., may be similarly processed according to the method of the first image frame and the second image frame, which is not described herein.
On this basis, in one embodiment of the present application, for a first target that successfully achieves tracking, a tracking position of the first target may be determined according to the predicted position and the observed position. For example, a weighted value of the observed position and the predicted position may be used as the tracking position. In one embodiment of the present application, after determining the tracking position of the first target, the first motion trajectory of the first target may be determined according to the tracking position of the first target in the consecutive multi-frame image frames. For example, the first motion trajectory may be
Figure SMS_6
Wherein said p t May be the tracking position of the first object in the image frame acquired at the acquisition instant t. In other embodiments of the present applicationIn an embodiment, in order to describe the first motion trajectory of the first object more vividly, the first motion trajectory of the first object may be determined according to the displacement amount of the first object in the front and rear image frames. For example, the first motion trajectory may be
Figure SMS_7
Wherein->
Figure SMS_8
. Of course, in order to unify the forms of the first motion trajectories of the plurality of first targets so as to facilitate subsequent motion feature matching, in an embodiment of the present application, normalization processing may be performed on the first motion trajectories, for example, the normalized first motion trajectories may be- >
Figure SMS_9
. Wherein the L may be used to represent the number of frames of the image in which the first target continuously appears. The specific value of L may be set by the user based on the stability of tracking and the efficiency of motion feature matching, and may be set to 3 frames, 5 frames, 10 frames, and so on, for example.
It should be noted that, the determining of the second motion track of the second object in the depth image sequence may refer to the determining method of the first motion track, that is, the object detection, the multi-object tracking, and the determination of the tracking position of the second object that is successfully tracked of the depth image sequence may refer to the processing method of the image sequence, which is not described herein.
By the embodiment, on the basis of target detection, more accurate position information of the first target in a plurality of image frames can be obtained by utilizing a multi-target tracking algorithm, and the determined first motion trail corresponding to the first target is more accurate, so that the accuracy of subsequent time alignment can be improved.
In this embodiment of the present application, after determining the first motion trajectories of the plurality of first targets in the image sequence and the second motion trajectories of the plurality of second targets in the depth image sequence, the corresponding target image frames and the target depth image frames may be determined based on the plurality of first motion trajectories and the plurality of second motion trajectories. For example, the similarity between the plurality of first motion trajectories and the plurality of second motion trajectories may be determined, and the corresponding target image frame and the corresponding target depth image frame may be determined when the similarity satisfies a preset requirement. In one embodiment of the present application, the similarity between the first motion trajectory and the second motion trajectory may include a difference in distance between tracking positions of the first object and the second object in the corresponding frame. In one embodiment of the present application, to reduce the probability of matching failure, the two sequences may be aligned first in a coarse manner and then in a fine manner, so as to improve the accuracy of time alignment. Specifically, the performing motion feature matching on the image sequence and the depth image sequence based on the first motion track and the second motion track, and determining the corresponding target image frame and the corresponding target depth image frame may include:
S901: determining a target image sequence and at least one matched target depth image sequence according to the first motion trajectories and the second motion trajectories;
s903: and selecting corresponding target image frames and target depth image frames from the target image sequence and the target depth image sequence.
In this embodiment of the present application, the matched target image sequence and the target depth image sequence may be determined according to distances between the plurality of first motion trajectories and the plurality of second motion trajectories. The number of the target depth image sequences can be one or a plurality of target depth image sequences. Wherein the Distance of the first and second motion trajectories may be determined using a Distance calculation method, which may include Cosine Distance (Cosine Distance), euclidean Distance (Euclidean Distance), chebyshev Distance (Chebyshev Distance), minkowski Distance (Minkowski Distance), and the like.
In an embodiment of the present application, a first motion track of one of the first targets may be selected from first motion tracks corresponding to a plurality of first targets, and compared with a second motion track of a plurality of second targets one by one as a reference first motion track. In one embodiment of the present application, the distance between the first motion trajectory in the target image sequence and the second motion trajectory in the target depth image sequence is less than a preset distance threshold. The preset distance threshold value can be set by a user according to actual application requirements. It is understood that the first motion trajectory of the first object in the object image sequence may be a first motion trajectory of all or part of the first object, and the second motion trajectory of the second object in the object depth image sequence may be a second motion trajectory of all or part of the second object. Stated another way, the target image sequence may be a complete image sequence or may be a subsequence of the image sequence. In the case that the target image sequence is a sub-sequence of the image sequence, the first motion trajectory may be tracking position information of the first target in the target image sequence. Of course, the target depth image sequence may be the entire depth image sequence, or may be a subsequence in the depth sequence.
For a more general understanding of the above description, a specific description will be given below with an example. For example, in one example, the first motion trajectory a of the first object a in the sequence of images is {11, 13, 15, 18}, the second motion trajectory of the second object B in the sequence of depth images is {4,6,8,9, 10, 13.4, 15.1, 17.9}, in the case where the preset distance threshold is set to 0.5, due to [ (13.4-13) 2 +(15.1-15) 2 +(17.9-18) 2 ]Less than 0.5, it is thus possible to determine that the target image sequence constituted by the 2,3,4 th frame in the image sequence matches the target depth image sequence constituted by the 6 th, 7 th, 8 th frame in the depth image sequence.
Of course, in practical applications, since the data volume of the point cloud data is generally larger than the data volume of the image data, that is, the number of image frames contained in the image sequence is smaller than the number of depth image frames contained in the depth image sequence. On the basis, in the process of matching motion characteristics of two sequences, at least one target depth image sequence matched with the image sequence can be determined based on a reference first motion track and distances among a plurality of second motion tracks, wherein the target depth image sequence can be a subsequence of the depth image sequence.
In this embodiment of the present application, after determining the target image sequence and the at least one target depth image sequence, one of the target depth image sequences may be selected from the at least one target depth image sequence, and then the target image frame and the target depth image frame may be selected from the target image sequence and the target depth image sequence. For example, in one embodiment of the present application, the image frames and depth image frames at corresponding positions in two sequences may be selected as the target image frame and the target depth image frame. For example, a start frame image frame in the target image sequence may be selected as the target image frame and a start frame depth image frame in the target depth image sequence may be selected as the target depth image frame. In another embodiment of the present application, a frame of image from the target image sequence may be selected as the target image frame and a frame of depth image from the target depth image sequence may be selected as the target depth image frame.
Through the embodiment, the image sequence and the depth image sequence can be roughly matched according to the motion trail, so that the target image sequence and the target depth image sequence which can be aligned can be determined. This corresponds to the selection of a small range of two sequences from a larger range of two sequences that can be aligned. The corresponding target image frame and target depth image frame may then be selected from the two sequences that may be aligned over a small range. Thus, the subsequent alignment result is more accurate and meets the application requirement.
In practical applications, since the target environment may include multiple targets, such as vehicles, pedestrians, etc. with different orientations, there may be multiple first targets in each frame of image frames and multiple second targets in each frame of depth image frames. In order to prevent the occurrence of a small probability event that the motion trajectories of different targets are similar in different time periods from affecting the matching accuracy of the target image frames and the target depth image frames, the target image sequence and the target depth image sequence can be subjected to target matching so as to determine more accurate corresponding target image frames and target depth image frames. Specifically, in one embodiment of the present application, the selecting the corresponding target image frame and the corresponding target depth image frame from the target image sequence and the target depth image sequence may include:
s1001: and performing target matching on the target image sequence and the target depth image sequence, and determining corresponding target image frames and target depth image frames.
In this embodiment of the present application, the target image sequence and the target depth image sequence may be subjected to target matching based on the plurality of first target frames output by the target detection model 400 and the plurality of second target frames output by the other target detection model. For example, the target matching result may be determined according to a difference in positions of the first target frame and the second target frame. In an embodiment of the present application, in order to obtain a clearer and accurate target matching result, a certain frame in the target image sequence may be matched with multiple frames of depth image frames in the target depth image one by one. Specifically, the performing the target matching on the target image sequence and the at least one target depth image sequence, and determining the corresponding target image frame and the target depth image frame may include:
S1101: determining a target image frame in the target image sequence;
s1103: performing target matching on the target image frames and the candidate depth image frames corresponding to the target depth image sequences, and determining at least one group of target pairs; the target pair comprises a matched reference first target and a matched reference second target;
s1105: and determining a target depth image frame corresponding to the target image frame under the condition that the number of the target pairs meets the preset requirement.
In this embodiment of the present application, one image frame may be randomly selected from the target image sequence as the target image frame, for example, a first image frame, a last image frame, and a certain image frame in the middle of the target image sequence are selected as the target image frames. Of course, in other embodiments of the present application, in order to reduce the matching times in the target matching process and improve the target matching efficiency, an image frame containing more motion information may be selected from the target image sequence as a target image frame. Specifically, the determining the target image frame in the target image sequence may include:
s1201: respectively determining the number of first targets contained in each frame of image frame in the target image sequence;
S1203: based on the first motion tracks, respectively determining the total displacement amount of the first targets in the image frames of each frame relative to the image frame of the previous frame;
s1205: and taking the image frames of which the number of the first targets is larger than a preset number threshold value and/or the total displacement is larger than a preset total displacement threshold value as target image frames.
In this embodiment of the present application, the number of first targets included in the image frames of each frame in the target sequence may be determined according to the output result of the target detection model 400. After determining the first motion trajectories corresponding to the plurality of first targets by using the methods described in the foregoing embodiments, tracking positions of the first targets in each frame of the target image sequence may be determined according to the first motion trajectories of the first targets. On the basis, the displacement of the first target in the current frame image frame can be determined according to the difference value between the tracking position of the first target in the current frame image frame and the tracking position of the previous frame image frame. It will be appreciated that the other said first targets in the current frame image frame may each determine the corresponding displacement amounts in the manner described above. In one embodiment of the present application, the total displacement amount of all the first targets in the current frame image frame may be determined according to the displacement amounts corresponding to all the first targets in the current frame image frame. In one embodiment of the present application, in a case where the number of first targets in the image frames in the target image sequence is greater than a preset number threshold, the image frame may be determined to be a target image frame. The preset number of thresholds may be set by a user according to alignment accuracy, and may be set to 3, 4, 5, or the like, for example.
In another embodiment of the present application, in a case where a total displacement amount of image frames in the target image sequence is greater than a preset total displacement amount threshold, the image frames may be determined to be the target image frames. Of course, in other embodiments of the present application, in order to determine the more accurate target image frame, the image frame may also be used as a target image frame when the number of the first targets in the image frame is greater than a preset number threshold and the total displacement is greater than a preset total displacement threshold.
In this embodiment of the present application, after determining the target image frame, the target image frame may be subjected to target matching with a candidate depth image frame corresponding to each of the target depth image sequences, so as to determine at least one group of target pairs. For example, after determining that the target image frame is the third frame image frame in the target image sequence, the third frame depth image frames in the plurality of target depth image sequences may be extracted as the candidate depth image frames, respectively. And then, respectively carrying out target matching on the target image frames and a plurality of candidate depth image frames one by one, and determining at least one group of target pairs. Specifically, a plurality of first targets in the target image frame and a plurality of second targets in the candidate depth image frame may be matched in pairs, and the successfully matched first targets and second targets are used as a group of target pairs. In one embodiment of the present application, the matching result of the first object and the second object may include a similarity of the first object and the second object. The similarity may include a distance of the first object and the second object. Of course, in other embodiments of the present application, a first target may be selected from the plurality of first targets as a reference first target, and then the reference first target is matched with a plurality of second targets, so as to determine a reference second target matched with the reference first target, and the reference second target is used as a group of target pairs successfully matched. Specifically, in one embodiment of the present application, performing object matching on the object image frame and a candidate depth image frame corresponding to each of the object depth image sequences, and determining at least one group of object pairs may include:
S1301: determining a difference in position of a reference first object in the object image frame and each second object in the candidate depth image frame, respectively;
s1303: and under the condition that the position difference is smaller than a preset position difference threshold value, determining a group of target pairs consisting of the reference first target and the matched reference second target.
In this embodiment of the present application, the position difference may be determined according to the overlapping degree of the bounding box of the first object and the bounding box of the second object, for example, may be an intersection ratio IOU of two bounding boxes, or may be a distance intersection point (Distance Intersection over Union, DIOU) on a union of two bounding boxes, a generalized intersection point (GeneralizedIntersection over Union, GIOU) on a union of two bounding boxes, or the like. In the case that the position difference is smaller than a preset position difference threshold, a reference second target that matches the reference first target may be determined. The preset position difference threshold may be set by the user according to the target environment and the moving speed of the acquisition device, and may be set to m, n, etc., for example.
In the embodiment of the present application, it may be determined that the target depth image frame corresponds to the target image frame when it is determined that the number of the target pairs meets a preset requirement. The number of object pairs meeting the requirement may include a number of object pairs paired in a second object in the object depth image frame and a first object in the object image frame being the largest or greater than a preset threshold. The preset threshold may be an average value, a maximum value, or the like of the number of target pairs in which the target image frame matches a plurality of depth image frames. It will be appreciated that in case the target depth image sequence is one, the candidate depth image frame may also be a multi-frame depth image frame in the target depth image sequence.
In the following, a specific example will be described, and as shown in fig. 5, a DIOU in which the difference in position is two bounding boxes is taken as an example, where the bounding box of the ith first object in the image sequence is M, the bounding box of the jth second object in the candidate depth image frame is N, and the circumscribed matrix of the bounding box M and the bounding box N may be C. The ith first target and the jth second target
Figure SMS_10
. Wherein said b i A center point of a bounding box representing the ith first object, the b j A center point of a bounding box representing said j-th second object, said +.>
Figure SMS_11
May be used to represent the euclidean distance between two center points. The C may be used to represent the diagonal length of the bounding rectangle C. In one embodiment of the present application, a cost matrix may be constructed based on a plurality of DIOUs, and an optimization solution is performed on the cost matrix by using a hungarian algorithm, and at least one set of target pairs with minimum global cost is output as a global optimal match between the image sequence and the point cloud sequence, where the matching cost describes the correlation between the target image frame and the candidate depth image frame, and a smaller matching cost indicates a higher correlation. Thereafter, a target depth image frame corresponding to the target image frame may be determined from the plurality of candidate depth image frames according to a plurality of matching costs. By the above embodiment, the degree of similarity between the target image frame and the plurality of candidate depth image frames can be determined according to the degree of overlapping between the plurality of first targets and the plurality of second targets, so that the target depth image frame which is most matched with the target image frame, namely, is closest in time can be determined according to the degree of similarity.
S305: the image sequence and the point cloud sequence are time aligned based on the target image frame and the target depth image frame.
In the embodiment of the present application, after determining the corresponding target image frame and the target depth image frame, the target image frame and the target depth image frame may be aligned in time. Thereafter, other image frames in the sequence of images may be aligned with other depth image frames in the sequence of depth images. In one embodiment of the present application, in the case where the acquisition frame rates of the image acquisition device 101 and the point cloud acquisition device 103 are the same, other image frames after the target image frame may be directly aligned with other depth image frames after the target depth image frame one by one.
In other embodiments of the present application, in the case where the acquisition frame rates of the image acquisition device 101 and the point cloud acquisition device 103 are different, the image sequence and the depth image sequence may be sampled according to a preset frame rate to determine a plurality of image frames and a plurality of depth image frames that need to be aligned, which is not limited herein.
According to the data time alignment method provided by the embodiment of the application, the target image frames and the target depth image frames which can be aligned can be determined based on the motion information of the targets contained in the image sequence and the point cloud sequence, and then the image sequence and the point cloud sequence can be aligned one by one. Since the motion information of the target object is not changed by different acquisition devices, the determined target image frame and the determined target depth image frame are more accurate. Moreover, the image sequence and the point cloud sequence can be time aligned without relying on a time stamp or the same trigger pulse, so that the resource cost can be saved, and the application range is wider.
Based on the same inventive concept, the embodiment of the application also provides a data time alignment device for realizing the above related data time alignment method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the data time alignment apparatus provided below may be referred to the limitation of the data time alignment method hereinabove, and will not be repeated here.
Specifically, in one embodiment of the present application, as shown in fig. 6, the data time alignment apparatus 600 may include:
an acquiring module 601, configured to acquire an image sequence of a target environment and a point cloud sequence, where the image sequence includes a plurality of continuous image frames, and the point cloud sequence includes a plurality of continuous point cloud data;
the feature matching module 603 is configured to perform motion feature matching on the image sequence and a depth image sequence corresponding to the point cloud sequence, and determine a corresponding target image frame and a corresponding target depth image frame;
a time alignment module 605 is configured to time align the image sequence and the point cloud sequence based on the target image frame and the target depth image frame.
Optionally, in an embodiment of the present application, the matching the motion characteristics of the image sequence and the depth image sequence corresponding to the point cloud sequence to determine a corresponding target image frame and a target depth image frame includes:
determining a depth image sequence corresponding to the point cloud sequence;
respectively determining a first motion track corresponding to a plurality of first targets in the image sequence and a second motion track corresponding to a plurality of second targets in the depth image sequence;
and performing motion feature matching on the image sequence and the depth image sequence based on the first motion tracks and the second motion tracks, and determining corresponding target image frames and target depth image frames.
Optionally, in an embodiment of the present application, the performing motion feature matching on the image sequence and the depth image sequence based on the first motion track and the second motion track, and determining the corresponding target image frame and the target depth image frame includes:
determining a target image sequence and at least one matched target depth image sequence according to the first motion trajectories and the second motion trajectories;
And selecting corresponding target image frames and target depth image frames from the target image sequence and the at least one target depth image sequence.
Optionally, in an embodiment of the present application, the selecting the corresponding target image frame and the target depth image frame from the target image sequence and the at least one target depth image sequence includes:
and performing target matching on the target image sequence and the at least one target depth image sequence, and determining corresponding target image frames and target depth image frames.
Optionally, in an embodiment of the present application, the performing object matching on the object image sequence and the at least one object depth image sequence, determining a corresponding object image frame and a corresponding object depth image frame includes:
determining a target image frame in the target image sequence;
performing target matching on the target image frames and the candidate depth image frames corresponding to the target depth image sequences, and determining at least one group of target pairs; the target pair comprises a matched reference first target and a matched reference second target;
and determining a target depth image frame corresponding to the target image frame under the condition that the number of the target pairs meets the preset requirement.
Optionally, in an embodiment of the present application, the determining the target image frame in the target image sequence includes:
respectively determining the number of first targets contained in each frame of image frame in the target image sequence;
based on the first motion tracks, respectively determining the total displacement amount of the first targets in the image frames of each frame relative to the image frame of the previous frame;
and taking the image frames of which the number of the first targets is larger than a preset number threshold value and/or the total displacement is larger than a preset total displacement threshold value as target image frames.
Optionally, in an embodiment of the present application, performing object matching on the target image frame and a candidate depth image frame corresponding to each of the target depth image sequences, determining at least one group of object pairs includes:
determining a difference in position of a reference first object in the object image frame and each second object in the candidate depth image frame, respectively;
and under the condition that the position difference is smaller than a preset position difference threshold value, determining a group of target pairs consisting of the reference first target and the matched reference second target.
Optionally, in an embodiment of the present application, the determining a first motion trajectory of the plurality of first targets in the image sequence includes:
Performing multi-target tracking on the image sequence, and determining a plurality of first targets which are successfully tracked;
a first motion trajectory of the first object in the image sequence is determined based on tracking positions of the first object in a plurality of consecutive image frames.
Optionally, in an embodiment of the present application, the performing multi-object tracking on the image sequence includes:
respectively inputting the plurality of continuous image frames into a target detection model, and outputting the position information and the identification information of a plurality of first candidate targets in the image frames through the target detection model;
and performing multi-target tracking on the image sequence based on the position information and the identification information of the first candidate targets.
Optionally, in an embodiment of the present application, the determining a depth image sequence corresponding to the point cloud sequence includes:
performing plane fitting on the point cloud data to determine a target point cloud;
clustering the target point clouds, and determining each point cloud set and a corresponding target boundary box;
and constructing a depth image sequence under the same view angle with the image sequence based on the target boundary box and the point cloud set.
Optionally, in an embodiment of the present application, the constructing a depth image sequence at a same view angle as the image sequence based on the target bounding box and the point cloud set includes:
acquiring internal parameters of a camera and internal parameters of a radar, wherein the camera is used for acquiring the image sequence and the radar is used for acquiring the point cloud sequence;
determining a field angle of the camera according to the camera internal parameters and the radar internal parameters;
and constructing a depth image sequence under the same view angle as the image sequence based on the target bounding box, the point cloud set and the view angle.
The data time alignment apparatus 600 according to the embodiments of the present application may correspond to performing the methods described in the embodiments of the present application, and the foregoing and other operations and/or functions of each module in the data time alignment apparatus 600 are respectively for implementing the corresponding flows of the methods provided in the foregoing embodiments, which are not repeated herein for brevity.
It should be further noted that the embodiments described above are merely illustrative, and that the modules described as separate components may or may not be physically separate, and that components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection therebetween, and can be specifically implemented as one or more communication buses or signal lines.
As shown in fig. 7, an embodiment of the present application further provides a domain controller 700, the domain controller 700 including: a processor and a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions. Domain controller 700 includes memory 701, processor 703, bus 705, and communication interface 707. The memory 701, the processor 703 and the communication interface 707 communicate via a bus 705. Bus 705 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industrystandard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 7, but not only one bus or one type of bus. The communication interface 707 is used for communication with the outside. The processor 703 may be a central processing unit (central processing unit, CPU). The memory 701 may include volatile memory (RAM), such as random access memory (random access memory). The memory 701 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, an HDD, or an SSD. The memory 701 has stored therein executable code that is executed by the processor 703 to perform the methods described in the foregoing embodiments.
Embodiments of the present application provide a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
Embodiments of the present application provide a computer program product comprising a computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
In some embodiments, the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or encoded on other non-transitory media or articles of manufacture. Fig. 8 schematically illustrates a conceptual partial view of an example computer program product comprising a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein. In one embodiment, the example computer program product 800 is provided using a signal bearing medium 801. The signal bearing medium 801 may include one or more program instructions 802 that when executed by one or more processors may provide the functionality or portions of the functionality described above with respect to fig. 1. Further, program instructions 802 in fig. 8 also describe example instructions.
In some examples, signal bearing medium 801 may comprise a computer readable medium 803 such as, but not limited to, a hard disk drive, compact Disk (CD), digital Video Disk (DVD), digital tape, memory, read-Only Memory (ROM), or random access Memory (Random Access Memory, RAM), among others. In some implementations, the signal bearing medium 801 may include a computer recordable medium 804 such as, but not limited to, memory, read/write (R/W) CD, R/W DVD, and the like. In some implementations, the signal bearing medium 801 may include a communication medium 805 such as, but not limited to, digital and/or analog communication media (e.g., fiber optic cable, waveguide, wired communications link, wireless communications link, etc.). Thus, for example, the signal bearing medium 801 may be conveyed by a wireless form of communication medium 805 (e.g., a wireless communication medium conforming to the IEEE 802.11 standard or other transmission protocol). The one or more program instructions 802 may be, for example, computer-executable instructions or logic-implemented instructions. In some examples, a computing device, such as the electronic device described with respect to fig. 6, may be configured to provide various operations, functions, or actions in response to program instructions 802 conveyed to the computing device through one or more of computer readable medium 803, computer recordable medium 804, and/or communication medium 805. It should be understood that the arrangement described herein is for illustrative purposes only. Thus, those skilled in the art will appreciate that other arrangements and other elements (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) can be used instead, and some elements may be omitted altogether depending on the desired results. In addition, many of the elements described are functional entities that may be implemented as discrete or distributed components, or in any suitable combination and location in conjunction with other components. The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, systems, and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by hardware (e.g., circuits or ASICs (Application Specific Integrated Circuit, application specific integrated circuits)) which perform the corresponding functions or acts, or combinations of hardware and software, such as firmware, etc.
Although the invention is described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (13)

1. A method of time alignment of data, the method comprising:
acquiring an image sequence and a point cloud sequence of a target environment, wherein the image sequence comprises a plurality of continuous image frames, and the point cloud sequence comprises a plurality of continuous point cloud data;
performing motion feature matching on the image sequence and a depth image sequence corresponding to the point cloud sequence, and determining a corresponding target image frame and a corresponding target depth image frame;
the image sequence and the point cloud sequence are time aligned based on the target image frame and the target depth image frame.
2. The method of claim 1, wherein the matching the motion characteristics of the image sequence to the depth image sequence corresponding to the point cloud sequence, determining the corresponding target image frame and the target depth image frame, comprises:
Determining a depth image sequence corresponding to the point cloud sequence;
respectively determining a first motion track corresponding to a plurality of first targets in the image sequence and a second motion track corresponding to a plurality of second targets in the depth image sequence;
and performing motion feature matching on the image sequence and the depth image sequence based on the first motion tracks and the second motion tracks, and determining corresponding target image frames and target depth image frames.
3. The method of claim 2, wherein the performing motion feature matching on the image sequence and the depth image sequence based on the first motion trajectory and the second motion trajectory, determining the corresponding target image frame and target depth image frame, comprises:
determining a target image sequence and at least one matched target depth image sequence according to the first motion trajectories and the second motion trajectories;
and selecting corresponding target image frames and target depth image frames from the target image sequence and the at least one target depth image sequence.
4. A method according to claim 3, wherein the selecting of the corresponding target image frame and target depth image frame from the sequence of target images and the at least one sequence of target depth images comprises:
And performing target matching on the target image sequence and the at least one target depth image sequence, and determining corresponding target image frames and target depth image frames.
5. The method of claim 4, wherein said object matching the sequence of object images with the at least one sequence of object depth images to determine corresponding object image frames and object depth image frames comprises:
determining a target image frame in the target image sequence;
performing target matching on the target image frames and the candidate depth image frames corresponding to the target depth image sequences, and determining at least one group of target pairs; the target pair comprises a matched reference first target and a matched reference second target;
and determining a target depth image frame corresponding to the target image frame under the condition that the number of the target pairs meets the preset requirement.
6. The method of claim 5, wherein the determining the target image frame in the target image sequence comprises:
respectively determining the number of first targets contained in each frame of image frame in the target image sequence;
based on the first motion tracks, respectively determining the total displacement amount of the first targets in the image frames of each frame relative to the image frame of the previous frame;
And taking the image frames of which the number of the first targets is larger than a preset number threshold value and/or the total displacement is larger than a preset total displacement threshold value as target image frames.
7. The method of claim 5, wherein object matching the object image frames with corresponding candidate depth image frames in each of the sequence of object depth images, determining at least one set of object pairs, comprises:
determining a difference in position of a reference first object in the object image frame and each second object in the candidate depth image frame, respectively;
and under the condition that the position difference is smaller than a preset position difference threshold value, determining a group of target pairs consisting of the reference first target and the matched reference second target.
8. The method of claim 2, wherein determining a first motion profile of a plurality of first objects in the sequence of images comprises:
performing multi-target tracking on the image sequence, and determining a plurality of first targets which are successfully tracked;
a first motion trajectory of the first object in the image sequence is determined based on tracking positions of the first object in a plurality of consecutive image frames.
9. The method of claim 8, wherein said multi-objective tracking of said image sequence comprises:
respectively inputting the plurality of continuous image frames into a target detection model, and outputting the position information and the identification information of a plurality of first candidate targets in the image frames through the target detection model;
and performing multi-target tracking on the image sequence based on the position information and the identification information of the first candidate targets.
10. The method of claim 2, wherein the determining the sequence of depth images corresponding to the sequence of point clouds comprises:
performing plane fitting on the point cloud data to determine a target point cloud;
clustering the target point clouds, and determining each point cloud set and a corresponding target boundary box;
and constructing a depth image sequence under the same view angle with the image sequence based on the target boundary box and the point cloud set.
11. The method of claim 10, wherein constructing a sequence of depth images at a same field angle as the sequence of images based on the target bounding box and the set of point clouds comprises:
acquiring internal parameters of a camera and internal parameters of a radar, wherein the camera is used for acquiring the image sequence and the radar is used for acquiring the point cloud sequence;
Determining a field angle of the camera according to the camera internal parameters and the radar internal parameters;
and constructing a depth image sequence under the same view angle as the image sequence based on the target bounding box, the point cloud set and the view angle.
12. A data time alignment method apparatus, the apparatus comprising:
the acquisition module is used for acquiring an image sequence and a point cloud sequence of the target environment, wherein the image sequence comprises a plurality of continuous image frames, and the point cloud sequence comprises a plurality of continuous point cloud data;
the feature matching module is used for matching the motion features of the image sequence and the depth image sequence corresponding to the point cloud sequence, and determining a corresponding target image frame and a corresponding target depth image frame;
and the time alignment module is used for performing time alignment on the image sequence and the point cloud sequence based on the target image frame and the target depth image frame.
13. A domain controller comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 11 when the computer program is executed.
CN202310251493.5A 2023-03-16 2023-03-16 Data time alignment method and device and domain controller Active CN115994934B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310251493.5A CN115994934B (en) 2023-03-16 2023-03-16 Data time alignment method and device and domain controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310251493.5A CN115994934B (en) 2023-03-16 2023-03-16 Data time alignment method and device and domain controller

Publications (2)

Publication Number Publication Date
CN115994934A true CN115994934A (en) 2023-04-21
CN115994934B CN115994934B (en) 2023-06-13

Family

ID=85992211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310251493.5A Active CN115994934B (en) 2023-03-16 2023-03-16 Data time alignment method and device and domain controller

Country Status (1)

Country Link
CN (1) CN115994934B (en)

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170286774A1 (en) * 2016-04-04 2017-10-05 Xerox Corporation Deep data association for online multi-class multi-object tracking
CN108280496A (en) * 2018-01-23 2018-07-13 四川精工伟达智能技术股份有限公司 Method for synchronizing time, device, medium based on RFID and electronic equipment
CN110942477A (en) * 2019-11-21 2020-03-31 大连理工大学 Method for depth map fusion by using binocular camera and laser radar
CN111538032A (en) * 2020-05-19 2020-08-14 北京数字绿土科技有限公司 Time synchronization method and device based on independent drawing tracks of camera and laser radar
CN111563487A (en) * 2020-07-14 2020-08-21 平安国际智慧城市科技股份有限公司 Dance scoring method based on gesture recognition model and related equipment
CN112580683A (en) * 2020-11-17 2021-03-30 中山大学 Multi-sensor data time alignment system and method based on cross correlation
CN112712051A (en) * 2021-01-12 2021-04-27 腾讯科技(深圳)有限公司 Object tracking method and device, computer equipment and storage medium
CN113848696A (en) * 2021-09-15 2021-12-28 北京易航远智科技有限公司 Multi-sensor time synchronization method based on position information
WO2022034245A1 (en) * 2020-08-14 2022-02-17 Topgolf Sweden Ab Motion based pre-processing of two-dimensional image data prior to three-dimensional object tracking with virtual time synchronization
CN114139370A (en) * 2021-11-29 2022-03-04 上海无线电设备研究所 Synchronous simulation method and system for optical engine and electromagnetic imaging dual-mode moving target
CN114185059A (en) * 2021-11-08 2022-03-15 哈尔滨工业大学(威海) Multi-radar fusion-based multi-person tracking system, method, medium and terminal
CN114357019A (en) * 2021-12-03 2022-04-15 同济大学 Method for monitoring data quality of road side sensing unit in intelligent networking environment
CN114519845A (en) * 2020-10-30 2022-05-20 北京万集科技股份有限公司 Multi-sensing data fusion method and device, computer equipment and storage medium
CN114520920A (en) * 2022-04-15 2022-05-20 北京凯利时科技有限公司 Multi-machine-position video synchronization method and system and computer program product
CN114926808A (en) * 2022-03-30 2022-08-19 吉林大学 Target detection and tracking method based on sensor fusion
CN115144828A (en) * 2022-07-05 2022-10-04 同济大学 Automatic online calibration method for intelligent automobile multi-sensor space-time fusion
CN115294168A (en) * 2022-07-12 2022-11-04 天翼云科技有限公司 Target tracking method and device and electronic equipment
CN115562499A (en) * 2022-11-16 2023-01-03 深圳市未来感知科技有限公司 Intelligent ring-based accurate interaction control method and system and storage medium
CN115655262A (en) * 2022-12-26 2023-01-31 广东省科学院智能制造研究所 Deep learning perception-based multi-level semantic map construction method and device
CN115657664A (en) * 2022-09-30 2023-01-31 华南农业大学 Path planning method, system, equipment and medium based on human teaching learning

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170286774A1 (en) * 2016-04-04 2017-10-05 Xerox Corporation Deep data association for online multi-class multi-object tracking
CN108280496A (en) * 2018-01-23 2018-07-13 四川精工伟达智能技术股份有限公司 Method for synchronizing time, device, medium based on RFID and electronic equipment
CN110942477A (en) * 2019-11-21 2020-03-31 大连理工大学 Method for depth map fusion by using binocular camera and laser radar
CN111538032A (en) * 2020-05-19 2020-08-14 北京数字绿土科技有限公司 Time synchronization method and device based on independent drawing tracks of camera and laser radar
CN111563487A (en) * 2020-07-14 2020-08-21 平安国际智慧城市科技股份有限公司 Dance scoring method based on gesture recognition model and related equipment
WO2022034245A1 (en) * 2020-08-14 2022-02-17 Topgolf Sweden Ab Motion based pre-processing of two-dimensional image data prior to three-dimensional object tracking with virtual time synchronization
CN114519845A (en) * 2020-10-30 2022-05-20 北京万集科技股份有限公司 Multi-sensing data fusion method and device, computer equipment and storage medium
CN112580683A (en) * 2020-11-17 2021-03-30 中山大学 Multi-sensor data time alignment system and method based on cross correlation
CN112712051A (en) * 2021-01-12 2021-04-27 腾讯科技(深圳)有限公司 Object tracking method and device, computer equipment and storage medium
CN113848696A (en) * 2021-09-15 2021-12-28 北京易航远智科技有限公司 Multi-sensor time synchronization method based on position information
CN114185059A (en) * 2021-11-08 2022-03-15 哈尔滨工业大学(威海) Multi-radar fusion-based multi-person tracking system, method, medium and terminal
CN114139370A (en) * 2021-11-29 2022-03-04 上海无线电设备研究所 Synchronous simulation method and system for optical engine and electromagnetic imaging dual-mode moving target
CN114357019A (en) * 2021-12-03 2022-04-15 同济大学 Method for monitoring data quality of road side sensing unit in intelligent networking environment
CN114926808A (en) * 2022-03-30 2022-08-19 吉林大学 Target detection and tracking method based on sensor fusion
CN114520920A (en) * 2022-04-15 2022-05-20 北京凯利时科技有限公司 Multi-machine-position video synchronization method and system and computer program product
CN115144828A (en) * 2022-07-05 2022-10-04 同济大学 Automatic online calibration method for intelligent automobile multi-sensor space-time fusion
CN115294168A (en) * 2022-07-12 2022-11-04 天翼云科技有限公司 Target tracking method and device and electronic equipment
CN115657664A (en) * 2022-09-30 2023-01-31 华南农业大学 Path planning method, system, equipment and medium based on human teaching learning
CN115562499A (en) * 2022-11-16 2023-01-03 深圳市未来感知科技有限公司 Intelligent ring-based accurate interaction control method and system and storage medium
CN115655262A (en) * 2022-12-26 2023-01-31 广东省科学院智能制造研究所 Deep learning perception-based multi-level semantic map construction method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JUNHUA WANG 等: "Realtime wide-area vehicle trajectory tracking using millimeter-wave radar sensors and the open TJRD TS dataset", 《INTERNATIONAL JOURNAL OF TRANSPORTATION SCIENCE AND TECHNOLOGY》, pages 273 - 290 *
张恒: "基于激光雷达与深度相机融合的SLAM技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2021, no. 1, pages 136 - 1131 *
王雪 等: "基于运动目标三维轨迹重建的视频序列同步算法", 《自动化学报》, vol. 43, no. 10, pages 1759 - 1772 *
王雪;SHI JIAN-BO;PARK HYUN-SOO;王庆;: "基于运动目标三维轨迹重建的视频序列同步算法", 自动化学报, no. 10 *

Also Published As

Publication number Publication date
CN115994934B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN112014857B (en) Three-dimensional laser radar positioning and navigation method for intelligent inspection and inspection robot
CN110988912B (en) Road target and distance detection method, system and device for automatic driving vehicle
JP7430277B2 (en) Obstacle detection method and apparatus, computer device, and computer program
EP3581890B1 (en) Method and device for positioning
US10970871B2 (en) Estimating two-dimensional object bounding box information based on bird's-eye view point cloud
CN112149550B (en) Automatic driving vehicle 3D target detection method based on multi-sensor fusion
WO2019092418A1 (en) Method of computer vision based localisation and navigation and system for performing the same
JP4874607B2 (en) Object positioning device
CN115049700A (en) Target detection method and device
US11430199B2 (en) Feature recognition assisted super-resolution method
CN112991391A (en) Vehicle detection and tracking method based on radar signal and vision fusion
CN111913177A (en) Method and device for detecting target object and storage medium
CN115861968A (en) Dynamic obstacle removing method based on real-time point cloud data
CN115900712A (en) Information source reliability evaluation combined positioning method
CN113012215A (en) Method, system and equipment for space positioning
CN114898314A (en) Target detection method, device and equipment for driving scene and storage medium
CN106558069A (en) A kind of method for tracking target and system based under video monitoring
Du et al. Particle filter based object tracking of 3D sparse point clouds for autopilot
CN117808689A (en) Depth complement method based on fusion of millimeter wave radar and camera
Notz et al. Extraction and assessment of naturalistic human driving trajectories from infrastructure camera and radar sensors
CN117152949A (en) Traffic event identification method and system based on unmanned aerial vehicle
CN116665179A (en) Data processing method, device, domain controller and storage medium
CN116862832A (en) Three-dimensional live-action model-based operator positioning method
CN115994934B (en) Data time alignment method and device and domain controller
CN116403191A (en) Three-dimensional vehicle tracking method and device based on monocular vision and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant