CN116380132B

CN116380132B - Sensor time offset processing method, device, equipment, vehicle and medium

Info

Publication number: CN116380132B
Application number: CN202310658110.6A
Authority: CN
Inventors: 刘卓昊; 陆岩
Original assignee: Shanghai Yunji Yuedong Intelligent Technology Development Co ltd
Current assignee: Shanghai Yunji Yuedong Intelligent Technology Development Co ltd
Priority date: 2023-06-06
Filing date: 2023-06-06
Publication date: 2023-08-22
Anticipated expiration: 2043-06-06
Also published as: CN116380132A

Abstract

The application provides a sensor time offset processing method, a device, equipment, a vehicle and a medium, wherein the method is used for determining the time offset between a pose sensor and a vision sensor which are arranged on the same carrier, and comprises the following steps: acquiring at least two frames of images in a data packet, wherein the data packet comprises a pose track of a carrier which is measured by a pose sensor and is offset in time and images captured by a vision sensor; respectively determining matched target feature points in at least two frames of images, wherein the target feature points comprise corresponding pixel points on the images aiming at the same space point in the at least two frames of images; according to the pose track, pose information of the visual sensor when capturing each frame of image is determined; determining polar errors of at least two frames of images according to pose information and target feature points; and determining the time offset of the vision sensor relative to the pose sensor according to the polar error. According to the scheme, the time offset estimation process is optimized, and the time offset estimation accuracy can be improved conveniently.

Description

Sensor time offset processing method, device, equipment, vehicle and medium

Technical Field

The application relates to the technical field of sensors and automatic driving, in particular to a method, a device, equipment, a vehicle and a medium for processing sensor time offset.

Background

In the multi-sensor system, due to different sensor functions, models and the like, a time offset phenomenon is easy to occur, so that the time offset of the sensor needs to be estimated, the time consistency of a sensor device is improved, the multi-sensor information is better fused, and the accuracy of the multi-sensor system is improved.

In the prior art, the method for processing the time offset among the multiple sensors mainly adopts an accurate hardware synchronization method, and mainly utilizes trigger signals to carry out hardware synchronization so as to realize the processing of the time offset among the multiple sensors.

However, by adopting a hardware synchronization mode, the time stamp of the sensor is affected by different clocks, trigger mechanisms, processing mechanisms, transmission and processing delays, so that accurate calibration is difficult, and the time consistency of the sensor device cannot be effectively improved.

Disclosure of Invention

The application provides a sensor time offset processing method, device, equipment, a vehicle and a medium, which are used for solving the problem that the existing time offset determination of a sensor is inaccurate.

In a first aspect, an embodiment of the present application provides a method for processing a sensor time offset, for determining a time offset between a pose sensor and a vision sensor disposed on the same carrier, the method including:

acquiring at least two frames of images in a data packet, wherein the data packet comprises a pose track of the carrier, which is measured by the pose sensor, and an image captured by the vision sensor, which are offset in time;

respectively determining matched target feature points in the at least two frames of images, wherein the target feature points comprise pixel points corresponding to the same space point on the images in the at least two frames of images;

according to the pose track, pose information of the visual sensor when capturing each frame of image is determined;

determining polar errors of the at least two frames of images according to the pose information and the target feature points;

and determining the time offset of the vision sensor relative to the pose sensor according to the polar error.

In one possible design of the first aspect, the determining the matched target feature points in the at least two frames of images includes:

respectively extracting features of a first matching image and a second matching image in the at least two frames of images, and determining feature points in the first matching image and feature points in the second matching image;

And performing feature matching on the feature points in the first matching image and the feature points in the second matching image to determine target feature points.

In another possible design of the first aspect, the feature matching the feature point in the first matching image with the feature point in the second matching image, to determine a target feature point, includes:

dynamic object filtering is carried out on a first matching image and a second matching image in the at least two frames of images, and matching feature points matched with each other in the first matching image and the second matching image are obtained;

and filtering the paired feature points to obtain target feature points.

In still another possible design of the first aspect, the feature matching the feature point in the first matching image with the feature point in the second matching image, to determine a target feature point, includes:

performing feature matching on each feature point in the first matching image and all feature points in the second matching image respectively to determine mutually matched first matching feature points, wherein the first matching feature points comprise first feature points in the first matching image and second feature points matched with the first feature points in the second matching image;

Each feature point in the second matching image is respectively matched with all feature points in the first matching image, and second matched feature points are determined, wherein each second matched feature point comprises a third feature point in the second matching image and a fourth feature point matched with the third feature point in the first matching image;

if the first pairing feature point and the second pairing feature point are the same, the first pairing feature point or the second pairing feature point is used as the pairing feature point;

and filtering the paired feature points to obtain target feature points.

In yet another possible design of the first aspect, the feature matching the feature point in the first matching image with the feature point in the second matching image, to determine a target feature point, includes:

acquiring adjacent images in the data packet, wherein the adjacent images are images adjacent in time to a first matching image and/or a second matching image in the at least two frames of images;

and carrying out multi-frame re-projection filtering on matched characteristic points matched with each other in the first matched image and the second matched image based on the adjacent image to obtain the target characteristic points.

In yet another possible design of the first aspect, the performing multi-frame re-projection filtering on the paired feature points matched with each other in the first matching image and the second matching image based on the adjacent image to obtain the target feature point includes:

acquiring first pose information of the vision sensor when the first matching image is captured, second pose information of the vision sensor when the second matching image is captured, and third pose information of the vision sensor when the adjacent image is captured;

repositioning the spatial point in the space according to the first pose information, the second pose information, the third pose information, the image plane of the adjacent image, the image plane of the first matching image, and the image plane of the second matching image;

acquiring pixel points corresponding to the relocated space points on the first matching image and pixel points corresponding to the relocated space points on the second matching image;

calculating a first distance between a pixel point corresponding to the relocated space point on the first matching image and a matching characteristic point in the first matching image, and a second distance between a pixel point corresponding to the relocated space point on the second matching image and a matching characteristic point in the second matching image;

And if the first distance and the second distance are smaller than a preset distance threshold, determining the pairing feature point as a target feature point.

In yet another possible design of the first aspect, the acquiring at least two frames of images in the data packet includes:

determining a desired time interval according to the motion rate and/or the frame rate of the vision sensor arranged on the same carrier;

and aiming at a first matched image in the data packet, acquiring a second matched image with a time stamp which is spaced from the first matched image by the expected time interval from the data packet.

In yet another possible design of the first aspect, the determining a time offset of the vision sensor relative to the pose sensor based on the polar error includes:

acquiring time offsets corresponding to N different polar line errors of the vision sensor, wherein N is a positive integer greater than 1;

and determining the time offset corresponding to the minimum polar line error in the N polar line errors according to the time offsets corresponding to the N different polar line errors of the visual sensor, and taking the time offset as the time offset of the visual sensor.

In yet another possible design of the first aspect, the method further comprises:

And according to the time offset, performing time offset compensation on the visual sensor.

In a second aspect, an embodiment of the present application provides a processing apparatus for sensor time offset, including:

the data acquisition module is used for acquiring at least two frames of images in a data packet, wherein the data packet comprises a pose track of a carrier which is measured by a pose sensor and is offset in time and an image captured by a visual sensor;

the characteristic point determining module is used for determining matched target characteristic points in the at least two frames of images respectively, wherein the target characteristic points comprise pixel points aiming at the same space point in the at least two frames of images;

the pose determining module is used for determining pose information of the visual sensor when capturing each frame of image according to the pose track;

the error determining module is used for determining polar errors of the at least two frames of images according to the pose information and the target feature points;

and the offset determining module is used for determining the time offset of the vision sensor relative to the pose sensor according to the polar error.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the method as described above.

In a fourth aspect, an embodiment of the present application provides a vehicle, at least including a pose sensor, a vision sensor, and a processor, where the pose sensor and the vision sensor are connected to the processor, and the processor is configured to implement the above method.

In a fifth aspect, embodiments of the present application provide a computer readable storage medium having stored therein computer instructions which, when executed by a processor, are adapted to carry out a method as described above.

According to the sensor time offset processing method, device, equipment, vehicle and medium provided by the embodiment of the application, the corresponding polar errors are obtained by utilizing the correlation between polar errors and time offsets and using the pose tracks and images which are offset in time, and statistical analysis is carried out on the polar errors, so that the real time offset between the visual sensor and the pose sensor is estimated, the whole time offset estimation process is optimized, the time offset calculation can be carried out in a natural scene, and the accuracy of the time offset calculation result is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application;

FIG. 1 is a schematic illustration of an epipolar line according to one embodiment of the present application;

FIG. 2 is a schematic diagram of polar error provided by an embodiment of the present application;

fig. 3 is a schematic view of a scenario of a method for processing sensor time offset according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for processing a sensor time offset according to an embodiment of the present application;

FIG. 5 is a graph showing the relationship between epipolar line error and time shift according to the present application;

FIG. 6 is a schematic diagram illustrating filtering of paired feature points according to an embodiment of the present application;

FIG. 7 is a flowchart illustrating a time offset processing method according to another embodiment of the present application;

FIG. 8 is a schematic flow chart of feature matching according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a sensor time offset processing device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the multi-sensor system, a time offset phenomenon easily occurs between different sensors due to differences in functions, models, etc. between each sensor. For example, the time of the sensor a is taken as the reference time, and the time of the sensor B is deviated from the reference time, so that the time deviation phenomenon can cause that the information of each sensor in the multi-sensor system is not aligned, and the fusion result of the multi-sensors is easy to have errors. To solve the time offset phenomenon of the multi-sensor system, three methods are mainly provided at present: (1) accurate hardware synchronization. Accurate hardware synchronization is realized by accurately controlling sensor triggering and actual acquisition time, and depending on an open interface and specifications of a hardware manufacturer. (2) The sensor data is visually compared. The method is an intuitive means for visualizing the sensor data, but is difficult to analyze and quantify the whole data, and is influenced by factors such as data acquisition, calibration and the like of the sensor. (3) Calibrating a calibration scene. The sensor time offset calculation under the calibration scene needs a calibration plate, is difficult to be used in natural scenes, and in practical application, the sensor is affected by surrounding environments such as illumination, and the calculation result of the calibration scene is not completely consistent with the calculation result of the calibration scene during use.

In view of the above problems, the present application provides a method, apparatus, device, vehicle, and medium for processing a sensor time offset, which can optimize the accuracy of calculation of the time offset. Specifically, taking a visual sensor and a pose sensor on the same carrier as an example, by utilizing the correlation between polar errors (the polar errors are explained later in the application) and time offsets, using pose tracks (the pose tracks are acquired by the pose sensor) and images (the images are acquired by the visual sensor), calculating to obtain corresponding polar errors, and carrying out statistical analysis on the polar errors to estimate the real time offsets between the visual sensor and the pose sensor, so that the time offset calculation can be carried out very conveniently in a natural scene, and the calculation accuracy of the time offsets is improved.

Wherein, the nouns referred to in the technical scheme of the application need to be interpreted:

polar error: refers to the distance from the target feature point in the image to the epipolar line in the image.

An electrode line: refers to the intersection of a polar plane formed by the target point in space and at least two pose points with the image plane of the image.

For example, fig. 1 is a schematic polar diagram provided in the embodiment of the present application, as shown in fig. 1, for a pose sensor and a vision sensor on the same carrier, when the carrier is moving, the pose point of the vision sensor at time t1 is O1, and the pose point of the vision sensor at time t2 is O2. The image plane of the vision sensor at the time t1 is I1, and the image plane of the vision sensor at the time t2 is I2. For a point P in space, the corresponding pixels on the image plane I1 and the image plane I2 are P1 and P2. The plane PO1O2 forms an epipolar plane, and the line O1O2 intersects with the image plane I1 and the image plane I2 at poles (epipoles) e1 and e2, respectively. At this time, the intersecting line of the polar plane and the image plane is called a polar line, i.e., the p1e1 line and the p2e2 line in fig. 1, and are denoted by line segments L1 and L2, respectively, in fig. 1.

For example, fig. 2 is a schematic diagram of an epipolar error provided by an embodiment of the present application, and when there is no time offset between multiple sensors, referring to fig. 1, the pixel point P1 and the pixel point P2 should fall on the epipolar lines L1 and L2, respectively. However, when there is a time shift (for example, there is a time shift between the vision sensor and the pose sensor), as shown in fig. 2, the position O1 at the original time t1 shifts to O1', which results in a certain error distance from the pixel point P1 and the pixel point P2 to the corresponding line (the line corresponding to the pixel point P1 is L1, and the line corresponding to the pixel point P2 is L2') in fig. 2, and this error distance is called a line error. The overall error (including image time offset, internal and external parameter calibration, pose track and the like) of the current multi-sensor system can be reflected through polar line errors, and under the condition of control variables, the correlation between polar line errors and time offset can be found through data analysis on polar line errors and time offset.

The technical scheme of the application is described in detail through specific embodiments. It should be noted that the following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.

Fig. 3 is a schematic view of a scenario of a sensor time offset processing method according to an embodiment of the present application, as shown in fig. 3, a pose sensor and a vision sensor may be integrated on a multi-sensor system 301, taking a carrier as an example of a vehicle 30, where the vehicle may be a vehicle with an intelligent driving system and may implement automatic driving. With the time of the pose sensor as the reference time, the time of the vision sensor is offset from the reference time, when the vehicle 30 runs, the vision sensor can capture images of the static obstacle T in front, multiple frames of images at different moments are acquired, and the pose sensor can record corresponding pose tracks. According to the application, based on the pose track and the acquired multi-frame image, the time offset of the visual sensor relative to the pose sensor can be determined, so that the visual sensor is corrected, and the consistency of the visual sensor and the pose sensor in time is ensured.

In addition, in other embodiments, the multi-sensor system may further include other sensors besides the vision sensor and the pose sensor, for example, millimeter wave radar, etc., and the method provided by the present application may also be applied to determination and correction of time offset between other sensors. The description of the examples is not repeated here.

Example 1

Fig. 4 is a flowchart of a method for processing a sensor time offset according to an embodiment of the present application, where the method may be applied to the vehicle, as shown in fig. 4, and the method may specifically include the following steps: step S401, at least two frames of images in a data packet are acquired, where the data packet includes a pose track of the carrier that is measured by the pose sensor and an image captured by the vision sensor that are offset in time.

In this embodiment, there is a time offset between the pose sensor and the vision sensor, i.e., the data acquired by the two are not synchronized in time. Illustratively, with the time at which the pose sensor collects data as a reference time, the vision sensor is considered to have a time offset. In other embodiments, the time when the vision sensor collects data may be used as the reference time, and the pose sensor may be regarded as having a time offset.

The pose sensor can determine corresponding pose points at different moments to form a pose track. Meanwhile, the vision sensor can capture images at each pose point and acquire an image frame of the pose point. For example, when there is no time offset phenomenon, the POSE point corresponding to the time t1 is POSE1, the Image frame collected by the vision sensor is Image1, the POSE point corresponding to the time t2 is POSE2, and the Image frame collected by the vision sensor is Image2. In practice, under the influence of software and hardware, if there is a time offset, the acquired POSE track and the image frame have deviations, for example, taking the time offset of the visual sensor as an example, the POSE sensor is the time t1 corresponding to the POSE point POSE1, and the time offset is actually added to the time of the visual sensor.

The carrier may be, for example, a vehicle as described above, or other self-propelled device such as a boat, an airplane, a space vehicle, a robot (including a surgical robot, a warehouse handling robot, a sweeping robot, etc.), or the like. The vision sensor may comprise a monocular camera, a binocular camera, etc., and the image may comprise a 2D image, a 3D image, etc., respectively. The pose sensor may include any sensor capable of determining the pose of the carrier, such as an inertial measurement unit (Inertial Measurement Unit, IMU), imu+global positioning system (Global Positioning System, gps), imu+real-time kinematic (RTK), and the like. When the pose sensor and the vision sensor are time-synchronized, the vehicle can automatically run based on the information acquired by the pose sensor and the vision sensor.

In this embodiment, the data packet may be data acquired in real time, and by calculating the time offset between the pose sensor and the vision sensor, the time of the pose sensor and the vision sensor may be calibrated in real time. Of course, the data packet may also include collected historical data, and by calculating a time offset between the pose sensor and the vision sensor, time calibration post-processing of the collected historical data of the pose sensor and the vision sensor may be implemented.

Step S402, respectively determining matched target feature points in at least two frames of images. The target feature points comprise pixel points corresponding to the same space point on the image in at least two frames of images.

In this embodiment, taking a vehicle as a carrier of a POSE sensor and a vision sensor as an example, during a running process of the vehicle, at least one frame of Image frame, for example, a POSE point POSE1, can be captured correspondingly by the vision sensor at each POSE point, and the corresponding Image frame is Image1. It should be noted that, at least two frames of images mentioned above include at least image frames captured by the vision sensor at two different POSE points respectively, for example, at least one frame of image is an image frame corresponding to POSE point POSE1, and another frame of image is an image frame corresponding to POSE point POSE 2.

In the running process of the vehicle, a certain point P in the space corresponds to different pixel points on image frames corresponding to different pose points. For example, referring to fig. 1, the image frame corresponding to the pose point O1 is I1, the image frame corresponding to the pose point O2 is I2, the pixel point corresponding to a certain point P in space on the image frame I1 is P1, and the pixel point corresponding to the image frame I2 is P2.

It will be appreciated that in this embodiment, there may be a plurality of feature points (which may be regarded as pixel points) on each frame of Image, for example, feature points C1, C2, cn (n is a positive integer) are included in Image 1. Likewise, several feature points, such as D1, D2, dn, may also be included in Image 2. Taking at least two frames of images as Image1 and Image2 as an example, the matched target feature points in Image1 and Image2 refer to pixel points corresponding to a certain point P in space on Image1 and Image2 respectively. For example, if a pixel point corresponding to a certain point P in the space on Image1 is the feature point C1 and a pixel point corresponding to the certain point P on Image2 is the feature point D1, the feature point C1 and the feature point D1 are a pair of matched target feature points. If the pixel point corresponding to a certain point P in the space on Image1 is the feature point C1, but the point P does not have a corresponding pixel point on Image2, the feature point C1 does not have a matching feature point, and the target feature point cannot be obtained.

For example, when determining the target feature point, the method may determine by comparing frame by frame, that is, first determining a pixel point X1 of a certain point P in a space in one frame of image, then finding another frame of image adjacent to the frame of image, determining whether a pixel point X2 exists in the another frame of image at the certain point P in the space, and if so, determining that the pixel point X1 and the pixel point X2 are the matched target feature points.

Step S403, according to the pose track, pose information of the visual sensor when capturing each frame of image is determined.

In this embodiment, the pose track is generated based on information captured by a pose sensor having a corresponding timestamp at each pose point. For example, the pose information may include a pose of the pose point and a timestamp. The vision sensor may also have a corresponding time stamp when capturing each frame of image. It should be noted that, the time offset may cause that the time stamp of the pose point and the time stamp of the captured Image cannot be matched, for example, when the visual sensor captures the Image1 at the pose point O1, the corresponding time stamp recorded by the visual sensor is t1, at this time, the pose point determined from the pose track is O1' according to the time stamp t1 recorded by the visual sensor, and if the visual sensor has no time offset, the pose point determined from the pose track should be O1.

When pose information of the visual sensor when capturing each frame of image is determined, a time stamp of the visual sensor when capturing the frame of image can be acquired first, and then corresponding pose information is acquired from a pose track based on the time stamp.

And step S404, determining polar errors of at least two frames of images according to the pose information and the target feature points.

In this embodiment, taking at least two frames of images including a first frame of image and a second frame of image as an example, the epipolar line in the second frame of image can be determined by the F matrix (Fundamental Matrix) and the target feature point in the first frame of image. Similarly, the polar lines in the first frame image can also be determined by the F matrix (Fundamental Matrix) and the target feature points in the second frame image. And then calculating the polar line distance from the target feature point in the first frame image to the first frame image and the polar line distance from the target feature point in the second frame image to the second frame image, and obtaining polar line errors after averaging.

For example, the polar error calculation formula may be as follows:

in the above expression, epi_err represents a polar line error, dis1 represents a polar line distance from a target feature point in the first frame image to the first frame image, and dis2 represents a polar line distance from a target feature point in the second frame image to the second frame image.

For example, with continued reference to fig. 2, the pose point O1' of the vision sensor when capturing the first frame image I1, the pose point O2 when capturing the second frame image I2, and the target feature point in the image I1 is P1 and the target feature point in the image I2 is P2 are taken as an example. Based on the pose point O1', the pose point O2, and the spatial point P, the epipolar line L2' can be determined. The polar error is the distance from P2 to polar L2'.

Step S405, determining a time offset of the vision sensor relative to the pose sensor according to the polar error.

In this embodiment, there is a correlation between the polar error and the time offset, and when the polar error is minimal, the time offset of the vision sensor relative to the pose sensor is closest to the true value. Based on the method, mass data can be counted, a two-dimensional coordinate system of polar errors corresponding to time offset is constructed, then a relation curve between the time offset and the polar errors is fitted based on points in the two-dimensional coordinate system, and then a corresponding time offset with the minimum polar error value is selected based on the relation curve, wherein the time offset can be used as a time offset true value of a vision sensor relative to a pose sensor.

In addition, in other embodiments, after calculating the time offset of the vision sensor with respect to the pose sensor, the time offset compensation may be performed on the vision sensor based on the time offset.

Fig. 5 is a schematic diagram of a relationship between polar error and time offset, where, as shown in fig. 5, the time offset corresponding to the minimum polar error value is 0.1 seconds, and the time offset may be used as the time offset of the vision sensor relative to the pose sensor. The time of the visual sensor can be adjusted at this time based on the time offset. Specifically, when the time shift compensation is performed on the visual sensor, taking an example that the time shift corresponding to the time when the polar error is the smallest value is 0.1 seconds, if the time of the visual sensor when the time shift compensation is not performed is t3, the time of the visual sensor may be adjusted to t3+0.1 as the time after the time shift compensation is performed on the visual sensor. Therefore, the consistency of the visual sensor and the pose sensor in time is realized, the effective fusion of the information of each sensor in the multi-sensor system is ensured, and the accuracy of the multi-sensor system is improved.

According to the embodiment of the application, the correlation between the polar errors and the time offset is utilized, the pose track and the image which are offset in time are utilized to obtain the corresponding polar errors, and the statistical analysis is carried out on the polar errors to estimate the real time offset of the visual sensor, so that the whole time offset estimation process is optimized, the time offset calculation can be carried out in a natural scene, and the time offset estimation accuracy is improved.

Example two

In this embodiment, the step S402 may be specifically implemented by the following steps: respectively extracting features of a first matching image and a second matching image in at least two frames of images, and determining feature points in the first matching image and feature points in the second matching image; and performing feature matching on the feature points in the first matching image and the feature points in the second matching image to determine target feature points.

In this embodiment, the first matching image and the second matching image need to be determined in at least two frames of images. For example, two frames of images that are temporally adjacent may be selected as the first matching image and the second matching image, respectively.

In this embodiment, in the feature extraction, a Scale-invariant feature transform (Scale-invariant feature transform, SIFT), oriented FAST and Rotated BRIEF (ORB) method, or a deep learning-based feature extraction (super point), feature matching (super glue) method may be used, for example, the SIFT method is used to detect and describe features of a picture, find extremum points in a spatial Scale, and extract positions, scales (gradient magnitudes), and rotation invariants (directions). The flow steps are as follows:

Step A1: and carrying out Gaussian blur to obtain pictures with different ambiguities. Step A2: and (5) performing direct downsampling to obtain a multi-resolution picture. Step A3: and subtracting the Gaussian blurred pictures to obtain a differential pyramid. Step A4: and comparing the upper pyramid with the lower pyramid of the differential pyramid to obtain an extreme point. Step A5: using taylor expansion, an accurate extreme point is obtained. Step A6: using the herrian formula, boundary points are eliminated by feature vector variation. Step A7: and calculating the gradient size and the gradient direction of the feature points by using a sobel operator. Step A8: and counting gradient directions of adjacent positions of the feature points, making a histogram, and solving a main direction of the feature points. Step A9: the rotation is performed corresponding to the main direction of the feature, and the direction of the feature point size is kept unchanged. Step A10: counting feature points of the domain, according to the number of 4*4, 8 directions are generated for each region, that is, the number of occurrences of each direction is used as features, and 16 regions, that is, 16×8=128 features are generally used.

In this embodiment, after feature extraction of the first matching image and the second matching image is completed and feature points in the first matching image and the second matching image are determined, feature points in the two frames of images may be compared one by one. For example, each feature point in the first matching image is traversed and compared with all feature points in the second matching image, and if the feature point N1 in the first matching image is found to match a certain feature point M1 in the second matching image, the feature point N1 and the feature point M1 can be used as target feature points.

According to the embodiment of the application, the first matching image and the second matching image are subjected to feature extraction and feature matching, the target feature points in the first matching image and the second matching image are found, and the polar errors can be determined based on the target feature points, so that the time offset can be conveniently and rapidly determined, the method and the device can be applied to various natural scenes in daily life, the time offset of the sensor is not required to be calculated under a specific calibration scene, the problem of inconsistent calculation results of the calibration scene and the actual use is avoided, and the calculation accuracy of the time offset is further improved.

Example III

Further, on the basis of the second embodiment, in this embodiment, when performing feature matching, the method may specifically be implemented by the following steps: filtering dynamic objects of a first matching image and a second matching image in at least two frames of images to obtain matched feature points matched with each other in the first matching image and the second matching image; and filtering the paired feature points to obtain target feature points.

In this embodiment, since the dynamic object itself moves, the imaging points (i.e., pixel points) of the same spatial point of the dynamic object in the adjacent image frames include not only the epipolar errors but also the displacements of the dynamic object, and therefore, the target feature points need to be feature points on the static object. When the carrier of the vision sensor is a vehicle, the first matching image and the second matching image captured by the vision sensor in the motion process of the vehicle need to be subjected to dynamic object filtering. By taking the first matching image and the second matching image as two images adjacent in time as an example, by analyzing the two images at adjacent moments, it can be determined which part of the images is a dynamic object and which part is a static object, so that dynamic object filtering can be performed, the number of feature points contained in the first matching image and the second matching image is reduced, and matching feature points matched with each other in the first matching image and the second matching image can be found more efficiently.

In this embodiment, the pairing feature point may refer to a certain pixel point on one of the static objects in the first matching image and the second matching image. For example, if a certain pixel point of the static object exists in the first matching image, and the pixel point also exists in the second matching image, the pixel point can be used as a pairing feature point. In addition, in other embodiments, when the static object includes a plurality of paired feature points, further filtering and screening may be performed on the paired feature points, and a more stable paired feature point may be selected as the target feature point.

According to the embodiment of the application, the first matching image and the second matching image are subjected to dynamic object filtering, so that the interference of dynamic objects in the images is avoided, more stable and accurate target characteristic points can be screened, and the accuracy of subsequent polar error calculation is further improved.

Example IV

Further, on the basis of the second embodiment, in this embodiment, the determination of the target feature point may also be achieved by: performing feature matching on each feature point in the first matching image and all feature points in the second matching image respectively to determine mutually matched first matching feature points, wherein the first matching feature points comprise first feature points in the first matching image and second feature points matched with the first feature points in the second matching image; each feature point in the second matching image is respectively matched with all feature points in the first matching image, and second matched feature points are determined, wherein each second matched feature point comprises a third feature point in the second matching image and a fourth feature point matched with the third feature point in the first matching image; if the first pairing feature point and the second pairing feature point are the same, the first pairing feature point or the second pairing feature point is used as the pairing feature point; and filtering the paired feature points to obtain target feature points.

For example, taking the example that the first matching image includes the feature points N2 and N3, and the second matching image includes the feature points M2 and M3, the feature points N2 and N3 in the first matching image may be traversed first, and the feature point N2 may be compared with all the feature points (that is, including the feature points M2 and M3) in the second matching image first, and if the feature point N2 matches the feature point M2, the first matching feature point is used as the first matching feature point. If the feature point N3 in the first matching image does not match all the feature points in the second matching image, the feature point N3 cannot be the first pairing feature point. After that, the feature points M2 and M3 in the second matching image are traversed, the feature point M2 may be compared with all the feature points in the first matching image (i.e. including the feature points N2 and N3), and if the feature point M2 matches the feature point N2, the feature point M2 and the feature point N2 may be used as second matching feature points. If the feature point M3 matches the feature point N3, the feature point M3 and the feature point N3 may also be the second paired feature points.

For convenience of distinction, a first paired feature point including the feature point N2 and the feature point M2 may be referred to as a first feature pair (N2, M2), a second paired feature point including the feature point M2 and the feature point N2 may be referred to as a second feature pair (N2, M2), and a second paired feature point including the feature point M3 and the feature point N3 may be referred to as a third feature pair (N3, M3). By contrast, the first feature pair (N2, M2) is found to be identical to the second feature pair (N2, M2), and the feature points N2 and M2 can be regarded as the paired feature points described above.

According to the embodiment of the application, the characteristic points contained in the first matching image are traversed and compared with all the characteristic points in the second matching image, then the characteristic points contained in the second matching image are traversed and compared with all the characteristic points in the first matching image, and finally cross verification is carried out, so that more accurate and stable paired characteristic points can be obtained, and the calculation accuracy of the follow-up polar errors is further improved.

Example five

On the basis of the above embodiment, in this embodiment, the determination of the target feature point may be achieved by the following steps, specifically: acquiring adjacent images in the data packet, wherein the adjacent images are images adjacent to a first matching image and/or a second matching image in at least two frames of images in time; and carrying out multi-frame re-projection filtering on matched feature points matched with each other in the first matched image and the second matched image based on the adjacent images to obtain target feature points.

In this embodiment, the neighboring image may be one or more frames, for example, there is an image ImageA that is adjacent in time to the first matching image, and there is an image ImageB that is adjacent in time to the second matching image, and then both the image ImageA and the image ImageB may be used as neighboring images.

For example, when the paired feature points are filtered by multi-frame re-projection, it may be determined whether the paired feature points are included in the neighboring image, and if the paired feature points are also included in the neighboring image, the paired feature points may be used as target feature points.

Further, when the number of neighboring images is two or more frames, it may be determined whether the paired feature point is the target feature point according to the number of neighboring images including the paired feature point. For example, when the neighboring images are three frames (including image ImageA, image ImageB, and image ImageC, for example), if two neighboring images (including image ImageA and image ImageB, for example) contain the pairing feature point, and the other neighboring image (including image ImageC, for example) does not contain the pairing feature point, the pairing feature point can still be used as the target feature point. If there are two adjacent images (including image ImageA and image ImageB, for example) that do not include the pairing feature point, only the pairing feature point is included in the other adjacent image (including image ImageC, for example), then the pairing feature point cannot be used as the target feature point. The number threshold of the neighboring images including the pairing feature points may be configured according to actual situations to determine whether the pairing feature points are target feature points, and the number is not limited here.

For example, in other embodiments, a distance threshold may be preset, and whether the pairing feature point is the target feature point may be determined through the distance threshold. The method comprises the following steps: acquiring first pose information of a visual sensor when capturing a first matching image, second pose information of the visual sensor when capturing a second matching image, and third pose information of the visual sensor when capturing an adjacent image; repositioning a spatial point in space according to the first pose information, the second pose information, the third pose information, the image plane of the adjacent image, the image plane of the first matching image, and the image plane of the second matching image; acquiring pixel points corresponding to the relocated space points on the first matching image and pixel points corresponding to the relocated space points on the second matching image; calculating a first distance between a pixel point corresponding to the repositioned spatial point on the first matching image and a matching characteristic point in the first matching image, and a second distance between a pixel point corresponding to the repositioned spatial point on the second matching image and a matching characteristic point in the second matching image; and if the first distance and the second distance are smaller than the preset distance threshold, determining the pairing feature point as the target feature point.

For example, fig. 6 is a schematic filtering diagram of pairing feature points provided in the embodiment of the present application, as shown in fig. 6, an initial position of a spatial point P is determined first, a point P1 on a first matching image I1 is used as a pairing feature point, and a point P2 on a second matching image I2 is also used as a pairing feature point. At this time, the spatial point P is repositioned by combining the neighboring image I3 and the third pose information O3, and the repositioned spatial point is determined as P'. At this time, the pixel point of the spatial point P ' on the first matching diagram I1 is P1', and the pixel point on the second matching diagram is P2'. The distance from P1 to P1 'may be calculated as a first distance and the distance from P2 to P2' may be calculated as a second distance, and if both the first distance and the second distance are smaller than a preset distance threshold, the pairing feature point P1 and the pairing feature point P2 may be determined as target feature points.

According to the embodiment of the application, the paired characteristic points are filtered again by adopting a multi-frame re-projection filtering method to obtain the target characteristic points, so that the stability and the accuracy of the target characteristic points can be ensured, and the accuracy in the follow-up calculation of the polar errors is further improved.

Example six

On the basis of the above embodiment, in this embodiment, when determining at least two frames of images in a data packet, the following steps may be specifically implemented: determining a desired time interval according to the motion rate and/or the frame rate of the vision sensor arranged on the same carrier; for a first matching image in the data packet, a second matching image with a time stamp spaced from the first matching image by a desired time interval is acquired from the data packet.

In this embodiment, when the carrier of the vision sensor is a vehicle, the movement speed of the vehicle can be regarded as the movement speed of the vision sensor in general. In addition, the frame rate of the vision sensor may be configured in connection with practical situations, such as 20 frames/second, 30 frames/second. Wherein the frame rate is positively correlated with the desired time interval and the motion rate is negatively correlated with the desired time interval, the higher the frame rate, the larger the desired time interval can be, which can reduce the number of images extracted from the data packet and reduce the amount of data. While if the speed is faster, the expected time interval needs to be smaller, so that the accuracy of the time offset calculation can be ensured. For example, a calculation formula for a desired time interval may be configured to take the motion rate and/or the frame rate as input to the calculation formula. For example, the desired time interval = (frame rate x n)/(motion rate x k). Wherein n and k are preset coefficients.

For example, the desired time interval may be set to be a times the frame interval (a is a preset coefficient), that is, an image is selected every few frames based on the desired time interval, to form an image matching pair. In addition, in other embodiments, the data packet may include tens of thousands of image frames, if the expected time interval is set to be too small, the amount of the selected image data will be too large, for this purpose, parameters of frame extraction may be set, and 1 group of image matching pairs may be set to be determined every n frames, so as to cope with the problem that the amount of the data is too large.

For example, any one of the at least two frames of the data packet may be selected as the first matching image, and the second matching image may be determined based on the desired time interval when the second matching image is determined. For example, if one image is selected at five frames per interval, then when the first matching image is the sixth frame image with a rank of 6, then the eleventh frame image with a rank of 11 is selected as the second matching image.

According to the embodiment of the application, the expected time interval is calculated by utilizing the motion rate and/or the frame rate, so that the first matching image and the second matching image are selected, and the situation that the polar error is inaccurate in the follow-up determination based on the target feature point due to the fact that the selected first matching image and second matching image are improper caused by the fact that the expected time interval is too large or too small is avoided.

Example seven

On the basis of the above embodiment, in this embodiment, the polar error may be determined by: acquiring time offsets corresponding to N different polar line errors of the vision sensor, wherein N is a positive integer greater than 1; and determining the time offset corresponding to the minimum polar line error in the N polar line errors according to the time offsets corresponding to the N different polar line errors of the vision sensor, and taking the time offset as the time offset of the vision sensor.

In this embodiment, data statistics may be performed to obtain N polar errors corresponding to the temporal offsets, and the polar errors are marked in a two-dimensional coordinate system (i.e., a temporal offset-polar error coordinate system), then a relationship curve between the temporal offset and the polar errors is fitted based on points marked in the two-dimensional coordinate system, and then a minimum polar error is selected based on the relationship curve, where the temporal offset corresponding to the minimum polar error may be used as an optimal temporal offset of the vision sensor. And the visual touch sensor is compensated by the method, so that the consistency of the visual sensor and the pose sensor in time is realized.

According to the embodiment of the application, the time offset corresponding to the minimum polar error is selected from the data by using the plurality of groups of data for statistical analysis and used as the time offset of the vision sensor, so that the accuracy of time offset calculation can be ensured.

Example eight

Fig. 7 is a flowchart of a time offset processing method according to another embodiment of the present application, as shown in fig. 7, the method specifically includes the following steps: step S701, an image is acquired. And step S702, extracting features. Step S703, feature matching. Step S704, six-degree-of-freedom pose of the device sensor. Step S705, sensor six degrees of freedom pose. Step S706, adjacent frames are relative to the pose. Step S707, polar error. Step S708, statistical analysis. Step S709, determining a sensor time offset.

In this embodiment, the output of the system includes a sensor (e.g., the visual sensor described above) whose time offset is to be measured and a reference image, and when the original sensor includes image information, the reference image may also be used as the reference image, and the final measured time offset is relative to the reference image. The processing of the reference image is divided into two steps of feature extraction and feature matching, feature detection can be performed by adopting methods such as SIFT, ORB and the like, or a super point method based on deep learning, taking the SIFT method as an example, the SIFT method is used for detecting and describing the features of the picture, and extreme points are found in a spatial scale, and positions, scales (gradient sizes) and rotation invariants (directions) are extracted.

In other embodiments, fig. 8 is a schematic flow chart of feature matching provided by the embodiment of the present application, as shown in fig. 8, and includes the following steps: step S801, adjacent frame image features. Step S802, dynamic object filtering. Step S803, l2_norm neighbor matching. Step S804, proportional test. Step S805, cross-validation. Step S806, geometric filtering. Step S807, track filtering. Step S808, stable feature matching.

The embodiment designs a robust adjacent frame feature matching method, firstly, dynamic object filtering is carried out based on image semantic detection information, then enumeration matching is carried out by utilizing L2_NORM similarity measurement, ratio test, cross verification and geometric filtering are carried out, stability of feature matching is further improved, and cross verification (cross check) ensures that descriptors of two groups of images can be matched with each other. And finally, performing track filter filtering, wherein the feature points are used for subsequent error calculation only when the feature points appear stably in multiple frames.

In this embodiment, the sensor to be measured needs to input a rough 6DoF pose estimation first, and the laser point cloud and the image can perform pose estimation by an odometer method. And after spline fitting is carried out on the poses to obtain the poses with continuous time, acquiring the sensor poses corresponding to the adjacent frames of the image.

Finally, according to the characteristics of the epipolar geometry (epipolar geometry), when the relative pose and internal and external parameters between two images are accurate, the corresponding characteristic points of the images should fall on the polar lines of the two images, in fact, due to the existence of time offset, the polar line errors can be changed, and the larger the time offset is, the more inaccurate the relative pose of the adjacent frames is, and the larger the corresponding polar line errors are. The technology of the invention carries out statistical analysis on the polar line characteristics obtained by matching characteristics and pose calculation of the adjacent frame images to obtain a time offset calculation result which minimizes polar line errors.

The embodiment of the application does not depend on specific internal triggering details of the sensor, does not need the assistance of a calibration plate and the like, and can be used for calculating the time offset of a natural scene. In the calculation flow, the feature matching of the adjacent frames is strictly filtered, so that the calculation result is stable and robust, and the calculation efficiency of time offset is ensured by the method based on polar error statistics.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Fig. 9 is a schematic structural diagram of a sensor time offset processing device provided in an embodiment of the present application, where the sensor time offset processing device may be integrated on a vehicle, and as shown in fig. 9, the sensor time offset processing device 900 may specifically include a data acquisition module 910, a feature point determining module 920, a pose determining module 930, an error determining module 940, and an offset determining module 950. The data acquisition module 910 is configured to acquire at least two frames of images in a data packet, where the data packet includes a pose track of a carrier that is measured by a pose sensor and an image captured by a vision sensor that is offset in time. The feature point determining module 920 is configured to determine matched target feature points in at least two frames of images, where the target feature points include pixel points for the same spatial point in at least two frames of images. The pose determination module 930 is configured to determine pose information of the vision sensor when capturing each frame of image according to the pose track. The error determining module 940 is configured to determine an epipolar error of at least two frames of images according to the pose information and the target feature points. The offset determination module 950 is configured to determine a time offset of the vision sensor relative to the pose sensor based on the epipolar error.

Optionally, the feature point determining module may specifically be configured to: respectively extracting features of a first matching image and a second matching image in at least two frames of images, and determining feature points in the first matching image and feature points in the second matching image; and performing feature matching on the feature points in the first matching image and the feature points in the second matching image to determine target feature points.

Optionally, the feature point determining module may specifically be configured to: filtering dynamic objects of a first matching image and a second matching image in at least two frames of images to obtain matched feature points matched with each other in the first matching image and the second matching image; and filtering the paired feature points to obtain target feature points.

Optionally, the feature point determining module may specifically be configured to: performing feature matching on each feature point in the first matching image and all feature points in the second matching image respectively to determine mutually matched first matching feature points, wherein the first matching feature points comprise first feature points in the first matching image and second feature points matched with the first feature points in the second matching image; each feature point in the second matching image is respectively matched with all feature points in the first matching image, and second matched feature points are determined, wherein each second matched feature point comprises a third feature point in the second matching image and a fourth feature point matched with the third feature point in the first matching image; if the first pairing feature point and the second pairing feature point are the same, the first pairing feature point or the second pairing feature point is used as the pairing feature point; and filtering the paired feature points to obtain target feature points.

Optionally, the feature point determining module may specifically be configured to: acquiring adjacent images in the data packet, wherein the adjacent images are images adjacent to a first matching image and/or a second matching image in at least two frames of images in time; and carrying out multi-frame re-projection filtering on matched feature points matched with each other in the first matched image and the second matched image based on the adjacent images to obtain target feature points.

Optionally, the feature point determining module may specifically be configured to: acquiring first pose information of a visual sensor when capturing a first matching image, second pose information of the visual sensor when capturing a second matching image, and third pose information of the visual sensor when capturing an adjacent image; repositioning a spatial point in space according to the first pose information, the second pose information, the third pose information, the image plane of the adjacent image, the image plane of the first matching image, and the image plane of the second matching image; acquiring pixel points corresponding to the relocated space points on the first matching image and pixel points corresponding to the relocated space points on the second matching image; calculating a first distance between a pixel point corresponding to the repositioned spatial point on the first matching image and a matching characteristic point in the first matching image, and a second distance between a pixel point corresponding to the repositioned spatial point on the second matching image and a matching characteristic point in the second matching image; and if the first distance and the second distance are smaller than the preset distance threshold, determining the pairing feature point as the target feature point.

Optionally, the data acquisition module may specifically be configured to: determining a desired time interval according to the motion rate and/or the frame rate of the vision sensor arranged on the same carrier; for a first matching image in the data packet, a second matching image with a time stamp spaced from the first matching image by a desired time interval is acquired from the data packet.

Optionally, the offset determining module may specifically be configured to: acquiring time offsets corresponding to N different polar line errors of the vision sensor, wherein N is a positive integer greater than 1; and determining the time offset corresponding to the minimum polar line error in the N polar line errors according to the time offsets corresponding to the N different polar line errors of the vision sensor, and taking the time offset as the time offset of the vision sensor.

Optionally, the processing device for sensor time offset further includes a compensation module, configured to compensate the time offset of the visual sensor according to the time offset.

The device provided by the embodiment of the application can be used for executing the method in the embodiment, and the implementation principle and the technical effect are similar, and are not repeated here.

It should be noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the data acquisition module may be a processing element that is set up separately, may be implemented in a chip of the above apparatus, or may be stored in a memory of the above apparatus in the form of program codes, and the functions of the data acquisition module may be called and executed by a processing element of the above apparatus. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element here may be an integrated circuit with signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 10, the electronic device 1000 includes: at least one processor 1001, memory 1002, bus 1003, and communication interface 1004. Wherein: processor 1001, communication interface 1004 and memory 1002 perform communication with each other via bus 1003. The communication interface 1004 is used to communicate with other devices. The communication interface comprises a communication interface for data transmission, a display interface or an operation interface for human-computer interaction, and the like. The processor 1001 is configured to execute a memory storing computer-executable instructions, and may specifically perform relevant steps in the method described in the above embodiments.

The processor may be a central processing unit, or a specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors included in the electronic device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs. The memory is used for storing computer execution instructions. The memory may comprise high speed RAM memory or may also comprise non-volatile memory, such as at least one disk memory.

The present embodiment further provides a vehicle, and by way of example, reference may be made to fig. 3 above, where the vehicle includes a pose sensor, a vision sensor, and a processor, where the pose sensor and the vision sensor are connected to the processor, and where the processor is configured to implement the method described above.

The present embodiment also provides a computer-readable storage medium having stored therein computer instructions which, when executed by at least one processor of an electronic device, perform the methods provided by the various embodiments described above.

In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the front and rear associated objects are an "or" relationship; in the formula, the character "/" indicates that the front and rear associated objects are a "division" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It will be appreciated that the various numerical numbers referred to in the embodiments of the present application are merely for ease of description and are not intended to limit the scope of the embodiments of the present application. In the embodiment of the present application, the sequence number of each process does not mean the sequence of the execution sequence, and the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application in any way.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. A method of processing a sensor time offset for determining a time offset between a pose sensor and a vision sensor disposed on a same carrier, the method comprising:

according to the pose information and the target feature points, determining polar errors of the at least two frames of images, wherein the polar errors are used for representing the distance from the target feature points in the images to polar lines in the images;

determining a time offset of the vision sensor relative to the pose sensor according to the epipolar error;

the determining, according to the polar error, a time offset of the vision sensor relative to the pose sensor specifically includes:

2. The method of claim 1, wherein determining matching target feature points in the at least two frames of images, respectively, comprises:

respectively extracting features of a first matching image and a second matching image in at least two frames of images, and determining feature points in the first matching image and feature points in the second matching image;

3. The method according to claim 2, wherein the feature matching the feature points in the first matching image with the feature points in the second matching image, and determining the target feature point, includes:

filtering dynamic objects of a first matching image and a second matching image in at least two frames of images to obtain matched feature points matched with each other in the first matching image and the second matching image;

and filtering the paired feature points to obtain target feature points.

4. The method according to claim 2, wherein the feature matching the feature points in the first matching image with the feature points in the second matching image, and determining the target feature point, includes:

and filtering the paired feature points to obtain target feature points.

5. The method according to claim 2, wherein the feature matching the feature points in the first matching image with the feature points in the second matching image, and determining the target feature point, includes:

Acquiring adjacent images in the data packet, wherein the adjacent images are images adjacent to a first matching image and/or a second matching image in at least two frames of images in time;

and carrying out multi-frame re-projection filtering on matched feature points matched with each other in the first matched image and the second matched image based on the adjacent images to obtain the target feature points.

6. The method according to claim 5, wherein performing multi-frame re-projection filtering on the matched feature points in the first matching image and the second matching image based on the adjacent image to obtain the target feature point includes:

acquiring corresponding pixel points of the relocated space points on the first matching image and corresponding pixel points of the relocated space points on the second matching image;

and if the first distance and the second distance are smaller than the preset distance threshold, determining the pairing feature point as a target feature point.

7. The method of claim 1, wherein the acquiring at least two frames of images in a data packet comprises:

and aiming at a first matching image in a data packet, acquiring a second matching image with a time stamp spaced from the first matching image by the expected time interval from the data packet.

8. The method according to claim 1, wherein the method further comprises:

9. A sensor time offset processing apparatus, comprising:

the error determining module is used for determining polar errors of the at least two frames of images according to the pose information and the target feature points, and the polar errors are used for representing the distances from the target feature points in the images to polar lines in the images;

the offset determining module is used for determining the time offset of the vision sensor relative to the pose sensor according to the polar error;

the offset determining module is specifically configured to:

10. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-8.

11. A vehicle comprising at least a pose sensor, a vision sensor and a processor, the pose sensor, the vision sensor being coupled to the processor, the processor being configured to implement the method of any of claims 1-8.

12. A computer readable storage medium having stored therein computer instructions which, when executed by a processor, are adapted to carry out the method of any one of claims 1-8.