CN114565777A

CN114565777A - Data processing method and device

Info

Publication number: CN114565777A
Application number: CN202210192315.5A
Authority: CN
Inventors: 陈一平
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-05-31

Abstract

The application discloses a data processing method and a data processing device, which belong to the field of simultaneous positioning and map construction, and the method comprises the following steps: extracting a plurality of first feature point sets from video data including a plurality of frames of images; each first feature point set comprises a plurality of feature points corresponding to the same feature object; wherein each feature point is extracted from a corresponding frame of image; obtaining a descriptor of each feature point, and combining a plurality of first feature point sets according to the obtained descriptor to obtain a plurality of second feature point sets; generating a corresponding map point according to each second feature point set, and determining at least one pair of map points to be judged in the generated map points; the distance between each pair of map points to be judged is smaller than a first distance threshold value; and combining each pair of map points to be judged according to the reprojection error corresponding to each pair of map points to be judged to obtain corresponding combined map points.

Description

Data processing method and device

Technical Field

The application belongs to the field of simultaneous positioning and map construction, and particularly relates to a data processing method and device.

Background

SLAM (Simultaneous Localization and Mapping) refers to a robot that performs self-Localization while moving in an unknown environment, and builds an incremental map from acquired environment information. The SLAM for acquiring the environmental information through the camera is a visual SLAM, has the advantages of low cost, rich image information and the like, and is more and more concerned by people.

However, as the camera moves, a feature object in the environment information may disappear from the field of view of the camera and appear again, so that in the environment information collected by the camera, the feature object reappearing after disappearing is regarded as a completely new feature object by the camera, thereby causing a large amount of redundant data to be included in the environment information collected by the camera. Redundant data may occupy a large amount of memory, resulting in increased data processing time and reduced data processing efficiency.

Disclosure of Invention

The embodiment of the application aims to provide a data processing method and device, and the problem of reducing redundant data in collected environment information can be solved.

In a first aspect, an embodiment of the present application provides a data processing method, where the method includes:

extracting a plurality of first feature point sets from video data including a plurality of frames of images; each first feature point set comprises a plurality of feature points corresponding to the same feature object; wherein each feature point is extracted from a corresponding frame of image;

obtaining a descriptor of each feature point, and combining the plurality of first feature point sets according to the obtained descriptor to obtain a plurality of second feature point sets;

generating a corresponding map point according to each second feature point set, and determining at least one pair of map points to be judged in the generated plurality of map points; the distance between each pair of map points to be judged is smaller than a first distance threshold value;

and combining each pair of map points to be judged according to the reprojection errors corresponding to each pair of map points to be judged to obtain corresponding combined map points.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including:

the extraction module is used for extracting a plurality of first feature point sets from video data comprising a plurality of frames of images; each first feature point set comprises a plurality of feature points corresponding to the same feature object; wherein each feature point is extracted from a corresponding frame of image;

the first merging module is used for acquiring a descriptor of each feature point, and merging the plurality of first feature point sets according to the acquired descriptor to obtain a plurality of second feature point sets;

the determining module is used for generating a corresponding map point according to each second feature point set, and determining at least one pair of map points to be judged in the generated map points; the distance between each pair of map points to be judged is smaller than a first distance threshold value;

and the second merging module is used for merging each pair of map points to be judged according to the reprojection errors corresponding to each pair of map points to be judged to obtain the corresponding merged map points.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or an instruction stored on the memory and executable on the processor, and when executed by the processor, the program or the instruction implements the steps of the data processing method according to the first aspect.

In a fourth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the data processing method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the data processing method according to the first aspect.

In the embodiment of the application, a plurality of first feature point sets are extracted from video data comprising a plurality of frames of images; each first feature point set comprises a plurality of feature points corresponding to the same feature object; wherein each feature point is extracted from a corresponding frame of image; obtaining a descriptor of each feature point, and combining a plurality of first feature point sets according to the obtained descriptor to obtain a plurality of second feature point sets; generating a corresponding map point according to each second feature point set, and determining at least one pair of map points to be judged in the generated plurality of map points; the distance between each pair of map points to be judged is smaller than a first distance threshold value; and combining each pair of map points to be judged according to the reprojection error corresponding to each pair of map points to be judged to obtain corresponding combined map points. Through the embodiment of the application, under the condition that the observed times of the feature object are more, the plurality of first feature point sets can be merged to obtain the plurality of second feature point sets, redundant data caused by the fact that the same feature object is identified to correspond to the plurality of different first feature point sets is reduced, map points corresponding to each second feature point set can be generated, each pair of map points to be judged which are close to each other is merged to reduce the redundant data caused by the fact that the same feature object is identified to correspond to the plurality of different map points, and therefore data processing efficiency is improved in the process of utilizing environment information to process data.

Drawings

Fig. 1 is a schematic flowchart of a first data processing method provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a second data processing method provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of merging a first feature point set according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a reprojection error provided by an embodiment of the present application;

FIG. 5 is a schematic flow chart of a merged map point provided in an embodiment of the present application;

fig. 6 is a schematic block diagram of a data processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic block diagram of an electronic device provided in an embodiment of the present application;

fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The data processing method and apparatus provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

Fig. 1 is a schematic flowchart of a first data processing method according to an embodiment of the present application.

Step 102, extracting a plurality of first feature point sets from video data comprising a plurality of frames of images; each first feature point set comprises a plurality of feature points corresponding to the same feature object; wherein each feature point is extracted from a corresponding frame of image.

The video data may be obtained by shooting with a camera, which may be a monocular camera, a binocular camera, or a depth camera. The embodiment of the present application does not particularly limit the kind of the camera.

The camera can change the pose in the process of shooting to obtain video data. The pose of the camera may include position information of the camera and a shooting angle of the camera.

The feature object may be an object that appears in a plurality of frames of images included in the video data, for example, a tree, a stool, and a person. If a feature object appears in multiple frames of images, it can be considered that the feature object is observed multiple times by a camera that captures video data, and the number of frames of images containing the same feature object in a piece of video data can be considered as the number of times the feature object is observed.

Each first feature point set includes a plurality of feature points corresponding to the same feature object, and each feature point is extracted from a corresponding frame of image, for example, a piece of video data includes 60 frames of images, wherein, from the 11 th frame of image to the 20 th frame of image, the 10 frames of images each include a stool, and the position of the stool in the 10 frames of images is different because the pose of the camera changes with time. Extracting the feature point corresponding to the stool from the 11 th frame image, extracting the feature point corresponding to the stool from the 12 th frame image … …, extracting the feature point corresponding to the stool from the 20 th frame image, so that 10 extracted feature points form a first feature point set, the 10 feature points correspond to the same stool, and each feature point is extracted from a corresponding frame image.

The method comprises the steps of extracting a first feature point set from video data, wherein at least one feature point is extracted from each frame of image of the video data, and performing feature matching or feature tracking between two adjacent frames of images for each feature point so as to determine a first feature point set corresponding to the same feature object.

At least one feature point is extracted from each frame of image of the video data, which may be an orb (organized Fast and rotaed brief) feature point, a SIFT (Scale-invariant feature transform) feature point, a surf (scaled up robust feature) feature point, or the like, extracted from one frame of image.

The feature match may be a descriptor match. The feature tracking may be optical flow tracking.

The descriptor is understood to mean that, for each feature point, feature information of a circle of pixel points around the feature point can be described by a set of binary numbers, and the feature information includes, but is not limited to, brightness information, color information, and the like.

In particular embodiments, each frame of image in the video data may be pre-processed before step 102 is performed. The image preprocessing mode can be distortion correction of each frame of image according to the calibration parameters of the camera. Calibration parameters include, but are not limited to, camera focal length, camera offset, and camera distortion parameters. The image preprocessing mode may also be to perform an adjustment operation on the brightness of the image, or to perform a motion blur processing operation on the image, and so on. The image with better quality is obtained by preprocessing the image, so that the characteristic extraction error is reduced in the process of extracting the characteristic points.

And 104, acquiring a descriptor of each feature point, and merging the plurality of first feature point sets according to the acquired descriptor to obtain a plurality of second feature point sets.

After the feature extraction and the feature matching are completed, the similarity judgment can be performed on the first feature point set, and the descriptor distance can be adopted to judge the similarity degree of the first feature point set in the embodiment of the application.

Optionally, according to the obtained descriptor, merging the multiple first feature point sets to obtain multiple second feature point sets, including: calculating the descriptor distance of any two feature points in each first feature point set according to the descriptor of each feature point; according to the descriptor distance of any two feature points obtained by calculation, counting a descriptor distance set corresponding to each feature point; the descriptor distance set comprises descriptor distances of other feature points which belong to the same first feature point set with the corresponding feature points; determining a target descriptor corresponding to each first feature point set according to the descriptor distance set corresponding to each feature point; and merging the plurality of first feature point sets according to the target descriptor corresponding to each first feature point set to obtain a plurality of second feature point sets.

Calculating the descriptor distance of any two feature points in each first feature point set according to the descriptor of each feature point; according to the descriptor distance of any two feature points obtained by calculation, counting a descriptor distance set corresponding to each feature point; the descriptor distance set comprises descriptor distances of other feature points belonging to the same first feature point set with the corresponding feature points.

For example, the first feature point set 1 includes 3 feature points, which are feature point 1, feature point 2, and feature point 3. For the feature point 1, the feature point 2 and the feature point 3 are other feature points belonging to the same first feature point set as the feature point 1. For the feature point 2, the feature point 1 and the feature point 3 are other feature points belonging to the same first feature point set as the feature point 1. For the feature point 3, the feature point 1 and the feature point 2 are other feature points belonging to the same first feature point set as the feature point 1.

The descriptor distance of any two feature points in the first feature point set 1 can be calculated: the descriptor distance r1 between the descriptor of feature point 1 and the descriptor of feature point 2, the descriptor distance r2 between the descriptor of feature point 1 and the descriptor of feature point 3, and the descriptor distance r3 between the descriptor of feature point 2 and the descriptor of feature point 3 are calculated.

Then, the descriptor distance set corresponding to the feature point 1 includes: r1, r 2; the descriptor distance set corresponding to the feature point 2 includes: r1, r 3; the descriptor distance set corresponding to the feature point 3 includes: r2 and r 3.

Optionally, determining a target descriptor corresponding to each first feature point set according to the descriptor distance set corresponding to each feature point includes: averaging the descriptor distances in the descriptor distance set corresponding to each feature point to obtain an average descriptor distance corresponding to each feature point; and in each first feature point set, determining the descriptor of the feature point with the minimum corresponding average descriptor distance as the target descriptor corresponding to each first feature point set.

And averaging the descriptor distances in the descriptor distance set corresponding to each feature point to obtain the average descriptor distance corresponding to each feature point. For example, the first feature point set 1 includes 3 feature points, which are feature point 1, feature point 2, and feature point 3. The descriptor distance set corresponding to the feature point 1 comprises: r1, r 2; the descriptor distance set corresponding to the feature point 2 includes: r1, r 3; the descriptor distance set corresponding to the feature point 3 includes: r2 and r 3. Averaging the descriptor distances in the descriptor distance set corresponding to the feature point 1 to obtain an average descriptor distance corresponding to the feature point 1, namely 1/2(r1+ r 2); averaging the descriptor distances in the descriptor distance set corresponding to the feature point 2 to obtain an average descriptor distance corresponding to the feature point 2, namely 1/2(r1+ r 3); and averaging the descriptor distances in the descriptor distance set corresponding to the feature point 3 to obtain an average descriptor distance corresponding to the feature point 3, namely 1/2(r2+ r 3).

In each first feature point set, the descriptor of the feature point whose corresponding average descriptor distance is the smallest is determined as the target descriptor corresponding to each first feature point set, for example, in the foregoing first feature point set 1, if the average descriptor distance 1/2(r1+ r2) corresponding to feature point 1 is the smallest, the average descriptor distance 1/2(r1+ r3) corresponding to feature point 2 is the largest, and the average descriptor distance 1/2(r2+ r3) corresponding to feature point 3 is located therebetween, then the descriptor of feature point 1 may be determined as the target descriptor corresponding to first feature point set 1.

Optionally, the plurality of second feature point sets include at least one merged feature point set and at least one non-merged feature point set; according to the target descriptor corresponding to each first feature point set, merging the multiple first feature point sets to obtain multiple second feature point sets, and the method comprises the following steps: determining at least one pair of feature point sets to be judged in the plurality of first feature point sets; calculating the descriptor distance of each pair of feature point sets to be judged according to the target descriptor corresponding to each first feature point set to obtain the target descriptor distance; and merging each pair of feature point sets to be judged, of which the target descriptor distance is smaller than the second distance threshold value, so as to obtain a corresponding merged feature point set.

The second feature point set may be a corresponding merged feature point set obtained by merging at least two first feature point sets, or may be a non-merged feature point set corresponding to the first feature point set. In the plurality of first feature point sets, a part of the first feature point sets can be subjected to merging processing, and the other part of the first feature point sets cannot be subjected to merging processing.

The two first feature point sets that can be subjected to the merging process can be regarded as two first feature point sets corresponding to the same feature object. For example, in 60 seconds of video data, a first feature point set 1 corresponding to stone a is extracted from the 10 th frame to the 20 th frame, a first feature point set 2 corresponding to stone b is extracted from the 45 th frame to the 52 th frame, and if the first feature point set 1 and the first feature point set 2 can be merged, it can be considered that stone a and stone b are the same stone.

Determining at least one pair of feature point sets to be judged in the plurality of first feature point sets, for example, the plurality of first feature point sets include a first feature point set 1 corresponding to the stone a and a first feature point set 2 corresponding to the stone b, and in the case that the similarity between the stone a and the stone b is greater than a preset similarity threshold, determining the first feature point set 1 and the first feature point set 2 as a feature point set to be judged.

And calculating the descriptor distance of each pair of feature point sets to be judged according to the target descriptor corresponding to each first feature point set, and obtaining the target descriptor distance. For example, a pair of feature point sets to be determined includes a first feature point set 1 and a first feature point set 2. The first feature point set 1 includes feature points 1, feature points 2, and feature points 3, the first feature point set 2 includes feature points 4, feature points 5, feature points 6, and feature points 7, a target descriptor corresponding to the first feature point set 1 is a descriptor of the feature point 1, and a target descriptor corresponding to the first feature point set 2 is a descriptor of the feature point 6. The descriptor distance between the descriptor of the feature point 1 and the descriptor of the feature point 6 is calculated as the descriptor distance of the feature point set to be judged, i.e. the target descriptor distance.

And merging each pair of feature point sets to be judged, of which the target descriptor distance is smaller than the second distance threshold value, so as to obtain a corresponding merged feature point set.

The second distance threshold may be a preset distance threshold describing the sub-distance.

And under the condition that the target descriptor distance of any pair of feature point sets to be judged is smaller than a second distance threshold, merging the feature point sets to be judged to obtain a merged feature point set corresponding to the feature point set to be judged.

And under the condition that the target descriptor distance of any pair of feature point sets to be judged is smaller than the second distance threshold, determining that the feature point sets to be judged do not need to be merged.

Step 104 may be described in more detail below in conjunction with fig. 2. Fig. 2 is a schematic flowchart of merging a first feature point set according to an embodiment of the present application.

In specific implementation, the first feature point set 1 and the first feature point set 2 to be compared may be determined from the multiple first feature point sets according to the similarity determination result of the feature object corresponding to the first feature point set, or any two first feature point sets in the multiple first feature point sets may be used as the first feature point set 1 and the first feature point set 2 to be compared.

As shown in fig. 2, in step 202, descriptor distances of the first feature point set 1 are calculated.

And step 204, calculating the descriptor distance of the first feature point set 2.

Step 202 and step 204 may be performed simultaneously, or step 202 may be performed first, then step 204 may be performed, or vice versa.

In step 206, a target descriptor of the first feature point set 1 is obtained.

After step 202 is performed, step 206 is performed.

In step 208, a target descriptor of the first feature point set 2 is obtained.

After step 204 is performed, step 208 is performed.

Step 210, calculating the descriptor distance between the target descriptor of the first feature point set 1 and the target descriptor of the first feature point set 2.

In step 212, it is determined whether the descriptor distance is less than the second distance threshold.

If so, go to step 214.

Step 214, merging the first feature point set 1 and the first feature point set 2 to obtain a second feature point set 1.

106, generating a corresponding map point according to each second feature point set, and determining at least one pair of map points to be judged in the generated map points; the distance between each pair of map points to be judged is smaller than a first distance threshold value.

The map points are used for reflecting specific position information of the feature points on each frame of image in the three-dimensional space. For a monocular camera, depth information corresponding to a feature point is generally acquired by a triangulation method; for a double-sided camera, depth information corresponding to the feature points can be acquired by calculating left-eye and right-eye parallax; for a depth camera, depth information corresponding to feature points can be directly acquired.

Determining at least one pair of map points to be determined from the generated plurality of map points, which may be calculating a map point distance between two adjacent map points, and then determining whether the map point distance is smaller than a first distance threshold, if so, determining the two adjacent map points as one map point to be determined.

The first distance threshold may be a preset map point distance threshold.

Optionally, before generating a corresponding map point according to each second feature point set, the data processing method further includes: acquiring a target camera pose of a camera for acquiring the video data; generating a corresponding map point according to each second feature point set, wherein the generating of the map point comprises the following steps: determining coordinate information and target depth information of the initial map point corresponding to each second feature point set according to the coordinate information and the depth information of each feature point in each second feature point set; and correcting the coordinate information of the initial map points according to the pose of the target camera and the target depth information to obtain target map points corresponding to each second feature point set.

The target camera pose may be position information and pose information of a camera that captures video data.

The visual slam system can adopt a reprojection error to construct a least square problem to optimize the results of the system, namely pose, depth and the like.

Fig. 3 is a schematic diagram of a reprojection error according to an embodiment of the present disclosure. As shown in fig. 3, let the coordinates of a certain spatial point be P ═ X, Y, Z]^TThe projected pixel coordinate is u ═ u, v]^TThe correspondence between the pixel position and the spatial position is as follows:

written in matrix form as:

su＝KTP (2)

wherein s is depth information corresponding to the feature points in the camera coordinate system, K is camera internal reference, and T is camera pose.

As shown in fig. 3, the point P may be a spatial point, P1 may be a projection of P when the camera is in pose 1, and P2 may be a projection of P when the camera is in pose 2, but actually, the spatial point P calculated from the coordinate information of P1, the depth information, and the camera pose 1 is inaccurate in its projection coordinates at pose 2, such as P2'.

The method comprises the steps of summing errors existing in the formula due to the existence of system noise of a visual slam system, constructing a least square problem, and determining the pose and the depth of a target camera, so that the numerical value of the following formula (3) is minimum.

And correcting the coordinate information of the initial map point according to the pose of the target camera and the target depth information to obtain the target map point corresponding to each second feature point set, wherein the coordinate information of the target map point corresponding to each second feature point set can be determined by substituting the pose of the target camera and the target depth information into the formula (1).

Because the characteristic points are observed by the camera for a plurality of timesAnd constructing coordinate correction of constraint participation map points to obtain the optimal value of the system. When the feature point sets are combined, the map points corresponding to the feature points already obtain depth values, so that the map points corresponding to the feature points need to be combined in addition to the attribute combination of the feature points, and the depth is updated by taking the number of the feature points as a weight value. If the first feature point set 1 has w₁A feature point having a depth d₁The first feature point set 2 has w₂A feature point having a depth d₂The depth d after the combination is shown in the following formula (4).

It can be understood that, in the case of merging the first feature point sets, the greater the number of feature points included in the first feature point set, the higher the weight of the target depth information corresponding to the first feature point set having the greater number of feature points.

In the technical scheme of observing the feature points through a monocular camera or observing the feature points through a binocular camera, a large amount of more accurate depth information and a small amount of less accurate depth information may be acquired. In this case, after the depth of the feature point set is combined in the above manner, the reliability of the depth information is improved to some extent, and further, it is not necessary to reacquire the depth information by other acquisition means. In addition, in the subsequent step 108, the map points need to be merged by using the depth information of the feature points, and the number of depth information is reduced by merging the depth information, so that the amount of calculation can be effectively reduced in the process of merging the map points, and the influence of sensor noise on the system precision can also be avoided.

And 108, combining each pair of map points to be judged according to the reprojection errors corresponding to each pair of map points to be judged to obtain corresponding combined map points.

Optionally, the merging the map points to be determined according to the reprojection error corresponding to each map point to be determined to obtain corresponding merged map points, including: in each pair of map points to be judged, calculating a reprojection error of each feature point according to the depth information and the coordinate information of each feature point in a second feature point set corresponding to each map point to be judged to obtain a first reprojection error of each feature point; determining a target feature point corresponding to each map point to be judged according to the median of the first reprojection error of each feature point; according to the depth information and the coordinate information of the target feature points corresponding to each map point to be judged, calculating the reprojection error of each target feature point in a crossed manner to obtain a second reprojection error of each target feature point; and combining each pair of map points to be judged, of which the sum of the two second reprojection errors is smaller than a preset error threshold value, to obtain a corresponding combined map point.

In each pair of map points to be judged, calculating the reprojection error of each feature point according to the depth information and the coordinate information of each feature point in the second feature point set corresponding to each map point to be judged, and obtaining the first reprojection error R of each feature point₁. For example, a pair of map points to be determined includes map point 1 and map point 2. The formula for calculating the reprojection error can be referred to the aforementioned formula (3), that is

The second feature point set corresponding to the map point 1 comprises a feature point 1, a feature point 2 and a feature point 3; the second feature point set corresponding to the map point 2 includes the feature point 4, the feature point 5, and the feature point 6. The reprojection errors of feature points 1 to 6 can be calculated, respectively, to obtain six first reprojection errors R1.

And determining the target feature point corresponding to each map point to be judged according to the median of the first reprojection error of each feature point. For example, for map point 1, the R1 value of feature point 1 is the smallest, the R1 value of feature point 2 is slightly larger, and the R1 value of feature point 3 is the largest, so the target feature point corresponding to map point 1 is feature point 2.

And according to the depth information and the coordinate information of the target feature points corresponding to each map point to be judged, calculating the reprojection error of each target feature point in a crossed manner to obtain a second reprojection error of each target feature point.

For example, the reprojection error between the target feature point of the map point 1 and the map point 2 is calculated as the second reprojection error R2 of the target feature point of the map point 1; and calculating the reprojection error of the target characteristic point of the map point 2 and the map point 1 in a crossed manner to be used as a second reprojection error R2' of the target characteristic point of the map point 2.

And combining each pair of map points to be judged, of which the sum of the two second reprojection errors is smaller than a preset error threshold value, to obtain a corresponding combined map point.

The preset error threshold may be a pre-set reprojection error threshold r. For example, when R2+ R2' is smaller than R, the map point 1 and the map point 2 may be merged to obtain a corresponding merged map point; in the case where R2+ R2' is equal to or greater than R, it can be determined that the map point 1 and the map point 2 do not need to be subjected to the merging process.

Fig. 4 is a schematic flowchart of a merged map point provided in an embodiment of the present application.

As shown in fig. 4, in step 402, a feature point reprojection error associated with map point 1 is calculated.

And step 404, calculating a re-projection error of the map point 2 associated feature point.

And 406, acquiring a target feature point corresponding to the map point 1.

And step 408, acquiring a target feature point corresponding to the map point 2.

Step 410, calculating the reprojection error with the map point 2 in a crossed manner to obtain r _ 1.

And step 412, calculating the reprojection error of the map point 1 in a crossed manner to obtain r _ 2.

In step 414, it is determined whether the sum of r _1 and r _2 is less than a predetermined error threshold.

If yes, go to step 416.

Step 416, merging map point 1 and map point 2.

Optionally, the merging the map points to be determined according to the reprojection error corresponding to each map point to be determined to obtain corresponding merged map points, further comprising: determining whether the number of the feature points corresponding to each merged map point is greater than a preset number threshold; if the number of the feature points corresponding to each merged map point is larger than a preset number threshold, determining the feature points corresponding to the merged map points as feature points to be screened, and calculating corresponding association scores according to the first reprojection error, the average descriptor distance and the observed times of the corresponding feature objects of each feature point to be screened; and screening the feature points to be screened corresponding to the combined map points according to the association scores corresponding to the feature points to be screened to obtain at least one representative feature point corresponding to the combined map points.

After step 108 is executed, the number of feature points associated with the map points may also be analyzed to determine whether the number is greater than a set number threshold, and an optimal feature point set is screened out. The number of feature points associated with the map point may be the number of feature points included in the second feature point set corresponding to the map point.

The scheme calculates the association scores of the feature points associated with the map points through the following formula (6), sorts the feature points according to the association scores, determines at least one representative feature point corresponding to the combined map points according to a sorting result,

s＝a*E_rep+b*D_avg+c*N_ob (6)

wherein s is the association score, E_repFor the reprojection error, a is the reprojection error corresponding coefficient, D_avgIs the average descriptor distance, b is the average descriptor distance coefficient, N_obC is the number of observed times of the feature points, and c is the coefficient of the number of the feature points.

In the embodiment shown in fig. 1, a plurality of first feature point sets are extracted from video data including a plurality of frame images; each first feature point set comprises a plurality of feature points corresponding to the same feature object; wherein each feature point is extracted from a corresponding frame of image; obtaining a descriptor of each feature point, and combining a plurality of first feature point sets according to the obtained descriptor to obtain a plurality of second feature point sets; generating a corresponding map point according to each second feature point set, and determining at least one pair of map points to be judged in the generated plurality of map points; the distance between each pair of map points to be judged is smaller than a first distance threshold value; and combining each pair of map points to be judged according to the reprojection error corresponding to each pair of map points to be judged to obtain corresponding combined map points. Through the embodiment of the application, under the condition that the observed times of the feature object are more, the plurality of first feature point sets can be merged to obtain the plurality of second feature point sets, redundant data caused by the fact that the same feature object is identified to correspond to the plurality of different first feature point sets is reduced, map points corresponding to each second feature point set can be generated, each pair of map points to be judged which are close to each other is merged to reduce the redundant data caused by the fact that the same feature object is identified to correspond to the plurality of different map points, and therefore data processing efficiency is improved in the process of utilizing environment information to process data.

Fig. 5 is a schematic flowchart of a second data processing method according to an embodiment of the present application.

Step 502, input image data preprocessing.

Step 504, image feature extraction and matching.

And step 506, fusing the similar characteristic points.

At step 508, map points are generated.

In step 510, map points are corrected and updated.

And step 512, map point fusion.

For the above data processing method embodiment, since it is basically similar to the foregoing data processing method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the foregoing data processing method embodiments.

It should be noted that, in the data processing method provided in the embodiment of the present application, the execution main body may be a data processing apparatus, or a control module in the data processing apparatus for executing the data processing method. In the embodiment of the present application, a data processing apparatus executes a data processing method as an example, and the data processing apparatus provided in the embodiment of the present application is described.

Fig. 6 is a schematic block diagram of a data processing apparatus according to an embodiment of the present application.

As shown in fig. 6, the data processing apparatus includes:

an extracting module 601, configured to extract a plurality of first feature point sets from video data including multiple frames of images; each first feature point set comprises a plurality of feature points corresponding to the same feature object; wherein each feature point is extracted from a corresponding frame of image;

a first merging module 602, configured to obtain a descriptor of each feature point, and merge multiple first feature point sets according to the obtained descriptor to obtain multiple second feature point sets;

a first determining module 603, configured to generate a corresponding map point according to each second feature point set, and determine at least one pair of map points to be determined in the generated plurality of map points; the distance between each pair of map points to be judged is smaller than a first distance threshold value;

and a second merging module 604, configured to merge each pair of map points to be determined according to the reprojection error corresponding to each pair of map points to be determined, so as to obtain corresponding merged map points.

Optionally, the first merging module 602 includes:

the calculating unit is used for calculating the descriptor distance of any two feature points in each first feature point set according to the descriptor of each feature point;

the statistical unit is used for counting a descriptor distance set corresponding to each feature point according to the descriptor distance of any two feature points obtained through calculation; the descriptor distance set comprises descriptor distances of other feature points which belong to the same first feature point set with the corresponding feature points;

the determining unit is used for determining a target descriptor corresponding to each first feature point set according to the descriptor distance set corresponding to each feature point;

and the merging unit is used for merging the plurality of first feature point sets according to the target descriptor corresponding to each first feature point set to obtain a plurality of second feature point sets.

Optionally, the determining unit is specifically configured to:

averaging the descriptor distances in the descriptor distance set corresponding to each feature point to obtain an average descriptor distance corresponding to each feature point;

and in each first feature point set, determining the descriptor of the feature point with the minimum corresponding average descriptor distance as the target descriptor corresponding to each first feature point set.

Optionally, the plurality of second feature point sets include at least one merged feature point set and at least one non-merged feature point set; the merging unit is specifically configured to:

determining at least one pair of feature point sets to be judged in the plurality of first feature point sets;

calculating the descriptor distance of each pair of feature point sets to be judged according to the target descriptor corresponding to each first feature point set to obtain the target descriptor distance;

Optionally, the data processing apparatus further comprises:

the acquisition module is used for acquiring the pose of a target camera of the camera for acquiring the video data;

the first determining module 603 is specifically configured to:

determining coordinate information and target depth information of the initial map point corresponding to each second feature point set according to the coordinate information and the depth information of each feature point in each second feature point set;

and correcting the coordinate information of the initial map points according to the pose of the target camera and the target depth information to obtain target map points corresponding to each second feature point set.

Optionally, the second merging module 604 is specifically configured to:

in each pair of map points to be judged, calculating a reprojection error of each feature point according to the depth information and the coordinate information of each feature point in a second feature point set corresponding to each map point to be judged to obtain a first reprojection error of each feature point;

determining a target feature point corresponding to each map point to be judged according to the median of the first reprojection error of each feature point;

according to the depth information and the coordinate information of the target feature points corresponding to each map point to be judged, calculating the reprojection error of each target feature point in a crossed manner to obtain a second reprojection error of each target feature point;

Optionally, the data processing apparatus further comprises:

a second determining module, configured to determine whether the number of feature points corresponding to each merged map point is greater than a preset number threshold;

if the number of the feature points corresponding to each merged map point is larger than a preset number threshold, operating a calculation module, wherein the calculation module is used for determining the feature points corresponding to the merged map points as feature points to be screened, and calculating corresponding association scores according to the first reprojection error, the average descriptor distance and the observed times of the corresponding feature objects of each feature point to be screened;

and the screening module is used for screening the feature points to be screened corresponding to the combined map points according to the association scores corresponding to the feature points to be screened to obtain at least one representative feature point corresponding to the combined map points.

The data processing device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The data processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system (Android), an iOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.

The data processing apparatus provided in the embodiment of the present application can implement each process implemented by the method embodiments in fig. 1 to fig. 5, and is not described here again to avoid repetition.

Fig. 7 is a schematic block diagram of an electronic device provided in an embodiment of the present application. Optionally, as shown in fig. 7, an electronic device 700 is further provided in this embodiment of the present application, and includes a processor 701, a memory 702, and a program or an instruction stored in the memory 702 and executable on the processor 701, where the program or the instruction is executed by the processor 701 to implement each process of the data processing method embodiment, and can achieve the same technical effect, and no further description is provided here to avoid repetition.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

The electronic device 800 includes, but is not limited to: a radio frequency unit 801, a network module 802, an audio output unit 803, an input unit 804, a sensor 805, a display unit 806, a user input unit 807, an interface unit 808, a memory 809, and a processor 810.

Those skilled in the art will appreciate that the electronic device 800 may further comprise a power source (e.g., a battery) for supplying power to the various components, and the power source may be logically connected to the processor 810 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system. The electronic device structure shown in fig. 8 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The processor 810 is configured to extract a plurality of first feature point sets from video data including a plurality of frames of images; each first feature point set comprises a plurality of feature points corresponding to the same feature object; wherein each feature point is extracted from a corresponding frame of image;

and combining each pair of map points to be judged according to the reprojection error corresponding to each pair of map points to be judged to obtain corresponding combined map points.

Optionally, the processor 110 is further configured to: according to the obtained descriptor, merging the multiple first feature point sets to obtain multiple second feature point sets, including:

calculating the descriptor distance of any two feature points in each first feature point set according to the descriptor of each feature point;

according to the descriptor distance of any two feature points obtained by calculation, counting a descriptor distance set corresponding to each feature point; the descriptor distance set comprises descriptor distances of other feature points which belong to the same first feature point set with the corresponding feature points;

determining a target descriptor corresponding to each first feature point set according to the descriptor distance set corresponding to each feature point;

and merging the plurality of first feature point sets according to the target descriptor corresponding to each first feature point set to obtain a plurality of second feature point sets.

Optionally, the processor 110 is further configured to: determining a target descriptor corresponding to each first feature point set according to the descriptor distance set corresponding to each feature point, wherein the method comprises the following steps:

Optionally, the plurality of second feature point sets include at least one merged feature point set and at least one non-merged feature point set; a processor 110, further configured to: according to the target descriptor corresponding to each first feature point set, merging the plurality of first feature point sets to obtain a plurality of second feature point sets, wherein the merging process comprises the following steps:

Optionally, the processor 110 is further configured to:

according to each second feature point set, before generating a corresponding map point, acquiring a target camera pose of a camera for acquiring video data;

generating a corresponding map point according to each second feature point set, wherein the generating of the map point comprises the following steps:

Optionally, the processor 110 is further configured to: according to the reprojection error corresponding to each pair of map points to be judged, combining the map points to be judged to obtain corresponding combined map points, comprising the following steps:

in each pair of map points to be judged, calculating a reprojection error of each feature point according to the depth information and the coordinate information of each feature point in a second feature point set corresponding to each map point to be judged, and obtaining a first reprojection error of each feature point;

Optionally, the processor 110 is further configured to: according to the reprojection error corresponding to each pair of map points to be judged, merging each pair of map points to be judged to obtain corresponding merged map points, and the method further comprises the following steps:

determining whether the number of the feature points corresponding to each merged map point is greater than a preset number threshold;

if the number of the feature points corresponding to each merged map point is larger than a preset number threshold, determining the feature points corresponding to the merged map points as feature points to be screened, and calculating corresponding association scores according to the first reprojection error, the average descriptor distance and the observed times of the corresponding feature objects of each feature point to be screened;

and screening the feature points to be screened corresponding to the combined map points according to the association scores corresponding to the feature points to be screened to obtain at least one representative feature point corresponding to the combined map points.

Through the embodiment of the application, the first feature point sets can be merged, the map points to be judged are merged, redundant data are reduced from the environmental information collected by the camera, the data processing efficiency is improved, and the number of the feature points corresponding to each map point can be reduced.

It should be understood that in the embodiment of the present application, the input Unit 804 may include a Graphics Processing Unit (GPU) 8041 and a microphone 8042, and the Graphics Processing Unit 8041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 806 may include a display panel 8061, and the display panel 8061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 807 includes a touch panel 8071 and other input devices 8072. A touch panel 8071, also referred to as a touch screen. The touch panel 8071 may include two portions of a touch detection device and a touch controller. Other input devices 8072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 809 may be used to store software programs as well as various data including, but not limited to, application programs and operating systems. The processor 810 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 810.

The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the program or the instruction implements the processes of the data processing method embodiment, and can achieve the same technical effects, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the data processing method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of data processing, comprising:

2. The method according to claim 1, wherein the merging the plurality of first feature point sets according to the obtained descriptor to obtain a plurality of second feature point sets comprises:

according to the descriptor distance of any two feature points obtained through calculation, counting a descriptor distance set corresponding to each feature point; the descriptor distance set comprises descriptor distances of other feature points belonging to the same first feature point set with the corresponding feature points;

3. The method according to claim 2, wherein the determining a target descriptor corresponding to each first feature point set according to the descriptor distance set corresponding to each feature point comprises:

and in each first characteristic point set, determining the descriptor of the characteristic point with the minimum corresponding average descriptor distance as the target descriptor corresponding to each first characteristic point set.

4. The method of claim 2, wherein the plurality of second feature point sets comprises at least one merged feature point set and at least one non-merged feature point set; the merging the plurality of first feature point sets according to the target descriptor corresponding to each first feature point set to obtain a plurality of second feature point sets includes:

and merging each pair of feature point sets to be judged, of which the target descriptor distance is smaller than a second distance threshold value, so as to obtain a corresponding merged feature point set.

5. The method according to claim 1, wherein before generating a corresponding map point according to each of the second feature point sets, further comprising:

acquiring a target camera pose of a camera for acquiring the video data;

generating a corresponding map point according to each second feature point set, including:

determining the coordinate information and the target depth information of the initial map point corresponding to each second feature point set according to the coordinate information and the depth information of each feature point in each second feature point set;

and correcting the coordinate information of the initial map point according to the pose of the target camera and the target depth information to obtain a target map point corresponding to each second feature point set.

6. The method according to claim 1, wherein the merging each pair of the map points to be determined according to the reprojection error corresponding to each pair of the map points to be determined to obtain corresponding merged map points comprises:

and combining each pair of the map points to be judged, of which the sum of the two second reprojection errors is smaller than a preset error threshold value, to obtain a corresponding combined map point.

7. The method according to claim 1, wherein after the merging of each pair of map points to be determined according to the reprojection error corresponding to each pair of map points to be determined to obtain corresponding merged map points, the method further comprises:

if the number of the feature points corresponding to each merged map point is larger than a preset number threshold, determining the feature points corresponding to the merged map points as feature points to be screened, and calculating corresponding association scores according to the first reprojection error, the average descriptor distance and the corresponding feature object observed times of each feature point to be screened;

8. A data processing apparatus, comprising:

the first determining module is used for generating a corresponding map point according to each second feature point set, and determining at least one pair of map points to be judged in the generated map points; the distance between each pair of map points to be judged is smaller than a first distance threshold value;

9. The apparatus of claim 8, wherein the first combining module comprises:

a calculating unit, configured to calculate a descriptor distance between any two feature points in each first feature point set according to the descriptor of each feature point;

the statistical unit is used for counting a descriptor distance set corresponding to each feature point according to the descriptor distances of any two feature points obtained through calculation; the descriptor distance set comprises descriptor distances of other feature points belonging to the same first feature point set with the corresponding feature points;

a determining unit, configured to determine, according to the descriptor distance set corresponding to each feature point, a target descriptor corresponding to each first feature point set;

10. The apparatus according to claim 9, wherein the determining unit is specifically configured to:

11. The apparatus of claim 9, wherein the plurality of second feature point sets comprises at least one merged feature point set and at least one non-merged feature point set; the merging unit is specifically configured to:

12. The apparatus of claim 8, further comprising:

the acquisition module is used for acquiring the target camera pose of the camera for acquiring the video data;

the first determining module is specifically configured to:

13. The apparatus of claim 8, wherein the second merging module is specifically configured to:

14. The apparatus of claim 8, further comprising: