CN115272463A

CN115272463A - Target positioning method, device, storage medium and terminal

Info

Publication number: CN115272463A
Application number: CN202210843784.9A
Authority: CN
Inventors: 刘平
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2022-07-18
Filing date: 2022-07-18
Publication date: 2022-11-01

Abstract

A target positioning method, a device, a storage medium and a terminal are provided, wherein the method comprises the steps of acquiring a plurality of color images obtained by shooting feature points to be positioned at different positions and a depth image corresponding to each color image by adopting the technical scheme provided by the application; error screening is carried out on a plurality of color images to obtain a plurality of target images; extracting the depth value of the feature point to be positioned in the depth image corresponding to each target image, and performing projection transformation on the depth value based on the selected image in the multiple target images to obtain multiple target depth values; calculating the average depth value of the depth values of the multiple targets, and determining the coordinate value of the feature point to be positioned in the selected image; and positioning the characteristic points to be positioned according to the selected image, the average depth value and the coordinate values. The method can effectively improve the positioning efficiency of positioning the characteristic points, and further improve the positioning efficiency of positioning the object.

Description

Target positioning method, device, storage medium and terminal

Technical Field

The application relates to the technical field of robot positioning and mapping, in particular to a target positioning method, a target positioning device, a storage medium and a terminal.

Background

Simultaneous Localization and Mapping (SLAM) is a technology for realizing environment Mapping and self-Localization through the fusion of multi-sensor data. The technology can realize accurate positioning of an object by combining an Inertial Measurement Unit (IMU) and image data acquired by a camera.

In recent years, the positioning speed and accuracy of SLAM technology have been improved widely. The Open-source visual inertial target positioning system (OpenVINS) is widely applied due to high speed, high compatibility and high module function.

However, in some scenarios, object positioning is performed in OpenVINS, and there is a problem that positioning is not accurate enough.

Disclosure of Invention

The application provides a target positioning method, a target positioning device, a storage medium and a terminal, and the method can improve positioning accuracy.

In a first aspect, the present application provides a method for locating a target, the method comprising:

acquiring a plurality of color images obtained by shooting feature points to be positioned at different positions and a depth image corresponding to each color image;

error screening is carried out on the multiple color images to obtain multiple target images;

extracting the depth value of the feature point to be positioned in the depth image corresponding to each target image, and performing projection transformation on the depth value based on the selected image in the multiple target images to obtain multiple target depth values;

calculating an average depth value of the target depth values, and determining a coordinate value of the feature point to be positioned in the selected image;

and positioning the characteristic point to be positioned according to the selected image, the average depth value and the coordinate value.

Accordingly, a second aspect of the present application provides an object localization apparatus, the apparatus comprising:

the acquisition module is used for acquiring a plurality of color images obtained by shooting feature points to be positioned at different positions and a depth image corresponding to each color image;

the screening module is used for carrying out error screening on the multiple color images to obtain multiple target images;

the extraction module is used for extracting the depth value of the characteristic point to be positioned in the depth image corresponding to each target image, and performing projection transformation on the depth value based on the selected image in the multiple target images to obtain multiple target depth values;

the calculation module is used for calculating the average depth value of the multiple target depth values and determining the coordinate value of the feature point to be positioned in the selected image;

and the first positioning module is used for positioning the characteristic point to be positioned according to the selected image, the average depth value and the coordinate value.

In a third aspect, the present application further provides a target positioning method, where the method includes:

sampling a plurality of characteristic points to be positioned in an object to be positioned;

determining feature points to be positioned, the number of which is greater than a preset threshold value, of the observation image frames as target feature points to be positioned;

positioning the feature points to be positioned of the targets according to the target positioning method provided by the first aspect to obtain positioning information of each feature point to be positioned of the targets;

and positioning the object to be positioned based on the positioning information of each target feature point to be positioned.

Accordingly, a fourth aspect of the present application provides an object localization apparatus, comprising:

the sampling module is used for sampling a plurality of characteristic points to be positioned in an object to be positioned;

the determining module is used for determining the feature points to be positioned, of which the number of the observation image frames is greater than a preset threshold value, as target feature points to be positioned;

the second positioning module is used for positioning the characteristic points to be positioned of the targets according to the target positioning method provided by the first aspect to obtain positioning information of each characteristic point to be positioned of the target;

and the third positioning module is used for positioning the object to be positioned based on the positioning information of each target feature point to be positioned.

In a fifth aspect, the present application provides a storage medium having stored thereon a computer program which, when loaded by a processor of an electronic device, performs the steps of any of the object localization methods as provided herein.

In a sixth aspect, the present application further provides an electronic device, which includes a processor and a memory, where the memory stores a computer program, and the processor executes the steps in any of the object localization methods provided in the present application by loading the computer program stored in the memory.

By adopting the technical scheme provided by the application, a plurality of color images obtained by shooting the characteristic points to be positioned at different positions and a depth image corresponding to each color image are obtained; error screening is carried out on a plurality of color images to obtain a plurality of target images; extracting the depth value of the feature point to be positioned in the depth image corresponding to each target image, and performing projection transformation on the depth value based on the selected image in the multiple target images to obtain multiple target depth values; calculating the average depth value of the depth values of the multiple targets, and determining the coordinate value of the feature point to be positioned in the selected image; and positioning the characteristic points to be positioned according to the selected image, the average depth value and the coordinate value. Therefore, the depth value of the feature point to be positioned can be directly read in the observation frame by acquiring the depth image of the feature point to be positioned. And then the pose is converted into the depth under the selected image through pose transformation, and more accurate depth information can be obtained compared with a solution mode of constructing a linear equation set by adopting multiple frames, so that the target positioning precision can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a target positioning method according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of a plurality of observation cameras observing feature points.

Fig. 3 is a comparison of positioning effect.

Fig. 4 is a block diagram of a target positioning apparatus according to an embodiment of the present application.

Fig. 5 is a block diagram of a terminal according to an embodiment of the present disclosure.

Detailed Description

It should be noted that the terms "first", "second", and "third", etc. in this application are used for distinguishing different objects, and are not used for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules listed, but rather, some embodiments may include other steps or modules not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.

An embodiment of the present application provides a target positioning method, an apparatus, a storage medium, and a terminal, where an execution subject of the target positioning method may be the target positioning apparatus provided in the embodiment of the present application, or an electronic device integrated with the target positioning apparatus, where the target positioning apparatus may be implemented in a hardware or software manner. The electronic device may be a mobile terminal, among others. Wherein, this mobile terminal can be smart mobile phone, flat board, on-vehicle terminal or intelligent wearable equipment etc..

Referring to fig. 1, fig. 1 is a schematic flowchart of a target positioning method provided in an embodiment of the present application, and as shown in fig. 1, a flowchart of the target positioning method provided in the embodiment of the present application may be as follows:

at 110, a plurality of color images of the feature point to be located and a depth image corresponding to the color images are obtained.

In the embodiment of the present application, the target positioning method may be applied to a target positioning device, and the target positioning device may be specifically loaded in a terminal. The terminal can be a smart phone, a tablet, a vehicle-mounted terminal or other intelligent wearable equipment.

In the related art, an important part of the positioning task is to acquire the three-dimensional (3 dimension, 3D) depth of an observed feature point, usually one feature point is often observed by a plurality of observation cameras with different viewing angles, and the 3D depth of the feature point can be solved by triangulation through the relative relationship of the camera poses. In the SLAM technology, the object itself can be positioned by depending on the 3D spatial information of a plurality of feature points, and therefore, the accuracy of the 3D spatial information of the feature points is crucial to the accuracy of the positioning of the object itself in SLAM.

In the current SLAM positioning scheme, openVINS is widely used in the industry due to the characteristics of high positioning speed, strong adaptability and the like. In OpenVINS, a 3D cartesian coordinate system triangulation method is used for triangularization, and feature points are triangulated in an anchor frame. This process is described in detail below.

As shown in fig. 2, a schematic diagram of feature points observed by a plurality of observation cameras is shown, and as shown in the figure, the feature points are observed by m +1 cameras, wherein an image frame observed by the camera a may be selected as an anchor frame, or a first frame observation frame (ordered according to the time sequence of observing the feature points) corresponding to the feature points may be selected as an anchor frame.

The first step is as follows: and calculating an initial value of the depth of the feature points.

Characteristic point p_fBesides being observed by the camera A, the pose of other observation cameras is marked as C_iWherein i =1,2. Characteristic point p_fThe coordinates in the anchor frame are^Ap_fAt the observation camera C_iThe coordinates in are^Cip_f. Camera frame C_iRotation to anchor frame of

Camera frame C_iTranslation to anchor frame is^Ap_Ci. Camera frame and anchor frame pair feature points p_fThe following relationship exists for the observation of (a):

in the absence of noise, the normalized image coordinates in the camera frame are^Cib, depth is^Ciz, normalized coordinates of feature points in anchor frame^Ab_fDepth of view of^Az_fThe image frame feature point to the normalized image coordinate mapping relation in the anchor frame is as follows:

wherein,^Ciz_fas feature points and camera frame C_iThe corresponding depth of the image,^Cib_ffor feature points in camera frame C_iNormalized coordinates of (2).^Ab_Ci→fAre mapped from the camera frame to the feature point coordinates under the anchor frame.

To eliminate the irrelevant degrees of freedom, define and^Ab_Ci→forthogonal vector

All three rows of the vector are^Ab_Ci→fPerpendicular, i.e.

Left-multiplying transformation matrix to transformation relation of formula (3)^AN_iConstructing two sum-only depths^Az_fThe relevant equation:

the constraints observed by multiple cameras are put together to construct the following depth constraint equation:

the following transformation is performed on equation (3):

the system of 1m by 1m can be obtained, and the system can be quickly solved by scalar division^Az_fNamely:

among them, openVINS also adds a validity check of triangulated feature, i.e. the feature point needs to be right in front of the camera and cannot be too far away from the camera.

And secondly, carrying out inverse depth nonlinear optimization on the characteristic points.

After the depth estimation of the feature points is obtained preliminarily, the depth estimation value is further optimized by adopting nonlinear least squares. By using the inverse depth representation of the feature points, better numerical stability can be obtained and convergence is facilitated. In most cases, indoor scenes can be converged over 2 to 3 iterations. The least squares problem is constructed as follows: defining anchor frame image normalization coordinates u_A＝^AX_f/^Az_f，v_A＝^Ay_f/^Az_fInverse depth ρ_A＝1/^Az_f. The mapping of features is rewritten to the following form:

then, a measurement equation is constructed:

because the feature points can be observed under a plurality of camera poses, the observation equation is as follows:

thus, a least squares equation and jacobi can be constructed:

and then, iterative solution is carried out on the least square problem by adopting a Gauss Newton method or a Levenberg Marquardt method, and finally the optimized depth is obtained.

According to the above description, it can be seen that, in OpenVINS, feature points are located, complex calculation is required, where each feature point needs to be calculated by observation of more than 5 frames, and in a scene with a fast change, such as a large rotation, the number of observation frames of the same feature point may be less than 5, which may cause that the feature points are not usable in triangulation, thereby causing poor performance in locating the scene. Under the condition, the extracted characteristic points do not contribute to pose estimation, and the positioning accuracy and robustness of the system are reduced. Moreover, the initial depth of each feature point needs to be solved by constructing a complex linear equation. The depth of the 3D characteristic point is optimized through a Gauss-Newton method, a least square problem needs to be constructed firstly, then a Jacobian matrix needs to be constructed, increment and iteration are conducted until convergence or the maximum iteration times are reached, the speed is low, the result is easily influenced by noise, and if one observation error is large, the final optimization result can be influenced. That is, the accuracy of locating the feature points in OpenVINS is not high and the efficiency is low, so that the OpenVINS technology has a poor performance in SLAM.

In order to solve the above-mentioned problem that the accuracy of locating the feature points is not high and the efficiency is low, the present application provides a target locating method, so as to improve the accuracy and efficiency of locating the feature points, and thus can further improve the accuracy and efficiency of locating the object itself in the SLAM, and the following describes the scheme provided by the present application in detail.

In the embodiment of the present application, it is also necessary to acquire observation frames of multiple camera views for the same feature point, and only in each observation view, a color image and a depth image corresponding to the feature point are acquired at the same time. The color image is an RGB image, and the camera for collecting the RGB image can be a monocular camera or a binocular camera. The camera for acquiring the depth image may be a Time of Flight (TOF) depth camera or a structured light depth camera.

When the feature point to be positioned is positioned, the target positioning device can acquire a plurality of RGB images and a plurality of depth images obtained by shooting the feature point to be positioned from different positions. The RGB images and the depth images are in one-to-one correspondence, namely the RGB images and the corresponding depth images are obtained by shooting at the same position and at the same time. In order to ensure the positioning accuracy, the number of the sheets can be at least three.

In 120, the depth values of the feature points to be positioned in the depth image corresponding to the color image are extracted, and projection conversion is performed on the depth values based on the selected image in the multiple color images to obtain multiple target depth values.

In the embodiment of the application, after the color image and the corresponding depth image of the feature point to be positioned are obtained, the depth value of the feature point to be positioned can be calculated based on the color image and the depth image of the feature point to be positioned. Specifically, the depth value of the feature point to be located in each depth image may be extracted first, and then the depth value is projected into the selected observation frame (such as the aforementioned anchor frame), so as to obtain the projection depth of each depth value in the anchor frame.

In order to further ensure the accuracy of target positioning and avoid the interference of an abnormal image to a positioning result, the embodiment of the application can further screen the multiple RGB images after the target positioning device acquires the multiple RGB images of the feature point to be positioned, so as to remove some interference images, and specifically, can obtain the remaining multiple target images for removing the reprojection error. And then accurate target positioning is carried out based on the target images. Namely, the multiple observation frames of the same feature point can be subjected to reprojection error screening, and the observed depth value with smaller reprojection error is reserved.

In some embodiments, the error screening of the multiple color images to obtain multiple target images includes:

1. determining a selected image in a plurality of color images;

2. carrying out pose transformation on images except for the selected image in the multiple color images based on the selected image to obtain multiple transformation images;

3. determining a plurality of target conversion images with the error smaller than a preset threshold value from the selected image in the plurality of conversion images;

4. and determining a plurality of target images according to the color images corresponding to the plurality of target conversion images and the selected image.

In the embodiment of the present application, the screening may be performed on a plurality of RGB images, specifically, a selected image may be determined in a plurality of RGB images, where the selected image may be the anchor frame. Anchor frames may also be understood as reference frames. Specifically, the first observation frame of the feature point to be located may be selected as the anchor frame. And then, performing pose transformation on other RGB images based on the anchor frame to obtain a transformed image corresponding to each image. And then further determining that the image with the error between the pose and the anchor frame after the pose transformation is smaller than a preset threshold value is a qualified image, and the image with the error between the pose and the anchor frame after the pose transformation is larger than the preset threshold value is an unqualified image. Further, the qualified image and the anchor frame are determined to form a target image.

In some embodiments, determining a plurality of target conversion images with an error smaller than a preset threshold from the selected image in the plurality of conversion images includes:

3.1, calculating a reprojection error between each converted image and the selected image;

and 3.2, determining the converted image with the reprojection error smaller than a preset threshold value as a target converted image.

In the embodiment of the present application, a specific method for determining an error between a pose-transformed image and an anchor frame may be to calculate a reprojection error between the transformed image and the anchor frame. The reprojection error refers to a difference between a projection of a real three-dimensional space point on an image plane (that is, a pixel point on an image) and a reprojection (which is actually a virtual pixel point obtained by using a calculated value). For various reasons, the calculated value does not completely conform to the actual situation, that is, the difference cannot be exactly 0, and at this time, the sum of the differences needs to be minimized to obtain the optimal camera pose parameter and the coordinate of the three-dimensional space point.

And after the reprojection error between the converted image and the selected image is obtained through calculation, determining the converted image with the reprojection error smaller than a preset threshold value as a target converted image, wherein the RGB image and the anchor frame corresponding to the target converted image form the target image. And then, the depth values of the characteristic points to be positioned in the depth images corresponding to the target images can be extracted to calculate the depth with the positioning characteristic points. The feature points observed in the observation frame are converted into an anchor frame coordinate system according to the pose conversion relation between the observation frame and the anchor frame to obtain the pixel coordinate position of the projection, and the error value (namely the re-projection error) is calculated according to the position of the feature points observed in the anchor frame. Because the pose information cannot be completely accurate, errors exist in the extraction of the feature points, and the projection position and the position of the feature points in the anchor frame cannot be completely overlapped, a re-projection error exists.

In some embodiments, calculating the reprojection error between each converted image and the selected image comprises:

3.1.1, acquiring a first coordinate value of the feature point to be positioned in the color image corresponding to each converted image, and acquiring a first depth value of the feature point to be positioned in the depth image corresponding to each converted image;

3.1.2, acquiring a rotation parameter and a translation parameter corresponding to the pose transformation of each transformation image;

3.1.3, calculating a characterization parameter of each converted image based on the first coordinate value, the first depth value, the rotation parameter and the translation parameter;

and 3.1.4, calculating the difference value between the characteristic parameter of each converted image and the target characteristic parameter of the selected image to obtain the reprojection error between each converted image and the selected image.

In this embodiment of the present application, the calculation of the reprojection error between the converted image and the anchor frame may specifically be to first obtain a first coordinate value of the feature point to be located in the color image corresponding to each converted image and a first depth value of the feature point to be located in the depth image corresponding to each converted image. And the depth value in the depth image corresponding to the conversion image is the depth value obtained by projecting the depth value of the characteristic point to be positioned in the depth image matched with the RGB image corresponding to the conversion image into the anchor frame.

And then, acquiring a rotation parameter and a translation parameter corresponding to pose transformation of each transformed image, further calculating a characterization parameter for characterizing each transformed image based on the first coordinate value, the first depth value, the rotation parameter and the translation parameter, and then calculating a difference value between the characterization parameter of each transformed image and a target characterization parameter corresponding to the anchor frame to obtain a reprojection error between each transformed image and the selected image.

Specifically, determining a reprojection error between each observation and an anchor frame observation includes: firstly, extracting feature points from an observation image, matching the feature points of different frame images, and determining the corresponding relation of the feature points. And acquiring rotation and translation parameters among the multiple frames of images. And projecting the observation points in the observation to the anchor frame based on the corresponding relation of the characteristic points and the rotation and translation relation between the frames to obtain projection coordinates. And calculating the difference between the coordinates of the observation point projected into the anchor frame and the coordinates of the observation point in the anchor frame, and acquiring the reprojection error from the observation point in the observation frame to the observation point in the anchor frame.

In some embodiments, calculating a difference between the characterization parameter of each transformed image and the target characterization parameter of the selected image to obtain a reprojection error between each transformed image and the selected image includes:

3.1.4.1. acquiring a second coordinate value of the feature point to be positioned in the selected image, and acquiring a second depth value of the feature point to be positioned in the depth image corresponding to the selected image;

3.1.4.2, calculating the product of the second coordinate value and the second depth value to obtain a target characterization parameter corresponding to the selected image;

and 3.1.4.3, calculating the difference value between the characteristic parameter of each converted image and the target characteristic parameter to obtain the reprojection error between each converted image and the selected image.

In this embodiment, the target characterization parameter of the anchor frame may also be obtained by calculating a coordinate value of the feature point to be located in the anchor frame and a depth value in the depth image corresponding to the anchor frame. Specifically, the target locating device may first obtain a second coordinate value of the feature point to be located in the anchor frame and a second depth value of the feature point to be located in the depth image corresponding to the anchor frame, and then calculate a product of the second coordinate value and the second depth value, so as to obtain the target characterization parameter corresponding to the anchor frame. And then calculating the difference value between the characteristic parameter of each converted image and the target characteristic parameter according to the difference value to obtain the reprojection error between each converted image and the selected image.

Specifically, assume that there are m frames of the RGB image for shooting the feature point to be located, where m is greater than or equal to 3. Selecting the first observation frame observing the feature point as an anchor frame, and then reading the depth value d of the feature point in the anchor frame_A. Reading the depth value d of the feature point in other observation frames_i. And transforming the feature points to the anchor frame through the conversion relation between the poses, calculating the reprojection error, and discarding the observation in the frame image if the reprojection error e is greater than a certain threshold value. If it is re-thrownThe shadow error e is within the threshold value range, and the projection depth of the characteristic point is determined^Ad_iRecord n frames (n)<m). The reprojection error is calculated as follows:

e＝^Ad_i ^Ab_f-d_A ^Ab_fformula (15)

After a plurality of target images (namely, the RGB images corresponding to the anchor frame and the conversion image having the re-projection error smaller than the preset threshold value) are determined, the depth value of the feature point to be positioned in the depth image corresponding to each target image can be further extracted. Then, projecting and converting the depth values corresponding to the target images except the anchor frame based on the anchor frame to obtain the projection depth, namely the projection depth^Ad_i. It will be appreciated that the depth of projection is here^Ad_iOr extracting the depth value of the feature point to be positioned from the depth image corresponding to each converted image. A plurality of (n-1) projection depths^Ad_iAnd depth value d of characteristic point to be positioned in anchor frame_AA plurality of target depth values are formed.

At 130, an average depth value of the plurality of target depth values is calculated, and coordinate values of the feature point to be located in the selected image are determined.

Wherein after determining the plurality of target depth values, an average depth value d of the plurality of target depth values may be further calculated_avgAnd the depth value is used as the depth value of the characteristic point to be positioned. The following were used:

furthermore, coordinate values of the feature points to be positioned in the anchor frame can be obtained.

At 140, the feature point to be located is located according to the selected image, the average depth value and the coordinate value.

After determining the depth values of the anchor frame and the feature point to be positioned relative to the anchor frame, namely the average depth value and the coordinate value of the feature point to be positioned in the anchor frame, the feature point to be positioned can be accurately positioned according to the anchor frame, the average depth value and the coordinate value of the feature point to be positioned in the anchor frame, and thus accurate 3D space position information of the feature point can be determined. The following were used:

^Ap_f＝d_avg ^Ab_fformula (17)

In some embodiments, the target positioning method provided by the present application further includes:

1. acquiring a target pose of the selected image in a world coordinate system;

2. and performing coordinate transformation based on the target pose, the average depth value and the coordinate value to obtain the positioning information of the feature point to be positioned in the world coordinate system.

In the embodiment of the application, after the 3D position of the feature point to be positioned relative to the anchor frame is determined, the position can be further converted into a world coordinate system, and the positioning information of the feature point to be positioned in the world coordinate system is obtained. Specifically, the pose of the anchor frame in the world coordinate system may be obtained first, and then coordinate transformation is performed based on the target pose, the depth of the feature point to be positioned relative to the anchor frame, and the coordinates of the feature point to be positioned in the anchor frame, so as to obtain the positioning information of the feature point to be positioned in the world coordinate system. The following were used:

wherein,^Gp is 3D information of the characteristic point to be positioned in a world coordinate system,

is the rotation parameter of the anchor frame,^Gp_Ais the translation parameter of the anchor frame.

According to the target positioning method, under the condition that one characteristic point is in an observation frame smaller than 5 frames, the depth of the characteristic point can be directly read through the depth map of the observation frame, namely, the depth information of the characteristic point which is more accurate can be obtained through a limited observation value, so that the utilization rate of the characteristic point is improved, the robustness in scenes such as large rotation is enhanced, and the positioning accuracy of the system is improved. And the depth image acquired by the depth sensor can directly read the depth value of the 3D feature point in the observation frame under the condition that the depth image at the feature point is effective. The depth under the anchor frame is converted from the pose transformation, and the speed is higher than that of a solving mode of constructing a linear equation set by using multiple frames. Through the verification of the reprojection error, the frames with large depth errors can be removed, and the accuracy of the final depth solution of the 3D feature points is ensured. In addition, according to the depth information of multiple frames, statistical averaging is carried out, the depth of the 3D feature point is finally obtained, measurement noise can be filtered, and the precision of the depth value of the observation point is guaranteed. The speed and the precision are ensured. The construction and solving processes of the complex least square problem are omitted.

OpenVINS is sensitive to IMU noise, and the pose is easy to drift under the condition of large IMU noise. Under the condition of large rotation, the observation frames of the feature points are limited, and it is difficult to ensure that enough 3D feature points can be initialized by a triangulation method. After the depth information is added, the 3D feature points have extra constraint, enough feature points can be initialized under the condition that the number of observation frames of the feature points such as large rotation is insufficient, drift caused by insufficient feature points is reduced, and the positioning precision and stability are improved. By introducing the depth map, the depth is prevented from being solved by constructing a linear equation, the least square problem is constructed, the initialization of the 3D characteristic points is optimized, and the speed is improved. Meanwhile, additional reprojection error constraint and an optimization mode of statistical averaging are introduced, and the accuracy of depth solving is improved.

In some embodiments, after coordinate transformation is performed based on the target pose, the average depth value, and the coordinate value to obtain the positioning information of the feature point to be positioned in the world coordinate system, the method further includes:

and carrying out space positioning on the object to be positioned according to the positioning information of the plurality of characteristic points to be positioned in the world coordinate system to obtain the position information of the object to be positioned.

The SLAM is used for positioning and mapping the object, and the positioning and mapping of the object are realized based on accurate feature point 3D information. The target positioning method provided by the application can be used for quickly and accurately positioning the characteristic points, namely, the accurate 3D information of the characteristic points can be quickly obtained. Furthermore, each feature point in the positioning space can be traversed, so that accurate 3D information of each feature point in the positioning space can be determined, and further, the object in the positioning space can be accurately positioned and mapped based on the accurate 3D information in the positioning space. For example, when an autonomous vehicle needs to be located, only the 3D information of a plurality of feature points associated with the autonomous vehicle needs to be obtained, and then the autonomous vehicle can be accurately located based on the relative position relationship between the autonomous vehicle and the plurality of feature points and the 3D information of the feature points.

The target positioning method provided by the application can greatly improve the accuracy of positioning the object in the SLAM task.

In addition, as shown in table 1 and fig. 3, schematic tables and diagrams comparing the method for acquiring data of a robot dog based on the target location method provided by the present application with the conventional OpenVINS are shown.

Algorithm	Average time consumption statistics (ms)	Obvious drift ratio of track
			OpenVINS	13.031	4/5
The method of the present application	12.188	1/5

TABLE 1 comparison table of positioning effect

As shown in fig. 3, when the robot dog walks around the conference table for a circle to return to its position, the track drift occurring when the robot dog is positioned by the method provided by the present application is significantly reduced. Moreover, as shown in table 1, the positioning time is obviously less by adopting the method provided by the application, and the positioning speed is effectively improved.

According to the above description, the target positioning method provided by the application obtains a plurality of color images of the feature points to be positioned and depth images corresponding to the color images; extracting the depth value of the feature point to be positioned in the depth image corresponding to each color image, and performing projection transformation on the depth value based on the selected image in the multiple color images to obtain multiple target depth values; calculating the average depth value of the depth values of the multiple targets, and determining the coordinate value of the feature point to be positioned in the selected image; and positioning the characteristic points to be positioned according to the selected image, the average depth value and the coordinate value. Therefore, according to the depth value of the depth image, the pose constraint and the feature point mapping relation among the color image features, the depth value of the undetermined feature point in the observation frame can be acquired more quickly and accurately. Compared with a solution mode of constructing a linear equation set by adopting multiple frames, more accurate depth information can be obtained, so that the target positioning precision can be improved.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a target positioning device 200 according to an embodiment of the present disclosure. The target positioning device 200 is applied to the electronic equipment provided by the application. As shown in fig. 4, the object locating device 200 may include:

the obtaining module 210 is configured to obtain a plurality of color images of the feature point to be located and depth images corresponding to the color images;

the extraction module 220 is configured to extract depth values of the feature points to be located in the depth image corresponding to each color image, and perform projection transformation on the depth values based on a selected image in the multiple target images to obtain multiple target depth values;

a calculating module 230, configured to calculate an average depth value of multiple target depth values, and determine a coordinate value of the feature point to be located in the selected image;

and the first positioning module 240 is configured to position the feature point to be positioned according to the selected image, the average depth value, and the coordinate value.

Optionally, in some embodiments, the extraction module includes:

the screening submodule is used for screening the color images to obtain a plurality of target images;

and the extraction submodule is used for extracting the depth values of the characteristic points to be positioned in the depth images corresponding to the multiple target images, and performing projection conversion on the depth values based on the images selected from the multiple target images to obtain multiple target depth values.

Optionally, in some embodiments, the screening submodule comprises:

the first determining unit is used for determining the selected image in the color images;

the transformation unit is used for carrying out pose transformation on the images except the selected image in the multiple color images based on the selected image to obtain multiple transformation images;

the second determining unit is used for determining a plurality of target conversion images with the image error smaller than a preset threshold value from the plurality of conversion images;

and the third determining unit is used for determining a plurality of target images according to the color images corresponding to the plurality of target conversion images and the selected image.

Optionally, in some embodiments, the second determining subunit includes:

the calculating subunit is used for calculating a reprojection error between each converted image and the selected image;

and the determining subunit is used for determining the converted image with the reprojection error smaller than the preset threshold value as the target converted image.

Optionally, in some embodiments, the computing subunit is further configured to:

acquiring a first coordinate value of the feature point to be positioned in the color image corresponding to each converted image, and acquiring a first depth value of the feature point to be positioned in the depth image corresponding to each converted image;

acquiring a rotation parameter and a translation parameter corresponding to pose transformation of each transformation image;

calculating a characterization parameter of each converted image based on the first coordinate value, the first depth value, the rotation parameter and the translation parameter;

and calculating the difference value between the characterization parameter of each converted image and the target characterization parameter of the selected image to obtain the reprojection error between each converted image and the selected image.

acquiring a second coordinate value of the feature point to be positioned in the selected image, and acquiring a second depth value of the feature point to be positioned in the depth image corresponding to the selected image;

calculating the product of the second coordinate value and the second depth value to obtain a target characterization parameter corresponding to the selected image;

and calculating the difference between the characterization parameter of each converted image and the target characterization parameter to obtain the reprojection error between each converted image and the selected image.

Optionally, in some embodiments, the target positioning device provided by the present application further includes:

the acquisition sub-module is used for acquiring the target pose of the selected image in a world coordinate system;

and the transformation submodule is used for carrying out coordinate transformation based on the target pose, the average depth value and the coordinate value to obtain the positioning information of the characteristic point to be positioned in the world coordinate system.

In some embodiments, the object localization apparatus provided herein further comprises:

and the positioning submodule is used for carrying out space positioning on the object to be positioned according to the positioning information of the plurality of characteristic points to be positioned in the world coordinate system to obtain the position information of the object to be positioned.

It should be noted that the target positioning apparatus 200 provided in the embodiment of the present application and the target positioning method shown in fig. 1 in the foregoing embodiment belong to the same concept, and the specific implementation process thereof is described in the foregoing related embodiments, and is not described herein again.

According to the above description, the target positioning device provided in the present application obtains, by the obtaining module 210, a plurality of color images of the feature point to be positioned and depth images corresponding to the color images; the extraction module 220 extracts depth values of the feature points to be positioned in the depth image corresponding to each color image, and performs projection conversion on the depth values based on the selected image in the multiple color images to obtain multiple target depth values; the calculating module 230 calculates an average depth value of the plurality of target depth values, and determines a coordinate value of the feature point to be positioned in the selected image; the positioning module 240 positions the feature points to be positioned according to the selected image, the average depth value and the coordinate values. Therefore, according to the depth value of the depth image, the pose constraint and the feature point mapping relation among the color image features, the depth value of the undetermined feature point in the observation frame can be acquired more quickly and accurately. Compared with a solution mode of constructing a linear equation set by adopting multiple frames, more accurate depth information can be obtained, so that the target positioning precision can be improved.

Embodiments of the present application further provide a storage medium, on which a computer program is stored, and when the computer program stored in the storage medium is executed on a processor of an electronic device provided in an embodiment of the present application, the processor of the electronic device is caused to perform any of the steps in the above target location method suitable for the electronic device. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

The present application further provides a terminal, please refer to fig. 5, in which the terminal 300 includes a processor 310 and a memory 320.

The processor 310 in the present embodiment may be a general purpose processor, such as an ARM architecture processor. The memory 320 stores a computer program, which may be a high speed random access memory, and may also be a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device. Accordingly, the memory 320 may also include a memory controller to provide the processor 301 access to the memory 320. The processor 310 is adapted to perform any of the above object localization methods, such as:

acquiring a plurality of color images of the characteristic point to be positioned and depth images corresponding to the color images; extracting depth values of the feature points to be positioned in the depth image corresponding to each color image, and performing projection transformation on the depth values based on the image selected from the multiple color images to obtain multiple target depth values; calculating the average depth value of the depth values of the multiple targets, and determining the coordinate value of the feature point to be positioned in the selected image; and positioning the characteristic points to be positioned according to the selected image, the average depth value and the coordinate values.

The above detailed description is given to a target positioning method, apparatus, storage medium and terminal provided by the present application, and specific examples are applied herein to explain the principle and implementation of the present application, and the description of the above embodiments is only used to help understanding the method and core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of locating an object, the method comprising:

acquiring a plurality of color images of the characteristic points to be positioned and depth images corresponding to the color images;

extracting depth values of the feature points to be positioned in the depth image corresponding to the color image, and performing projection conversion on the depth values based on the image selected from the multiple color images to obtain multiple target depth values;

calculating an average depth value of the multiple target depth values, and determining a coordinate value of the feature point to be positioned in the selected image;

2. The method according to claim 1, wherein the extracting depth values of the feature points to be located in the depth image corresponding to the color image, and performing projection transformation on the depth values based on the image selected from the plurality of color images to obtain a plurality of target depth values comprises:

screening the multiple color images to obtain multiple target images;

and extracting depth values of the feature points to be positioned in the depth images corresponding to the multiple target images, and performing projection conversion on the depth values based on the images selected from the multiple target images to obtain multiple target depth values.

3. The method of claim 2, wherein the screening the plurality of color images to obtain a plurality of target images comprises:

determining a selected image in the plurality of color images;

carrying out pose transformation on images except the selected image in the multiple color images based on the selected image to obtain multiple transformation images;

determining a plurality of target conversion images with errors smaller than a preset threshold value from the selected image in the plurality of conversion images;

and determining a plurality of target images according to the color images corresponding to the plurality of target conversion images and the selected image.

4. The method according to claim 3, wherein the determining a plurality of target conversion images among the plurality of conversion images with an error from the selected image smaller than a preset threshold comprises:

calculating a reprojection error between each transformed image and the selected image;

and determining the converted image with the reprojection error smaller than a preset threshold value as a target converted image.

5. The method of claim 4, wherein calculating a reprojection error between each converted image and the selected image comprises:

calculating a characterization parameter of each transformed image based on the first coordinate value, the first depth value, the rotation parameter, and the translation parameter;

and calculating the difference between the characterization parameter of each converted image and the target characterization parameter of the selected image to obtain the reprojection error between each converted image and the selected image.

6. The method of claim 5, wherein calculating a difference between the characterization parameter of each transformed image and the target characterization parameter of the selected image to obtain a reprojection error between each transformed image and the selected image comprises:

7. The method of claim 1, further comprising:

acquiring a target pose of the selected image in a world coordinate system;

and performing coordinate transformation based on the target pose, the average depth value and the coordinate value to obtain the positioning information of the feature point to be positioned in the world coordinate system.

8. The method according to claim 7, wherein after the coordinate transformation is performed based on the target pose, the average depth value and the coordinate value to obtain the positioning information of the feature point to be positioned in the world coordinate system, the method further comprises:

9. An object localization arrangement, the arrangement comprising:

the acquisition module is used for acquiring a plurality of color images of the characteristic points to be positioned and depth images corresponding to the color images;

the extraction module is used for extracting the depth values of the feature points to be positioned in the depth images corresponding to the color images, and performing projection conversion on the depth values based on the selected images of the multiple color images to obtain multiple target depth values;

the calculation module is used for calculating the average depth value of the target depth values and determining the coordinate value of the feature point to be positioned in the selected image;

10. A storage medium having stored thereon a computer program for performing the steps of the object localization method according to any of claims 1-8, when being loaded by a processor of an electronic device.

11. A terminal comprising a processor and a memory, said memory storing a computer program, characterized in that said processor performs the steps in the object localization method according to any of claims 1-8 by loading said computer program.