CN112802112B

CN112802112B - Visual positioning method, device, server and storage medium

Info

Publication number: CN112802112B
Application number: CN202110387773.XA
Authority: CN
Inventors: 聂琼; 申浩; 夏华夏
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2021-07-16
Anticipated expiration: 2041-04-12
Also published as: CN112802112A

Abstract

The application discloses a visual positioning method, a visual positioning device, a server and a storage medium, and belongs to the technical field of automatic driving. According to the visual positioning method provided by the embodiment of the application, when the target object is visually positioned, the characteristic information of the moving object is determined according to the characteristic information of the multi-frame images collected by the camera equipment, and then the characteristic information of the moving object is filtered from the characteristic information of the multi-frame images to obtain the characteristic information of the static object, so that the target object can be visually positioned according to the characteristic information of the static object, the influence of the moving object is eliminated, and the accuracy of visual positioning is improved.

Description

Visual positioning method, device, server and storage medium

Technical Field

The present application relates to the field of automatic driving technologies, and in particular, to a visual positioning method, an apparatus, a server, and a storage medium.

Background

Currently, in the field of automatic driving, the pose of the unmanned vehicle is estimated mainly by a visual odometer method, and the unmanned vehicle is navigated according to the pose. The visual odometry method mainly determines the pose of the unmanned vehicle through visual information between continuous frames.

In the related technology, a camera device is installed on an unmanned vehicle, images of the surrounding environment are shot in real time through the camera device in the driving process of the unmanned vehicle, feature points are extracted from two adjacent images, the extracted feature points are matched to obtain feature point pairs, and the pose of the unmanned vehicle is determined according to the matched feature point pairs.

Disclosure of Invention

The embodiment of the application provides a visual positioning method, a visual positioning device, a server and a storage medium, which can improve the accuracy of visual positioning. The technical scheme is as follows.

According to an aspect of embodiments of the present application, there is provided a visual positioning method, including:

determining first characteristic information corresponding to a plurality of camera devices, wherein the plurality of camera devices are all installed on a target object to be positioned, and the first characteristic information corresponding to any camera device is the characteristic information of a plurality of frames of images collected by any camera device;

for each image pickup device, determining second feature information of the image pickup device based on first feature information corresponding to the image pickup device, wherein the second feature information comprises feature information corresponding to a moving object;

filtering the second characteristic information from the first characteristic information to obtain third characteristic information corresponding to the camera equipment, wherein the third characteristic information is characteristic information corresponding to a static object;

determining first pose information of the target object based on third feature information corresponding to the plurality of image pickup devices, wherein the first pose information is relative pose information of the target object between the current frame and the historical frame.

In one possible implementation manner, the determining second feature information of the image capturing apparatus based on the first feature information corresponding to the image capturing apparatus includes:

determining second pose information of the camera equipment based on first feature information corresponding to the camera equipment, wherein the second pose information is relative pose information of the camera equipment between two adjacent frames of images;

second feature information of the image pickup apparatus is determined based on the second position and orientation information.

In another possible implementation manner, the second feature information further includes at least one of position information of the misdetected feature point pair and position information of the mismatched feature point pair.

In another possible implementation manner, the determining second feature information of the image capturing apparatus based on the second position and orientation information includes:

determining first reprojection error information of each matched characteristic point pair in two adjacent frames of images of the camera device and second reprojection error information of each matched characteristic point pair in two adjacent frames of images of any other camera device based on the second attitude information;

determining the second feature information based on the first and second reprojection error information.

In another possible implementation manner, the first feature information includes position information of a matched feature point pair in the two adjacent frames of images;

the determining, based on the second pose information, first reprojection error information of each matched feature point pair in two adjacent frames of images by the image capturing apparatus and second reprojection error information of each matched feature point pair in two adjacent frames of images by any other image capturing apparatus includes:

projecting a first feature point in a first feature point pair based on the second pose information and position information of the first feature point pair corresponding to the image pickup device to obtain first re-projection error information, wherein the first feature point pair is any one feature point pair in the feature point pair matched between two adjacent frames of images corresponding to the image pickup device, and the first feature point is any one feature point in the first feature point pair;

and projecting second feature points in the second feature point pairs based on the second position and orientation information and position information of the second feature point pairs corresponding to any other image pickup equipment to obtain second re-projection error information, wherein the second feature point pairs are any feature point pairs matched between two adjacent frames of images corresponding to any other image pickup equipment, and the second feature points are any feature points in the second feature point pairs.

In another possible implementation manner, the determining first pose information of the target object based on third feature information corresponding to the plurality of image capturing apparatuses includes:

determining stability coefficients of the plurality of image pickup apparatuses based on third feature information corresponding to the plurality of image pickup apparatuses;

determining a target image pickup apparatus from the plurality of image pickup apparatuses based on stability coefficients of the plurality of image pickup apparatuses, the target image pickup apparatus being an image pickup apparatus having a highest stability coefficient;

and determining first position and orientation information of the target object based on third characteristic information corresponding to the target camera equipment.

In another possible implementation manner, the third feature information includes position information of a feature point pair in a static state in two adjacent frame images;

the determining the stability coefficients of the plurality of image capturing apparatuses based on the third feature information corresponding to the plurality of image capturing apparatuses includes:

determining a first number of the characteristic point pairs in the static state corresponding to each image pickup device based on the position information of the characteristic point pairs in the static state corresponding to each image pickup device, wherein the size of the first number is used for representing a stability coefficient of the image pickup device.

determining third reprojection error information of each characteristic point pair in the static state corresponding to each image pickup apparatus based on the position information of the characteristic point pair in the static state corresponding to each image pickup apparatus;

and summing the reprojection errors corresponding to the third reprojection error information of each feature point pair in the static state to obtain a sum value, wherein the sum value is used for representing a stability coefficient of the image pickup device.

In another possible implementation manner, the determining the first feature information corresponding to the plurality of image capturing apparatuses includes:

for each frame of image collected by each camera device, carrying out feature point detection on the image to obtain a plurality of feature points;

matching a plurality of characteristic points of two adjacent frames of images corresponding to each camera device to obtain matched characteristic point pairs;

and using the position information of the matched characteristic point pair as the first characteristic information corresponding to each image pickup device.

According to an aspect of embodiments of the present application, there is provided a visual positioning apparatus, the apparatus including:

the first determining module is used for determining first characteristic information corresponding to a plurality of camera devices, wherein the plurality of camera devices are all installed on a target object to be positioned, and the first characteristic information corresponding to any one camera device is the characteristic information of a plurality of frames of images collected by any one camera device;

the second determining module is used for determining second characteristic information of the image pickup equipment based on first characteristic information corresponding to the image pickup equipment for each image pickup equipment, wherein the second characteristic information comprises characteristic information corresponding to a moving object;

the filtering module is used for filtering the second characteristic information from the first characteristic information to obtain third characteristic information corresponding to the camera equipment, wherein the third characteristic information is characteristic information corresponding to a static object;

and a third determining module, configured to determine, based on third feature information corresponding to the multiple image capturing apparatuses, first pose information of the target object, where the first pose information is relative pose information of the target object between the current frame and a history frame.

In a possible implementation manner, the second determining module is configured to determine second pose information of the image capturing apparatus based on first feature information corresponding to the image capturing apparatus, where the second pose information is relative pose information of the image capturing apparatus between two adjacent frames of images; second feature information of the image pickup apparatus is determined based on the second position and orientation information.

In another possible implementation manner, the second determining module is configured to determine, based on the second pose information, first reprojection error information of each matched feature point pair in two adjacent frames of images of the image capturing apparatus, and second reprojection error information of each matched feature point pair in two adjacent frames of images of any other image capturing apparatus; determining the second feature information based on the first and second reprojection error information.

the second determining module is configured to project a first feature point in a first feature point pair to obtain first re-projection error information, based on the second position and orientation information and position information of the first feature point pair corresponding to the image capturing apparatus, where the first feature point pair is any one of feature point pairs matched between two adjacent frames of images corresponding to the image capturing apparatus, and the first feature point is any one of feature points in the first feature point pair; and projecting second feature points in the second feature point pairs based on the second position and orientation information and position information of the second feature point pairs corresponding to any other image pickup equipment to obtain second re-projection error information, wherein the second feature point pairs are any feature point pairs matched between two adjacent frames of images corresponding to any other image pickup equipment, and the second feature points are any feature points in the second feature point pairs.

In another possible implementation manner, the third determining module is configured to determine the stability coefficients of the plurality of image capturing apparatuses based on third feature information corresponding to the plurality of image capturing apparatuses; determining a target image pickup apparatus from the plurality of image pickup apparatuses based on stability coefficients of the plurality of image pickup apparatuses, the target image pickup apparatus being an image pickup apparatus having a highest stability coefficient; and determining first position and orientation information of the target object based on third characteristic information corresponding to the target camera equipment.

the third determining module is configured to determine, based on the position information of the feature point pair in the stationary state corresponding to each image capturing apparatus, a first number of the feature point pairs in the stationary state corresponding to each image capturing apparatus, where a size of the first number is used to represent a stability factor of the image capturing apparatus.

the third determining module is configured to determine, based on the position information of the feature point pair in the stationary state corresponding to each image capturing apparatus, third reprojection error information of each feature point pair in the stationary state corresponding to each image capturing apparatus; and summing the reprojection errors corresponding to the third reprojection error information of each feature point pair in the static state to obtain a sum value, wherein the sum value is used for representing a stability coefficient of the image pickup device.

In another possible implementation manner, the first determining module is configured to perform feature point detection on each frame of image acquired by each camera device to obtain a plurality of feature points; matching a plurality of characteristic points of two adjacent frames of images corresponding to each camera device to obtain matched characteristic point pairs; and using the position information of the matched characteristic point pair as the first characteristic information corresponding to each image pickup device.

According to an aspect of embodiments of the present application, there is provided a server including one or more processors and one or more memories, in which at least one program code is stored, the at least one program code being loaded by the one or more processors and executed to implement the operations performed by the visual positioning method according to any one of the above possible implementations.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded into and executed by a processor to implement the operations performed by the visual positioning method according to any one of the above-mentioned possible implementations.

According to an aspect of embodiments of the present application, there is provided a computer program or a computer program product comprising: computer program code which, when executed by a server, causes the server to carry out operations performed by the visual positioning method of any one of the possible implementations as described above.

According to the visual positioning method provided by the embodiment of the application, when the target object is visually positioned, the characteristic information of the moving object is determined according to the characteristic information of the multi-frame images collected by the camera equipment, and then the characteristic information of the moving object is filtered from the characteristic information of the multi-frame images to obtain the characteristic information of the static object, so that the target object can be visually positioned according to the characteristic information of the static object, the influence of the moving object is eliminated, and the accuracy of visual positioning is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a visual positioning method provided in an embodiment of the present application;

FIG. 2 is a flow chart of a visual positioning method provided by an embodiment of the present application;

FIG. 3 is a flow chart of a visual positioning method provided by an embodiment of the present application;

fig. 4 is a schematic diagram of an image capturing apparatus for determining a target provided in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a visual positioning apparatus according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a server according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It will be understood that the terms "first," "second," and the like as used herein may be used herein to describe various concepts, which are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first apparatus may be termed a second apparatus, and, similarly, a second apparatus may be termed a first apparatus, without departing from the scope of the present application.

Fig. 1 is a diagram of an implementation environment of a visual positioning method provided in an embodiment of the present application, and referring to fig. 1, the implementation environment includes: a plurality of image pickup apparatuses 101 and a server 102, each of the image pickup apparatuses 101 and the server 102 being connectable via a wireless or wired network.

The plurality of image pickup apparatuses 101 are each mounted on a target object to be positioned, and there is no common view relationship among the plurality of image pickup apparatuses 101. For example, the target object may be an unmanned vehicle, the number of the image pickup apparatuses 101 is 3, and the 3 image pickup apparatuses 101 may be installed in front of, to the left of, and to the right of the unmanned vehicle. The server 102 is a server 102 provided in the target object. For any image pickup apparatus 101, after the image pickup apparatus 101 captures an image, the captured image can be sent to the server 102, and the server 102 performs visual positioning on the target object according to the image, and determines the relative pose information of the target object between the current frame and the historical frame. The relative pose information includes relative position information and relative pose information.

In the embodiment of the application, the visual positioning method can be applied to various scenes. For example, the visual positioning method can be applied to unmanned driving scenes, scenes in which a robot grabs or carries an object, robot cleaning scenes, unmanned aerial vehicle flying scenes and the like.

For example, when the visual positioning method is applied to an unmanned scene, the target object may be an unmanned vehicle. The unmanned vehicle is provided with a plurality of camera devices 101 at different positions, the plurality of camera devices 101 capture images and send the images to the server 102 during the driving process of the unmanned vehicle, and the server 102 visually locates the unmanned vehicle according to the images. The server 102 may determine the movement trajectory of the unmanned vehicle according to the relative pose information obtained by the visual positioning, thereby navigating the unmanned vehicle. As another example, when the visual positioning method is applied to a scene in which a robot grabs or carries an object, the target object may be a robot. The robot is provided with a plurality of camera devices 101 at different positions, when the robot grabs an object, the camera devices 101 shoot images and send the images to the server 102, and the server 102 performs visual positioning on the robot according to the images. The server 102 may control the robot according to the relative pose information obtained by the visual positioning, so that the robot can accurately grab or transport the object, thereby improving the accuracy of grabbing or transporting the object.

The image pickup apparatus 101 may be a monocular depth camera that can not only photograph a color image but also measure a distance from an object to the camera, that is, a depth, so that a change in the surrounding environment can be sensed more conveniently and accurately. The server 102 may be at least one of a server, a server cluster composed of a plurality of servers, a cloud server, a cloud computing platform, and a virtualization center.

Fig. 2 is a flowchart of a visual positioning method provided in an embodiment of the present application, and referring to fig. 2, the method includes the following steps.

Step 201: determining first characteristic information corresponding to a plurality of camera devices, wherein the plurality of camera devices are all installed on a target object to be positioned, and the first characteristic information corresponding to any camera device is the characteristic information of a plurality of frames of images collected by any camera device.

Step 202: for each image pickup apparatus, second feature information of the image pickup apparatus is determined based on first feature information corresponding to the image pickup apparatus, and the second feature information includes feature information corresponding to a moving object.

Step 203: and filtering the second characteristic information from the first characteristic information to obtain third characteristic information corresponding to the camera equipment, wherein the third characteristic information is characteristic information corresponding to a static object.

Step 204: and determining first pose information of the target object based on the third feature information corresponding to the plurality of camera devices, wherein the first pose information is relative pose information of the target object between the current frame and the historical frame.

In one possible implementation manner, determining second feature information of the image capturing apparatus based on the first feature information corresponding to the image capturing apparatus includes:

determining second position and posture information of the camera equipment based on the first characteristic information corresponding to the camera equipment, wherein the second position and posture information is relative position and posture information of the camera equipment between two adjacent frames of images;

second feature information of the image pickup apparatus is determined based on the second posture information.

In another possible implementation manner, the second characteristic information further includes at least one of position information of the misdetected characteristic point pair and position information of the mismatched characteristic point pair.

In another possible implementation manner, determining second feature information of the image capturing apparatus based on the second position and orientation information includes:

determining first reprojection error information of each matched characteristic point pair in the two adjacent frames of images of the image pickup device and second reprojection error information of each matched characteristic point pair in the two adjacent frames of images of any other image pickup device based on the second attitude information;

second feature information is determined based on the first reprojection error information and the second reprojection error information.

In another possible implementation manner, the first feature information includes position information of matched feature point pairs in two adjacent frames of images;

determining first reprojection error information of each matched characteristic point pair in the two adjacent frame images of the image pickup device and second reprojection error information of each matched characteristic point pair in the two adjacent frame images of any other image pickup device based on the second pose information, comprising:

projecting a first characteristic point in a first characteristic point pair based on the second position and posture information and position information of the first characteristic point pair corresponding to the camera equipment to obtain first re-projection error information, wherein the first characteristic point pair is any one characteristic point pair in the characteristic point pair matched between two adjacent frames of images corresponding to the camera equipment, and the first characteristic point is any one characteristic point in the first characteristic point pair;

and projecting the second characteristic points in the second characteristic point pair based on the second position and orientation information and the position information of the second characteristic point pair corresponding to any other image pickup equipment to obtain second re-projection error information, wherein the second characteristic point pair is any one of characteristic point pairs matched between two adjacent frames of images corresponding to any other image pickup equipment, and the second characteristic point is any one of characteristic points in the second characteristic point pair.

In another possible implementation manner, determining the first pose information of the target object based on the third feature information corresponding to the plurality of image capturing apparatuses includes:

determining a target image pickup apparatus from the plurality of image pickup apparatuses based on the stability coefficients of the plurality of image pickup apparatuses, the target image pickup apparatus being an image pickup apparatus having the highest stability coefficient;

and determining first position and orientation information of the target object based on the third characteristic information corresponding to the target camera equipment.

determining stability coefficients of the plurality of image capturing apparatuses based on third feature information corresponding to the plurality of image capturing apparatuses, including:

and determining a first number of the characteristic point pairs in the static state corresponding to each image pickup device based on the position information of the characteristic point pairs in the static state corresponding to each image pickup device, wherein the size of the first number is used for representing a stability coefficient of the image pickup device.

determining third reprojection error information of each characteristic point pair in the static state corresponding to each image pickup device based on the position information of the characteristic point pair in the static state corresponding to each image pickup device;

and summing the reprojection errors corresponding to the third reprojection error information of each feature point pair in the static state to obtain a sum value, wherein the sum value is used for representing a stability coefficient of the image pickup equipment.

In another possible implementation manner, determining first feature information corresponding to a plurality of image capturing apparatuses includes:

and taking the position information of the matched characteristic point pair as the first characteristic information corresponding to each image pickup device.

Fig. 3 is a flowchart of a visual positioning method provided by an embodiment of the present application, where the method is executed by a server, and referring to fig. 3, the method includes the following steps.

Step 301: the server determines first characteristic information corresponding to the plurality of image pickup apparatuses.

In this step, the plurality of camera devices are used for visually positioning a target object to be positioned, and the first feature information corresponding to any one of the camera devices is feature information of a plurality of frames of images acquired by the any one of the camera devices. The target object to be positioned may be an unmanned vehicle, a robot, an unmanned aerial vehicle, or the like, which is not specifically limited in this embodiment of the application. The plurality of image pickup apparatuses may be installed at different positions of the target object without a common view relationship therebetween. In addition, the multiple image pickup devices are depth cameras, that is, the cameras can not only collect color images, but also measure the distance from an object to the cameras, that is, the depth.

In this step, the server can be realized by the following steps (1) to (3).

(1) And for each frame of image acquired by each camera device, the server detects the characteristic points of the image to obtain a plurality of characteristic points.

In this step, the server may extract the feature points by detecting a difference in gray value between one pixel and pixels of surrounding neighborhoods. And when the gray value between the pixel and the pixels of the surrounding neighborhood exceeds a preset gray threshold, indicating that the pixel is too bright or too dark, extracting the pixel as a characteristic point. Alternatively, the server may extract the Feature points by a Scale Invariant Feature Transform (SIFT) algorithm or a Speeded Up Robust Features (SURF) algorithm. In the embodiment of the present application, the manner of extracting the feature points is not particularly limited.

In a possible implementation manner, the number of the plurality of image capturing apparatuses may be set and changed as needed, and is not particularly limited in the embodiment of the present application. For example, the number of the camera devices is 3, and for the 3 camera devices, the server respectively performs feature point detection on the image acquired by each camera device by using three threads, so as to obtain feature points on each frame of image of each camera device.

(2) And the server matches a plurality of characteristic points of two adjacent frames of images corresponding to each camera device to obtain matched characteristic point pairs.

In a possible implementation manner, for two adjacent frames of images acquired by each image capturing device, the server may determine a distance between each feature point in the first frame of image and each feature point in the second frame of image, and use two feature points whose distance is smaller than a preset distance threshold as a matched feature point pair, thereby obtaining a plurality of matched feature point pairs. The first frame image and the second frame image are two adjacent frame images, and the first frame image may be a previous frame image of the second frame image or a subsequent frame image of the second frame image. The distance between the feature points may be a hamming distance or a euclidean distance. In the embodiments of the present application, this is not particularly limited.

In another possible implementation manner, the server may also perform optical flow tracking on the feature points by using an LK (Lucas-Kanade) algorithm, so as to determine matching feature point pairs.

(3) And the server takes the position information of the matched characteristic point pairs as first characteristic information corresponding to each camera device.

In this step, the position information of the characteristic point pair may be coordinate information of the characteristic point pair.

Step 302: for each image pickup device, the server determines second position and orientation information of the image pickup device based on the first characteristic information corresponding to the image pickup device.

The second pose information is relative pose information of the camera shooting equipment between two adjacent frames of images. In this step, the server may determine the second position and orientation information of the image capturing apparatus according to the position information of the feature point pair, which may be implemented by the following steps (4) to (5).

(4) The server extracts a second number of characteristic point pairs from the matched characteristic point pairs in the two adjacent frames of images corresponding to the camera device.

The server may randomly extract a second number of pairs of feature points from the plurality of pairs of feature points. For example, the number of matched pairs of feature points is 100, and the server may randomly extract 10 pairs of feature points from the 100 pairs of feature points.

(5) The server determines second position information based on the position information of the second number of pairs of feature points.

The second attitude information may be represented by a first rotation matrix and a first translation matrix. The first rotation matrix is a matrix of three rows and three columns, and the first translation matrix is a matrix of three rows and one column. The server may fit the first rotation matrix and the first translation matrix based on the position information of the second number of pairs of feature points so that the first rotation matrix and the first translation matrix may be applied to the position information of the second number of pairs of feature points, and then compose the first rotation matrix and the first translation matrix into the second posture information.

In one possible implementation manner, the server may further verify the first rotation matrix and the first translation matrix based on position information of a third number of pairs of feature points of the plurality of pairs of feature points, excluding the second number of pairs of feature points. And if the verification is passed, the second attitude information is represented by the first rotation matrix and the first translation matrix. If the verification fails, the characteristic point pairs are extracted again to determine the first rotation matrix and the first translation matrix.

In this implementation, the server may determine whether the verification passes based on the number of pairs of feature points of the third number of pairs of feature points applicable to the first rotation matrix and the first translation matrix. And if the number exceeds a preset number threshold value, the first rotation matrix and the first translation matrix are suitable for the position information of most characteristic point pairs, and the verification is determined to be passed. And if the number is smaller than a preset number threshold value, the first rotation matrix and the first translation matrix are not suitable for the position information of most characteristic point pairs, and the verification is determined not to be passed.

Step 303: the server determines second characteristic information of the image pickup apparatus based on the second position information.

In this step, the second feature information may include feature information of the moving object, the feature information of the moving object may be position information of a feature point pair in a moving state, and the second feature information may further include at least one of position information of a feature point pair detected by mistake and position information of a feature point pair matched by mistake. The characteristic point pairs which are detected by mistake are that a certain pixel point in the image is taken as a characteristic point by mistake and then matched with other characteristic points; a mismatched feature point pair is a pair of feature points that are mismatched by mistake.

The server may determine second feature information by reprojection error based on the second pose information of the image pickup apparatus. This process can be realized by the following steps (6) to (7).

(6) The server determines first reprojection error information of each matched characteristic point pair in the two adjacent frames of images by the camera device and second reprojection error information of each matched characteristic point pair in the two adjacent frames of images by any other camera device based on the second position and orientation information.

In this step, the server may determine the first and second re-projection error information by the following steps (6-1) and (6-2), respectively.

And (6-1) the server projects the first characteristic point in the first characteristic point pair based on the second position and posture information and the position information of the first characteristic point pair corresponding to the camera equipment to obtain first re-projection error information.

The first characteristic point pair is any one of characteristic point pairs matched between two adjacent frames of images corresponding to the image pickup device, and the first characteristic point is any one of the characteristic points in the first characteristic point pair.

In this step, the server may project the first feature point according to the pose corresponding to the first pose information to obtain a third feature point. And the server determines the distance between the third characteristic point and the fourth characteristic point, and takes the distance as a first re-projection error, so as to obtain first re-projection error information. And the fourth feature point is a feature point matched with the first feature point in the first feature point pair.

And (6-2) the server projects the second characteristic points in the second characteristic point pair based on the second position and posture information of the second characteristic point pair corresponding to any other camera equipment to obtain second re-projection error information.

The first reprojection error information and the second reprojection error information are both used for representing the state of the matched feature point pair, that is, the feature point pair is in a moving state or a static state.

The second characteristic point pair is any one of characteristic point pairs matched between two adjacent frames of images corresponding to any other image pickup device, and the second characteristic point is any one of characteristic points in the second characteristic point pair.

In this step, the server may project the second feature point according to the pose corresponding to the first pose information to obtain a fifth feature point. And the server determines the distance between the fifth characteristic point and the sixth characteristic point, and takes the distance as a second re-projection error, so as to obtain second re-projection error information. And the sixth characteristic point is a characteristic point matched with the second characteristic point in the second characteristic point pair.

For example, the number of the image capturing apparatuses is 3, and for the sake of convenience of distinction, the image capturing apparatuses are referred to as a first image capturing apparatus, a second image capturing apparatus, and a third image capturing apparatus, respectively, and then for the first image capturing apparatus, the server may determine second position and orientation information of the first image capturing apparatus based on first feature information corresponding to the first image capturing apparatus, determine a first reprojection error of each matched pair of feature points in two adjacent frames of images by the first image capturing apparatus based on the second position and orientation information of the first image capturing apparatus, and determine a second reprojection error of each matched pair of feature points in two adjacent frames of images by the second image capturing apparatus and the third image capturing apparatus, thereby obtaining the first reprojection error information and the second reprojection error information, respectively.

It should be noted that, if the target object is a stationary object in the surrounding environment during the movement process, the server may directly perform visual positioning on the target object according to the feature point pairs of the stationary object. When moving objects exist in the surrounding environment, the accuracy of visual positioning is affected by the moving objects, and therefore, the moving objects need to be filtered before the target object is visually positioned. Moreover, in the process of detecting the feature points and in the process of matching the feature point pairs, false detection and false matching may occur, and the false detection of the feature point pairs and the false matching of the feature point pairs may also affect the accuracy of visual positioning. Therefore, in the embodiment of the application, the server filters according to the reprojection error of the feature point pairs, not only can the moving object be filtered, but also the feature point pairs which are detected by mistake and the feature point pairs which are matched by mistake can be filtered, so that the target object can be visually positioned according to the static object subsequently, and the robustness of visual positioning is improved.

(7) The server determines second feature information based on the first reprojection error information and the second reprojection error information.

The second feature information includes not only the position information of the feature point pair in the moving state but also the position information of the erroneously detected feature point pair and the position information of the erroneously matched feature point pair.

In a possible implementation manner, the server may sum the first re-projection error corresponding to the first re-projection error information and the second re-projection error corresponding to the second re-projection error information to obtain a sum. And if the sum is larger than a first preset threshold value, determining a characteristic point pair with a first re-projection error exceeding a preset error threshold value from the characteristic point pairs matched with the camera equipment, and taking the position information of the characteristic point pair with the first re-projection error exceeding the preset error threshold value as second characteristic information.

In another possible implementation manner, the server may further sum a first re-projection error corresponding to the first re-projection error information and a second re-projection error corresponding to the second re-projection error information to obtain a sum, then determine an average value of the sum, determine, if the average value is greater than a second preset threshold, a feature point pair where the first re-projection error exceeds the preset error threshold from among feature point pairs matched with the image capturing apparatus, and use position information of the feature point pair where the first re-projection error exceeds the preset error threshold as the second feature information.

In this embodiment of the application, when the server determines the second feature information of the image capturing apparatus, the second reprojection error information corresponding to other image capturing apparatuses mainly plays a role in determining whether the first rotation matrix and the first translation matrix are correct, and is not added to the selection of the feature point pair. Therefore, the characteristic point pairs in the subsequent visual positioning of the target object are all from the same camera equipment, so that the calibration and sensor synchronization are not needed among a plurality of camera equipment, the accurate calibration result and the hard synchronization of the sensor are not required, and the visual positioning is more accurate.

For example, for the first image capturing apparatus, the second image capturing apparatus, and the third image capturing apparatus, when determining the second feature information of the first image capturing apparatus, the second reprojection error information corresponding to the second image capturing apparatus and the third image capturing apparatus mainly plays a role in determining whether the rotation matrix and the translation matrix corresponding to the first image capturing apparatus are correct, and is not added to the selection of the feature point pair in the motion state, the feature point pair detected by mistake, and the feature point pair matched by mistake corresponding to the first image capturing apparatus. Correspondingly, when the second characteristic information of the second camera device is determined, the second reprojection error information corresponding to the first camera device and the third camera device also plays a role in judging whether the rotation matrix and the translation matrix corresponding to the second camera device are correct, and is not added to the selection of the characteristic point pair in the motion state, the characteristic point pair detected by mistake and the characteristic point pair matched by mistake corresponding to the second camera device.

Step 304: and the server filters the second characteristic information from the first characteristic information to obtain third characteristic information corresponding to the camera equipment.

In this step, the first feature information includes position information of a feature point pair matched in two adjacent frames of images, and the third feature information is feature information corresponding to a stationary object, that is, position information of the feature point pair in a stationary state.

If the second characteristic information includes characteristic information corresponding to the moving object, that is, the position information of the characteristic point pair in the moving state, the server filters the position information of the characteristic point pair in the moving state from the position information of the matched characteristic point pair, so as to obtain the position information of the characteristic point pair in the static state. If the second characteristic information further comprises at least one of the position information of the misdetected characteristic point pairs and the position information of the mismatched characteristic point pairs, the server can also filter the misdetected characteristic point pairs and the mismatched characteristic point pairs, so that the accuracy of subsequent visual positioning is further improved.

Step 305: the server determines the stability coefficients of the plurality of image pickup apparatuses based on third feature information corresponding to the plurality of image pickup apparatuses.

The higher the stability factor is, the more pairs of characteristic points that indicate that the image pickup apparatus is in a stationary state in two adjacent frame images.

In this step, the server may determine the stability coefficients of the plurality of image capturing apparatuses by the following two implementations.

In a first implementation manner, the server determines a first number of pairs of feature points in the static state corresponding to each image pickup apparatus based on the position information of the pairs of feature points in the static state corresponding to each image pickup apparatus, and the size of the first number is used for representing a stability coefficient of the image pickup apparatus.

The larger the first number is, the higher the stability coefficient is, the more the pairs of feature points that indicate that the image pickup apparatus is in a static state in two adjacent frames of images are, and the higher the accuracy of subsequent visual positioning is.

In this implementation, for each image capturing apparatus, the server may count the first number of pairs of feature points in the stationary state based on the position information of each pair of feature points in the stationary state. And determining the stability coefficients corresponding to the first number according to the corresponding relation between the number range and the stability coefficients established in advance and the number range where the first number is located.

In a second implementation manner, the server determines third reprojection error information of each characteristic point pair in the static state corresponding to each image pickup apparatus based on the position information of the characteristic point pair in the static state corresponding to each image pickup apparatus; and summing the reprojection errors corresponding to the third reprojection error information of each feature point pair in the static state to obtain a sum value, wherein the sum value is used for representing a stability coefficient of the image pickup device.

The larger the sum value is, the higher the stability coefficient is, which indicates that the more pairs of feature points of the image pickup apparatus in a static state in two adjacent frames of images are, the higher the accuracy of subsequent visual positioning is.

In this implementation, for any one of the feature point pairs in the stationary state, the server may project any one of the feature point pairs according to the second pose information based on the first pose information, determine a distance between a projected position of the feature point and a feature point matching the feature point, use the distance as a reprojection error, and then determine a sum of the reprojection errors of the plurality of feature point pairs in the stationary state. And determining the stability coefficient corresponding to the sum value according to the pre-established corresponding relation between the reprojection error sum value range and the stability coefficient and the reprojection error sum value range in which the sum value is positioned.

In a third implementation, the server may further determine the stability factor of the image capturing apparatus based on an average of the reprojection errors corresponding to the third reprojection error information.

The smaller the average value is, the higher the stability coefficient is, the more the pairs of feature points that indicate that the image pickup apparatus is in a static state in two adjacent frames of images are, the higher the accuracy of subsequent visual positioning is.

In the implementation mode, the server determines an average value of the reprojection errors of the plurality of feature point pairs in the static state, and determines the stability coefficient corresponding to the average value according to a pre-established correspondence relationship between a reprojection error average value range and the stability coefficient and the reprojection error average value range in which the average value is located.

Step 306: the server determines a target image capturing apparatus from the plurality of image capturing apparatuses based on the stability coefficients of the plurality of image capturing apparatuses.

The target image pickup apparatus is an image pickup apparatus having the highest stability factor. In this step, the server may set, as the target image capturing apparatus, the image capturing apparatus with the largest first number, or set, as the target image capturing apparatus, the image capturing apparatus with the smallest sum of the reprojection errors, or set, as the target image capturing apparatus, the image capturing apparatus with the smallest average of the reprojection errors.

Referring to fig. 4, fig. 4 will be described by taking as an example only the case where the target object includes the first image pickup apparatus, the second image pickup apparatus, and the third image pickup apparatus. As can be seen in fig. 4: for the first camera device, the second camera device and the third camera device, the server extracts feature points from the images for each camera device, and then matches the feature points of two adjacent frames of images to obtain matched feature point pairs. For each image pickup apparatus, the server determines the first re-projection error information and the second re-projection error information of the feature point pairs matched by the image pickup apparatus and the other two image pickup apparatuses in the two adjacent frames of images, and the description is given only by taking the image pickup apparatus as the second image pickup apparatus as an example in fig. 4. For the second camera device, the server firstly extracts a certain number of characteristic point pairs from the matched characteristic point pairs in the two adjacent frames of images corresponding to the second camera device, and determines second position and orientation information based on the position information of the characteristic point pairs. And then verifying the second position and orientation information through the residual number of feature points, and determining first re-projection error information corresponding to the second camera equipment based on the second position and orientation information when the verification is passed. And, second re-projection error information corresponding to the first image pickup apparatus and second re-projection error information corresponding to the third image pickup apparatus are also determined based on the second posture information. And filtering second characteristic information from the first characteristic information corresponding to the second image pickup device according to the first re-projection error information of the second image pickup device, the second re-projection error information of the first image pickup device and the second re-projection error information of the third image pickup device, wherein the second characteristic information not only comprises the position information of the characteristic point pair in the motion state, but also comprises the position information of the misdetected characteristic point pair and the position information of the mismatched characteristic point pair, and the position information of the characteristic point pair in the static state corresponding to the second image pickup device is obtained. The position information of the characteristic point pair in the stationary state corresponding to the first image pickup apparatus and the position information of the characteristic point pair in the stationary state corresponding to the third image pickup apparatus are sequentially determined by the above-described method. And then according to second characteristic information corresponding to the plurality of image pickup devices, determining the image pickup device with the highest stability coefficient from the plurality of image pickup devices as a target image pickup device.

Step 307: the server determines first position information of the target object based on third characteristic information corresponding to the target camera equipment.

The first pose information is relative pose information of the target object between the current frame and the historical frame. The history frame may be any frame before the current frame, that is, the history frame may be the first frame, or may be a frame before the current frame. For example, the current frame is the tenth frame, the historical frame is the second frame, and the server may determine the relative pose information of the target object between the second frame and the tenth frame. Or, the current frame is the ninth frame, the historical frame is the first frame, and the server may determine the relative pose information of the target object between the first frame and the ninth frame.

In this step, the server may determine the second rotation matrix and the second translation matrix based on the position information of the feature point pair in a stationary state in the two adjacent frames of images by the target image capturing apparatus. First position and orientation information of the target object is then determined based on the second rotation matrix and the second translation matrix.

In one possible implementation, the server may determine the second rotation matrix and the second translation matrix by the following process. The process may be: the server randomly extracts a fourth number of feature point pairs from the feature point pairs of the two adjacent frames of images in the static state, fits a second rotation matrix and a second translation matrix according to the position information of the fourth number of feature point pairs so that the second rotation matrix and the second translation matrix can be applied to the position information of the fourth number of feature point pairs, and then verifies the second rotation matrix and the second translation matrix through the remaining feature point pairs in the static state. And if the verification is passed, representing third posture information of the target camera equipment between two adjacent frames of images through the second rotation matrix and the second translation matrix.

The server can sequentially determine a second rotation matrix and a second translation matrix of two adjacent frames of images of the target camera between the current frame and the historical frame, then multiplies the second rotation matrix corresponding to each two adjacent frames of images to obtain a third rotation matrix, multiplies the second translation matrix corresponding to each two adjacent frames of images to obtain a third translation matrix, and combines the third rotation matrix and the third translation matrix into the first posture information.

For example, if the current frame is the fourth frame and the historical frame is the first frame, the server may first determine a second rotation matrix and a second translation matrix between the first frame and the second frame, a second rotation matrix and a second translation matrix between the second frame and the third frame, and a second rotation matrix and a second translation matrix between the third frame and the fourth frame, then multiply each second rotation matrix to obtain a third rotation matrix, and multiply each second translation matrix to obtain a third translation matrix. And representing the relative pose information of the target object between the first frame and the fourth frame by adopting a third rotation matrix and a third translation matrix.

In the embodiment of the application, the server can screen the feature points among the camera devices and in the same camera device, determine the most stable camera device in the multiple camera devices, and then determine the relative pose information of the target object by using the feature point pair corresponding to the most stable camera device.

Fig. 5 is a schematic view of a visual positioning apparatus provided in an embodiment of the present application, and referring to fig. 5, the apparatus includes:

the first determining module 501 is configured to determine first feature information corresponding to multiple image capturing devices, where the multiple image capturing devices are all installed on a target object to be positioned, and the first feature information corresponding to any image capturing device is feature information of a multi-frame image acquired by any image capturing device;

a second determining module 502, configured to determine, for each image capturing apparatus, second feature information of the image capturing apparatus based on the first feature information corresponding to the image capturing apparatus, where the second feature information includes feature information corresponding to a moving object;

the filtering module 503 is configured to filter the second feature information from the first feature information to obtain third feature information corresponding to the image capturing apparatus, where the third feature information is feature information corresponding to a stationary object;

a third determining module 504, configured to determine, based on third feature information corresponding to the multiple image capturing apparatuses, first pose information of the target object, where the first pose information is relative pose information of the target object between the current frame and the history frame.

In a possible implementation manner, the second determining module 502 is configured to determine second pose information of the image capturing apparatus based on the first feature information corresponding to the image capturing apparatus, where the second pose information is relative pose information of the image capturing apparatus between two adjacent frames of images; second feature information of the image pickup apparatus is determined based on the second posture information.

In another possible implementation manner, the second determining module 502 is configured to determine, based on the second position and orientation information, first reprojection error information of each matched feature point pair in the two adjacent frames of images of the image capturing apparatus, and second reprojection error information of each matched feature point pair in the two adjacent frames of images of any other image capturing apparatus; second feature information is determined based on the first reprojection error information and the second reprojection error information.

a second determining module 502, configured to project a first feature point in a first feature point pair based on the second position and orientation information and position information of the first feature point pair corresponding to the image capturing apparatus, to obtain first re-projection error information, where the first feature point pair is any one of feature point pairs matched between two adjacent frames of images corresponding to the image capturing apparatus, and the first feature point is any one of feature points in the first feature point pair; and projecting the second characteristic points in the second characteristic point pair based on the second position and orientation information and the position information of the second characteristic point pair corresponding to any other image pickup equipment to obtain second re-projection error information, wherein the second characteristic point pair is any one of characteristic point pairs matched between two adjacent frames of images corresponding to any other image pickup equipment, and the second characteristic point is any one of characteristic points in the second characteristic point pair.

In another possible implementation manner, the third determining module 504 is configured to determine the stability coefficients of the multiple image capturing apparatuses based on third feature information corresponding to the multiple image capturing apparatuses; determining a target image pickup apparatus from the plurality of image pickup apparatuses based on the stability coefficients of the plurality of image pickup apparatuses, the target image pickup apparatus being an image pickup apparatus having the highest stability coefficient; and determining first position and orientation information of the target object based on the third characteristic information corresponding to the target camera equipment.

a third determining module 504, configured to determine, based on the position information of the feature point pair in the stationary state corresponding to each image capturing apparatus, a first number of the feature point pairs in the stationary state corresponding to each image capturing apparatus, where a size of the first number is used to characterize a stability factor of the image capturing apparatus.

a third determining module 504, configured to determine third reprojection error information of each of the pairs of feature points in the stationary state corresponding to each of the image capturing apparatuses, based on the position information of the pair of feature points in the stationary state corresponding to each of the image capturing apparatuses; and summing the reprojection errors corresponding to the third reprojection error information of each feature point pair in the static state to obtain a sum value, wherein the sum value is used for representing a stability coefficient of the image pickup equipment.

In another possible implementation manner, the first determining module 501 is configured to perform feature point detection on each frame of image acquired by each image capturing device to obtain a plurality of feature points; matching a plurality of characteristic points of two adjacent frames of images corresponding to each camera device to obtain matched characteristic point pairs; and taking the position information of the matched characteristic point pair as the first characteristic information corresponding to each image pickup device.

When the visual positioning device performs visual positioning on the target object, the characteristic information of the moving object is determined according to the characteristic information of the multi-frame images collected by the camera equipment, and then the characteristic information of the moving object is filtered from the characteristic information of the multi-frame images to obtain the characteristic information of the static object, so that the target object can be visually positioned according to the characteristic information of the static object, the influence of the moving object is eliminated, and the accuracy of the visual positioning is improved.

It should be noted that: in the visual positioning device provided in the above embodiment, only the division of the above functional modules is used for illustration in the visual positioning, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the server is divided into different functional modules to complete all or part of the above described functions. In addition, the apparatus for visual positioning and the method for visual positioning provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 600 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 601 and one or more memories 602, where the memory 602 stores at least one program code, and the at least one program code is loaded and executed by the processors 601 to implement the methods provided by the method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a computer readable storage medium, such as a memory including program code, executable by a processor in a server to perform the visual positioning method of the above embodiments is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program or a computer program product is also provided, which comprises computer program code, which, when executed by a server, causes the server to implement the visual positioning method in the above-described embodiments.

In an exemplary embodiment, the computer program according to the embodiments of the present application may be deployed to be executed on one server, or on a plurality of servers located at one site, or on a plurality of servers distributed at a plurality of sites and interconnected by a communication network, and the plurality of servers distributed at the plurality of sites and interconnected by the communication network may constitute a block chain system.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A visual positioning method, characterized in that the method comprises:

filtering the second characteristic information from the first characteristic information to obtain third characteristic information corresponding to the camera equipment, wherein the third characteristic information comprises position information of a characteristic point pair in a static state in two adjacent frames of images;

summing the reprojection errors corresponding to the third reprojection error information of each feature point pair in the static state to obtain a sum value, wherein the sum value is used for representing a stability coefficient of the camera device;

and determining first pose information of the target object based on third feature information corresponding to the target camera equipment, wherein the first pose information is relative pose information of the target object between the current frame and the historical frame.

2. The method according to claim 1, wherein the determining second feature information of the image pickup apparatus based on the first feature information corresponding to the image pickup apparatus comprises:

3. The method according to claim 2, wherein the second characteristic information further includes at least one of position information of the misdetected characteristic point pair and position information of the mismatched characteristic point pair.

4. The method of claim 3, wherein the determining second feature information of the imaging device based on the second position information comprises:

5. The method according to claim 4, wherein the first feature information includes position information of matched pairs of feature points in the two adjacent frame images;

6. The method according to claim 1, wherein the determining first feature information corresponding to a plurality of image capturing apparatuses includes:

7. A visual positioning device, the device comprising:

the filtering module is configured to filter the second feature information from the first feature information to obtain third feature information corresponding to the image capturing apparatus, where the third feature information includes position information of a feature point pair in a static state in two adjacent frames of images;

a third determining module, configured to determine, based on the position information of the feature point pair in the stationary state corresponding to each image capturing apparatus, third reprojection error information of each feature point pair in the stationary state corresponding to each image capturing apparatus; summing the reprojection errors corresponding to the third reprojection error information of each feature point pair in the static state to obtain a sum value, wherein the sum value is used for representing a stability coefficient of the camera device; determining a target image pickup apparatus from the plurality of image pickup apparatuses based on stability coefficients of the plurality of image pickup apparatuses, the target image pickup apparatus being an image pickup apparatus having a highest stability coefficient; and determining first pose information of the target object based on third feature information corresponding to the target camera equipment, wherein the first pose information is relative pose information of the target object between the current frame and the historical frame.

8. A server, characterized in that the server comprises one or more processors and one or more memories having stored therein at least one program code, which is loaded and executed by the one or more processors to implement the operations performed by the visual positioning method according to any one of claims 1 to 6.

9. A computer-readable storage medium, having stored therein at least one program code, which is loaded and executed by a processor to perform operations performed by the visual localization method of any one of claims 1 to 6.