CN112785705B - Pose acquisition method and device and mobile equipment - Google Patents

Pose acquisition method and device and mobile equipment Download PDF

Info

Publication number
CN112785705B
CN112785705B CN202110082125.3A CN202110082125A CN112785705B CN 112785705 B CN112785705 B CN 112785705B CN 202110082125 A CN202110082125 A CN 202110082125A CN 112785705 B CN112785705 B CN 112785705B
Authority
CN
China
Prior art keywords
current frame
frame image
matched
points
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110082125.3A
Other languages
Chinese (zh)
Other versions
CN112785705A (en
Inventor
秦家虎
刘晨昕
余雷
王帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202110082125.3A priority Critical patent/CN112785705B/en
Publication of CN112785705A publication Critical patent/CN112785705A/en
Application granted granted Critical
Publication of CN112785705B publication Critical patent/CN112785705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a pose acquisition method, a pose acquisition device and mobile equipment, wherein the pose acquisition method comprises the following steps: obtaining a current frame image; obtaining characteristic points with matched characteristic points in a history image contained in a sliding window in a current frame image; respectively obtaining a first pixel model of the matched characteristic points in each current frame image; respectively comparing the first pixel model of the matched characteristic points in each current frame image with the corresponding second pixel model to obtain a model comparison result, wherein the model comparison result represents whether the spatial points corresponding to the matched characteristic points in the current frame image belong to a moving object or not; the second pixel model is a pixel model of the characteristic points matched with the characteristic points in the current frame image in the historical image; screening out target feature points in the current frame image according to the model comparison result, wherein the target feature points are feature points of which the corresponding space points do not belong to a moving object; and obtaining the current pose of the mobile equipment according to the reprojection error corresponding to the target feature point.

Description

Pose acquisition method and device and mobile equipment
Technical Field
The application relates to the technical field of intelligent home, in particular to a pose acquisition method, a pose acquisition device and mobile equipment.
Background
Meanwhile, positioning and mapping (Simultaneous Localization and Mapping, SLAM) technology becomes a hot spot of current robot field research, and SLAM technology can be divided into laser SLAM and visual SLAM according to different sensors. The visual SLAM estimates the pose of the robot and the surrounding environment from a single continuous moving image. The visual SLAM specifically refers to a robot carrying a specific sensor, an environment model is built in the motion process under the condition that no environment priori information exists, and the motion pose of the robot is estimated.
The current SLAM positioning algorithm is based on the assumption that the environment is static or quasi-static, i.e. the whole scene is static. However, most of the actual robot running scenes are dynamic scenes, and objects moving relatively can interfere with pose acquisition, so that the situation of low positioning accuracy exists.
Therefore, a technical solution capable of improving the positioning accuracy of SLAM is needed.
Disclosure of Invention
In view of the above, the present application provides a pose obtaining method, a pose obtaining device, and a mobile device, so as to solve the technical problem of low positioning accuracy of the mobile device in a dynamic scene in the prior art.
The application provides a pose acquisition method, which comprises the following steps:
obtaining a current frame image, wherein the current frame image is an image acquired by an image acquisition device on mobile equipment;
obtaining the matched characteristic points in the current frame image, wherein the matched characteristic points in the current frame image are characteristic points with matched characteristic points in a historical image contained in a sliding window corresponding to the current frame image; the sliding window comprises a plurality of frames of historical images, wherein the historical images are key frame images before the current frame image;
respectively obtaining a first pixel model of the matched feature points in each current frame image, wherein the first pixel model corresponds to a plurality of neighbor feature points of the matched feature points in the current frame image, and the first pixel model also has a direction vector of the first pixel model and a re-projection error of the matched feature points in the current frame image;
respectively comparing the first pixel model of the matched characteristic points in each current frame image with the corresponding second pixel model to obtain a model comparison result, wherein the model comparison result represents whether the spatial points corresponding to the matched characteristic points in the current frame image belong to a moving object or not; the second pixel model is a pixel model of a feature point matched with the current frame image in the historical image, the second pixel model corresponds to a plurality of neighbor feature points of the feature point matched with the historical image, and the second pixel model also has a direction vector of the second pixel model and a mean value of reprojection errors of the feature point matched with the historical image;
Screening out target feature points in the current frame image according to the model comparison result, wherein the target feature points are feature points of which the corresponding space points in the current frame image do not belong to a moving object;
and obtaining the current pose of the mobile equipment according to the reprojection error corresponding to the target feature point in the current frame image.
According to the above method, preferably, the obtaining the current pose of the mobile device according to the reprojection error corresponding to the target feature point in the current frame image includes:
obtaining a weight value corresponding to the target feature point according to the depth value of the space point corresponding to the target feature point;
obtaining a pose transformation matrix corresponding to the target feature point when the reprojection error of the target feature point is minimum according to the weight value corresponding to the target feature point, wherein the reprojection error is obtained according to the three-dimensional coordinate value corresponding to the target feature point in the historical image and the two-dimensional coordinate value of the target feature point in the current frame image;
and obtaining the current pose of the mobile equipment according to the pose transformation matrix.
In the above method, preferably, the method further includes respectively obtaining a first pixel model of the feature point matched in each current frame image, including:
Respectively taking the matched feature points in each current frame image as the center to obtain a plurality of neighbor feature points of the matched feature points in the current frame image;
the depth values of the feature points matched in the current frame image and the corresponding neighbor feature points are different, the neighbor feature points of the feature points matched in the current frame image are feature points with depth values larger than the target depth and the distance between the feature points matched in the current frame image meets the distance sorting rule, and the target depth is related to the depth average value of the neighbor pixel points of the feature points matched in the current frame image;
and establishing a first pixel model of the matched feature points in the current frame image at least according to the plurality of neighbor feature points of the matched feature points in the current frame image, so that the first pixel model corresponds to the plurality of neighbor feature points of the matched feature points in the current frame image and also has a direction vector of the first pixel model and a re-projection error of the matched feature points in the current frame image.
According to the method, the direction vector of the first pixel model is obtained based on the three-dimensional coordinate value of the spatial center point corresponding to the neighbor feature point in the first pixel model and the three-dimensional coordinate value of the spatial point corresponding to the feature point matched in the current frame image, and the reprojection error of the feature point matched in the current frame image is obtained based on the historical image.
In the above method, preferably, the comparing the first pixel model of the feature point matched in each current frame image with the corresponding second pixel model to obtain a model comparison result includes:
obtaining the number of the neighbor feature points matched in the first pixel model and the corresponding second pixel model;
obtaining a modular length of a difference between the direction vector in the first pixel model and the direction vector in the corresponding second pixel model;
judging whether the re-projection error of the matched characteristic points in the current frame image in the first pixel model is smaller than or equal to the average value of the re-projection errors corresponding to the matched characteristic points in the history image in the second pixel model, so as to obtain a judging result; the mean value is the mean value of the cumulative heavy projection errors of the feature points matched in the historical images in the sliding window;
and obtaining a model comparison result according to the number, the module length and the judgment result.
The above method, preferably, further comprises:
obtaining an update request for a pixel model of a feature point in the history image;
obtaining a newly added image in the sliding window;
Extracting feature points of the newly added image to obtain feature points in the newly added image;
matching the characteristic points in the newly added image with the characteristic points in the historical image in the sliding window to obtain matching characteristic points matched with the characteristic points in the historical image in the newly added image and non-matching characteristic points not matched with the characteristic points in the historical image in the newly added image;
updating a pixel model of the feature points matched with the matching feature points in the historical image according to the matching feature points;
and obtaining the pixel model of the non-matching characteristic point, so that when the newly added image is used as a historical image in the sliding window, the pixel model of the non-matching characteristic point is used as the pixel model of the new characteristic point in the historical image.
In the above method, preferably, after the target feature point in the current frame image is screened according to the model comparison result, before the current pose of the mobile device is obtained according to the reprojection error corresponding to the target feature point in the current frame image, the method further includes:
And screening out target characteristic points which meet epipolar constraint with the characteristic points in the historical image from the target characteristic points in the current frame image.
The method preferably obtains the feature points matched in the current frame image, and the method comprises the following steps:
extracting characteristic points in the current frame image;
and matching the characteristic points extracted from the current frame image with the characteristic points extracted from the historical image to obtain the characteristic points matched with the characteristic points in the historical image in the current frame image.
The application also provides a pose acquisition device, comprising:
the image acquisition unit is used for acquiring a current frame image, wherein the current frame image is an image acquired by an image acquisition device on the mobile equipment;
the characteristic point processing unit is used for obtaining the characteristic points matched in the current frame image, wherein the characteristic points matched in the current frame image are characteristic points with the matched characteristic points in the historical images contained in the sliding window corresponding to the current frame image; the sliding window comprises a plurality of frames of historical images, wherein the historical images are key frame images before the current frame image;
The model building unit is used for respectively obtaining a first pixel model of the matched characteristic points in each current frame image, wherein the first pixel model corresponds to a plurality of neighbor characteristic points of the matched characteristic points in the current frame image, and the first pixel model also has a direction vector of the first pixel model and a re-projection error of the matched characteristic points in the current frame image;
the model comparison unit is used for respectively comparing the first pixel model of the matched characteristic points in each current frame image with the corresponding second pixel model to obtain a model comparison result, and the model comparison result represents whether the spatial points corresponding to the matched characteristic points in the current frame image belong to a moving object or not; the second pixel model is a pixel model of a feature point matched with the current frame image in the historical image, the second pixel model corresponds to a plurality of neighbor feature points of the feature point matched with the historical image, and the second pixel model also has a direction vector of the second pixel model and a mean value of reprojection errors of the feature point matched with the historical image;
The characteristic point screening unit is used for screening out target characteristic points in the current frame image according to the model comparison result, wherein the target characteristic points are characteristic points of which the corresponding space points in the current frame image do not belong to a moving object;
and the pose obtaining unit is used for obtaining the current pose of the mobile equipment according to the reprojection error corresponding to the target feature point in the current frame image.
The application also provides a mobile device comprising:
a memory for storing an application program and data generated by the operation of the application program;
a processor for executing the application program to realize:
obtaining a current frame image, wherein the current frame image is an image acquired by an image acquisition device on mobile equipment;
obtaining the matched characteristic points in the current frame image, wherein the matched characteristic points in the current frame image are characteristic points with matched characteristic points in a historical image contained in a sliding window corresponding to the current frame image; the sliding window comprises a plurality of frames of historical images, wherein the historical images are key frame images before the current frame image;
respectively obtaining a first pixel model of the matched feature points in each current frame image, wherein the first pixel model corresponds to a plurality of neighbor feature points of the matched feature points in the current frame image, and the first pixel model also has a direction vector of the first pixel model and a re-projection error of the matched feature points in the current frame image;
Respectively comparing the first pixel model of the matched characteristic points in each current frame image with the corresponding second pixel model to obtain a model comparison result, wherein the model comparison result represents whether the spatial points corresponding to the matched characteristic points in the current frame image belong to a moving object or not; the second pixel model is a pixel model of a feature point matched with the current frame image in the historical image, the second pixel model corresponds to a plurality of neighbor feature points of the feature point matched with the historical image, and the second pixel model also has a direction vector of the second pixel model and a mean value of reprojection errors of the feature point matched with the historical image;
screening out target feature points in the current frame image according to the model comparison result, wherein the target feature points are feature points of which the corresponding space points in the current frame image do not belong to a moving object;
and obtaining the current pose of the mobile equipment according to the reprojection error corresponding to the target feature point in the current frame image.
According to the pose acquisition method, the pose acquisition device and the mobile device disclosed by the application, after the current frame image is acquired by the image acquisition device on the mobile device, the feature points matched with the feature points in the historical images contained in the previous sliding window in the current frame image are acquired, further after the pixel models corresponding to the plurality of neighbor feature points of the feature points in the current frame image are respectively obtained, the pixel models are compared with the pixel models of the feature points in the historical images contained in the sliding window, further, the target feature points, which do not belong to the moving object, of the corresponding space points in the current frame image can be screened out according to the model comparison result, and based on the feature points, the current pose of the mobile device can be obtained according to the reprojection errors of the target feature points. Therefore, the pixel model of the matched characteristic points in the front and rear images is utilized to reject the characteristic points belonging to the moving object, so that the mobile equipment is positioned according to the screened characteristic points not belonging to the moving object, interference of the characteristic points of the moving object on pose acquisition can be avoided, and the positioning accuracy of the mobile equipment in a dynamic scene is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a pose acquisition method according to a first embodiment of the present application;
FIGS. 2-6 are respectively exemplary diagrams of embodiments of the present application;
fig. 7 to 8 are partial flowcharts of a pose acquisition method according to a first embodiment of the present application;
FIG. 9 is another exemplary diagram of an embodiment of the present application;
FIGS. 10-11 are respectively a flowchart of another part of a pose acquisition method according to the first embodiment of the present application;
fig. 12 is another flowchart of a pose acquisition method according to the first embodiment of the present application;
fig. 13 is a schematic structural diagram of a pose acquisition device according to a second embodiment of the present application;
fig. 14 is another schematic structural diagram of a pose acquisition device according to the second embodiment of the present application;
fig. 15 is a schematic structural diagram of a mobile device according to a third embodiment of the present application;
Fig. 16 to 19 are respectively exemplary diagrams of embodiments of the present application applicable to a mobile robot.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Referring to fig. 1, a flowchart of an implementation of a pose acquisition method according to an embodiment of the present application may be applicable to an electronic device capable of processing an image, such as a mobile device with an image acquisition device, such as a mobile robot, or may be a device connected to the mobile device, such as a computer or a server. The technical scheme in the embodiment is used for improving the accuracy of pose acquisition of the mobile equipment.
Specifically, the method in this embodiment may include the following steps:
step 101: and obtaining the current frame image.
The current frame image is the latest image acquired by an image acquisition device on the mobile device, the image acquisition device can be a camera or a video recorder and other devices, the images in the acquisition range corresponding to the mobile device can be acquired, as shown in fig. 2, the camera is arranged at the top end of the mobile robot, and in the moving process of the mobile robot, the camera can acquire the images in the front range.
It should be noted that, in this embodiment, the current frame image acquired by the image acquisition device may be obtained through a connection interface with the image acquisition device, or the current frame image acquired by the image acquisition device transmitted by the mobile device may be obtained through a communication interface with the mobile device.
In addition, the current frame image in this embodiment can be understood as an image that needs to be processed at the current time.
Step 102: and obtaining the matched characteristic points in the current frame image.
The feature points matched in the current frame image obtained in step 102 are feature points matched with feature points in a history image in the current frame image, where the history image is an image in a sliding window before the current frame image, as shown in fig. 3, after the history image is collected, one or more frames of history images form a sliding window corresponding to the current frame image and are stored in an image library, and based on this, in this embodiment, after the current frame image is obtained, the history image in the sliding window that is already stored can be read in the image library. Wherein, the characteristic points in the current frame image which are matched with the characteristic points in the historical image are the characteristic points which correspond to the same space point in the map with the characteristic points in the historical image.
It should be noted that, in this embodiment, the history image added to the sliding window is a key frame image in the image acquired by the image acquisition device, and the key frame refers to an image frame with a representative meaning, which may be understood as an image frame where a key action in movement or change of a character or an object is located, such as an I frame.
In one implementation manner, when the feature points in the current frame image that match the feature points in the history image are obtained in step 102 in this embodiment, the following manner may be implemented:
first, extracting feature points in the current frame image, wherein the feature points in the image refer to representative points in the image, and are another digital expression form of the image information. The feature points in the embodiment may be ORB (riented Fast and Rotated Brief) features in the current frame image, and are composed of key points and descriptors, where the key points may be FAST corner points with scale and rotation invariance, and the descriptors may be binary BRIEF descriptors. In a specific implementation, the embodiment may use a preset feature extraction algorithm to extract feature points in the current frame image, so as to obtain feature points in the current frame image that meet the extraction condition. In order to further improve accuracy, in this embodiment, the current frame image may be first divided into a plurality of small squares, and then feature point extraction is performed for each small square, so as to increase uniformity of feature point extraction, so that feature points of the current frame image can be uniformly distributed in the whole image field of view, and at this time, the extracted feature points include feature points in the current frame image;
Then, the feature points extracted from the current frame image are matched with the feature points extracted from the history image. The characteristic points in the historical images are characteristic points obtained by extracting the characteristic points in the same characteristic extraction mode as the current frame image.
Specifically, in this embodiment, a preset matching algorithm may be used to match the feature points in the current frame image with the feature points in the history image, for example, performing ORB descriptor matching on the feature points. In order to further improve the matching rate, in this embodiment, a matching algorithm such as a DBOW2 word bag technology may be used to perform accelerated matching on the feature points in the current frame image and the features in the history image, so as to match the feature points in the current frame image, where the feature points in the current frame image correspond to the feature points in the history image and are the same space point, as shown in fig. 4, where the feature points matched in the current frame image have the matched feature points in the history image.
Specifically, in this embodiment, the feature points in the current frame image may be first subjected to point matching with the feature points extracted from the key frame image of the frame closest to the current frame image in the sliding window, and if the number of feature points in the current frame image, which are matched with the feature points extracted from the key frame image of the frame closest to the current frame image in the sliding window, does not meet the required number, then the feature points in the current frame image are matched with the feature points in all the key frame images in the sliding window until the required number is met.
Further, in order to improve the reliability of this embodiment, it is required to determine whether the number of feature points matched in the current frame image reaches the preset required number, if the number of feature points matched in the current frame image does not reach the required number, the matching algorithm may be modified, for example, the matching condition in the matching algorithm is relaxed, then the feature points extracted in the current frame image and the feature points in the history image in the sliding window are matched by using the matching algorithm after the matching condition is relaxed, for example, the feature points matched with the feature points in the history image in the current frame image are accelerated and matched by using the DBOW2 bag technology that the matching condition is relaxed, so as to increase the number of the feature points matched in the current frame image, and enable the number of the matched feature points to meet the required number.
The history image in the sliding window is a key frame image, so that when all the features in the current frame image are matched, the feature points of the key frame in the sliding window are used, and therefore the feature points matched with the feature points of the key frame in the sliding window in the current frame image are matched.
Further, in this embodiment, after obtaining the feature points in the current frame image that match the feature points in the history image in step 102, the re-projection error calculation may be performed based on the feature points that match the current frame image, and the current pose of the mobile device may be estimated primarily based on the calculated re-projection error. For example, in this embodiment, depth values of spatial points corresponding to feature points matched with feature points in a current frame image in a history image are obtained according to a depth image corresponding to the history image, so as to obtain three-dimensional coordinate values of the spatial points, then the three-dimensional coordinate values are converted into a current coordinate system corresponding to the current frame image by using an initial pose transformation matrix, then two-dimensional coordinate values of the spatial points in the current coordinate system are projected, and then error calculation is performed on the coordinate values and the two-dimensional coordinate values of the corresponding feature points matched in the current frame image, so that pose estimation is performed according to the calculated reprojection errors, and thus the current pose of the mobile device is primarily estimated.
Step 103: and respectively obtaining a first pixel model of the matched characteristic points in each current frame image.
The first pixel model may also be referred to as a feature point model, where the first pixel model corresponds to a plurality of neighboring feature points, direction vectors, and re-projection errors of feature points matched in the current frame image, and further, the depth values of neighboring feature points of the feature points matched in the current frame image are different from the depth values of feature points in the corresponding current frame image, that is, in this embodiment, one first pixel model is respectively built for each feature point matched in the current frame image, and the first pixel model of each feature point matched in the current frame image is built according to a plurality of neighboring feature points different from the depth values of feature points matched in the current frame image, where the neighboring feature points may be the same or different in depth value. Based on this, in this embodiment, when the first pixel model of the feature point matched in the current frame image is obtained, the following may be specifically implemented:
firstly, a plurality of neighbor feature points of the feature points matched in the current frame image are obtained, and the neighbor feature points are distributed around the feature points matched in the current frame image, as shown in fig. 5;
Then, redundant neighbor feature points with the same depth values in the neighbor feature points are removed;
and then, establishing a first pixel model of the feature points matched in the current frame image according to the rest neighbor feature points.
Specifically, the first pixel model may include feature data of a plurality of neighboring feature points, such as pixel values and coordinate values of the neighboring feature points, and further includes: the direction vector of the first pixel model and the re-projection error of the matched feature point in the current frame image are obtained based on the three-dimensional coordinate value of the spatial center point corresponding to the neighbor feature point in the first pixel model and the three-dimensional coordinate value of the spatial point corresponding to the matched feature point in the current frame image, and the re-projection error of the matched feature point in the current frame image is obtained based on a historical image such as a historical image in a sliding window, and specifically, the re-projection error is obtained according to the geometric error of the 3D spatial point projected to the current frame image in the historical image. For example, depth values of space points corresponding to feature points matched with feature points matched in a current frame image in a history image are obtained according to the depth image corresponding to the history image, three-dimensional coordinate values of the space points are obtained, the three-dimensional coordinate values are converted into a current coordinate system corresponding to the current frame image by using an initial pose transformation matrix, two-dimensional coordinate values of the space points in the current coordinate system are projected, and error calculation is carried out on the coordinate values and the two-dimensional coordinate values of the corresponding feature points matched in the current frame image.
Specifically, the first pixel model may be represented by { p' 1 ,p' 2 ,p' 3 …p' N N ', r' } represents. Wherein { p }' 1 ,p' 2 ,p' 3 …p' N The direction vector is represented by n ', and concretely is represented by n ', which is the neighboring feature point of the feature point p ' matched in the current frame image and has different depth in space from the feature point pRepresentation, wherein->The three-dimensional coordinate value of the spatial center point corresponding to the neighboring feature point of the feature point p' (x, y, z) is the three-dimensional coordinate value of the spatial point corresponding to the feature point matched in the current frame image,the three-dimensional coordinate values may be obtained by mapping feature points in the current frame image and their corresponding depth values in the depth image into space. The reprojection error of the feature point p ' matched in the current frame image is represented by r ', and N is the number of neighboring feature points of the feature point p ', which may be a preset value, and may be specifically determined according to empirical data.
It should be noted that, since the feature points matched in the current frame image obtained in step 102 are feature points matched with feature points in the history image in the sliding window in the current frame image, the first pixel model is a pixel model of feature points obtained by screening the original feature points of the current frame image according to the feature points of the previous history image of the current frame image. In addition, for other feature points in the current frame image, except for the feature points matched with the feature points in the history image, corresponding pixel models need to be built, and the way of building the pixel models by the other feature points can refer to the building way of the first pixel model, the pixel models of the other feature points are mainly used for updating the second pixel model corresponding to the feature points matched with the feature points of the current frame image in the history image, which is described in detail below. Therefore, the first pixel model of the feature point matched in the current frame image is built on the premise that the feature point is the feature point matched with the feature point matched in the history image, so the first pixel model of the feature point matched in the current frame image is built on the basis of the history image in the embodiment.
Step 104: and respectively comparing the first pixel model of the matched characteristic points in each current frame image with the corresponding second pixel model to obtain a model comparison result.
The model comparison result represents whether the spatial points corresponding to the matched feature points in the current frame image belong to a moving object or not; the second pixel model is a pixel model of a feature point in the historical image, the second pixel model corresponds to a mean value of a plurality of neighbor feature points, direction vectors and reprojection errors of the feature point in the historical image, and further, depth values of the neighbor feature points of the feature point in the historical image are different from depth values of the feature point in the corresponding historical image.
It should be noted that, in the present embodiment, the method for creating the second pixel model of the feature point matched in the history image may refer to the method for creating the first pixel model, which is not described in detail herein. The method comprises the steps of establishing a second pixel model of a matched characteristic point in a later frame image based on a former frame image in a sliding window containing at least two frames of historical images, wherein the characteristic point of a first frame image in the sliding window can be preset through initialization, so that the pixel model of the characteristic point in the first frame image in the sliding window is established based on the initialization preset characteristic point, and updating the second pixel model of the matched characteristic point in the former frame image in the sliding window by utilizing the pixel model of the characteristic point extracted from the newly added images after the images are newly added in the sliding window.
Of course, the content in the second pixel model is consistent with the content in the first pixel model, for example, the first pixel model includes feature data of a plurality of neighboring feature points of the feature points in the current frame image, further includes a direction vector of the first pixel model and a reprojection error of the feature points in the history image, and the second pixel model includes feature data of a plurality of neighboring feature points of the feature points in the history image, further includes a mean value of the direction vector of the second pixel model and the reprojection error of the feature points in the history image.
Based on this, in this embodiment, the first pixel model of the feature point matched in each current frame image is compared with the second pixel model of the feature point matched with the feature point matched in the current frame image in each history image, that is, one-to-one comparison is performed, as shown in fig. 6, so as to obtain a model comparison result corresponding to the feature point matched in each current frame image.
Specifically, in this embodiment, the first pixel model of the feature point matched in each current frame image is compared with the corresponding second pixel model, which may be that the first pixel model of the feature point matched in each current frame image is respectively compared with each item of the corresponding second pixel model in a plurality of neighboring points, direction vectors and reprojection errors, so as to obtain a comparison result of each item of a plurality of neighboring feature points, direction vectors and reprojection errors corresponding to the feature point matched in each current frame image, and further obtain a model comparison result corresponding to the feature point matched in each current frame image based on the comparison results.
Step 105: and screening out target feature points in the current frame image according to the model comparison result.
The target feature points are feature points of the moving object, which do not belong to corresponding space points in the current frame image.
Specifically, in this embodiment, if the model comparison result indicates that the spatial point corresponding to the matched feature point in the current frame image belongs to a moving object, and if the difference between the feature point matched in the current frame image and the feature point matched in the history image exceeds a certain standard, the feature point matched in the current frame image is a feature point moving relative to the feature point matched in the history image, then it may be determined that the spatial point corresponding to the feature point matched in the current frame image belongs to the moving object, and if the difference between the feature point matched in the current frame image and the feature point matched in the history image does not exceed a certain standard, then it may be determined that the feature point matched in the current frame image is a feature point stationary relative to the feature point matched in the history image, and based on this, the spatial point corresponding to the feature point matched in the current frame image may not belong to the moving object, and based on this, the feature point corresponding to the moving object may be selected, thereby the feature point corresponding to the moving object may be selected.
Step 106: and obtaining the current pose of the mobile equipment according to the reprojection error corresponding to the target feature point in the current frame image.
Specifically, in this embodiment, the re-projection error may be minimized, so as to obtain a pose transformation matrix corresponding to the target feature point when the re-projection error is minimized, where the re-projection error is obtained according to the three-dimensional coordinate value of the space point corresponding to the feature point in the history image matched with the target feature point and the two-dimensional coordinate value of the target feature point in the current frame image, and then the current pose of the mobile device may be obtained according to the pose transformation matrix.
Furthermore, in this embodiment, based on the primarily estimated current pose of the mobile device, the primarily estimated current pose is taken as an optimized initial value, and by minimizing the reprojection error corresponding to the target feature point in the current frame image, a pose transformation matrix corresponding to the time when the reprojection error to the target feature point is minimum is obtained, and then the more accurate current pose of the mobile device is obtained according to the pose transformation matrix.
As can be seen from the foregoing, in the pose obtaining method provided in the first embodiment of the present application, after a current frame image is collected by an image collecting device on a mobile device, feature points in the current frame image, which are matched with feature points in a history image included in a previous sliding window, are obtained, and then after corresponding pixel models of the feature points matched in the current frame image are obtained respectively, the pixel model is compared with a pixel model of the feature points in the history image included in the sliding window, and then, according to a model comparison result, target feature points, which do not belong to a moving object, of the corresponding spatial points in the current frame image can be screened out, and based on the feature points, the current pose of the mobile device can be obtained according to the reprojection errors of the target feature points. Therefore, in the embodiment, the pixel models of the matched characteristic points in the front and rear images are utilized to reject the characteristic points belonging to the moving object, so that the mobile device is positioned according to the screened characteristic points not belonging to the moving object, interference of the characteristic points of the moving object on pose acquisition can be avoided, and positioning accuracy of the mobile device in a dynamic scene is improved.
In order to further improve the positioning accuracy, in step 106, when the current pose of the mobile device is obtained according to the reprojection error corresponding to the target feature point in the current frame image, the method may specifically be implemented by the following steps, as shown in fig. 7:
step 701: and obtaining a weight value corresponding to the target feature point according to the depth value of the space point corresponding to the target feature point.
The depth value of the spatial point corresponding to the target feature point may be obtained according to the depth value in the depth image corresponding to the current frame image in which the target feature point is located, and based on this, the weight value is obtained based on the depth value. For example, the depth value of the spatial point corresponding to the i-th target feature point is ω i The representation, wherein,lambda is a scale constant, d i And the depth value of the spatial point corresponding to the feature point matched with the ith target feature point in the historical image in the depth image corresponding to the current frame image is obtained. This is because the moving object that interferes with positioning is generally located at a position closer to the mobile device, and the position generated by the movement of the moving object farther away on the pixel plane in a shorter time is relatively smaller, and therefore, in this embodiment, the weight value is configured for the target feature point based on the depth value of the spatial point corresponding to the target feature point, thereby improving the accuracy of the subsequent pose acquisition.
Step 702: and obtaining a pose transformation matrix corresponding to the target feature point when the re-projection error of the target feature point is minimum according to the weight value corresponding to the target feature point.
The reprojection error is obtained according to the corresponding three-dimensional coordinate value of the target feature point in the historical image and the two-dimensional coordinate value of the target feature point in the current frame image. For example, in this embodiment, depth values of spatial points corresponding to target feature points in a history image are obtained according to a depth image corresponding to the history image, so as to obtain three-dimensional coordinate values of the spatial points, then the three-dimensional coordinate values are converted into a current coordinate system corresponding to a current frame image by using an initial pose transformation matrix, then two-dimensional coordinate values of the spatial points in the current coordinate system are projected, and then error calculation is performed on the two-dimensional coordinate values and two-dimensional coordinate values of corresponding feature points matched in the current frame image, so as to obtain a reprojection error. Based on this, in this embodiment, the corresponding re-projection errors are weighted according to the weight values corresponding to the target feature points, so as to obtain an optimal pose transformation matrix.
Step 703: and obtaining the current pose of the mobile equipment according to the pose transformation matrix.
In this embodiment, a preset positioning algorithm may be used to calculate the current pose of the mobile device according to the pose transformation matrix.
Specifically, in this embodiment, depth values of feature points in a history image matched with target feature points may be obtained according to a depth image corresponding to the history image, and then three-dimensional coordinate values in space of feature points in the history image respectively matched with each target feature point may be restored, based on which three-dimensional coordinate values of feature points in the history image matched with each target feature point and two-dimensional coordinate values of each target feature point in a current frame image may be formed into a three-dimensional two-dimensional point pair, i.e., a 3D-2D point pair, based on which a reprojection error equation and a corresponding pose transformation matrix may be constructed to The method is characterized in that F (·) is a Huber kernel function, and the function is that when characteristic points are mismatched, the re-projection error is large, the increase of the re-projection error can be limited through the kernel function, and because the Huber kernel function is smooth, the derivation is convenient, and the minimized solution of a re-projection error equation is facilitated. Pi (·) is based on projection transformation of an image acquisition device, such as a camera model, which can change 3D coordinates in a camera coordinate system to 2D pixel coordinates in a camera image, i is a sequence number of 3D-2D point pairs, n pairs in total, u i =[x i ,y i ] T For two-dimensional coordinates of 2D point in 3D-2D point pair in pixel plane, i.e. current frame image, P i =[X i ,Y i ,Z i ] T The 3D point in the pair of 3D-2D points is emptyThe middle is the three-dimensional coordinates of the corresponding space point in the history image.
Based on this, the most suitable pose transformation matrix can be obtained when minimizing the function of the re-projection error. Specifically, in this embodiment, the re-projection error function may be solved by nonlinear optimization through EPNP algorithm, so as to obtain the pose transformation matrix. Based on this, in this embodiment, with the initially estimated current pose of the mobile device as an initial value, the pose transformation matrix may be used to obtain a more accurate current pose of the mobile device.
Further, in order to improve accuracy of the first pixel model, in this embodiment, neighbor feature points that can most represent characteristics of feature points in the current frame image need to be screened out from neighbor feature points of feature points matched in the current frame image, based on this, when the first pixel model of each feature point matched in the current frame image is obtained in step 103 in this embodiment, the method may be specifically implemented by the following steps, as shown in fig. 8:
step 801: and respectively taking the matched characteristic points in each current frame image as the center to obtain a plurality of neighbor characteristic points of the matched characteristic points in the current frame image.
The depth value of the neighboring feature point of the feature point matched in the current frame image is different from the depth value of the neighboring feature point corresponding to the feature point matched in the current frame image, that is, the depth value of the neighboring feature point of the feature point matched in the current frame image is different from the depth value of the feature point matched in the current frame image serving as a center, the neighboring feature point of the feature point matched in the current frame image is characterized in that the depth value is larger than the target depth and the distance between the neighboring feature point and the feature point matched in the current frame image meets the distance sorting rule, and the target depth is related to the depth average value of the 24 neighboring pixel points of the feature point matched in the current frame image.
Taking N neighbor feature points as an example, the specific screening method is as follows:
firstly, taking the matched characteristic points in each current frame image as a center, and obtaining pixel points in a 24 neighborhood or an 8 neighborhood of the matched characteristic points in the current frame image, as shown in fig. 9;
then, calculating the average value of the depth values of the pixels in the 24 neighborhood or 8 neighborhood of the matched characteristic point p' in the current frame image toRepresentation, e.g.)>d i Depth value of pixel point of 24 neighborhood; based on this, a depth value larger than the target depth such as +.>Is a pixel of (1);
Then, calculating the distance between the matched feature point p' in the current frame image and each neighboring feature point, such as Euclidean distance, and making the depth value larger thanAnd the neighbor feature points with the distances from the feature point p' matched in the current frame image to the N previous neighbor feature points are sequenced from small to large and serve as neighbor feature points for establishing the first pixel model.
The above screening scheme can exclude the situation that the feature point p' matched with the current frame image belongs to the same rigid body, namely, the neighbor feature point of the same object is also added to the establishment of the first pixel model, because the pixel points are most likely to be positioned on the same rigid body under the condition of depth approach, even if the motion changes, the relationship between the pixel points does not change too much, therefore, the neighbor feature point which is likely to belong to the same rigid body with the feature point matched with the current frame image is removed in the embodiment, and the accuracy of the first pixel model can be improved.
Step 802: and establishing a first pixel model of the matched feature points in the current frame image at least according to the plurality of neighbor feature points of the matched feature points in the current frame image.
Specifically, in this embodiment, a first pixel model may be established according to feature data of the plurality of neighboring feature points, where the first pixel model may include feature data of each of the plurality of neighboring feature points, and further include direction vectors corresponding to the plurality of neighboring feature points and a reprojection error of a feature point matched in the current frame image.
Further, in step 104, when the first pixel model of the feature point matched in each current frame image is compared with the corresponding second pixel model, the following steps may be specifically implemented, as shown in fig. 10:
step 1001: and obtaining the number of the matched neighbor feature points in the first pixel model and the corresponding second pixel model.
The matched neighbor feature points refer to that a difference value between the neighbor feature points corresponding to the first pixel model and the neighbor feature points corresponding to the second pixel model in the first pixel model is smaller than or equal to a difference threshold value. A first pixel model { p ' of the feature point p ' matched in the current frame image ' 1 ,p′ 2 ,p′ 3 …p′ N A second pixel model of n ', r' } and the matched feature point p in the history imageFor example, in this embodiment, the neighboring feature points in the first pixel model and the corresponding neighboring feature points in the second pixel model may be respectively compared, so as to obtain one or more difference values in dimensions such as a distance, a pixel value, and a depth value, where when the difference value is smaller than the corresponding difference threshold, the neighboring feature points in the first pixel model and the corresponding neighboring feature points in the second pixel model are considered to be similar, i.e. matched, and if the difference value is greater than or equal to the corresponding difference threshold, the neighboring feature points in the first pixel model and the corresponding neighboring feature points in the second pixel model are considered to be unmatched.
Based on this, in this embodiment, the number of neighboring feature points that are matched, that is, the number of neighboring feature points in which the difference value in the second pixel model corresponding to the first pixel model is smaller than or equal to the difference threshold value, is obtained.
Step 1002: a modulo length of a difference between the direction vector in the first pixel model and its corresponding direction vector in the second pixel model is obtained.
In the case where the moving object moves perpendicular to the camera plane, there is a case where the similarity between neighboring feature points in the two pixel models is extremely high, that is, the difference value is extremely low, so, in order to further improve the accuracy, the comparison condition of the direction vectors is increased in this embodiment, based on which the modulo length of the difference between the direction vectors of the two pixel models needs to be obtained in order to be n-n' | 2 N is the direction vector in the second pixel model.
Step 1003: and judging whether the re-projection error of the matched characteristic points in the current frame image in the first pixel model is smaller than or equal to the average value of the re-projection errors corresponding to the matched characteristic points in the historical image in the second pixel model, so as to obtain a judging result.
The average value is the average value of the cumulative heavy projection errors of the matched characteristic points in the historical images in the sliding window.
It should be noted that, in order to pass through the possibility that the cumulative contrast feature point of the reprojection error is a dynamic point, the average value in the present embodiment is calculated byRepresenting that m is the number of history images in the sliding window,/-, and>and (5) the corresponding reprojection error of the characteristic point p matched in the historical image in the i-th frame historical image in the sliding window is obtained. Based on this, in this embodiment, the reprojection error r of the feature points matched in the current frame image in the first pixel model p′ Mean->And judging the size of the image, thereby obtaining a judging result of whether the average value is larger than the re-projection error of the feature point matched in the current frame image in the first pixel model.
The execution sequence between the step 1002 and the step 1003 and the step 1001 may be changed, for example, the step 1002 may be executed first, then other steps may be executed, or three steps may be executed simultaneously, and the obtained different technical solutions belong to the same inventive concept, which are all within the protection scope of the present application.
Step 1004: and obtaining a model comparison result according to the number, the module length and the judgment result.
Under the condition that the number exceeds a preset value, the module length is smaller than the preset value and the reprojection error is smaller than the mean value, the obtained model comparison result represents that the spatial point corresponding to the matched feature point in the current frame image does not belong to a moving object, namely a static point.
That is, in the case that at least the number exceeds a preset value such as 3 or 5, in order to further improve accuracy, it is determined whether the module length of the direction vector difference is within a certain range such as less than a preset value δ and whether the re-projection error is less than a mean value, to obtain a model comparison result.
Taking the example of obtaining the model comparison result according to whether the number exceeds the preset numerical value, whether the module length of the direction vector difference value is within a certain range and whether the re-projection error is smaller than the average value:
if the number exceeds the preset numerical value, the module length of the direction vector difference value is in a certain range, and the reprojection error is smaller than the corresponding average value in the sliding window, a model comparison result that the space point corresponding to the matched feature point in the current frame image belongs to the static point, namely does not belong to the moving object can be obtained;
if the number does not exceed the preset numerical value, or the module length of the direction vector difference value exceeds the preset range, or the re-projection error is larger than the corresponding average value in the sliding window, the model comparison result of determining that the spatial point corresponding to the matched feature point in the current frame image belongs to the dynamic point, namely the moving object, can be obtained.
That is, among the above three conditions: the modular length of the direction vector difference value with the quantity exceeding the preset value is within a certain range, namely I N-n' I 2 <Delta, the reprojection error is smaller than the corresponding average value in the sliding windowOnly when all three conditions are met, the spatial point corresponding to the matched feature point in the current frame image can be determined to be a static point, and as long as one condition is not met, the spatial point corresponding to the matched feature point in the current frame image can be determined to be a dynamic point.
In a specific implementation, each feature point in the history image in the sliding window has a pixel model, and in this embodiment, the pixel model of the feature point in the history image, which is matched with the feature point in the current frame image, is recorded as a second pixel model, so as to distinguish the first pixel model of the feature point matched with the current frame image from the pixel model of the feature point without the feature point.
Based on this, further, in order to increase the diversity of the feature points in the pixel model of each feature point in the history image in the sliding window, it is ensured that the feature points in the pixel model of the old image in the sliding window where the history image is located are not easily replaced, in this embodiment, the pixel model in the history image may be updated, as shown in fig. 11 specifically:
step 1101: an update request for a pixel model of a feature point in a history image is obtained.
The update request may be generated automatically or by a user operation, and the update request may include an identifier of one or more feature points in the history image that needs to be updated.
Step 1102: a newly added image within the sliding window is obtained.
The newly added image may be the current frame image which has been subjected to pose positioning processing and is the key frame image, and at this time, the current frame image is already added as a history image into the sliding window and is recorded as the newly added image. It should be noted that, in the case that the current frame image is a key frame image, the current frame image may be added to the sliding window.
It should be noted that the newly added image is a key frame image subjected to positioning processing which is added to the sliding window as a history image when a new image is obtained and is a new current frame image.
Step 1103: and extracting the characteristic points of the newly added image to obtain the characteristic points in the newly added image.
Various feature points can be extracted from the newly added image by a feature extraction algorithm and the like, and each feature point in the newly added image may have a matched feature point in the history image, for example, the feature point in the history image is matched with the feature point in the newly added image about the descriptor, or each feature point in the newly added image may not have a matched feature point in the history image.
Step 1104: and matching the characteristic points in the newly added image with the characteristic points in the historical image in the sliding window to obtain matching characteristic points matched with the characteristic points in the historical image in the newly added image and non-matching characteristic points not matched with the characteristic points in the historical image in the newly added image.
The matching feature points refer to feature points in the newly added image, wherein the feature points are matched with feature points in the historical image, and the non-matching feature points refer to feature points in the newly added image, wherein the feature points are not matched with the feature points in the historical image.
Specifically, in this embodiment, feature points in the newly added image may be matched with feature point ORB descriptors in the history image in the sliding window, so as to obtain matched feature points and non-matched feature points.
Step 1105: and updating the pixel model of the feature points matched with the matched feature points in the historical image according to the matched feature points.
Specifically, in this embodiment, a pixel model may be first established for the matching feature points, where the pixel model of the matching feature points corresponds to a plurality of neighboring feature points, direction vectors, and reprojection errors of the matching feature points, and then, in this embodiment, the neighboring feature points in the pixel model of the matching feature points are used to update the pixel model of the feature points matched with the matching feature points in the history image.
For example, a second pixel model for feature point p in the history imageThe matching feature point p is found in the newly added image F within the sliding window F The feature points have corresponding pixel models M (p F ) For->The updating probability is 1/N; thereafter, p can be selected F Feature point with minimum distance, such as Euclidean distance +.>The second pixel model M (p) is updated.
Based on this, after the pixel feature points in the second pixel model are updated, the direction vector and the re-projection error in the second pixel model can be recalculated. And the updated second pixel model can be used for model comparison, so that a more accurate model comparison result is obtained, and the accuracy of pose acquisition is improved.
Step 1106: the pixel model of the non-matching feature point is obtained such that when the newly added image is taken as a history image in the sliding window, the pixel model of the non-matching feature point is taken as the pixel model of the new feature point in the history image.
That is, after the newly added image is added to the sliding window, the newly added image has been taken as a history image in the sliding window, and thus the pixel model of the non-matching feature point in the newly added image is taken directly as the pixel model of the feature point in the history image. Based on the method, when a new current frame image is used for positioning the current pose of the mobile device, the characteristic points extracted from the new current frame image are matched with the characteristic points extracted from the historical image added with the new image in the sliding window, after the characteristic points matched with the characteristic points in the historical image are matched in the new current frame image, the pixel model of the characteristic points matched in the new current frame image is compared with the pixel model of the characteristic points matched with the pixel model in the historical image, further, the target characteristic points, namely the static points, in the new current frame image are screened out, and then the current pose of the mobile device is accurately estimated based on the reprojection errors corresponding to the static points.
It should be noted that, the execution sequence of the step 1105 and the step 1106 is not limited to the sequence shown in the drawings, for example, the step 1106 may be performed first, then the step 1105 may be performed, or different technical solutions formed by different execution sequences of the step 1105 and the step 1106 may be simultaneously performed, which belong to the same inventive concept, and are all within the protection scope of the present application.
In order to further improve accuracy, after the target feature point in the current frame image is screened out according to the model comparison result in step 105 in this embodiment, before the current pose of the mobile device is obtained in step 106 according to the reprojection error corresponding to the target feature point in the current frame image, the method in this embodiment may further include the following steps, as shown in fig. 12:
step 107: and screening out target characteristic points which meet epipolar constraint with the characteristic points in the historical image from the target characteristic points in the current frame image, and then executing step 106.
That is, considering that there may be missed detection or no matched feature points found in previous processing, the multi-view geometric constraint is introduced in the method for processing, and further, feature points which do not meet epipolar constraint in target feature points are provided, and only target feature points which meet epipolar constraint are reserved, so that accuracy of subsequent pose acquisition is improved.
Referring to fig. 13, a schematic structural diagram of a pose acquisition device according to a second embodiment of the present application may be configured in an electronic device capable of processing an image, such as a mobile device with an image acquisition device, such as a mobile robot, or may be a device connected to the mobile device, such as a computer or a server. The technical scheme in the embodiment is used for improving the accuracy of pose acquisition of the mobile equipment.
Specifically, the apparatus in this embodiment may include the following units:
an image obtaining unit 1301, configured to obtain a current frame image, where the current frame image is an image acquired by an image acquisition device on a mobile device;
a feature point processing unit 1302, configured to obtain feature points matched in the current frame image, where the feature points matched in the current frame image are feature points with matched feature points in a history image included in a sliding window corresponding to the current frame image; the sliding window comprises a plurality of frames of historical images, wherein the historical images are key frame images before the current frame image;
a model building unit 1303, configured to obtain a first pixel model of a feature point matched in each current frame image, where the first pixel model corresponds to a plurality of neighboring feature points of the feature point matched in the current frame image, and the first pixel model further has a direction vector of the first pixel model and a re-projection error of the feature point matched in the current frame image;
A model comparison unit 1304, configured to compare a first pixel model of the feature points matched in each current frame image with a corresponding second pixel model, so as to obtain a model comparison result, where the model comparison result characterizes whether a spatial point corresponding to the feature points matched in the current frame image belongs to a moving object; the second pixel model is a pixel model of a feature point matched with the current frame image in the historical image, the second pixel model corresponds to a plurality of neighbor feature points of the feature point matched with the historical image, and the second pixel model also has a direction vector of the second pixel model and a mean value of reprojection errors of the feature point matched with the historical image;
a feature point screening unit 1305, configured to screen out a target feature point in the current frame image according to the model comparison result, where the target feature point is a feature point that a corresponding spatial point in the current frame image does not belong to a moving object;
and a pose obtaining unit 1306, configured to obtain a current pose of the mobile device according to a reprojection error corresponding to the target feature point in the current frame image.
As can be seen from the foregoing, in the pose obtaining device provided in the first embodiment of the present application, after a current frame image is collected by an image collecting device on a mobile device, feature points in the current frame image, which are matched with feature points in a history image included in a previous sliding window, are obtained, and then after pixel models corresponding to a plurality of neighboring feature points of the feature points in the current frame image are respectively obtained, the pixel model is compared with a pixel model of the feature points in the history image included in the sliding window, and then, according to a model comparison result, target feature points, which do not belong to a moving object, of the corresponding space points in the current frame image can be screened out, and based on the feature points, the current pose of the mobile device can be obtained according to the reprojection errors of the target feature points. Therefore, in the embodiment, the pixel models of the matched characteristic points in the front and rear images are utilized to reject the characteristic points belonging to the moving object, so that the mobile device is positioned according to the screened characteristic points not belonging to the moving object, interference of the characteristic points of the moving object on pose acquisition can be avoided, and positioning accuracy of the mobile device in a dynamic scene is improved.
In one implementation, the pose obtaining unit 1306 is specifically configured to: obtaining a weight value corresponding to the target feature point according to the depth value of the space point corresponding to the target feature point; obtaining a pose transformation matrix corresponding to the target feature point when the reprojection error of the target feature point is minimum according to the weight value corresponding to the target feature point, wherein the reprojection error is obtained according to the three-dimensional coordinate value corresponding to the target feature point in the historical image and the two-dimensional coordinate value of the target feature point in the current frame image; and obtaining the current pose of the mobile equipment according to the pose transformation matrix.
In one implementation, the model building unit 1303 is specifically configured to: respectively taking the matched feature points in each current frame image as the center to obtain a plurality of neighbor feature points of the matched feature points in the current frame image; the method comprises the steps that a feature point matched in a current frame image and a neighbor feature point corresponding to the feature point are feature points with depth values larger than target depth, and the distance between the feature point matched in the current frame image and the feature point meets a distance sorting rule, wherein the target depth is related to the depth average value of neighbor pixel points of the feature point matched in the current frame image; and establishing a first pixel model of the matched feature points in the current frame image at least according to the plurality of neighbor feature points of the matched feature points in the current frame image, so that the first pixel model corresponds to the plurality of neighbor feature points of the matched feature points in the current frame image and also has a direction vector of the first pixel model and a re-projection error of the matched feature points in the current frame image.
The direction vector of the first pixel model is obtained based on the three-dimensional coordinate value of the spatial center point corresponding to the neighbor feature point in the first pixel model and the three-dimensional coordinate value of the spatial point corresponding to the feature point matched in the current frame image, and the reprojection error of the feature point matched in the current frame image is obtained based on the historical image.
Based on this, the model comparison unit 1304 is specifically configured to: obtaining the number of the neighbor feature points matched in the first pixel model and the corresponding second pixel model; obtaining a modular length of a difference between the direction vector in the first pixel model and the direction vector in the corresponding second pixel model; judging whether the re-projection error of the matched characteristic points in the current frame image in the first pixel model is smaller than or equal to the average value of the re-projection errors corresponding to the matched characteristic points in the history image in the second pixel model, so as to obtain a judging result; the mean value is the mean value of the cumulative heavy projection errors of the feature points matched in the historical images in the sliding window; and obtaining a model comparison result according to the number, the module length and the judgment result.
In one implementation, the apparatus in this embodiment may further include the following units, as shown in fig. 14:
a model updating unit 1307 for: obtaining an update request for a pixel model of a feature point in the history image; obtaining a newly added image in the sliding window; extracting feature points of the newly added image to obtain feature points in the newly added image; matching the characteristic points in the newly added image with the characteristic points in the historical image in the sliding window to obtain matching characteristic points matched with the characteristic points in the historical image in the newly added image and non-matching characteristic points not matched with the characteristic points in the historical image in the newly added image; updating a pixel model of the feature points matched with the matching feature points in the historical image according to the matching feature points; and obtaining the pixel model of the non-matching characteristic point, so that when the newly added image is used as a historical image in the sliding window, the pixel model of the non-matching characteristic point is used as the pixel model of the new characteristic point in the historical image.
In one implementation, before obtaining the current pose of the mobile device according to the reprojection error corresponding to the target feature point in the current frame image, the feature point filtering unit 1305 is further configured to: and screening out target characteristic points which meet epipolar constraint with the characteristic points in the historical image from the target characteristic points in the current frame image.
In one implementation, the feature point processing unit 1302 is specifically configured to: extracting characteristic points in the current frame image; and matching the characteristic points extracted from the current frame image with the characteristic points extracted from the historical image to obtain the characteristic points matched with the characteristic points in the historical image in the current frame image.
In one implementation, the feature point processing unit 1302 is further configured to: and carrying out re-projection error calculation based on the matched characteristic points in the current frame image, and primarily estimating the current pose of the mobile equipment based on the calculated re-projection error.
It should be noted that, the specific implementation of each unit in this embodiment may refer to the corresponding content in the foregoing and will not be described in detail herein.
Referring to fig. 15, a schematic structural diagram of a mobile device according to a third embodiment of the present application may be an electronic device capable of processing an image, such as a mobile device with an image capturing device, such as a mobile robot, or may be a device connected to the mobile device, such as a computer or a server. The technical scheme in the embodiment is used for improving the accuracy of pose acquisition of the mobile equipment.
Specifically, the mobile device in this embodiment at least includes the following structures:
a memory 1501 for storing an application program and data generated by the operation of the application program;
a processor 1502 for executing the application program to implement:
obtaining a current frame image, wherein the current frame image is an image acquired by an image acquisition device on mobile equipment;
obtaining the matched characteristic points in the current frame image, wherein the matched characteristic points in the current frame image are characteristic points with matched characteristic points in a historical image contained in a sliding window corresponding to the current frame image; the sliding window comprises a plurality of frames of historical images, wherein the historical images are key frame images before the current frame image;
respectively obtaining a first pixel model of the matched feature points in each current frame image, wherein the first pixel model corresponds to a plurality of neighbor feature points of the matched feature points in the current frame image, and the first pixel model also has a direction vector of the first pixel model and a re-projection error of the matched feature points in the current frame image;
respectively comparing the first pixel model of the matched characteristic points in each current frame image with the corresponding second pixel model to obtain a model comparison result, wherein the model comparison result represents whether the spatial points corresponding to the matched characteristic points in the current frame image belong to a moving object or not; the second pixel model is a pixel model of a feature point matched with the current frame image in the historical image, the second pixel model corresponds to a plurality of neighbor feature points of the feature point matched with the historical image, and the second pixel model also has a direction vector of the second pixel model and a mean value of reprojection errors of the feature point matched with the historical image;
Screening out target feature points in the current frame image according to the model comparison result, wherein the target feature points are feature points of which the corresponding space points in the current frame image do not belong to a moving object;
and obtaining the current pose of the mobile equipment according to the reprojection error corresponding to the target feature point in the current frame image.
As can be seen from the above technical solution, in the mobile device according to the third embodiment of the present application, after the current frame image is acquired by the image acquisition device on the mobile device, the feature points in the current frame image, which are matched with the feature points in the history image included in the previous sliding window, are acquired, and then after the pixel models corresponding to the multiple neighbor feature points of the feature points in the current frame image are respectively obtained, the pixel model is compared with the pixel model of the feature points in the history image included in the sliding window, and then the target feature points, which do not belong to the moving object, of the corresponding space points in the current frame image can be screened out according to the model comparison result, based on this, the current pose of the mobile device can be obtained according to the reprojection errors of the target feature points. Therefore, in the embodiment, the pixel models of the matched characteristic points in the front and rear images are utilized to reject the characteristic points belonging to the moving object, so that the mobile device is positioned according to the screened characteristic points not belonging to the moving object, interference of the characteristic points of the moving object on pose acquisition can be avoided, and positioning accuracy of the mobile device in a dynamic scene is improved.
Under the condition that the electronic equipment is mobile equipment, the electronic equipment further comprises an image acquisition device such as a camera.
It should be noted that, the specific implementation of the processor in this embodiment may refer to the corresponding content in the foregoing and will not be described in detail herein.
Taking the positioning of a robot as an example, the following describes in detail the implementation of SLAM according to the technical scheme of the present application:
first, the inventors of the present application found that the following drawbacks exist in the robot positioning process using SLAM:
in the adopted visual SLAM framework, it is necessary to assume that the environment is a static environment, or that static features in the environment occupy a majority of quasi-static scenes. The visual SLAM system based on the direct method or the characteristic point method is difficult to deal with the interference of a moving object, and in the SLAM operation process, if the solution to the pose of a camera is completely dependent on the minimum energy function or the random sampling consistency RANSAC method, when static characteristic points do not occupy most in the whole image or occupy less, in limited iterative calculation, the correct camera motion model is often difficult to obtain;
the inventors of the present application continued to test the positioning scheme based on the above drawbacks, and also found that: for SLAM algorithm based on inertial measurement unit IMU (Inertial Measurement Unit) and solving dynamic environment, the method does not solve the problem of dynamic environment from essence, but only provides a motion prior for a system, increases the accuracy of feature point matching, and can cause a large number of mismatching in scenes with larger environmental similarity; the additional sensor can increase the complexity of a positioning algorithm and the hardware cost of the sensor;
Since the dynamic object is not necessarily a moving object, if the predefined dynamic object is excluded for blind purposes, this will easily cause the information in the image to be greatly reduced, while the current semantic dynamic SLAM is adopted to identify the dynamic object and reject it, which will make its applicability in scenes where feature points are rare worse, and introducing a deep learning network will have high requirements on the system computing power, and in a hardware platform without graphic processor GPU (Graphics Processing Unit), the algorithm is difficult to achieve real-time performance, if GPU is added, the hardware cost will be increased, so that the product application is greatly limited. On the other hand, the target detection algorithm for deep learning needs a large amount of data for pre-training, which can limit the exploration of the robot based on the visual SLAM positioning algorithm to an unknown environment to a certain extent;
the inventors of the present application also found that: for an algorithm based on motion segmentation, the motion of a camera and a rigid dynamic object in a scene can be estimated at the same time, and the method does not depend on semantic information of images, and performs motion estimation by clustering points with the same motion into a motion model, so that different motion rigid bodies in the dynamic scene are segmented. However, the method has higher requirements on conditions of a scene, and under the condition that the depth of a measurable 3D point in the scene is too small, the error of motion estimation of a camera and a dynamic object is larger, and due to inaccuracy of cluster segmentation, a certain cluster can contain a plurality of motion rigid bodies.
Therefore, in order to solve the problem that the accuracy of the traditional visual SLAM positioning algorithm is reduced in a dynamic environment, the inventor of the application provides a method for eliminating a dynamic region and abnormal feature points in a scene based on a feature point modeling and multi-view geometry method. The method takes the visual SLAM as a main body, a model is built in a sliding window and is used as a characteristic point, the characteristic point corresponding to a dynamic object in a scene on an image is effectively removed by comparing the model with a current frame, and the robust estimation of the camera pose under the dynamic environment is realized. The method comprises the following steps:
in connection with the technical framework of the present application shown in fig. 16, the present application can be divided into five parts: and calculating a rough pose transformation matrix, dynamic region judgment, feature point modeling, epipolar constraint judgment outlier and accurate pose estimation based on the minimized reprojection error. These five parts are described in detail below:
1) Computing pose estimates based on minimizing reprojection errors
The pose estimation flowchart 17 shows. The pose estimation in the application adopts a rough-to-fine mode. The pose estimation of this step belongs to rough pose estimation, and the core is to calculate the motion of a 3D-2D point pair by constructing a reprojection error. The algorithm comprises the following steps and principles:
(1) Feature points are extracted from the current frame image. Feature points are representative points in an image, and are another digital representation of image information. In the invention, we extract ORB features from the image, which consist of key points and descriptors. The key point is called "Oriented FAST", which is a modified FAST corner with scale and rotation invariance. The description adopts binary BRIEF description. In the feature extraction process, the feature points may be concentrated in a region due to the fact that the texture is rich somewhere in the image, so that pose estimation tends to move locally. To improve system accuracy, we increase the uniformity of feature extraction: dividing the image into small squares of S, extracting characteristic points of each square, and distributing the image characteristics to the whole image field as much as possible.
(2) ORB descriptor matching is carried out on the feature points extracted from the current frame and the corresponding feature points in the last key frame, DBOW2 word bag technology is adopted to carry out accelerated matching on the feature points extracted from the current frame and the feature points in the last key frame, after matching is finished, whether the number of the feature point matching reaches the required number is judged, if so, the next construction of a reprojection error is directly carried out, and the pose of the camera is estimated. If the matching condition is not met, the matching condition is relaxed, and the DBOW2 word bag technology is utilized again to accelerate matching of the feature points extracted by the current frame and the local map points constructed by the previous frame.
(3) And obtaining the depth value of the characteristic point of the previous key frame by using the depth image data input by the previous key frame sensor, and restoring the 3D coordinates of the characteristic point. Forming 3D-2D point pairs by the previous 3D points and the 2D pixel points of the current frame, and constructing a reprojection error equation such as
Wherein omega i Is a predefined weight, and its calculation formula is:lambda is a scale constant, d i And the depth value of the 3D point in the ith characteristic point pair. The addition of this weight is a novel angle. It is considered in two aspects: in an indoor scene, a moving object is positioned at a position close to a robot; the displacement of the pixel plane caused by the movement of the 3D point moving farther away in a shorter time is negligible. F (·) is a Huber kernel function, which has the effect that when characteristic points are mismatched, the re-projection error is large, the increase of the re-projection error can be limited by the kernel function, and the solution of a re-projection error equation is facilitated because the Huber kernel function is smooth and is convenient to derive. Pi (·) is a projective transformation based on the camera model that can change the 3D coordinates in the camera coordinate system to 2D pixel coordinates in the camera image. u (u) i =[u i ,v i ] T For the coordinates of the 2D point in the 3D-2D point pair in the pixel plane, P i =[X i ,Y i ,Z i ] T Is the coordinates of the 3D point in space in the 3D-2D point pair. And finally, carrying out nonlinear optimization solution through an EPNP algorithm to obtain the primarily estimated current pose of the robot.
2) Feature point modeling
This step is the core of the present application, and is mainly to build a staged model for feature points, i.e. the pixel model in the foregoing, and because the model is replaced and updated in a sliding window, the staged model is called staged, and the flowchart is shown in fig. 18. The purpose of the model establishment is to judge whether the feature points are dynamic or not by comparing the feature points of the current frame with the matched feature point model and comparing the difference between the feature points of the current frame and the feature point model.
The model is built and initialized as follows: first, for each feature point p, i.e., a feature point in the history image, a pixel model M (p), i.e., a second pixel model, is established, wherein Wherein (1)>And N is the number of the selected feature points, wherein the depth of the feature point p in the space is inconsistent with that of the neighbor feature points. The selection of the neighbor feature points is not random, firstly, the average value of the depth values of the 24 neighborhood pixels of the feature point p is required to be calculated>Wherein (1)>Then, the Euclidean distance between the characteristic point p and other characteristic points is calculated, and the depth value is larger than +. >The Euclidean distance between the feature point and the feature point p is from small to large, and the feature point of the previous N is taken as a depth inconsistent neighbor feature point.
This step is considered by excluding feature points belonging to the same rigid body as the feature point p from being added to the model, and in the case of depth approximation, the pixel points are most likely to be in the same rigid body, and even if the motion is changed, the relationship between them does not change much. n is the direction vector of the model, which is the position of the spatial center point of the 3D map point corresponding to all the neighbor feature points in the modelThen subtracted from the position (x, y, z) of the corresponding 3D map point of the feature point p, i.e.>r p Cumulative sum of reprojection errors, +.>At the time of initialization, 0 is obtained, and m is the number of frames of the sliding window. The model of this step is built up from previous frames, i.e. frames within a sliding window,and initializing the first frame, and then updating the model by the frame.
3) Dynamic feature point determination and model update
Before judging the characteristic points of the current frame, a pixel characteristic point model, namely a first pixel model, is also required to be built for the effective characteristic points of the current frame (namely, the characteristic points with matched characteristic points in the historical image), similar to the steps above, the characteristic points p 'namely, N depth inconsistent neighbor characteristic points of the characteristic points in the current frame image are firstly searched, then the direction vector N' of the model is calculated, and finally the re-projection error r calculated in the first step is added p' The inventor of the present application can set three judging conditions to judge whether a certain feature point of the current frame is static, and if the following three conditions are satisfied at the same time, the feature point is a static point:
(1) Finding a feature point (also called a feature point model) matched with the feature point p' of the current frame in a feature point pixel model, namely a second pixel model by using the result of feature point matching in the step 1), and using neighbor feature points in the feature point modelNeighbor feature point { p 'to previous frame feature point p' 1 ,p' 2 ,p' 3 …p' N Matching with preset N th The matching of the above feature points is successful, namely the condition is satisfied.
(2) Considering that the dynamic object moves perpendicular to the camera plane, the situation that the similarity is extremely high between the depth inconsistent neighbor feature points in the feature point pixel model and N depth inconsistent neighbor feature points of the current frame feature point p ' can occur, and the direction vector comparison condition is further increased, namely if the module length of the difference value between the direction vector N in the feature point model and the direction vector N ' in the current frame feature point p ' model is within a certain range, the requirement that the feature point is static is met: i n-n' | 2 <δ。
(3) This term can be considered as an extra term constraint, mainly to compare the likelihood of feature points being dynamic points by accumulation of reprojection errors, where reprojection is The projection error calculation uses the result of the first step calculation. The formula is expressed asm is the number of frames of the sliding window, which means that the average value of the accumulated re-projection errors of the point p in the characteristic point model in the sliding window is obtained, if the re-projection errors of the characteristic point p 'of the current frame are smaller than the average value of the re-projection errors of the point p in the characteristic point model in the sliding window, the characteristic point p' can be more firmly believed to be a static point, and the judging condition is satisfied.
Further, as the robot acquires the image, the robot processes the located current frame image as a history image, for example, moves into a sliding window, and simultaneously acquires a new current frame image. Based on this, when there is update in the history image in the sliding window, the model of the static feature point p in the history image in the sliding window, that is, the second pixel model, is updated, in this application, a random update strategy is adopted, and it is mainly done to ensure that the model pixel point generated by the old frame in the sliding window is not easily replaced, so as to increase the diversity of the pixel feature point in the model:
first, a feature point model for a feature point p in a sliding windowIf the matched characteristic point p is found in the characteristic points of the current frame F And the feature point is judged as a static feature point, and a feature point model M (p F ) Based on this, a model for feature points in the sliding windowEach point of the list is given equal update probability, that is, each point has the same update probability, and the update probability is +.>
Next, for the model M (p F ) Middle pairWhich feature point model is updated may be defined first by the update probability, i.e., model M (p F ) Only the probability η probability updates the model of the feature point p. Model M (p) F ) The model of the feature point p can be updated, and the feature point p is selected and selected F Pixel characteristic point with minimum Euclidean distanceThe model M (p) is updated. Also, after updating the pixel feature points in the model M (p), the direction vector n of the model can be recalculated and the reprojection error r can be updated p . For the current frame feature points where the matching points are not found in the sliding window, the application considers that the current frame feature points are added into the feature point model according to an initialization step so as to facilitate the next feature point matching and model comparison.
4) Polar constraint judgment outer point
After dynamic point removal (screening) with feature point models, the present application deals with by introducing constraints of multi-view geometry, considering that there may be a problem of missed detection or no matching feature points found in the feature point model in the previous step. The fact underlying the present application is that in a static scenario, dynamic features may violate standard constraints in multi-view geometry, as shown in fig. 19.
Wherein x is 1 、x 2 And x' 2 Is three points on pixel coordinates, if 3D point P 1 Is a static point, then x 1 And x 2 Is a pair of matching points, they will satisfy equation x 2 Fx 1 The base matrix= 0,F can be obtained by camera motion rotation matrix R and translation vector t. l (L) 1 、l 2 Called epipolar, the geometric constraint is also called epipolar constraint. If 3D point P 1 For dynamic point, i.e. at the next moment it moves to P 2 Point, then P 2 The projection on the camera plane becomes x' 2 It is apparent that x 1 And x' 2 Not meeting epipolar constraint, i.e. x 2 Fx 1 Not equal to 0 and far greater than 0, the application is to further reject the missed dynamic 3D point by the multi-view constraint.
5) Accurate pose estimation
The method is characterized in that the surviving characteristic points of the current frame after the previous step, namely the target characteristic points and the static characteristic points in the local map are locally optimized, and the pose estimation of the current frame is optimized by calculating the minimum value of the reprojection error function. The reprojection error function is in the form of Wherein P is i For 3D point in local map matched with static feature point of current frame, u i The coordinates of the static feature points in the current frame in the pixel plane.
Therefore, the inventor of the application proposes a concept of modeling the feature points when dealing with the positioning problem in the dynamic environment, and eliminates the dynamics by comparing the feature points with the model. And when the re-projection error function is calculated, a pre-weighting concept is introduced, and the error value is pre-weighted by utilizing the concept that a far dynamic point cannot generate larger displacement in a pixel plane in a continuous frame and a static point is easier to be a far background.
Furthermore, the application provides a modeling method using depth inconsistent feature points, the idea is to model the feature points by using surrounding environment information instead of modeling the surrounding information of pixels, and the dynamic and static feature points can be effectively distinguished by judging the three conditions of the model and the feature points to be judged. In the method, a sliding window method is adopted to judge a plurality of continuous frames, and the accumulated information is used for judging the characteristic points, so that the defect of unobvious movement between two short frames is avoided, and the accuracy of static and dynamic distinction of the characteristic points is effectively improved.
Therefore, in the method, different from the existing dynamic processing environment mode, a characteristic point modeling mode is adopted, a pixel model is established by utilizing information of characteristic points in time and space, pixels are represented by adopting depth-inconsistent neighbor characteristic points in space, a random updating strategy is adopted in time, and the amount of past information contained in the model can be increased. And three conditions are set to judge the characteristic points, only the characteristic points which simultaneously meet the three conditions can be judged to be static points, and then the characteristic points are utilized to accurately estimate the pose. Furthermore, the method for utilizing the information of the surrounding environment of the pixels is different from a method for pre-training and motion segmentation by deep learning, does not need to know the environment of a scene in advance, does not need to segment different motion rigid bodies in the scene, reduces the requirement on the level of hardware, and is a SLAM algorithm for processing the dynamic environment with strong practicability.
It should be noted that, in terms of the sensor, the method for screening the static feature points in the application is not only applicable to the RGB-D camera, but also applicable to the monocular and binocular cameras, and only the RGB-D camera can directly obtain the depth information of the pixel points from the sensor, and no additional algorithm is needed for calculation. Moreover, the pixel modeling method used in the application models the feature points based on the characteristic that the surrounding pixel feature points of the dynamic feature points also change, and if other schemes can be found to distinguish the same and different places of the dynamic and static feature points along with the change of time, the method is still within the protection scope of the application. In addition, the method is based on an algorithm of visual SLAM, but is also based on environmental characteristic point sampling, but is also applicable to SLAM algorithm using a laser sensor, only other characteristic points for modeling the laser point are needed to be found, then static and dynamic characteristic point judgment conditions are set according to the characteristics of the sampling points obtained by the laser sensor, dynamic and static characteristic points are distinguished, and the pose of the robot is estimated by using the static points.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The pose acquisition method is characterized by comprising the following steps of:
obtaining a current frame image, wherein the current frame image is an image acquired by an image acquisition device on mobile equipment;
obtaining the matched characteristic points in the current frame image, wherein the matched characteristic points in the current frame image are characteristic points with matched characteristic points in a historical image contained in a sliding window corresponding to the current frame image; the sliding window comprises a plurality of frames of historical images, wherein the historical images are key frame images before the current frame image;
respectively obtaining a first pixel model of the matched feature points in each current frame image, wherein the first pixel model corresponds to a plurality of neighbor feature points of the matched feature points in the current frame image, the first pixel model also has a direction vector of the first pixel model and a re-projection error of the matched feature points in the current frame image, and the first pixel model is established based on the matched feature points in the current frame image according to the rest neighbor feature points after redundant neighbor feature points with the same depth values in the plurality of neighbor feature points are removed;
Respectively comparing the first pixel model of the matched characteristic points in each current frame image with the corresponding second pixel model to obtain a model comparison result, wherein the model comparison result represents whether the spatial points corresponding to the matched characteristic points in the current frame image belong to a moving object or not; the second pixel model is a pixel model of a feature point in the historical image, which is matched with the feature point matched with the current frame image, the second pixel model corresponds to a plurality of neighbor feature points of the feature point matched with the historical image, the second pixel model also has a mean value of a direction vector of the second pixel model and a re-projection error of the feature point matched with the historical image, the comparison result is a comparison result of each neighbor feature point, direction vector and re-projection error of the feature point matched with each current frame image, which are respectively compared with a corresponding second pixel model in each neighbor point, direction vector and re-projection error, so as to obtain a comparison result of each neighbor feature point, direction vector and re-projection error corresponding to the feature point matched with each current frame image;
Screening out target feature points in the current frame image according to the model comparison result, wherein the target feature points are feature points of which the corresponding space points in the current frame image do not belong to a moving object;
and obtaining the current pose of the mobile equipment according to the reprojection error corresponding to the target feature point in the current frame image.
2. The method of claim 1, wherein obtaining the current pose of the mobile device according to the reprojection error corresponding to the target feature point in the current frame image comprises:
obtaining a weight value corresponding to the target feature point according to the depth value of the space point corresponding to the target feature point;
obtaining a pose transformation matrix corresponding to the target feature point when the reprojection error of the target feature point is minimum according to the weight value corresponding to the target feature point, wherein the reprojection error is obtained according to the three-dimensional coordinate value corresponding to the target feature point in the historical image and the two-dimensional coordinate value of the target feature point in the current frame image;
and obtaining the current pose of the mobile equipment according to the pose transformation matrix.
3. The method according to claim 1, wherein obtaining the first pixel model of the feature point matched in each current frame image includes:
Respectively taking the matched feature points in each current frame image as the center to obtain a plurality of neighbor feature points of the matched feature points in the current frame image;
the depth values of the feature points matched in the current frame image and the corresponding neighbor feature points are different, the neighbor feature points of the feature points matched in the current frame image are feature points with depth values larger than the target depth and the distance between the feature points matched in the current frame image meets the distance sorting rule, and the target depth is related to the depth average value of the neighbor pixel points of the feature points matched in the current frame image;
and establishing a first pixel model of the matched feature points in the current frame image at least according to the plurality of neighbor feature points of the matched feature points in the current frame image, so that the first pixel model corresponds to the plurality of neighbor feature points of the matched feature points in the current frame image and also has a direction vector of the first pixel model and a re-projection error of the matched feature points in the current frame image.
4. A method according to claim 3, wherein the direction vector of the first pixel model is obtained based on three-dimensional coordinate values of spatial center points corresponding to neighboring feature points in the first pixel model and three-dimensional coordinate values of spatial points corresponding to the feature points matched in the current frame image, and the re-projection error of the feature points matched in the current frame image is obtained based on the history image.
5. The method according to claim 3 or 4, wherein comparing the first pixel model of the feature point matched in each current frame image with the corresponding second pixel model to obtain a model comparison result includes:
obtaining the number of the neighbor feature points matched in the first pixel model and the corresponding second pixel model;
obtaining a modular length of a difference between the direction vector in the first pixel model and the direction vector in the corresponding second pixel model;
judging whether the re-projection error of the matched characteristic points in the current frame image in the first pixel model is smaller than or equal to the average value of the re-projection errors corresponding to the matched characteristic points in the history image in the second pixel model, so as to obtain a judging result; the mean value is the mean value of the cumulative heavy projection errors of the feature points matched in the historical images in the sliding window;
and obtaining a model comparison result according to the number, the module length and the judgment result.
6. The method as recited in claim 5, further comprising:
obtaining an update request for a pixel model of a feature point in the history image;
Obtaining a newly added image in the sliding window;
extracting feature points of the newly added image to obtain feature points in the newly added image;
matching the characteristic points in the newly added image with the characteristic points in the historical image in the sliding window to obtain matching characteristic points matched with the characteristic points in the historical image in the newly added image and non-matching characteristic points not matched with the characteristic points in the historical image in the newly added image;
updating a pixel model of the feature points matched with the matching feature points in the historical image according to the matching feature points;
and obtaining the pixel model of the non-matching characteristic point, so that when the newly added image is used as a historical image in the sliding window, the pixel model of the non-matching characteristic point is used as the pixel model of the new characteristic point in the historical image.
7. The method according to claim 1, wherein after screening out the target feature points in the current frame image according to the model comparison result, before obtaining the current pose of the mobile device according to the reprojection errors corresponding to the target feature points in the current frame image, the method further comprises:
And screening out target characteristic points which meet epipolar constraint with the characteristic points in the historical image from the target characteristic points in the current frame image.
8. The method according to claim 1, wherein obtaining the matched feature points in the current frame image comprises:
extracting characteristic points in the current frame image;
and matching the characteristic points extracted from the current frame image with the characteristic points extracted from the historical image to obtain the characteristic points matched with the characteristic points in the historical image in the current frame image.
9. A pose acquisition device, characterized by comprising:
the image acquisition unit is used for acquiring a current frame image, wherein the current frame image is an image acquired by an image acquisition device on the mobile equipment;
the characteristic point processing unit is used for obtaining the characteristic points matched in the current frame image, wherein the characteristic points matched in the current frame image are characteristic points with the matched characteristic points in the historical images contained in the sliding window corresponding to the current frame image; the sliding window comprises a plurality of frames of historical images, wherein the historical images are key frame images before the current frame image;
The model building unit is used for respectively obtaining a first pixel model of the matched feature points in each current frame image, wherein the first pixel model corresponds to a plurality of neighbor feature points of the matched feature points in the current frame image, the first pixel model also has a direction vector of the first pixel model and a re-projection error of the matched feature points in the current frame image, and the first pixel model is built based on the matched feature points in the current frame image according to the rest neighbor feature points after redundant neighbor feature points with the same depth values in the plurality of neighbor feature points are removed;
the model comparison unit is used for respectively comparing the first pixel model of the matched characteristic points in each current frame image with the corresponding second pixel model to obtain a model comparison result, and the model comparison result represents whether the spatial points corresponding to the matched characteristic points in the current frame image belong to a moving object or not; the second pixel model is a pixel model of a feature point in the historical image, which is matched with the feature point matched with the current frame image, the second pixel model corresponds to a plurality of neighbor feature points of the feature point matched with the historical image, the second pixel model also has a mean value of a direction vector of the second pixel model and a re-projection error of the feature point matched with the historical image, the comparison result is a comparison result of each neighbor feature point, direction vector and re-projection error of the feature point matched with each current frame image, which are respectively compared with a corresponding second pixel model in each neighbor point, direction vector and re-projection error, so as to obtain a comparison result of each neighbor feature point, direction vector and re-projection error corresponding to the feature point matched with each current frame image;
The characteristic point screening unit is used for screening out target characteristic points in the current frame image according to the model comparison result, wherein the target characteristic points are characteristic points of which the corresponding space points in the current frame image do not belong to a moving object;
and the pose obtaining unit is used for obtaining the current pose of the mobile equipment according to the reprojection error corresponding to the target feature point in the current frame image.
10. A mobile device, comprising:
a memory for storing an application program and data generated by the operation of the application program;
a processor for executing the application program to realize:
obtaining a current frame image, wherein the current frame image is an image acquired by an image acquisition device on mobile equipment;
obtaining the matched characteristic points in the current frame image, wherein the matched characteristic points in the current frame image are characteristic points with matched characteristic points in a historical image contained in a sliding window corresponding to the current frame image; the sliding window comprises a plurality of frames of historical images, wherein the historical images are key frame images before the current frame image;
respectively obtaining a first pixel model of the matched feature points in each current frame image, wherein the first pixel model corresponds to a plurality of neighbor feature points of the matched feature points in the current frame image, the first pixel model also has a direction vector of the first pixel model and a re-projection error of the matched feature points in the current frame image, and the first pixel model is established based on the matched feature points in the current frame image according to the rest neighbor feature points after redundant neighbor feature points with the same depth values in the plurality of neighbor feature points are removed;
Respectively comparing the first pixel model of the matched characteristic points in each current frame image with the corresponding second pixel model to obtain a model comparison result, wherein the model comparison result represents whether the spatial points corresponding to the matched characteristic points in the current frame image belong to a moving object or not; the second pixel model is a pixel model of a feature point in the historical image, which is matched with the feature point matched with the current frame image, the second pixel model corresponds to a plurality of neighbor feature points of the feature point matched with the historical image, the second pixel model also has a mean value of a direction vector of the second pixel model and a re-projection error of the feature point matched with the historical image, the comparison result is a comparison result of each neighbor feature point, direction vector and re-projection error of the feature point matched with each current frame image, which are respectively compared with a corresponding second pixel model in each neighbor point, direction vector and re-projection error, so as to obtain a comparison result of each neighbor feature point, direction vector and re-projection error corresponding to the feature point matched with each current frame image;
Screening out target feature points in the current frame image according to the model comparison result, wherein the target feature points are feature points of which the corresponding space points in the current frame image do not belong to a moving object; and obtaining the current pose of the mobile equipment according to the reprojection error corresponding to the target feature point in the current frame image.
CN202110082125.3A 2021-01-21 2021-01-21 Pose acquisition method and device and mobile equipment Active CN112785705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110082125.3A CN112785705B (en) 2021-01-21 2021-01-21 Pose acquisition method and device and mobile equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110082125.3A CN112785705B (en) 2021-01-21 2021-01-21 Pose acquisition method and device and mobile equipment

Publications (2)

Publication Number Publication Date
CN112785705A CN112785705A (en) 2021-05-11
CN112785705B true CN112785705B (en) 2024-02-09

Family

ID=75758237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110082125.3A Active CN112785705B (en) 2021-01-21 2021-01-21 Pose acquisition method and device and mobile equipment

Country Status (1)

Country Link
CN (1) CN112785705B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222940B (en) * 2021-05-17 2022-07-12 哈尔滨工业大学 Method for automatically grabbing workpiece by robot based on RGB-D image and CAD model
CN113361365B (en) * 2021-05-27 2023-06-23 浙江商汤科技开发有限公司 Positioning method, positioning device, positioning equipment and storage medium
CN113847907B (en) * 2021-09-29 2024-09-13 深圳市慧鲤科技有限公司 Positioning method and device, equipment and storage medium
CN113762289A (en) * 2021-09-30 2021-12-07 广州理工学院 Image matching system based on ORB algorithm and matching method thereof
CN114310908B (en) * 2022-01-25 2023-10-24 深圳市优必选科技股份有限公司 Robot control method, robot control device and robot
CN114998433A (en) * 2022-05-31 2022-09-02 Oppo广东移动通信有限公司 Pose calculation method and device, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111780764A (en) * 2020-06-30 2020-10-16 杭州海康机器人技术有限公司 Visual positioning method and device based on visual map
WO2020259248A1 (en) * 2019-06-28 2020-12-30 Oppo广东移动通信有限公司 Depth information-based pose determination method and device, medium, and electronic apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10706582B2 (en) * 2012-09-17 2020-07-07 Nec Corporation Real-time monocular structure from motion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020259248A1 (en) * 2019-06-28 2020-12-30 Oppo广东移动通信有限公司 Depth information-based pose determination method and device, medium, and electronic apparatus
CN111780764A (en) * 2020-06-30 2020-10-16 杭州海康机器人技术有限公司 Visual positioning method and device based on visual map

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
艾青林 ; 余杰 ; 胡克用 ; 陈琦 ; .基于ORB关键帧匹配算法的机器人SLAM实现.机电工程.2016,(05),全文. *

Also Published As

Publication number Publication date
CN112785705A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN112785705B (en) Pose acquisition method and device and mobile equipment
CN111190981B (en) Method and device for constructing three-dimensional semantic map, electronic equipment and storage medium
CN110555901B (en) Method, device, equipment and storage medium for positioning and mapping dynamic and static scenes
CN106940704B (en) Positioning method and device based on grid map
CN108682027A (en) VSLAM realization method and systems based on point, line Fusion Features
CN107329962B (en) Image retrieval database generation method, and method and device for enhancing reality
CN108629843B (en) Method and equipment for realizing augmented reality
CN109410316B (en) Method for three-dimensional reconstruction of object, tracking method, related device and storage medium
CN110472585B (en) VI-S L AM closed-loop detection method based on inertial navigation attitude track information assistance
CN111899334A (en) Visual synchronous positioning and map building method and device based on point-line characteristics
US9299161B2 (en) Method and device for head tracking and computer-readable recording medium
CN112651997B (en) Map construction method, electronic device and storage medium
CN109472820B (en) Monocular RGB-D camera real-time face reconstruction method and device
CN112752028B (en) Pose determination method, device and equipment of mobile platform and storage medium
CN110243390B (en) Pose determination method and device and odometer
CN112734837B (en) Image matching method and device, electronic equipment and vehicle
CN113160130A (en) Loop detection method and device and computer equipment
CN106023256B (en) State observation method towards augmented reality auxiliary maintaining System planes intended particle filter tracking
CN114022567A (en) Pose tracking method and device, electronic equipment and storage medium
CN113920254A (en) Monocular RGB (Red Green blue) -based indoor three-dimensional reconstruction method and system thereof
CN116091588A (en) Three-dimensional object detection method, apparatus, and computer-readable storage medium
CN111402429B (en) Scale reduction and three-dimensional reconstruction method, system, storage medium and equipment
CN117726747A (en) Three-dimensional reconstruction method, device, storage medium and equipment for complementing weak texture scene
CN111583331B (en) Method and device for simultaneous localization and mapping
CN115294358A (en) Feature point extraction method and device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant