CN115393427A - Method and device for determining position and posture of camera, computer equipment and storage medium - Google Patents

Method and device for determining position and posture of camera, computer equipment and storage medium Download PDF

Info

Publication number
CN115393427A
CN115393427A CN202110571419.2A CN202110571419A CN115393427A CN 115393427 A CN115393427 A CN 115393427A CN 202110571419 A CN202110571419 A CN 202110571419A CN 115393427 A CN115393427 A CN 115393427A
Authority
CN
China
Prior art keywords
camera
pose
matching point
feature points
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110571419.2A
Other languages
Chinese (zh)
Inventor
陆潇
周奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110571419.2A priority Critical patent/CN115393427A/en
Publication of CN115393427A publication Critical patent/CN115393427A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a method and a device for determining the position and the posture of a camera, computer equipment and a storage medium; the method and the device for acquiring the image information can acquire at least two images acquired by a camera and rotation information of the camera; extracting feature points of at least two images; performing feature point matching on the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points, and the two-dimensional feature points are matched feature points between the two images; converting corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points; and performing pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera. The scheme can improve the determination efficiency of the camera pose.

Description

Method and device for determining position and posture of camera, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a camera pose, a computer device, and a storage medium.
Background
The poses, i.e. the position and the attitude, of the rigid body in the space and the pose of the rigid body, and the camera poses, i.e. the position of the camera in the space and the orientation of the camera. The camera pose can be regarded as the transformation of the camera from the original position to the current position, and specifically can comprise a translation transformation and a rotation transformation.
In the research and practice processes of the related technology, the inventor of the application finds that, at present, when the camera pose is solved, the difficulty in determining the camera pose is high and the mode is complex due to the fact that the model structure is complex and the number of parameters to be solved is large, so that the method for determining the camera pose needs to be improved.
Disclosure of Invention
The embodiment of the application provides a method and a device for determining the pose of a camera, computer equipment and a storage medium, and can improve the determination efficiency of the pose of the camera.
The embodiment of the application provides a method for determining a camera pose, which comprises the following steps:
acquiring at least two images acquired by a camera and rotation information of the camera;
extracting feature points of at least two images;
performing feature point matching on the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points which are matched with the two images;
converting corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points;
and performing pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
Correspondingly, the embodiment of the present application further provides a device for determining the pose of a camera, including:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring at least two images acquired by a camera and rotation information of the camera;
the extraction unit is used for extracting the characteristic points of at least two images;
the matching unit is used for matching the characteristic points of the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional characteristic points which are matched with each other between the two images;
the conversion unit is used for converting the corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points;
and the operation unit is used for performing pose operation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
In one embodiment, the conversion unit includes:
a calculation subunit, configured to calculate a homography matrix between the two images;
a point pair determining subunit, configured to determine, based on the homography matrix, a target matching point pair from the at least one matching point pair;
and the converting subunit is used for converting the corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points.
In an embodiment, the two images include a first time sequence image and a second time sequence image, and the first time sequence image and the second time sequence image corresponding to the first time sequence image satisfy a preset time sequence association relationship; the conversion subunit is configured to:
selecting two-dimensional feature points of the first time sequence image from the target matching point pair to determine target two-dimensional feature points to be converted; determining corresponding first position and orientation information when the camera collects the first time sequence image; and converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the first attitude information.
In an embodiment, the conversion subunit is configured to:
determining a plane attribute corresponding to the camera when the camera collects the first time sequence image; determining depth-of-field information corresponding to the first time sequence image acquired by the camera based on the plane attribute and the first attitude information; and converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the depth of field information and the first pose information.
In an embodiment, the conversion subunit is configured to:
if the camera is in an initial state when acquiring the first time sequence image, determining first attitude information based on the initial state of the camera; and if the camera is in a non-initial state when acquiring the first time sequence image, determining the time sequence associated image of the first time sequence image based on the time sequence association relation, and determining first position and orientation information based on position and orientation information corresponding to the time sequence associated image.
In an embodiment, the calculation subunit is configured to:
selecting a preset number of matching point pairs from the at least one matching point pair; and calculating a homography matrix between the two images based on the mapping relation between the two images and the selected matching point pair.
In an embodiment, the point pair determining subunit is configured to:
performing point pair screening on the at least one matching point pair according to the homography matrix to obtain a screened matching point pair; and selecting a target matching point pair from the screened matching point pairs.
In an embodiment, the two images include a first time sequence image and a second time sequence image, and the first time sequence image and the second time sequence image corresponding to the first time sequence image satisfy a preset time sequence association relationship; the arithmetic unit includes:
a selecting subunit, configured to select, from the converted pair of matching points, a pair of matching points required for pose calculation;
the operation subunit is used for performing pose operation on the camera according to the rotation information and the selected matching point pair so as to determine corresponding translation information when the camera acquires the second time sequence image;
and the information determining subunit is used for determining second position and orientation information corresponding to the second time sequence image acquired by the camera based on the rotation information and the translation information.
In an embodiment, the pose information of the camera comprises translation information of the camera; the selecting subunit is configured to:
determining the quantity information of the matching point pairs required by the pose operation according to the freedom degree attribute of the translation information; and selecting corresponding matching point pairs from the converted matching point pairs based on the quantity information.
Accordingly, the present application further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for determining a camera pose as shown in the present application.
Accordingly, embodiments of the present application further provide a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for determining a camera pose as shown in the embodiments of the present application when executing the computer program.
The method and the device can acquire at least two images acquired by a camera and rotation information of the camera; extracting feature points of at least two images; performing feature point matching on the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points, and the two-dimensional feature points are matched feature points between the two images; converting corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points; and performing pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
The scheme can acquire the rotation information of the camera and apply the rotation information to the pose of the camera, so that the number of parameters to be solved can be reduced; secondly, the camera pose solution model can be converted from a nonlinear model to a linear model. Therefore, the camera pose can be solved in a loose coupling visual mode, the solving difficulty of the camera pose can be reduced, and a simpler solving mode is provided. In addition, when the scheme is applied to the actual application, the application effect can be greatly improved, for example, when the scheme is applied to the real-time enhancement, a virtual object can be rendered on the screen in real time after the pose of the camera is obtained, so that the effect of the reality enhancement is achieved, and in addition, due to the scheme, the estimation of the pose of the camera can be more stable and accurate, so that the rendered virtual object can be firmly fixed on the screen of the mobile phone and cannot slide along with the screen or directly fly away from the correct position, and the presentation effect of the reality enhancement is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a scene schematic diagram of a method for determining a camera pose provided by an embodiment of the present application;
fig. 2 is a flowchart of a method for determining a camera pose provided by an embodiment of the present application;
fig. 3 is another flowchart of a method for determining a camera pose provided by an embodiment of the present application;
fig. 4 is another flowchart of a method for determining a camera pose provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a camera pose determination apparatus provided in an embodiment of the present application;
fig. 6 is another schematic structural diagram of a device for determining a camera pose provided by an embodiment of the present application;
fig. 7 is another structural schematic diagram of a camera pose determination apparatus provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a method and a device for determining the position and the posture of a camera, computer equipment and a storage medium. Specifically, the embodiment of the application provides a camera pose determination device suitable for a computer device. The computer device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, a notebook computer, and the like. The server may be a single server or a server cluster composed of a plurality of servers.
The embodiment of the application takes a computer device as an example, and introduces a method for determining a camera pose.
Referring to fig. 1, the server 10 may acquire at least two images captured by a camera and rotation information of the camera. For example, the terminal 20 may be equipped with a camera, and thus, the terminal 20 may capture at least two images by the camera, and further, the terminal 20 may transmit the captured images and rotation information of the camera to the server 10, so that the server 10 acquires the at least two images captured by the camera and the rotation information of the camera.
Further, the server 10 may extract feature points of at least two images; performing feature point matching on at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points, and the two-dimensional feature points are feature points matched between the two images; converting corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points; and performing pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
Alternatively, the server 10 may send the calculated pose information of the camera to the terminal 20, so that the terminal 20 may perform further operations based on the camera pose information, for example, after the terminal 20 obtains the pose of the camera, a virtual object may be rendered on the screen in real time, so as to achieve the effect of Augmented Reality (AR), and the object appears to exist as if it really exists.
The following are detailed descriptions. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.
The method for determining the position and the posture of the camera provided by the embodiment of the application can be executed by a terminal or a server, and can also be executed by the terminal and the server together; in the embodiment of the present application, the method for determining a camera pose is executed by a server as an example, specifically, executed by a device for determining a camera pose integrated in the server, as shown in fig. 2, a specific process of the method for determining a camera pose may be as follows:
101. at least two images acquired by the camera and rotation information of the camera are acquired.
The camera may be an element configured on the terminal, for example, a front camera, a rear camera, or a pop-up camera equipped on the terminal. The image acquisition can be performed by a camera, for example, by taking a picture; for another example, the image is acquired by shooting; for another example, the acquisition of the image is realized by establishing a video call; and so on.
The rotation information of the camera is related information which represents the rotation transformation of the camera in the process when the camera is transformed from the original position to the current position. For example, the rotation information of the camera may be presented in the form of a matrix.
It is noted that the poses, i.e., the position and the posture, are the position of the rigid body in the space and the posture of the rigid body itself, and the camera pose, i.e., the position of the camera in the space and the orientation of the camera. Since the camera pose can be regarded as the transformation of the camera from the original position to the current position, specifically, the transformation can include a translation transformation and a rotation transformation, and therefore, the camera pose information can correspondingly include translation information and rotation information.
The server may acquire the at least two images acquired by the camera in various manners, for example, the at least two images may be acquired by the camera, for example, the camera may be a camera equipped on the terminal, and further, the acquired images may be sent to the server, for example, the acquired images may be sent to the server by the terminal, so that the server may acquire the at least two images acquired by the camera.
It should be noted that, in practical applications, the method for determining the pose of the camera according to the present application may be executed by the terminal, so that the terminal may acquire at least two images through the camera and acquire the images acquired by the camera.
For example, the terminal may acquire the rotation information of the camera and transmit the rotation information to the server, so that the server acquires the rotation information of the camera. The terminal may acquire the rotation information of the camera in various ways, for example, the rotation information may be acquired by a sensor, such as a gyroscope sensor; for another example, the rotation information of the camera may be obtained by an Inertial Measurement Unit (IMU); and so on.
The IMU is a device for measuring the three-axis attitude angle (or angular velocity) and acceleration of an object. Generally, an IMU includes three single-axis accelerometers and three single-axis gyroscopes, the accelerometers detect acceleration signals of an object in three independent axes of a carrier coordinate system, and the gyroscopes detect angular velocity signals of the carrier relative to a navigation coordinate system, and measure angular velocity and acceleration of the object in three-dimensional space, and then solve the attitude of the object. Therefore, if the terminal is configured with the IMU, the terminal may acquire data of the gyroscope through the IMU and acquire rotation information from the gyroscope.
In one embodiment, the terminal may be equipped with both a camera and an IMU, and thus, the terminal may capture at least two images by the camera and acquire rotation information corresponding to each image captured by the camera through the IMU. Further, the terminal may transmit the captured images and the acquired rotation information to the server, so that the server may acquire at least two images captured by the camera and the rotation information of the camera.
As an example, the terminal may be equipped with both a camera and an IMU, and the terminal may capture the following n images by the camera: f1 F2, F, wherein n is more than or equal to 2, F1 represents the first frame of image collected by the camera, F2 represents the 2 nd frame of image collected by the camera, and so on, and Fn represents the n th frame of image collected by the camera.
Taking F1 as an example, since the F1 frame can be set to be at the origin of the coordinate system when the system is just started, the corresponding pose of the camera when acquiring F1 can be acquired. Since the pose information of the camera may include rotation information and translation information, referring to equation (1), the pose may be represented as a 4 × 4 matrix T:
Figure BDA0003082824540000071
where R is a 3x3 matrix representing the current rotation of the camera and t is a 3x1 matrix representing the current translation of the camera.
Taking F2 as an example, the terminal may obtain, through the IMU, rotation information corresponding to the time when the camera acquires F2. Similarly, for F3, F4.. Fn, the terminal may acquire, via the IMU, corresponding rotation information when the camera is capturing images.
102. And extracting the characteristic points of at least two images.
The feature points are representative points in the image, and these points will remain unchanged when the image changes, such as rotation and scaling of the image. Corners and edges in an image are relatively more "distinctive" because they are more discernable in different images. Therefore, an intuitive way to extract features is to identify corners between images and determine the correspondence.
As an example, a feature point may be composed of two parts, a key-point (key-point) and a descriptor (descriptor). The key points refer to position information of the feature points on the image, and some of the key points also have other information such as directions and sizes. A descriptor is usually a vector that describes the information of the pixels around the feature points in some artificially designed way. An important design principle of descriptors is that similar appearance features have similar descriptors, so that if the distance between the descriptors of two feature points in a vector space is close, the two feature points can be considered as the same feature point.
In the present application, the extracted feature points may include, but are not limited to, at least one of the following: FAST feature points, harris feature points, GFTT feature points, ORB feature points, SIFT feature points, and the like.
FAST refers to Features from accessed segment test, and is a corner detection method which can be used for extracting feature points and completing tracking and mapping of objects.
Harris refers to a Harris angular point detection method, and the method provides a concept of applying a gray difference value of adjacent pixel points so as to judge whether the angular point, the edge and the smooth area exist. The Harris corner detection principle is that a moving window is utilized to calculate a gray level change value in an image, wherein the key process comprises the steps of converting the image into a gray level image, calculating a difference image, smoothing Gaussian, calculating a local extreme value and confirming a corner.
GFTT is an abbreviation of Good Feature To Track, which is modified on the basis of Harris corner detection and achieves Good results. The algorithm can extract a large number of feature points, and the GFTT can be used when a large number of feature points are needed.
The ORB features consist of key points and descriptors. Its key point is called "Oriented FAST", which is a modified FAST corner, and its descriptor is BRIEF. The BRIEF describes the detected feature points, is a binary coded descriptor, abandons the traditional method of describing the feature points by using a regional gray histogram, greatly accelerates the speed of establishing the feature descriptors, and greatly reduces the time for feature matching.
SIFT, scale-invariant feature transform (SIFT), is a description used in the field of image processing. The SIFT features are local features of the image, the rotation, scale scaling and brightness change of the image are kept unchanged, and the stability of the angle of view change, affine transformation and noise is also kept to a certain degree.
In the present application, there are various ways to extract feature points of an image, and the method may be specifically implemented according to the selected feature points, for example, a FAST method may be used to extract feature points of an image. As an example, feature points on two images F1 and F2 may be extracted using a FAST method, and a feature point set constituted by the feature points extracted by F1 may be denoted as P = { P1, P2.. Multidot.pn }, and a feature point set constituted by the feature points extracted by F2 may be denoted as Q = { Q1, Q2.. Multidot.qn }.
103. And performing feature point matching on the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points, and the two-dimensional feature points are feature points matched between the two images.
Wherein the feature point matching is used to match feature points between two images. Since the feature points are representative points in the image and will remain unchanged when the image changes, such as rotation and scaling of the image, the feature points representing the same feature between two images can be matched through feature point matching.
Specifically, in the present application, after feature point matching is performed on feature points between two images, the feature points matched between the two images can be found, and the two feature points constitute a matching point pair.
It should be noted that the feature points extracted from the images in the present application are two-dimensional feature points, for example, two-dimensional pixel points, and therefore each matching point pair obtained by feature point matching includes two-dimensional feature points, where the two-dimensional feature points are feature points matched between two images.
In the present application, the feature point matching method includes, but is not limited to, at least one of the following: ORB feature matching, SIFT feature matching, brisk feature matching, surf feature matching, LK optical flow matching, and the like.
As an example, ORB feature matching may be implemented by forced matching, specifically, a distance between a descriptor between a feature point of the image 1 and each feature point of the image 2 may be calculated, then the calculated distances are sorted, and a feature point closest to the feature point in the image 2 is selected as a matching point. The descriptor distance represents the similarity degree between two feature points, and different distance measurement norms can be taken in practical application: for floating point type descriptors, the euclidean distance is used for measurement, while for binary descriptors (such as BRIEF), the hamming distance is often used for measurement. Wherein, the Hamming distance is determined by comparing whether each bit of the vector is the same, and if not, adding 1 to the Hamming distance. The higher the vector similarity, the smaller the corresponding hamming distance.
Alternatively, when the number of feature points is large, the computation amount of the brute force matching method becomes large. At this time, ORB feature matching can be implemented by Fast Approximate Nearest Neighbor (FLANN) algorithm, which is more suitable for the case of an extremely large number of matching points.
As an example, SIFT feature matching may be used to adopt the euclidean distance of the feature vectors of the key points as the similarity determination measure of the key points in the two images after the SIFT feature vectors of the two images are generated. For example, a certain key point of image 1 may be taken, and two key points with the nearest distance in image 2 may be found by traversing. Of the two key points, if the closest distance divided by the next closest distance is less than a certain threshold, it is determined as a pair of matching point pairs.
Among them, BRISK is an improvement of BRIEF descriptor, and compared with BRIEF feature, it has rotation invariance, scale invariance and robustness to noise. By way of example, BRISK feature matching may similarly be achieved by computing hamming distances between feature points.
Among them, SURF, i.e. Speeded Up Robust Features (SURF), is a Robust local feature point detection and description algorithm. surf feature matching can be determined by calculating the Euclidean distance between two feature points, wherein the shorter the Euclidean distance is, the better the matching degree of the two feature points is represented. Compared with SIFT feature matching, the SURF feature matching also adds judgment of a Hessian matrix track, if the signs of the matrix tracks of two feature points are the same, the two feature points have contrast change in the same direction, if the signs of the matrix tracks of the two feature points are different, the contrast change directions of the two feature points are opposite, and even if the Euclidean distance is 0, the two feature points are directly excluded.
The LK optical flow matching, namely Lucas-Kanade optical flow algorithm, is an optical flow estimation algorithm of two-frame difference, and is a method for finding out the corresponding relation between the previous frame and the current frame by using the change of pixels in an image sequence on a time domain and the correlation between adjacent frames so as to calculate the motion information of an object between the adjacent frames.
In the present application, at least one matching point pair may be obtained by performing feature point matching between two images, where each matching point pair includes two-dimensional feature points, and the two-dimensional feature points are feature points matched between the two images. In an embodiment, an ORB method may be used to match feature points between the images F1 and F2, and specifically, it may be noted that the set of matching point pairs between F1 and F2 is C = { p1-q1, p2-q 2.,. Pm-qm }, where m ≦ n, because it cannot be guaranteed that the feature point extracted at F1 can always find a corresponding point on the F2 image; for each element in the set, the feature point before the separation number "-" is the feature point of the image F1, and the feature point after the separation number "-" is the feature point of the image F2, and a matching point pair is formed by these two feature points.
104. And converting the corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points.
The manner of calculating the camera pose may be various, and for example, the camera pose may be calculated by a Perspective-n-Point (Perspective-n-Point) algorithm. Specifically, pnP can compute the 6-degree-of-freedom pose of the camera pose through n pairs of 3d spatial points corresponding to 2d image points, and thus PnP is essentially solving a nonlinear equation containing 6 unknowns.
The degree of freedom refers to the number of variables whose values are not limited when a certain statistic is calculated. As an example, an object may have six degrees of freedom in space, i.e., freedom of movement in the directions of three orthogonal axes x, y, and z and freedom of rotation about these three axes. Since the camera pose may include rotation information and translation information of the camera, where the rotation information and the translation information have 3 degrees of freedom, respectively, a 6-degree-of-freedom pose of the camera needs to be calculated when calculating the camera pose.
In an embodiment, the camera pose may be calculated through PnP, and since the matching point pairs obtained in step 103 are 2d-2d point pairs, a target matching point pair may be selected from at least one matching point pair, and corresponding 2d feature points in the target matching point pair are converted into 3d feature points, so as to obtain a 3d-2d point pair required for solving the camera pose through PnP.
For example, since the homography matrix can map a point (3-dimensional homogeneous vector) on one projective plane to another projective plane, the homography matrix between two images can be calculated, and the conversion of the corresponding two-dimensional feature point in the target matching point pair to the three-dimensional feature point is realized based on the homography matrix, specifically, the step "converting the corresponding two-dimensional feature point in the target matching point pair to the three-dimensional feature point" may include:
calculating a homography matrix between the two images;
determining a target matching point pair from at least one matching point pair based on the homography matrix;
and converting the corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points.
Where a homography matrix is a concept in projective geometry that maps points (3-dimensional homogeneous vectors) on one projective plane onto another projective plane, see equation (2), a homography matrix is a 3x3 matrix, noted:
Figure BDA0003082824540000111
there are various ways to calculate the homography matrix between two images, and as an example, it can be assumed that the coordinates of a 3-dimensional homogenous vector are p = [ u, v,1], and then the mapping point q = [ u ', v',1] can be directly obtained by the homography matrix H, see equation (3), as follows:
Figure BDA0003082824540000121
combining formula (2) with formula (3) to obtain formula (4):
Figure BDA0003082824540000122
it can be seen that the homography matrix contains h in total 11 To h 32 Totally 8 unknowns to be solved, and secondly, when calculating the homography matrix between two images, selecting a matching point pair required for calculating the homography matrix from at least one matching point pair obtained in step 103, and solving by combining the above formula, specifically, the step of "calculating the homography matrix between the two images" may include:
selecting a preset number of matching point pairs from at least one matching point pair;
and calculating a homography matrix between the two images based on the mapping relation between the two images and the selected matching point pairs.
Since 1 set of matching point pairs can provide two constraints, at least 4 sets of matching points are required to compute 8 unknowns of the homography H. In the present application, the number of the matching point pairs selected from at least one matching point pair may be set based on the requirement, for example, 4 groups of matching point pairs may be selected from the at least one matching point pair obtained in step 103 to ensure the calculation speed of the homography matrix; as another example, more than 4 sets of matching point pairs may be selected from the at least one matching point pair obtained in step 103, so that the homography matrix may be calculated and verified with a sufficient number of matching point pairs.
The matching point pairs can be selected in various ways, for example, randomly; for another example, a region with a higher attention degree may be defined in the image, and a preset number of matching point pairs may be selected from the region; and so on.
Since the homography matrix represents the mapping between two planes, after a preset number of matching point pairs are selected from at least one matching point pair, the homography matrix between the two images can be calculated by referring to equation (4) based on the mapping relationship between the two images and the selected matching point pairs.
In an embodiment, for the images F1 and F2, mapping the plane corresponding to the image F1 to the plane corresponding to the image F2 may be implemented by calculating a homography matrix between the images F1 and F2, specifically, the set of matching point pairs between F1 and F2 may be C = { p1-q1, p2-q 2.,. Pm-qm } through step 103, in the case of m > =4, a preset number of matching point pairs may be selected from C, for example, 4 pairs of matching point pairs may be randomly selected from C, so that 8 constraints may be obtained, and a linear equation set is established by referring to equation (4) to solve 8 unknowns of H, so as to obtain a homography matrix between the calculated image F1 and the image F2.
After the homography matrix between the two images is obtained through calculation, the target matching point pair can be further determined from at least one matching point pair based on the homography matrix.
For the matching point pair between the two images calculated in step 103, there is a possibility of an erroneous matching, that is, the matching point pair determined by the feature point matching result is not a feature point that is actually matched between the two images. Therefore, in order to avoid interference on the camera pose calculation process due to the incorrect matching, the matching point pairs may be screened through a homography matrix between the two images to screen out the incorrect matching point pairs, and the step "determining the target matching point pair from at least one matching point pair based on the homography matrix" may include:
performing point pair screening on at least one matching point pair according to the homography matrix to obtain the screened matching point pair;
and selecting a target matching point pair from the screened matching point pairs.
The point pair screening refers to a process of screening matching point pairs to obtain screened matching point pairs.
In an embodiment, the matching point pairs in the matching point pair set C between the image F1 and the image F2, which are obtained by calculation in step 103, may be subjected to point pair screening through a homography matrix between the image F1 and the image F2, so as to eliminate some wrong matching point pairs in the matching pair C, and obtain the screened matching point pairs.
There are various ways to select the target matching point pairs from the screened matching point pairs, for example, a preset number of matching point pairs may be randomly selected from the screened matching point pairs, and the selected matching point pairs are used as the target matching point pairs; for another example, an area with a higher attention degree may be defined for the image F1 or the image F2, and a matching point pair corresponding to the area in the screened matching point pair is taken as a target matching point pair; for another example, all the screened matching point pairs may be used as target matching point pairs; and so on.
After the target matching point pair is determined, the corresponding two-dimensional feature points in the target matching point pair can be converted into three-dimensional feature points.
In an embodiment, a first time series image and a second time series image may be determined from two images corresponding to the target matching point pair, where the first time series image and the second time series image corresponding to the target matching point pair satisfy a preset time series association relationship.
The time sequence association describes an association relationship between two images in a time sequence, for example, the time sequence association may include a chronological order. For example, the time-series relationship between the images F1 and F2 may include the following three cases: the time corresponding to the image F1 is prior to the time corresponding to the image F2, for example, the shooting time of the image F1 is prior to the shooting time of the image F2; the time corresponding to the image F1 is later than the time corresponding to the image F2, for example, the shooting time of the image F1 is later than the shooting time of the image F2; the time corresponding to the image F1 is equal to the time corresponding to the image F2, for example, the shooting time of the image F1 is the same as the shooting time of the image F2, and is not shown successively.
In this application, the time sequence association relationship that the first time sequence image and the second time sequence image satisfy may be specifically a time corresponding to the first time sequence image that is earlier than a time corresponding to the second time sequence image.
As an example, the terminal may be equipped with a camera, and the image F1 captured by the camera may be taken as a first time-series image and the image F2 may be taken as a second time-series image, where the shooting time of the first time-series image precedes the shooting time of the second time-series image, i.e., the shooting time of F1 precedes F2. Therefore, for the target matching point pair between F1 and F2, the two-dimensional feature point corresponding to F1 can be converted into a three-dimensional feature point, so as to subsequently execute the steps required for calculating the camera pose.
Specifically, referring to the foregoing steps, a matching point pair set C between the first time series image F1 and the second time series image F2 may be obtained, and matching point pairs that are incorrectly matched in C may be eliminated based on the homography matrix, at which time 3d coordinates corresponding to the 2d feature points on F1 may be further calculated. Specifically, the step of "converting the corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points" may include:
selecting two-dimensional feature points of the first time sequence image from the target matching point pair to determine target two-dimensional feature points to be converted;
determining corresponding first position and orientation information when a camera collects a first time sequence image;
and converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the first attitude information.
The target matching point pair comprises the two-dimensional feature point corresponding to the first time sequence image and the two-dimensional feature point corresponding to the second time sequence image, so that the two-dimensional feature point corresponding to the first time sequence image can be determined as the target two-dimensional feature point to be converted.
And calculating three-dimensional feature points corresponding to the target two-dimensional feature points, and determining first position and attitude information corresponding to the first time sequence image acquired by the camera. The pose information corresponding to the camera acquiring the image may be determined based on a state of the camera acquiring the image, and specifically, the determining the first pose information corresponding to the camera acquiring the first time sequence image may include:
if the camera is in an initial state when acquiring the first time sequence image, determining first attitude information based on the initial state of the camera;
and if the camera is in a non-initial state when acquiring the first time sequence image, determining a time sequence associated image of the first time sequence image based on the time sequence associated relationship, and determining first pose information based on pose information corresponding to the time sequence associated image.
The initial state of the camera refers to a state in which the camera is located immediately after the system is started, and correspondingly, the non-initial state refers to a state other than the initial state, for example, a state after the initial state.
In this application, the first pose information corresponding to the camera acquiring the first time series image may be determined based on the state of the camera acquiring the first time series image.
In an embodiment, if the camera is in an initial state when acquiring F1, the pose information of the camera in the initial state may be acquired as the first pose information. For example, if F1 is the first frame of image taken by the camera immediately after the system is started, the F1 frame can be set to be at the origin of the coordinate system immediately after the system is started, so that the pose of the camera when acquiring F1 can be obtained, see equation (1), and the pose can be represented as a 4 × 4 matrix T:
Figure BDA0003082824540000151
where R is a 3x3 matrix representing the current rotation of the camera and t is a 3x1 matrix representing the current translation of the camera.
In another embodiment, the images captured by the camera in time sequence may include P1, P2, and P3, where P1 is captured first, P2 is captured second, and P3 is captured last. If the first time sequence image is P3 and the camera is in the initial state when acquiring P1, it is known that the camera is in the non-initial state when acquiring P3. And because the time sequence incidence relation describes the incidence relation of at least two images in time sequence, the time sequence incidence image of the P3 can be determined to be P2 based on the time sequence incidence relation, and therefore the corresponding pose information when the camera acquires the P3 can be determined based on the pose information when the camera acquires the P2. Similarly, the pose information corresponding to when the camera acquired P2 may be determined based on the pose information corresponding to when the camera acquired P1.
Since the camera is in the initial state when acquiring the P1, the pose information corresponding to the camera acquiring the P1 is obtained by referring to the above. Further, the pose information corresponding to the camera acquiring P2 can be obtained based on the pose information corresponding to the camera acquiring P1, and the pose information corresponding to the camera acquiring P3 can be obtained based on the pose information corresponding to the camera acquiring P2. Specifically, how to obtain the pose information corresponding to the camera acquiring P2 based on the pose information corresponding to the camera acquiring P1 may refer to the following description of this embodiment.
After the first position and posture information corresponding to the first time sequence image collected by the camera is determined, the target two-dimensional feature points can be converted into corresponding three-dimensional feature points based on the first position and posture information. Specifically, the step of "converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the first pose information" may include:
determining a corresponding plane attribute when a camera collects a first time sequence image;
determining depth-of-field information corresponding to the camera when the camera acquires the first time sequence image based on the plane attribute and the first attitude information;
and converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the depth of field information and the first pose information.
The plane attribute is used for describing the characteristics of a plane observed by the camera when image acquisition is carried out. For example, the planar attribute of the camera may be a horizontal plane, that is, when the camera performs image acquisition, the observed planes are all horizontal planes; for another example, the plane attribute of the camera may be a vertical plane, that is, when the camera performs image acquisition, the observed planes are all vertical planes; and so on. It is noted that the horizontal plane and the vertical plane are common planes, and other planes can be expressed by the form of plane equations.
The depth of field information of the camera describes information related to the depth of field (DOF) of the camera, which is a range of distances between the front and rear of a subject measured by imaging that can obtain a sharp image at the front edge of a camera lens or other imager. Specifically, after the focusing is completed, the distance of the clear image presented in the range before and after the focal point, this range after and before, is called the depth of field.
There may be various ways of determining depth information corresponding to the camera acquiring the first time sequence image based on the plane attribute and the first pose information.
In an embodiment, the method for determining the pose of the camera according to the present application may be applied to the AR, and for example, the first time series image F1 is an image acquired by the camera when the camera is in an initial state, and the second time series image F2 is an image acquired by the camera after the first time series image is acquired. As an example, the corresponding plane attribute when the camera acquires the image may be a horizontal plane, that is, all planes observed by the camera during the process of acquiring the image are horizontal planes.
Where the normal vector of the horizontal plane is known, which is parallel to the direction of gravity, i.e. parallel to the Z-axis of the coordinate system, the normal vector of the plane can be written as np = [0, 1]. Meanwhile, the distance between the camera and the horizontal plane can be assumed to be 1 meter when F1 is collected, and most AR playing methods are scale-free, so that the user can manually zoom the size of the virtual object even if the actual distance is not 1 meter.
Further, it can be said that one target two-dimensional feature point of F1 is p = [ u, v,1], where u, v are pixel coordinates of the point p on the image, respectively. Assuming that the 3d coordinate of the 2d feature point p is Xw and the depth of field of Xw is s, according to the projective geometry principle, xw can be solved by the pose R, t of the camera with reference to equation (5):
Figure BDA0003082824540000171
and K is an internal reference matrix which can be directly taken from a terminal configuration file or calibrated offline.
At this time, the only unknown is the depth of field s of Xw. Since the 3d feature point Xw is on the horizontal plane, it satisfies the plane equation (6):
np*Xw=1 (6)
where np = [0, 1] on the left side of the equation is a plane normal vector, and 1 on the right side of the equation represents a distance 1m from F1 to the plane, equation (7) can be obtained by combining equation (5) and equation (6):
Figure BDA0003082824540000172
further, by simplifying formula (7), formula (8) can be obtained:
Figure BDA0003082824540000173
at this time, since the only unknown in the equation (8) is the depth of field s, the equation can be directly solved to obtain equation (9), that is, the depth of field information s corresponding to the camera acquiring the first time series image:
Figure BDA0003082824540000174
after the depth-of-field information corresponding to the first time sequence image acquired by the camera is obtained, the two-dimensional feature points of the target can be further converted into corresponding three-dimensional feature points based on the depth-of-field information and the first pose information.
Specifically, in this embodiment, xw can be solved by combining equations (1), (9) and (5), resulting in equation (10):
Figure BDA0003082824540000181
therefore, referring to the foregoing, each target two-dimensional feature point may be converted into a three-dimensional feature point to obtain a converted matching point pair. As an example, the target matching point pair set may be C = { p1-q1, p2-q2,..,. Pm-qm }, and for all target two-dimensional feature points p1, p2,. Pm therein, their corresponding three-dimensional feature coordinates Xw1, xw2,..,. Xwm may be found, and thus a new 3d-2d matching point pair set Cw = { Xw1-q1, xw2-q2,. X, xwm-qm } may be established.
105. And performing pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
The pose calculation is a calculation process for calculating pose information of the camera. The pose calculation may be performed in various ways, and may be implemented by PnP, for example.
As an example, in the foregoing embodiment, a 3d-2d matching point pair set Cw is obtained, and at this time, the pose R1 and the pose t1 corresponding to the camera acquiring F2 need to be solved. If the PnP algorithm is used directly, a non-linear 6-element equation needs to be solved. For each pair of 3d-2d point pairs, it can be noted that the 2d feature point is p = [ x, y,1], and the 3d feature point is Xw, then equation (11) can be established according to the projection geometry:
Figure BDA0003082824540000182
wherein 2d characteristic points
Figure BDA0003082824540000183
And 3d the feature point Xw are both known, s is the depth of field that can be considered first, while the rotation information R1 and the translation information t1 are both unknown, equation (11) has 6 unknowns since R1 and t1 each contain 3 unknowns, and the whole equation (11) is non-linear since there is a multiplication between R1 and Xw.
The nonlinear 6-element equation is not easy to solve, and particularly, the constraint of unit orthogonality exists when R1 is used as a rotation matrix, so that a general PnP algorithm finds a local optimal solution in an iterative manner through a least square method, the solution R1 and t1 are converted into a nonlinear optimization problem, the calculated amount is large, the algorithm convergence is slow, and the real-time requirement is difficult to meet on equipment with poor calculation force such as a mobile phone.
In the present application, since the data of the gyroscope can be acquired through imu, so that the rotation information, i.e. R1 in the above formula, is directly taken from the gyroscope, this makes R1 become a known quantity, i.e. R1 does not need to be calculated by equation (11), and the unknown number of equation (11) is changed from 6 to 3, and the multiplication between R1 and Xw is also automatically eliminated. Since both R1 and Xw are known quantities, the only remaining is the addition to t1, so the entire equation becomes linear. Therefore, the corresponding pose information when the camera collects the F2 can be quickly and efficiently calculated. Specifically, the step of performing pose calculation on the camera according to the rotation information and the converted matching point pair to obtain pose information of the camera may include:
selecting a matching point pair required for pose operation from the converted matching point pair;
performing pose operation on the camera according to the rotation information and the selected matching point pair so as to determine corresponding translation information when the camera acquires the second time sequence image;
and determining second position and orientation information corresponding to the second time sequence image acquired by the camera based on the rotation information and the translation information.
As an example, in the foregoing embodiment, since equation (11) can only provide two constraints, and t1 contains 3 unknowns, it is necessary to select a matching point pair required for performing a pose operation from the transformed matching point pair, and specifically, the step "selecting a matching point pair required for performing a pose operation from the transformed matching point pair" may include:
determining the quantity information of the matching point pairs required by pose operation according to the freedom degree attribute of the translation information;
and selecting corresponding matching point pairs from the converted matching point pairs based on the quantity information.
The degree of freedom refers to the number of variables whose values are not limited when a certain statistic is calculated. As an example, an object may have six degrees of freedom in space, namely a degree of freedom of movement in the direction of three orthogonal axes x, y, z and a degree of freedom of rotation about these three axes.
Similarly, the camera pose may include rotation information and translation information of the camera, and as an example, equation (1) may be referred to. Because the degree of freedom attribute of the camera translation information is 3, that is, t1 in the formula (11) contains 3 unknowns, and the formula (11) can only provide two constraints, it can be determined that the number n of matching point pairs required for pose calculation is greater than or equal to 2, that is, at least two pairs of 3d-2d matching point pairs are required to solve t1. Therefore, after the quantity information of the matching point pairs required for pose operation is determined, the corresponding matching point pairs can be selected from the converted matching point pairs based on the quantity information.
For example, matching point pairs corresponding to the quantity information can be selected from the converted matching point pairs to reduce the calculation amount, so that the calculation efficiency is improved; for another example, more than the number of matching point pairs with the same information may be selected from the converted matching point pairs, so that there may be a sufficient number of matching point pairs to verify the calculation result, so as to improve the calculation accuracy; and so on.
Further, the pose calculation of the camera can be performed according to the rotation information and the selected matching point pair, so that the corresponding translation information when the camera collects the second time sequence image is determined.
As an example, in the foregoing embodiments, vectors may be used
Figure BDA0003082824540000201
When the equation (11) is cross-multiplied, the left side of the equation (11) becomes 0, so that the scale factor s is eliminated, which can be specifically seen in equation (12):
Figure BDA0003082824540000202
therefore, equation (12) can be converted into a form of a linear equation system, resulting in equation (13):
Figure BDA0003082824540000203
further, the vector may be processed
Figure BDA0003082824540000204
The cross multiplication spread is written in the form of matrix multiplication, and thus, canTo convert equation (13) to equation (14):
Figure BDA0003082824540000205
since equation (14) can only provide two constraints, and
Figure BDA0003082824540000206
the total number of the matched points is 3 unknown quantities, so that two pairs of 3d-2d matched points are needed to be solved
Figure BDA0003082824540000207
As an example, it may be assumed that the two teams of 3d-2d matching point pairs used are q1-Xw1, q2-Xw2, respectively, where
Figure BDA0003082824540000208
Then, combining two pairs of 3d-2d matching point pairs can obtain a linear equation set, see formula (15):
Figure BDA0003082824540000209
wherein can record
Figure BDA0003082824540000211
The equation set (15) can be abbreviated as equation (16):
Figure BDA0003082824540000212
t1 can be obtained by directly solving the equation (16), that is, translation information corresponding to the second time sequence image is acquired by the camera, and see equation (17):
Figure BDA0003082824540000213
since the camera pose includes the rotation information and the translation information, and the rotation information R1 and the translation information t1 are obtained by the solution described above, the camera pose can be determined accordingly. That is, the second pose information corresponding to the second time sequence image collected by the camera is determined through the linear PnP.
As can be seen from the above, the present embodiment can acquire at least two images acquired by a camera and rotation information of the camera; extracting feature points of at least two images; performing feature point matching on the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points, and the two-dimensional feature points are matched feature points between the two images; converting corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points; and performing pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
The scheme can acquire the rotation information of the camera and apply the rotation information to the pose of the camera, so that the number of parameters to be solved can be reduced; secondly, the camera pose solution model can be converted from a nonlinear model to a linear model. Therefore, the camera pose can be solved in a loose coupling visual mode, the solving difficulty of the camera pose can be reduced, and a simpler solving mode is provided.
In addition, when the scheme is applied to the actual application, the application effect can be greatly improved, for example, when the scheme is applied to enhance the real-time, after the pose of the camera is obtained, a virtual object can be rendered on the screen in real time, so that the effect of enhancing the reality is achieved, and in addition, because the scheme can enable the estimation of the pose of the camera to be more stable and accurate, the rendered virtual object can be firmly fixed on the screen of the mobile phone and cannot slide along with the screen or directly fly away from the correct position, so that the presentation effect of the augmented reality is greatly improved.
For the case of horizontal plane shooting, fig. 3 shows a flowchart of the scheme, and it can be known that, in the scheme, after the IMU and the camera are loosely coupled together, the original nonlinear PnP equation is changed into linear (only including linear equations with 3 unknowns), so that the equation is easier to solve and the calculation result is more stable and accurate, and it is no longer necessary to use the gauss newton method or the UKF to solve the 15-dimensional nonlinear state estimation equation.
The method described in the above examples is further described in detail below by way of example.
In this embodiment, a description will be given by taking an example in which the determination device of the camera pose is integrated in a server and a terminal, where the server may be a single server or a server cluster composed of a plurality of servers; the terminal can be a mobile phone, a tablet computer, a notebook computer and other equipment.
As shown in fig. 4, a method for determining a camera pose specifically includes the following steps:
201. the terminal collects at least two images through the camera and obtains rotation information of the camera.
In one embodiment, the terminal may be equipped with both a camera and an IMU, and the terminal may be photographed by the camera against a horizontal surface such as the ground or a table top (or even a horizontally placed palm) so that the plane viewed by the camera is a horizontal surface rather than a vertical surface. Further, the terminal can acquire at least two images through the camera, and acquire rotation information corresponding to the camera when the camera acquires each image through the IMU.
202. And the terminal sends the image and the rotation information to the server.
203. The server acquires at least two images acquired by the camera and rotation information of the camera.
204. The server extracts feature points of at least two images.
In this embodiment, the feature point extraction manner includes, but is not limited to, at least one of the following: FAST characteristic points, harris characteristic points, GFTT characteristic points, ORB characteristic points and the like. As an example, feature points on the two images may be extracted by using a FAST method, and the two-dimensional feature point sets corresponding to the two images are P = { P1, P2,. Once, pn }, and Q = { Q1, Q2,. Once, qn }, respectively.
205. The server performs feature matching on the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points, and the two-dimensional feature points are feature points matched between the two images.
In this embodiment, the matching manner includes, but is not limited to, at least one of the following: ORB feature matching, SIFT feature matching, BRISK feature matching, SURF feature matching, LK optical flow matching, and the like. As an example, the ORB method may be used to match feature points on two images, and it can be noted that the matching pair set between two frames is C = { p1-q1, p2-q 2.,. Pm-qm }, where m ≦ n, because it cannot be guaranteed that the feature point extracted from the 1 st frame image can always find a corresponding point on the 2 nd frame image.
206. And the server converts the corresponding two-dimensional points in the target matching points into three-dimensional characteristic points.
In an embodiment, the two images include a first time sequence image and a second time sequence image, and the first time sequence image and the second time sequence image corresponding to the first time sequence image satisfy a preset time sequence association relationship. The server can select the two-dimensional feature points of the first time sequence image from the target matching point pair to determine the target two-dimensional feature points to be converted; determining corresponding first position and orientation information when a camera collects a first time sequence image; and converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the first attitude information.
207. And the server performs pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera and sends the pose information to the terminal.
In an embodiment, the two images include a first time sequence image and a second time sequence image, and the first time sequence image and the second time sequence image corresponding to the first time sequence image satisfy a preset time sequence association relationship. The server can select a matching point pair required for pose calculation from the converted matching point pair; performing pose operation on the camera according to the rotation information and the selected matching point pair to determine corresponding translation information when the camera acquires a second time sequence image; and determining second position and orientation information corresponding to the second time sequence image acquired by the camera based on the rotation information and the translation information.
208. And the terminal receives the pose information and displays the augmented reality picture based on the pose information.
Therefore, the rotation information of the camera can be collected and applied to the solution of the pose of the camera, and the number of parameters to be solved can be reduced; secondly, the camera pose solution model can be converted from a nonlinear model to a linear model. Therefore, the camera pose can be solved in a loose coupling visual mode, the solving difficulty of the camera pose can be reduced, and a simpler solving mode is provided.
In addition, when the scheme is applied to the actual application, the application effect can be greatly improved, for example, when the scheme is applied to enhance the real-time, after the pose of the camera is obtained, a virtual object can be rendered on the screen in real time to achieve the effect of enhancing the reality, and because the scheme can enable the estimation of the pose of the camera to be more stable and accurate, the rendered virtual object can be firmly fixed on the screen of the mobile phone and cannot slide along with the screen or directly fly away from the correct position, the presentation effect of the augmented reality is greatly improved
In order to better implement the method, correspondingly, the embodiment of the application also provides a camera pose determination device, wherein the camera pose determination device can be integrated in a server or a terminal. The server can be a single server or a server cluster consisting of a plurality of servers; the terminal can be a mobile phone, a tablet computer, a notebook computer and other equipment.
For example, as shown in fig. 5, the apparatus for determining the camera pose may include an acquisition unit 301, an extraction unit 302, a matching unit 303, a conversion unit 304, and an arithmetic unit 305, as follows:
an acquiring unit 301, configured to acquire at least two images acquired by a camera and rotation information of the camera;
an extracting unit 302, configured to extract feature points of at least two images;
a matching unit 303, configured to perform feature point matching on the at least two images to obtain at least one matching point pair, where each matching point pair includes two-dimensional feature points, and the two-dimensional feature points are feature points matched between the two images;
a conversion unit 304, configured to convert corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points;
an operation unit 305, configured to perform a pose operation on the camera according to the rotation information and the converted matching point pair, so as to obtain pose information of the camera.
In an embodiment, referring to fig. 6, the conversion unit 304 may include:
a calculating subunit 3041, which may be used to calculate a homography matrix between the two images;
a point pair determining subunit 3042, configured to determine a target matching point pair from the at least one matching point pair based on the homography matrix;
the converting subunit 3043 may be configured to convert the corresponding two-dimensional feature point in the target matching point pair into a three-dimensional feature point.
In an embodiment, the two images include a first time sequence image and a second time sequence image, and the first time sequence image and the second time sequence image corresponding to the first time sequence image satisfy a preset time sequence association relationship; the converting subunit 3043 may be configured to:
selecting two-dimensional feature points of the first time sequence image from the target matching point pair to determine target two-dimensional feature points to be converted; determining corresponding first position and orientation information when the camera collects the first time sequence image; and converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the first attitude information.
In an embodiment, the converting subunit 3043 may be configured to:
determining a corresponding plane attribute when the camera collects the first time sequence image; determining depth-of-field information corresponding to the first time sequence image acquired by the camera based on the plane attribute and the first attitude information; and converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the depth of field information and the first pose information.
In an embodiment, the converting subunit 3043 may be configured to:
if the camera is in an initial state when acquiring the first time sequence image, determining first attitude information based on the initial state of the camera; and if the camera is in a non-initial state when acquiring the first time sequence image, determining the time sequence associated image of the first time sequence image based on the time sequence association relation, and determining first position and orientation information based on position and orientation information corresponding to the time sequence associated image.
In an embodiment, the calculating subunit 3041 may be configured to:
selecting a preset number of matching point pairs from the at least one matching point pair; and calculating a homography matrix between the two images based on the mapping relation between the two images and the selected matching point pairs.
In an embodiment, the point pair determining subunit 3042 may be configured to:
performing point pair screening on the at least one matching point pair according to the homography matrix to obtain a screened matching point pair; and selecting a target matching point pair from the screened matching point pairs.
In an embodiment, referring to fig. 7, the two images include a first time sequence image and a second time sequence image, and the first time sequence image and the second time sequence image corresponding to the first time sequence image satisfy a preset time sequence association relationship; the operation unit 305 may include:
a selecting subunit 3051, configured to select, from the converted pair of matching points, a pair of matching points required for pose calculation;
the operation subunit 3052 is configured to perform pose operation on the camera according to the rotation information and the selected matching point pair, so as to determine corresponding translation information when the camera acquires the second time series image;
an information determining subunit 3053, configured to determine, based on the rotation information and the translation information, second pose information corresponding to when the camera acquires the second time-series image.
In an embodiment, the pose information of the camera comprises translation information of the camera; the selecting subunit 3051 may be configured to:
determining the quantity information of the matching point pairs required by the pose operation according to the freedom degree attribute of the translation information; and selecting corresponding matching point pairs from the converted matching point pairs based on the quantity information.
In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.
As can be seen from the above, in the apparatus for determining a camera pose of the present embodiment, the obtaining unit 301 obtains at least two images acquired by the camera and rotation information of the camera; extracting feature points of at least two images by an extracting unit 302; performing feature point matching on the at least two images by using a matching unit 303 to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points, and the two-dimensional feature points are feature points matched between the two images; converting, by the conversion unit 304, the corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points; and the operation unit 305 performs pose operation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
The scheme can acquire the rotation information of the camera and apply the rotation information to the pose of the camera, and can reduce the number of parameters to be solved; secondly, the camera pose solution model can be converted from a nonlinear model to a linear model. Therefore, the camera pose can be solved in a loose coupling visual mode, the solving difficulty of the camera pose can be reduced, and a simpler solving mode is provided.
In addition, when the scheme is applied to the actual application, the application effect can be greatly improved, for example, when the scheme is applied to the real-time enhancement, a virtual object can be rendered on the screen in real time after the pose of the camera is obtained, so that the effect of the reality enhancement is achieved, and in addition, due to the scheme, the estimation of the pose of the camera can be more stable and accurate, so that the rendered virtual object can be firmly fixed on the screen of the mobile phone and cannot slide along with the screen or directly fly away from the correct position, and the presentation effect of the reality enhancement is greatly improved.
In addition, an embodiment of the present application further provides a computer device, where the computer device may be a server or a terminal, and as shown in fig. 8, a schematic structural diagram of the computer device according to the embodiment of the present application is shown, specifically:
the computer device may include components such as memory 401 including one or more computer-readable storage media, input unit 402, sensors 403, processor 404 including one or more processing cores, and power supply 405. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 8 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:
the memory 401 may be used to store software programs and modules, and the processor 404 executes various functional applications and data processing by operating the software programs and modules stored in the memory 401. The memory 401 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the computer device, and the like. Further, the memory 401 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 401 may also include a memory controller to provide the processor 404 and the input unit 402 access to the memory 401.
The input unit 402 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, input unit 402 may include a touch-sensitive surface as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations by a user (such as operations by the user on or near the touch-sensitive surface using a finger, a stylus, or any other suitable object or attachment) thereon or nearby, and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 404, and can receive and execute commands sent by the processor 404. In addition, touch sensitive surfaces may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. The input unit 402 may include other input devices in addition to a touch-sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The computer device may also include at least one sensor 403, such as light sensors, motion sensors, and other sensors. In particular, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel based on the intensity of ambient light, and a proximity sensor that turns off the display panel and/or backlight when the computer device is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the computer device, detailed descriptions thereof are omitted.
The processor 404 is a control center of the computer device, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the computer device and processes data by operating or executing software programs and/or modules stored in the memory 401 and calling data stored in the memory 401, thereby performing overall monitoring of the mobile phone. Alternatively, processor 404 may include one or more processing cores; preferably, the processor 404 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 404.
The computer device also includes a power supply 405 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 404 via a power management system that manages charging, discharging, and power consumption management. The power supply 405 may also include any component including one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
Although not shown, the computer device may further include a camera, a bluetooth module, etc., which will not be described herein. Specifically, in this embodiment, the processor 404 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 401 according to the following instructions, and the processor 404 runs the application program stored in the memory 401, so as to implement various functions as follows:
acquiring at least two images acquired by a camera and rotation information of the camera; extracting feature points of at least two images; performing feature point matching on the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points, and the two-dimensional feature points are matched feature points between the two images; converting corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points; and performing pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
As can be seen from the above, the computer device of this embodiment can acquire the rotation information of the camera and apply the rotation information to the pose of the camera, and therefore, the number of parameters to be solved can be reduced; secondly, the camera pose solution model can be converted from a nonlinear model to a linear model. Therefore, the computer equipment of the embodiment can solve the camera pose in a loose coupling visual mode, can reduce the solving difficulty of the camera pose, and provides a simpler solving mode.
In addition, when the scheme is applied to the actual application, the application effect can be greatly improved, for example, when the scheme is applied to the real-time enhancement, after the pose of the camera is obtained, a virtual object can be rendered on the screen in real time, the effect of the reality enhancement is achieved, and in addition, due to the scheme, the estimation of the pose of the camera can be more stable and accurate, so that the rendered virtual object can be firmly fixed on the screen of the mobile phone and cannot slide along with the screen or directly fly away from the correct position, and the presentation effect of the reality enhancement is greatly improved
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, the present application provides a storage medium, where multiple instructions are stored, where the instructions can be loaded by a processor to execute the steps in any one of the methods for determining a pose of a camera provided in the embodiments of the present application. For example, the instructions may perform the steps of:
acquiring at least two images acquired by a camera and rotation information of the camera; extracting feature points of at least two images; performing feature point matching on the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points, and the two-dimensional feature points are matched feature points between the two images; converting corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points; and performing pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.
The instructions stored in the storage medium can execute the steps in any method for determining a camera pose provided in the embodiment of the present application, so that the beneficial effects that can be achieved by any method for determining a camera pose provided in the embodiment of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described again here.
According to an aspect of the application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the methods provided in the various alternative implementations of the aspect of determining the pose of the camera described above.
The method, the apparatus, the computer device, and the storage medium for determining the pose of the camera provided in the embodiments of the present application are described in detail above, and specific examples are applied in the present application to explain the principles and embodiments of the present application, and the description of the above embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (12)

1. A method for determining the pose of a camera is characterized by comprising the following steps:
acquiring at least two images acquired by a camera and rotation information of the camera;
extracting feature points of at least two images;
performing feature point matching on the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional feature points, and the two-dimensional feature points are matched feature points between the two images;
converting corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points;
and performing pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
2. The method of determining the camera pose according to claim 1, wherein converting the corresponding two-dimensional feature points in the pair of target matching points into three-dimensional feature points comprises:
calculating a homography matrix between the two images;
determining a target matching point pair from the at least one matching point pair based on the homography matrix;
and converting the corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points.
3. The method for determining the pose of the camera according to claim 2, wherein the two images include a first time sequence image and a second time sequence image, and the first time sequence image corresponds to the second time sequence image and meets a preset time sequence incidence relation;
converting the corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points, including:
selecting two-dimensional feature points of the first time sequence image from the target matching point pair to determine target two-dimensional feature points to be converted;
determining corresponding first position and orientation information when the camera collects the first time sequence image;
and converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the first attitude information.
4. The method according to claim 3, wherein converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the first pose information comprises:
determining a corresponding plane attribute when the camera collects the first time sequence image;
determining depth-of-field information corresponding to the first time sequence image acquired by the camera based on the plane attribute and the first attitude information;
and converting the target two-dimensional feature points into corresponding three-dimensional feature points based on the depth of field information and the first pose information.
5. The method for determining the pose of the camera according to claim 3, wherein determining the first pose information corresponding to the time sequence image acquired by the camera comprises:
if the camera is in an initial state when acquiring the first time sequence image, determining first position and orientation information based on the initial state of the camera;
and if the camera is in a non-initial state when acquiring the first time sequence image, determining a time sequence associated image of the first time sequence image based on the time sequence association relation, and determining first pose information based on pose information corresponding to the time sequence associated image.
6. The method for determining the pose of a camera according to claim 2, wherein calculating the homography matrix between the two images comprises:
selecting a preset number of matching point pairs from the at least one matching point pair;
and calculating a homography matrix between the two images based on the mapping relation between the two images and the selected matching point pairs.
7. The method of determining the camera pose according to claim 2, wherein determining a target matching point pair from the at least one matching point pair based on the homography matrix comprises:
performing point pair screening on the at least one matching point pair according to the homography matrix to obtain a screened matching point pair;
and selecting a target matching point pair from the screened matching point pairs.
8. The method for determining the pose of the camera according to claim 1, wherein the two images include a first time sequence image and a second time sequence image, and the first time sequence image corresponds to the second time sequence image and satisfies a preset time sequence association relation;
performing pose calculation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera, wherein the pose calculation comprises the following steps:
selecting a matching point pair required for pose operation from the converted matching point pair;
performing pose operation on the camera according to the rotation information and the selected matching point pair so as to determine corresponding translation information when the camera acquires the second time sequence image;
and determining second position and orientation information corresponding to the second time sequence image acquired by the camera based on the rotation information and the translation information.
9. The camera pose determination method according to claim 1, wherein the pose information of the camera includes translation information of the camera;
selecting a matching point pair required for pose operation from the converted matching point pair, comprising:
determining the quantity information of the matching point pairs required by the pose operation according to the freedom degree attribute of the translation information;
and selecting corresponding matching point pairs from the converted matching point pairs based on the quantity information.
10. A camera pose determination apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring at least two images acquired by a camera and rotation information of the camera;
the extraction unit is used for extracting the characteristic points of at least two images;
the matching unit is used for matching the characteristic points of the at least two images to obtain at least one matching point pair, wherein each matching point pair comprises two-dimensional characteristic points which are matched with each other between the two images;
the conversion unit is used for converting the corresponding two-dimensional feature points in the target matching point pair into three-dimensional feature points;
and the operation unit is used for performing pose operation on the camera according to the rotation information and the converted matching point pairs to obtain pose information of the camera.
11. An electronic device comprising a memory and a processor; the memory stores an application program, and the processor is configured to execute the application program in the memory to perform the operations of the method for determining the pose of the camera according to any one of claims 1 to 9.
12. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the determination of camera pose according to any one of claims 1 to 9.
CN202110571419.2A 2021-05-25 2021-05-25 Method and device for determining position and posture of camera, computer equipment and storage medium Pending CN115393427A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110571419.2A CN115393427A (en) 2021-05-25 2021-05-25 Method and device for determining position and posture of camera, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110571419.2A CN115393427A (en) 2021-05-25 2021-05-25 Method and device for determining position and posture of camera, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115393427A true CN115393427A (en) 2022-11-25

Family

ID=84114429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110571419.2A Pending CN115393427A (en) 2021-05-25 2021-05-25 Method and device for determining position and posture of camera, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115393427A (en)

Similar Documents

Publication Publication Date Title
KR102319207B1 (en) Method, apparatus and device for determining camera attitude information, and storage medium
CN108615248B (en) Method, device and equipment for relocating camera attitude tracking process and storage medium
CN110555883B (en) Repositioning method and device for camera attitude tracking process and storage medium
CN108682038B (en) Pose determination method, pose determination device and storage medium
CN110599549B (en) Interface display method, device and storage medium
JP6043856B2 (en) Head pose estimation using RGBD camera
EP2614487B1 (en) Online reference generation and tracking for multi-user augmented reality
CN108876854B (en) Method, device and equipment for relocating camera attitude tracking process and storage medium
KR101410273B1 (en) Method and apparatus for environment modeling for ar
US20150243016A1 (en) Orientation estimation apparatus, orientation estimation method, and computer-readable recording medium storing orientation estimation computer program
WO2018112788A1 (en) Image processing method and device
JP6609640B2 (en) Managing feature data for environment mapping on electronic devices
US20150244935A1 (en) Wearable information system having at least one camera
WO2018233623A1 (en) Method and apparatus for displaying image
JP6873344B2 (en) Fatigue judgment device, fatigue judgment method, and fatigue judgment program
KR20140019950A (en) Method for generating 3d coordinate using finger image from mono camera in terminal and mobile terminal for generating 3d coordinate using finger image from mono camera
CN115393427A (en) Method and device for determining position and posture of camera, computer equipment and storage medium
KR101491413B1 (en) Method for generating 3d coordinate using finger image from mono camera in terminal and mobile terminal for generating 3d coordinate using finger image from mono camera
KR101396098B1 (en) Method for generating 3d coordinate using finger image from mono camera in terminal and mobile terminal for generating 3d coordinate using finger image from mono camera
KR101382806B1 (en) Method for generating 3d coordinate using finger image from camera in terminal and mobile terminal for generating 3d coordinate using finger image from camera
CN115981492A (en) Three-dimensional handwriting generation method, equipment and system
CN115904063A (en) Non-contact human-computer interaction pen handwriting generation method, device, equipment and system
CN110648363A (en) Camera posture determining method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination