CN113724332B

CN113724332B - Method for determining relative pose of camera, electronic device and storage medium

Info

Publication number: CN113724332B
Application number: CN202111296906.9A
Authority: CN
Inventors: 周杰; 胡洋
Original assignee: Beike Technology Co Ltd
Current assignee: Beike Technology Co Ltd
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-01-18
Anticipated expiration: 2041-11-04
Also published as: CN113724332A

Abstract

The embodiment of the disclosure discloses a method for determining relative pose of a camera, an electronic device and a storage medium, wherein the determining method comprises the following steps: generating a reference grid occupation map and a target grid occupation map based on the reference panoramic depth map and the target panoramic depth map, and generating a vertical face normal vector two-dimensional histogram; determining a camera relative rotation angle set based on the vertical face normal vector two-dimensional histogram; determining a camera relative pose set based on the camera relative rotation angle set, the reference grid occupation map and the target grid occupation map; and acquiring a reference panoramic color image corresponding to the reference panoramic depth image, acquiring a target panoramic color image corresponding to the target panoramic depth image, performing visual feature matching on the reference panoramic color image and the target panoramic color image based on the camera relative pose set, and determining the camera relative poses of the first shooting point and the second shooting point based on the visual feature matching. The embodiment of the disclosure can accurately acquire the relative pose between the two views of the camera.

Description

Method for determining relative pose of camera, electronic device and storage medium

Technical Field

The present disclosure relates to three-dimensional panoramic technologies, and in particular, to a method for determining a relative pose of a camera, an electronic device, and a storage medium.

Background

The posture solving between two views is an important step in three-dimensional reconstruction and is also the basis of full-automatic splicing of Virtual Reality (VR) shooting points.

In the related technology, the attitude between two views is solved by a three-dimensional point cloud alignment method. However, the method based on three-dimensional point cloud alignment has high computational complexity, is limited by the precision of a data source, and has no adaptability to the characteristic of insufficient data consistency deduced by depth.

How to accurately acquire the relative pose between two views is a problem to be solved urgently.

Disclosure of Invention

The embodiment of the disclosure provides a method for determining a relative pose of a camera, an electronic device and a storage medium, which can accurately acquire the relative pose between two views of the camera.

In a first aspect of the embodiments of the present disclosure, a method for determining a relative pose of a camera is provided, including:

acquiring the elevation data of a target object in a reference panoramic depth map, and acquiring the elevation data of the target object in a target panoramic depth map, wherein the reference panoramic depth map is obtained by processing after a camera shoots an image at a first shooting point, and the target panoramic depth map is obtained by processing after the camera shoots an image at a second shooting point;

generating a reference grid occupation map based on the elevation data of the target object in the reference panoramic depth map, and generating a target grid occupation map based on the elevation data of the target object in the target panoramic depth map;

generating a vertical surface normal vector two-dimensional histogram based on the vertical surface data of the target object in the reference panoramic depth map and the vertical surface data of the target object in the target panoramic depth map;

determining a set of camera relative rotation angles between the first shot point and the second shot point based on the facade normal vector two-dimensional histogram;

determining a set of camera relative poses between the first shot point and the second shot point based on the set of camera relative rotation angles, the reference grid occupancy map, and the target grid occupancy map;

and acquiring a reference panoramic color image corresponding to the reference panoramic depth image, acquiring a target panoramic color image corresponding to the target panoramic depth image, performing visual feature matching on the reference panoramic color image and the target panoramic color image based on the camera relative pose set, and determining the camera relative poses of the first shooting point and the second shooting point based on a visual feature matching result.

According to the method for determining the relative pose of the camera, the performing the visual feature matching on the reference panoramic color image and the target panoramic color image based on the set of the relative pose of the camera comprises the following steps:

acquiring one camera relative pose element at a time from the set of camera relative poses;

acquiring the number of pixel overlaps in the pose-adjusted target panoramic depth map and the reference panoramic depth map based on the pose-adjusted target panoramic depth map according to the currently acquired camera relative pose element;

if the number of pixel overlaps is less than a preset pixel overlap threshold, deleting the currently acquired camera relative pose element from the camera relative pose set;

deleting camera relative pose elements in the camera relative pose set, wherein the overlapping quantity of all pixels is smaller than a preset pixel overlapping degree threshold value, so as to obtain a first set;

performing visual feature matching on the reference panoramic color map and the target panoramic color map based on the first set.

According to the method for determining the relative pose of the camera, the visual feature matching of the reference panoramic color image and the target panoramic color image based on the first set comprises the following steps:

determining an occlusion cost lookup table based on the reference grid occupancy map;

determining an occlusion cost corresponding to each camera relative pose element in the first set based on the occlusion cost look-up table;

deleting all camera relative pose elements with the shielding cost larger than a preset shielding cost threshold value in the first set to obtain a second set;

performing visual feature matching on the reference panoramic color map and the target panoramic color map based on the second set.

According to the method for determining the relative pose of the camera, the determining the occlusion cost lookup table based on the reference grid occupation map comprises the following steps:

determining the shielding area of the target coordinate in the map occupied by the reference grid based on the shielding range of the target coordinate in the horizontal direction and the shielding range of the target coordinate in the vertical direction;

taking the ratio of the shielding area of the target coordinate between the reference grid occupied map and the total area of the reference grid occupied map as the shielding cost of the target coordinate;

acquiring corresponding shielding costs of all designated coordinates in the map occupied by the reference grid;

and generating the occlusion cost lookup table based on the corresponding occlusion costs of all the designated coordinates in the reference grid occupied map.

According to the method for determining the relative pose of the camera, the performing the visual feature matching on the reference panoramic color image and the target panoramic color image based on the set of the relative poses of the camera, and determining the relative pose of the camera of the first shooting point and the second shooting point based on the result of the visual feature matching includes:

respectively calculating the feature matching degree between the reference panoramic color image and the target panoramic color image based on each camera relative pose element in the second set to obtain a plurality of feature matching degrees;

and determining the camera relative poses of the first shooting point and the second shooting point based on the camera relative pose element with the highest feature matching degree.

According to the method for determining the relative pose of the camera, the acquiring of the elevation data of the target object in the reference panoramic depth map comprises the following steps:

projecting the reference panoramic depth map to obtain a three-dimensional point cloud under a camera coordinate system;

performing plane fitting based on the three-dimensional point cloud to obtain a plane set;

filtering a plane in the plane set, wherein an included angle between a normal vector and the gravity direction is smaller than a preset angle threshold value;

and obtaining the facade data of the target object in the reference panoramic depth map based on the residual planes after the plane set is filtered.

According to the method for determining the relative pose of the camera, the generating a vertical surface normal vector two-dimensional histogram based on the vertical surface data of the target object in the reference panoramic depth map and the vertical surface data of the target object in the target panoramic depth map includes:

performing gravity direction projection on the elevation data of the target object in the reference panoramic depth map and the elevation data of the target object in the target panoramic depth map to obtain elevation normal vector two-dimensional projection of the target object in the reference panoramic depth map and the target object in the target panoramic depth map;

and after the vertical face normal vector two-dimensional projection is converted into a plane angle, generating a vertical face normal vector two-dimensional histogram based on preset histogram granularity.

According to the method for determining the relative pose of the camera, the determining the set of relative rotation angles of the camera between the first shooting point and the second shooting point based on the vertical surface normal vector two-dimensional histogram comprises:

acquiring a reference two-dimensional histogram unit corresponding to the vertical-surface normal vector two-dimensional projection of the target object in the target panoramic depth map in the vertical-surface normal vector two-dimensional histogram;

carrying out periodic continuation on the target two-dimensional histogram unit;

acquiring a plurality of peak values of the target two-dimensional histogram unit after the period extension;

determining a plurality of relative rotation angles corresponding to the plurality of peaks based on the plurality of peaks;

generating the set of camera relative rotation angles based on the plurality of relative rotation angles.

In a second aspect of the embodiments of the present disclosure, an electronic device is provided, including:

a memory for storing a computer program;

a processor configured to execute the computer program stored in the memory, and when the computer program is executed, the method for determining the relative pose of the camera according to the first aspect is implemented.

In a third aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the method for determining the relative pose of the camera according to the first aspect is implemented.

The camera relative pose determining method, the electronic device and the storage medium of the embodiment of the disclosure include the steps of firstly obtaining a reference panoramic depth map and a target panoramic depth map after image processing of a camera, respectively obtaining elevation data of a target object in the reference panoramic depth map and elevation data of the target object in the target panoramic depth map, respectively generating a reference grid occupation map and a target grid occupation map based on the obtained elevation data, generating an elevation normal vector two-dimensional histogram based on the obtained elevation data, then determining a camera relative rotation angle set of the camera between two panoramic image shooting points according to the elevation normal vector two-dimensional histogram, and further determining a camera relative pose set based on the camera relative rotation angle set, the reference grid occupation map and the target grid occupation map; and finally, performing visual feature matching on the reference panoramic color image and the target panoramic color image by calculating a camera relative pose set, and finally accurately acquiring the relative pose between two views of the camera. In the embodiment of the disclosure, the 1-degree-of-freedom rotation can be restricted in a plurality of limited search intervals by an estimation mode of relative rotation, so that the operation efficiency is improved. In addition, the situation that the grid global optimal solution is not a true solution under the condition of single measurement is reduced by calculating the local alignment optimal solution.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 is a schematic flow chart of a method for determining a relative pose of a camera according to an embodiment of the present disclosure;

FIG. 2 is a grid occupancy map in one example of the present disclosure;

FIG. 3 is a vertical normal vector two-dimensional histogram of a square box model room in one example of the present disclosure;

FIG. 4 is a schematic diagram of a point cloud to be aligned in one example of the present disclosure;

FIG. 5 is a schematic view of a shade illuminated by horizontal and vertical light in one example of the present disclosure;

FIG. 6 is a schematic diagram of determining an occlusion cost area in one example of the present disclosure;

fig. 7 is a schematic structural diagram of an embodiment of an electronic device according to the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

The inventors of the present disclosure found that in approaches where point cloud based alignment includes point cloud feature based alignment, and geometry based alignment, the direction of gravity is generally known for point clouds as a result of site scanning. Under the condition that the gravity direction is known, the six-degree-of-freedom attitude calculation problem can be simplified into a constraint solving problem of rotation of one degree of freedom and translation of three degrees of freedom. If the translation is limited on a two-dimensional plane, the translation of three degrees of freedom is reduced to two degrees of freedom, so that the whole pose solution becomes a solution problem of one degree of freedom rotation plus two degrees of freedom translation. Based on this, the specific technical scheme of this disclosure has been proposed.

Fig. 1 is a schematic flow chart of a method for determining a relative pose of a camera according to an embodiment of the present disclosure. As shown in fig. 1, a method for determining a relative pose of a camera according to an embodiment of the present disclosure includes:

s1: and acquiring the elevation data of the target object in the reference panoramic depth map, and acquiring the elevation data of the target object in the target panoramic depth map. Wherein the reference panoramic depth map is obtained by processing after the camera shoots an image at the first shooting point. The target panoramic depth map is obtained through the post-processing of the camera at the second shooting point. The elevation data of the object is data of the object on the elevation in the corresponding panoramic depth map (reference panoramic depth map or target panoramic depth map).

Specifically, a reference panoramic depth map is generated from an image taken by the camera at a first shooting point, and a target panoramic depth map is generated from an image taken by the camera at a second shooting point. The camera may be a panoramic camera, that is, the reference panorama and the target panorama are directly photographed by the panoramic camera, and then the reference panorama and the target panorama are respectively converted into a reference panoramic depth map and a target panoramic depth map. In addition, the camera can also adopt a monocular camera, a plurality of images are respectively shot at the first shooting point and the second shooting point through the monocular camera, a reference panoramic image is obtained by splicing the images of the first shooting point, a target panoramic image is obtained by splicing the images of the second shooting point, and then the reference panoramic image and the target panoramic image are respectively converted into a reference panoramic depth image and a target panoramic depth image.

After the reference panoramic depth map and the target panoramic depth map are obtained, point cloud information in the reference panoramic depth map and the target panoramic depth map is respectively obtained, and therefore the vertical surface data of the target object in the reference panoramic depth map and the vertical surface data of the target object in the target panoramic depth map are generated. Illustratively, when the reference panoramic depth map and the target panoramic depth map are both obtained by shooting a certain house type, the target objects include walls and other objects (such as furniture, living goods and the like) in the house type.

In the embodiment of the disclosure, the facade data is acquired to ensure consistency of subsequent normal vector calculation, and a three-dimensional normal vector of the facade is consistent with a projected two-dimensional normal vector, if a point p is on the facade and a normal vector n (nx, ny, nz) of the facade, ny =0, and a normal vector of a two-dimensional point of the facade is n' (nx, nz) through downward planar projection, so that consistency is maintained, and ambiguity caused by dimension reduction is avoided.

S2: and generating a reference grid occupation map based on the vertical face data of the target object in the reference panoramic depth map, and generating a target grid occupation map based on the vertical face data of the target object in the target panoramic depth map. The grid occupies each grid in the map, and the probability that the grid has the obstacle is represented by the occupation probability.

In the embodiment of the disclosure, a reference grid occupation map and a target grid occupation map can be generated by using an occupation grid probability map construction method.

Fig. 2 is a grid occupancy map in one example of the present disclosure. As shown in fig. 2, the black part in the graph is the position of the input point cloud data, the algorithm constructs the probability that the whole path passes through the integer coordinate according to the line segment from the origin O to the position, the initial probability of the whole graph is 0.5, the probability in the path (not including the end point) is less than 0.5, the probability is represented by white, and the probability of the end point position is greater than 0.5, and the probability is represented by black.

S3: and generating a vertical surface normal vector two-dimensional histogram based on the vertical surface data of the target object in the reference panoramic depth map and the vertical surface data of the target object in the target panoramic depth map.

In the embodiment of the disclosure, a vertical surface normal vector two-dimensional histogram including information of a target object in a reference panoramic depth map and vertical surface data of the target object in a target panoramic depth map can be constructed in a point cloud projection manner.

Fig. 3 is a vertical normal vector two-dimensional histogram of a box-model room in one example of the present disclosure. As shown in fig. 3, in the horizontal angle of 0-360, every 3 degrees is divided into an interval, and according to the number of point clouds in the interval of point cloud normal vector statistics, an obvious peak value characteristic will exist indoors, as shown in the diagram, histograms 1 and 2 respectively represent normal vector histograms of two different shooting points statistics, and under the assumption that adjacent wall surfaces are perpendicular to each other, the peak value appears once every 90 degrees.

S4: based on the vertical face normal vector two-dimensional histogram, a set of camera relative rotation angles between the first shot point and the second shot point is determined.

Specifically, after the normal vectors of the point clouds of the first shooting point and the second shooting point are rotated, the normal vectors should be consistent with the normal vectors of the paired shooting points at the corresponding points, assuming that the second shooting point is Ci and the first shooting point paired with the second shooting point is Cj, assuming that two-dimensional normal vectors { n' } can be clustered into a limited number of classes, the disclosed embodiment limits the number of classes to be at most 8, that is, the camera relative rotation angle set includes at most 8 relative rotation angles.

Note that Ci corresponds to a two-dimensional normal vector histogram portion Hi, Cj corresponds to a two-dimensional normal vector histogram portion Hj, and the rotation R2 on the two-dimensional plane is a rotation from Ci to Cj, so that the difference between Hi and Hj after the rotation through the angle corresponding to R2 should be locally extremely small, and a plurality of relative rotation angles are finally determined by calculating the difference between the histograms Hi and Hj after the rotation angle. A set of camera relative rotation angles is generated based on the determined plurality of relative rotation angles.

S5: and determining a camera relative pose set between the first shooting point and the second shooting point based on the camera relative rotation angle set, the reference grid occupation map and the target grid occupation map.

Specifically, a translation search window W is set by taking a camera coordinate system where Cj is located as a world coordinate system, W is uniformly divided into k × k small windows { Ws }, and a proper k needs to be selected here, so that the size of Ws is preferably within an interval [0.5, 1.0], thereby preventing the increase of operation amount caused by too fine division and simultaneously avoiding that the division is too coarse so that a true solution cannot be found.

And taking a small window Ws out of the { Ws }, taking a rotation angle theta out of the { theta }, and searching a two-dimensional optimal pose T (theta, tx, ty) for the window Ws and the rotation theta by a branch-and-bound method. When Ws is taken through the window in { Ws } and theta is taken through the value in { theta }, k multiplied by r local optimal solutions are obtained and are marked as a camera relative pose set { T }. The optimal pose in a window is searched in the branch-and-bound method, firstly, an occupied grid map generated by reference point cloud is input, the map is rotated according to a rotating window to obtain a series of rotated maps, then, the maps are subjected to layer-by-layer maximization according to a translation window, a set position of 5 layers is assumed to be set, 5 layers of maps are generated totally, the problem of translation amount calculation is converted into a quadtree search problem, and a global optimal solution of the pose is obtained.

S6: and acquiring a reference panoramic color image corresponding to the reference panoramic depth image, acquiring a target panoramic color image corresponding to the target panoramic depth image, performing visual feature matching on the reference panoramic color image and the target panoramic color image based on the camera relative pose set, and determining the camera relative poses of the first shooting point and the second shooting point based on the visual feature matching.

In the embodiment of the present disclosure, if it is determined that all the camera relative pose elements in the camera relative pose set are true solutions, that is, there is no wrong relative pose due to occlusion or image overlap, visual feature matching between the reference panoramic color image and the target panoramic color image may be performed on all the camera relative pose elements in the camera relative pose set, and the best matching result is used as the final camera relative pose of the first shooting point and the second shooting point.

In the embodiment, a reference panoramic depth map and a target panoramic depth map after image processing of camera shooting are firstly obtained, elevation data of a target object in the reference panoramic depth map and elevation data of the target object in the target panoramic depth map are respectively obtained, then a reference grid occupation map and a target grid occupation map are respectively generated based on the obtained elevation data, an elevation normal vector two-dimensional histogram is generated based on the obtained elevation data, then a camera relative rotation angle set of the camera between two panoramic image shooting points can be determined according to the elevation normal vector two-dimensional histogram, and a camera relative pose set is determined based on the camera relative rotation angle set, the reference grid occupation map and the target grid occupation map; and finally, performing visual feature matching on the reference panoramic color image and the target panoramic color image by calculating a camera relative pose set, and finally accurately acquiring the relative pose between two views of the camera. In the embodiment of the disclosure, the 1-degree-of-freedom rotation can be restricted in a plurality of limited search intervals by an estimation mode of relative rotation, so that the operation efficiency is improved. In addition, the situation that the grid global optimal solution is not a true solution under the condition of single measurement is reduced by calculating the local alignment optimal solution.

In one embodiment of the present disclosure, in step S1, acquiring the elevation data of the target object in the reference panoramic depth map includes:

s1-1: and projecting the reference panoramic depth map to obtain a three-dimensional point cloud under a camera coordinate system. Wherein the gravity direction of the reference panoramic depth map is known, and the data of the reference panoramic depth map is the data which is aligned according to the gravity direction. And if the reference panoramic depth map is the depth map without semantic information, projecting the reference panoramic depth map into a three-dimensional point cloud of a camera coordinate system. If the reference panoramic depth map is a depth map with semantic information, mask information of the wall surface is directly obtained through semantic segmentation of the image, and point clouds belonging to the wall surface are filtered out.

S1-2: and performing plane fitting based on the three-dimensional point cloud to obtain a plane set. Based on the three-dimensional point cloud, performing plane fitting through a Random sample consensus (RANSAC) algorithm to obtain a plane set { P }.

S1-3: and in the filtering plane set, the included angle between the normal vector and the gravity direction is smaller than the plane with the preset angle threshold value. Assuming that the gravity direction is g, the included angle alpha between the normal vector and g in the filtering { P } is smaller than a given preset angle threshold value alpha_minOf the plane of (a). Wherein, the size of alpha represents the vertical degree of the normal vector of the vertical surface and the gravity direction, and if the alpha is equal to 90 degrees, the vertical degree represents that the wall surface is exactly vertical to the ground.

S1-4: and obtaining the vertical surface data of the target object in the reference panoramic depth map based on the residual planes after the plane set filtering.

In the same manner as in steps S1-1 to S1-4, the elevation data of the target object in the target panoramic depth map may be acquired.

In this embodiment, the elevation data of the target object in the depth map can be quickly and accurately acquired through point cloud projection, plane fitting and threshold removal.

In one embodiment of the present disclosure, step S3 includes:

s3-1: and performing gravity direction projection on the elevation data of the target object in the reference panoramic depth map and the elevation data of the target object in the target panoramic depth map to obtain the elevation normal vector two-dimensional projection of the target object in the reference panoramic depth map and the target object in the target panoramic depth map.

S3-2: and after the vertical face normal vector two-dimensional projection is converted into a plane angle, generating a vertical face normal vector two-dimensional histogram based on the preset histogram granularity.

Specifically, assuming that a point cloud set of vertical face data generated by a first shooting point and a second shooting point is { Ci }, a vertical face set in Ci is { P }, setting a normal vector n of a point cloud normal vector P in { P }, determining a horizontal plane XZ perpendicular to the gravity direction through the gravity direction g, projecting all points in Ci to XZ, and projecting the normal vector to XZ to obtain a two-dimensional projection { n' } of a vertical face point cloud normal vector, for wider adaptability, the embodiment of the present disclosure does not directly fit the principal directions corresponding to these assumptions, but uses a normal vector histogram to describe the vertical face orientation information of the graph, and can describe that the vertical face range is extended to a vertical curved surface in the gravity direction, the generation manner of the vertical direction is to convert the two-dimensional normal vector into a plane angle of [0, 2 Π ], and construct a normal vector histogram H with a certain precision as a histogram granularity, for example, if the accuracy is 3 °, the total number of H intervals is 360/3 = 120.

In the embodiment, the normal vector histogram is used to describe the vertical face orientation information, and the vertical face range that can be described is expanded to a vertical curved surface in the gravity direction, so that the relative rotation angle of the camera can be conveniently determined in the subsequent steps.

In one embodiment of the present disclosure, step S4 includes:

s4-1: and acquiring a target two-dimensional histogram unit corresponding to the vertical normal vector two-dimensional projection of the target object in the reference panoramic depth map in the vertical normal vector two-dimensional histogram.

S4-2: and carrying out periodic continuation on the target two-dimensional histogram unit.

Specifically, Hj is subjected to period extension, and the interval from Hj to [ n, 2n ] can be extended during actual calculation. And Hj is a target two-dimensional histogram unit corresponding to the vertical face normal vector two-dimensional projection of the target object in the reference panoramic depth map in the vertical face normal vector two-dimensional histogram.

S4-3: and acquiring a plurality of peak values of the target two-dimensional histogram unit after the period prolongation.

S4-4: based on the plurality of peak values, a plurality of relative rotation angles corresponding to the plurality of peak values are determined.

In particular, integersθIs taken for a whileIAll values of (A), (B) and (C), I is [0, n-1 ]]All integers in (1) such that the following holdsθ：

Wherein the content of the first and second substances,θrepresenting a relative rotation angle of the camera, associated with a peak in a target two-dimensional histogram cell;H _j（x-θ) Expressed in a vertical face normal vector two-dimensional histogram with (x-θ) Is a two-dimensional histogram part corresponding to a vertical normal vector of a target object in a target panoramic depth map of an abscissa,H _i（x) Is represented in a vertical face normal vector two-dimensional histogramxAnd the two-dimensional histogram part corresponds to the vertical surface normal vector of the target object in the reference panoramic depth map of the abscissa.

S4-5: a set of camera relative rotation angles is generated based on the plurality of relative rotation angles.

In particular, fromIIn which is removedθRepeating the step S4-4 to solve the problemθJoining sets sequentiallyθIn, wherein ifθAndθthe existing value distance in (b) does not exceed 2, orIIf there is no more value, then the join is aborted.

The first 8 elements in the { theta } are retained, if the number is less than 8, all the elements are retained, the filtered set { theta } is a rotated initial search range, and the number of the elements in the set { theta } is recorded as r.

Referring to fig. 3 and 4, fig. 4 shows an initial state of point clouds formed at two shooting points, where the point clouds formed at the two shooting points have a certain rotation angle, which indicates that an included angle exists between walls that are parallel to each other when the two point clouds are spliced together, and the histogram shows a phase shift. Referring to fig. 3, the phase shift is represented by the abscissa difference of the peak, so that the rotation angle between two point clouds can be determined by simply determining the abscissa difference. Since the abscissa difference is determined by the peak value, but not all peak values are involved in the calculation, it is necessary to filter out the best possible peak value. Referring to fig. 3, there are a plurality of peaks corresponding to the normal vector histogram 1, the position of the dotted line is the maximum peak of the histogram 1, the corresponding angle is selected according to the abscissa calculation, and two coordinates of the left and right sides near the peak are excluded, so as to achieve the purpose of suppressing the non-maximum value, and all angles are selected continuously and iteratively, and only the first 8 angles at most are reserved.

In this embodiment, the number of elements which best meet the relative rotation angle of the camera in the relative rotation angle set is determined based on the number of wall surfaces of a room under normal conditions, and then the range of the relative rotation angle of the camera can be quickly determined through the phase difference of the peak value in the vertical surface normal vector two-dimensional histogram, so that the system operation efficiency is improved.

In one embodiment of the present disclosure, in step S5, performing visual feature matching on the reference panoramic color map and the target panoramic color map based on the set of camera relative poses includes:

s5-1: one camera relative pose element at a time is acquired from the set of camera relative poses.

S5-2: and acquiring the pixel overlapping quantity of the pose-adjusted target panoramic depth map and the reference panoramic depth map based on the pose-adjusted target panoramic depth map according to the currently acquired relative pose elements of the camera.

S5-3: and if the number of pixel overlaps is less than a preset pixel overlap threshold value, deleting the currently acquired camera relative pose element from the camera relative pose set. The preset pixel overlap threshold is set to detect whether the reference panoramic depth map and the target panoramic depth map are based on images acquired and processed by the same object, for example, when the reference panoramic depth map is a panoramic depth map for a house type a and the target panoramic depth map is a panoramic depth map for a house type B, due to different acquisition targets, even if the camera pose is adjusted, the overlap of the image pixels after adjustment is very low. If the image acquisition targets of the reference panoramic depth map and the target panoramic depth map are all house type A, after the camera pose is accurately adjusted, the pixel overlapping degree is usually higher than a preset pixel overlapping degree threshold value.

S5-4: deleting camera relative pose elements in the camera relative pose set, wherein the overlapping quantity of all pixels is smaller than a preset pixel overlapping degree threshold value, so as to obtain a first set;

s5-5: visual feature matching is performed on the reference panoramic color map and the target panoramic color map based on the first set.

In this embodiment, by setting a preset pixel overlap threshold and comparing the overlap of the two images after the camera pose adjustment, the obviously wrong relative camera pose can be removed.

In one embodiment of the present disclosure, step S5-5 includes:

s5-5-1: determining an occlusion cost look-up table based on the reference grid occupancy map. Wherein the occlusion cost lookup table gives the occlusion cost when different positions are occluded in the reference grid occupation map. The shielding cost of a certain point in the map occupied by the reference grid is the ratio of the area of an affected area of the point to the irradiation light to the total area of the map occupied by the reference grid when an obstacle exists at the point.

S5-5-2: and determining the occlusion cost corresponding to each camera relative pose element in the first set based on the occlusion cost lookup table.

S5-5-3: and deleting all camera relative pose elements with the shielding cost larger than a preset shielding cost threshold value in the first set to obtain a second set. And if the shielding cost is greater than a preset shielding cost threshold value, determining that the currently selected camera relative pose element is wrong, and deleting the wrong camera relative pose element.

S5-5-4: and performing visual feature matching on the reference panoramic color image and the target panoramic color image based on the second set.

In this embodiment, for each camera relative pose element in the first set, a corresponding occlusion cost is calculated, and by using the occlusion cost calculation result, an obviously erroneous result can be deleted, so that the accuracy of the finally determined camera relative pose can be improved.

In one embodiment of the present disclosure, S5-5-1 includes:

s5-5-1-1: and determining the shielding area of the target coordinate in the map occupied by the reference grid based on the shielding range of the target coordinate in the horizontal direction and the shielding range of the target coordinate in the vertical direction.

FIG. 5 is a schematic view of a barrier illuminated by horizontal and vertical light in one example of the disclosure. When the reference panoramic depth map and the target panoramic depth map are in a posture offset as shown in fig. 5, the blocked light cannot reach the original position according to the original path, and the light irradiated from the horizontal direction in fig. 5 (a) and the light irradiated from the vertical direction in fig. 5 (b) cannot pass through the obstacle. For a point on the barrier in fig. 5 (a) and 5 (b), for example, a point indicated by an arrow in fig. 5 (a) and 5 (b), the blocking range in the figure can be determined based on the light irradiated in the horizontal direction and the light irradiated in the vertical direction.

S5-5-1-2: and taking the ratio of the shielding area of the target coordinate between the map occupied by the reference grid and the total area of the map occupied by the reference grid as the shielding cost of the target coordinate.

S5-5-1-3: and acquiring corresponding occlusion costs of all designated coordinates in the map occupied by the reference grid.

S5-5-1-4: and generating an occlusion cost lookup table based on the corresponding occlusion costs of all the specified coordinates in the reference grid occupied map.

More specifically, step S5-5-1 includes:

for the reference occupancy grid map M, a matrix Occ of the same size as M is generated and all coordinates are assigned a value of 0. Where Occ is a pre-computed map for occlusion costs.

According to the traversal M of the abscissa, calculating x continuous unoccupied path { Lx } set for the x abscissa, wherein Lx is the interval [ y_min, y_max]So that x [ y × [_min, y_max]Wherein all coordinates satisfy p in M<0.5. Lx is calculated in the following way: traversing the ordinate y in turn, if the probability p of the coordinate (x, y) corresponding to M<0.5 when y_minWhen not set, let y_min= y; when y is_minIf it is set, then continue to traverse until the probability p of (x, y) corresponding to M is greater than or equal to 0.5, at which point let y_max= y-1, will y_minSet to the unset state, the interval Lx = [ y =_min, y_max]The set { Lx } is added until all ordinate traversals are complete.

According to the traversal M of the ordinate, for y of the ordinate, calculating a continuous unoccupied path { Ly } set of y, Ly being the interval [ x_min, x_max]So that yx [ x_min, x_max]Wherein all coordinates satisfy p in M<0.5. The calculation mode of Ly is as follows: traversing the abscissa x in turn, if the coordinate (x, y) corresponds to the probability p of M<0.5 when x_minWhen not set, let x_min= x; when xmin has been set, then continue traversing until the probability p of (x, y) corresponding to M is greater than or equal to 0.5, at which point let x be_max= x-1, mixing x_minSet to the unset state, the interval Ly = [ x =_min, x_max]And adding the set Ly until all the abscissa traversals are finished.

Traversing all coordinates (x, y) in M, if the coordinates (x, y) correspond to the probability p of M<0.5, the only x-corresponding one is selected from { Lx }An interval Lx = [ y ]_min, y_max]Selecting a unique interval Ly = [ x ] corresponding to y from { Ly }_min, x_max]The occlusion cost Occ at pixel coordinate (x, y) is calculated according to the following formula:

and updating the occlusion cost to the coordinates corresponding to Occ after the calculation is completed.

In this embodiment, the unoccupied position in the map occupied by the reference grid is used as the position where the light can propagate and penetrate, the light can propagate from any direction, here, the two directions x and y are taken as representatives, Lx and Ly are respectively represented by the line segments in fig. 5, which can make the light linearly propagate to the map key point, for example, [ y [_min, y_max]The accumulation of all the sections is the position of the map which is not occupied.

If the occlusion occurs and the ray cannot pass through the original location, there is a smaller travel distance in both the x and y directions than the original path.

FIG. 6 is a schematic diagram of determining an occlusion cost area in one example of the present disclosure. As shown in fig. 6 for O₀，O₁，O₂And O₃The length of the two communication paths in the x and y directions corresponding to the coordinates forms a rectangle, the shielding area is the area of the smallest rectangle formed by the coordinates and the external rectangle (namely the area of the grid part), and the shielding cost is the ratio of the shielding area to the total area. When the coordinates are close to the boundary of the rectangle, the occlusion areas are all small values (e.g., coordinate O)₀And O₁) The occlusion cost is closer to 0 (e.g., coordinate O) as the coordinate is closer to the corner of the rectangle₀) And the closer the coordinate is to the geometric center of the rectangle, the closer the cost is to 0.25 (e.g., coordinate O)₃）。

In this embodiment, based on the ratio of the shielding area to the total area determined by one coordinate, the shielding cost of the coordinate can be determined, and then a shielding cost lookup table can be generated, which is convenient for the subsequent step to determine the required shielding cost quickly based on the shielding cost lookup table.

In one embodiment of the present disclosure, in step S6, performing visual feature matching on the reference panoramic color image and the target panoramic color image based on the set of camera relative poses, and determining the camera relative poses of the first shooting point and the second shooting point based on the visual feature matching result, including:

respectively calculating the visual feature matching degree between the reference panoramic color image and the target panoramic color image based on the relative pose elements of each camera in the second set to obtain a plurality of visual feature matching degrees; and determining the camera relative poses of the first shooting point and the second shooting point based on the camera relative pose element with the highest visual feature matching degree.

Specifically, one camera relative pose element Tji is selected from the second set, a visual matching score s of Tji is calculated, and epipolar constraint corresponding to Tji is calculated. Supposing that the visual characteristic extracted by the target panoramic depth map Ci corresponding to the panoramic color map is { Fi }, the visual characteristic extracted by the reference panoramic depth map Cj corresponding to the panoramic color map is { Fj }, searching for a matching point of Ci in Cj, if the matching point of the characteristic point Fi in Cj is Fj, calculating a descriptor distance dij of the characteristic point Fi, and if dij meets a given threshold d_maxLet s = s +(s)_max-dij). Wherein s is_maxGiven greater than d_maxScore value of, e.g. for ORB feature descriptors, s_maxGenerally 256, d_maxGenerally, 128 is taken, scores corresponding to all the camera relative pose elements in the second set are calculated according to the above rule, and the camera relative pose element with the largest score is selected as the optimal solution, namely the camera relative poses of the first shooting point and the second shooting point.

In this embodiment, for the camera relative pose set, after the camera relative pose elements which are wrong in the occlusion processing and the overlap processing are deleted, the optimal camera relative pose can be screened out finally by calculating the matching degree score of the visual features for the remaining camera relative pose elements.

In addition, the embodiment of the present disclosure further provides an electronic device including:

a memory for storing a computer program;

a processor, configured to execute the computer program stored in the memory, and when the computer program is executed, implement the method for determining the relative pose of the camera according to any of the above embodiments of the present disclosure.

Fig. 7 is a schematic structural diagram of an embodiment of an electronic device according to the present disclosure. Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 7. The electronic device may be either or both of the first device and the second device, or a stand-alone device separate from them, which stand-alone device may communicate with the first device and the second device to receive the acquired input signals therefrom.

As shown in fig. 7, the electronic device includes one or more processors and memory.

The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

The memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by a processor to implement the camera relative pose determination methods of the various embodiments of the present disclosure described above and/or other desired functions.

In one example, the electronic device may further include: an input device and an output device, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input device may also include, for example, a keyboard, a mouse, and the like.

The output device may output various information including the determined distance information, direction information, and the like to the outside. The output devices may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 7, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device may include any other suitable components, depending on the particular application.

In addition to the above methods and apparatuses, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the method of determining the relative pose of a camera according to various embodiments of the present disclosure described in the above section of this specification.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the method for determining the relative pose of a camera according to various embodiments of the present disclosure described in the above section of the present specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method for determining relative pose of a camera, comprising:

2. The method of determining the camera relative pose of claim 1, wherein the visually feature matching the reference panoramic color map and the target panoramic color map based on the set of camera relative poses comprises:

3. The method of determining the relative pose of the camera as recited in claim 2, wherein the visually feature matching the reference panoramic color map and the target panoramic color map based on the first set comprises:

4. The method for determining the relative pose of a camera according to claim 3, wherein determining the occlusion cost look-up table based on the reference grid occupancy map comprises:

5. The method of determining the camera relative pose according to claim 3 or 4, wherein the performing the visual feature matching on the reference panoramic color map and the target panoramic color map based on the set of camera relative poses, and determining the camera relative pose of the first shot point and the second shot point based on the visual feature matching result comprises:

respectively calculating the visual feature matching degree between the reference panoramic color image and the target panoramic color image based on each camera relative pose element in the second set to obtain a plurality of visual feature matching degrees;

and determining the camera relative poses of the first shooting point and the second shooting point based on the camera relative pose element with the highest visual feature matching degree.

6. The method for determining the relative pose of a camera according to claim 1, wherein the obtaining elevation data of the target object in the reference panoramic depth map comprises:

7. The method for determining the relative pose of a camera according to claim 1, wherein generating a two-dimensional histogram of elevation normal vectors based on elevation data of the target object in the reference panoramic depth map and elevation data of the target object in the target panoramic depth map comprises:

8. The method of determining the relative pose of the camera according to claim 7, wherein the determining a set of relative rotation angles of the camera between the first shot point and the second shot point based on the vertical face normal vector two-dimensional histogram comprises:

acquiring a target two-dimensional histogram unit corresponding to the vertical face normal vector two-dimensional projection of the target object in the reference panoramic depth map in the vertical face normal vector two-dimensional histogram;

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program stored in the memory, and when the computer program is executed, implementing the method for determining the relative pose of the camera according to any one of claims 1 to 8.

10. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the method for determining the relative pose of a camera according to any one of claims 1 to 8.