GB2560301A

GB2560301A - Methods and apparatuses for determining positions of multi-directional image capture apparatuses

Info

Publication number: GB2560301A
Application number: GB1702680.8A
Authority: GB
Inventors: You Yu; Wang Tinghuai; Fan Lixin
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2017-02-20
Filing date: 2017-02-20
Publication date: 2018-09-12
Also published as: WO2018150086A2; WO2018150086A3; GB201702680D0

Abstract

A method comprises processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses. Each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses. Image re-projection is performed on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera. The plurality of second images is processed to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, a position is determined of each of the plurality of multi-directional image capture apparatuses.

Description

(54) Title of the Invention: Methods and apparatuses for determining positions of multi-directional image capture apparatuses

Abstract Title: Determining Positions of Multi-Directional Image Capture Apparatus (57) A method comprises processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses. Each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses. Image re-projection is performed on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera. The plurality of second images is processed to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, a position is determined of each of the plurality of multidirectional image capture apparatuses.

FIGURE 4

1/5

FIGURE 1

2/5

22
	22
	Ο
23	23 23 23

23

FIGURE 2A

3/5

FIGURE 3A \Z

FIGURE 3B

4/5

S4.1

S4.2

S4.3

S4.4

S4.5

S4.6

S4.7

FIGURE 4

5/5

FIGURES

FIGURE 6

Methods and Apparatuses for Determining Positions of MultiDirectional Image Capture Apparatuses

Technical Field

The present specification relates to methods and apparatuses for determining positions of multi-directional image capture apparatuses.

Background

Camera pose registration is an important technique used to determine positions and orientations of image capture apparatuses such as cameras. The recent advent of commercial multi-directional image capture apparatuses, such as 360° camera systems, brings new challenges with regard to the performance of camera pose registration in a reliable, accurate and efficient manner.

Summary

According to a first aspect, this specification describes a method comprising processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, performing image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.

The first images may be fisheye images.

Processing the plurality of first images to generate the plurality of stereo-pairs of panoramic images may comprise de-warping the first images and stitching the dewarped images to generate the panoramic images.

The second images may be rectilinear images.

The processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.

The panoramic images of each stereo-pair may be offset from each other by a baseline distance.

The baseline distance to be used may be a predetermined fixed distance.

The baseline distance to be used may be determined by: minimising a cost function which indicates an error associated with use of each of a plurality of baseline distances, and determining that the baseline distance associated with the lowest error is to be used.

The processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras and the cost function maybe a weighted average of: re-projection error from the structure from motion algorithm and variance of calculated baseline distances between stereo-pairs of virtual cameras.

The method may further comprise determining a pixel to real world distance conversion factor based on the determined positions of the virtual cameras and the baseline distance used.

The processing of the plurality of second images may generate respective orientations of the virtual cameras, and the first aspect may further comprise: based on the generated orientations of the virtual cameras, determining an orientation of each of the plurality of multi-directional image capture apparatuses.

According to a second aspect, this specification describes apparatus configured to perform any method described with reference to the first aspect.

-3According to a third aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method described with reference to the first aspect.

According to a fourth aspect, this specification describes apparatus comprising at least one processor, and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to: process a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi10 directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, perform image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, process the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determine a position of each of the plurality of multi-directional image capture apparatuses.

The first images may be fisheye images.

The second images maybe rectilinear images.

The baseline distance to be used may be a predetermined fixed distance.

-4The baseline distance to be used may be determined by: minimising a cost function which indicates an error associated with use of each of a plurality of baseline distances, and determining that the baseline distance associated with the lowest error is to be used.

The computer program code, when executed by the at least one processor, may cause the apparatus to determine a pixel to real world distance conversion factor based on the determined positions of the virtual cameras and the baseline distance used.

The processing of the plurality of second images may generate respective orientations of the virtual cameras, and the computer program code, when executed by the at least one processor, may cause the apparatus to: determine an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.

According to a fifth aspect, this specification describes computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by at least one processor, causes performance of: processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multidirectional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, performing image reprojection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.

-5The computer-readable code stored on the medium of the fifth aspect may further cause performance of any of the operations described with reference to the method of the first aspect.

According to a sixth aspect, this specification describes apparatus comprising means for processing a plurality of first images to generate a plurality of stereo-pairs of panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo10 pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses, means for performing image re-projection on each panoramic image of the plurality of stereopairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, means for processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and means for determining a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras.

The apparatus of the sixth aspect may further comprise means for causing performance of any of the operations described with reference to the method of the first aspect.

Brief Description of the Drawings

For a more complete understanding of the methods, apparatuses and computer25 readable instructions described herein, reference is now made to the following description taken in connection with the accompanying drawings, in which:

Figure 1 illustrates an example of multiple multi-directional image capture apparatuses in an environment;

Figures 2A and 2B illustrate examples of ways in which images captured by a multidirectional image capture apparatus are processed;

Figures 3A and 3B illustrate the determination of the position and orientation of a multi-directional image capture apparatus relative to a reference coordinate system; Figure 4 is a flowchart illustrating examples of various operations which maybe performed by an image processing apparatus based on a plurality of images captured by a plurality of multi-directional image capture apparatuses;

-6Figure 5 is a schematic diagram of an example configuration of an image processing apparatus configured to perform various operations including those described with reference to Figure 4;

Figure 6 illustrates an example of a computer-readable storage medium with computer 5 readable instructions stored thereon.

Detailed Description

In the description and drawings, like reference numerals may refer to like elements throughout.

Figure 1 illustrates a plurality of multi-directional image capture apparatuses 10 located within an environment. The multi-directional image capture apparatuses 10 may, in general, be any apparatus capable of capturing images of a scene 13 from multiple different perspectives simultaneously. For example, multi-directional image capture apparatus 10 may be a 360° camera system (also known as an omnidirectional camera system or a spherical camera system). However, it will be appreciated that multidirectional image capture apparatus 10 does not necessarily have to have full angular coverage of its surroundings and may only cover a smaller field of view.

The term “image” used herein may refer generally to visual content. This may be visual content captured by, or derived from visual content captured by, multi-directional image capture apparatus 10. For example, an image may be a photograph or a single frame of a video.

As illustrated in Figure 1, each multi-directional image capture apparatus 10 may comprise a plurality of cameras 11. The term “camera” used herein may refer to a subpart of a multi-directional image capture apparatus 10 which performs the capturing of images. As illustrated, each of the plurality of cameras 11 of multi-directional image capture apparatus 10 may be facing a different direction to each of the other cameras 11 of the multi-directional image capture apparatus 10. As such, each camera 11 of a multidirectional image capture apparatus 10 may have a different field of view, thus allowing the multi-directional image capture apparatus 10 to capture images of a scene 13 from different perspectives simultaneously.

-ΊSimilarly, as illustrated in Figure l, each multi-directional image capture apparatus to may be at a different location to each of the other multi-directional image capture apparatuses to. Thus, each of the plurality of multi-directional image capture apparatuses to may capture images of the environment (via their cameras 11) from different perspectives simultaneously.

In the example scenario illustrated in Figure l, a plurality of multi-directional image capture apparatuses to are arranged to capture images of a particular scene 13 within the environment. In such circumstances, it may be desirable to perform camera pose registration in order to determine the position and orientation of each of the multidirectional image capture apparatuses 10. In particular, it maybe desirable to determine these positions and orientations relative to a particular reference coordinate system. This allows the overall arrangement of the multi-directional image capture apparatuses 10 relative to each other to be determined, which may be useful for a number of functions. For example, such information may be used for any of the following: performing 3D reconstruction of the captured environment, performing 3D registration of the multi-directional image capture apparatuses 10 with respect to other sensors such as LiDAR (Light Detection and Ranging) or infrared (IR) depth sensors, audio positioning of audio sources, playback of object-based audio with respect to multi-directional image capture apparatus 10 location, and presenting multi-directional image capture apparatuses positions as ‘hotspots’ to which a viewer can switch during virtual reality (VR) viewing.

One way of determining the positions of multi-directional image capture apparatuses

10 is to use Global Positioning System (GPS) localization. However, GPS only provides position information and does not provide orientation information. In addition, position information obtained by GPS may not be very accurate and may be susceptible to changes in the quality of the satellite connection. One way of determining orientation information is to obtain the orientation information from magnetometers and accelerometers installed in the multi-directional image capture apparatuses 10. However, such instruments may be susceptible to local disturbance (e.g. magnetometers maybe disturbed by a local magnetic field), so the accuracy of orientation information obtained in this way is not necessarily very high.

Another way of performing camera pose registration is to use a computer vision method. For example, position and orientation information can be obtained by

-8performing structure from motion (SfM) analysis on images captured by a multidirectional image capture apparatus to. Broadly speaking, SfM works by determining point correspondences between images (also known as feature matching) and calculating location and orientation based on the determined point correspondences.

However, when multi-directional image capture apparatuses to are used to capture a scene which lacks distinct features/textures (e.g. a corridor), determination of point correspondences between captured images may be unreliable due to the lack of distinct features/textures in the limited field of view of the images. In addition, since multi10 directional image capture apparatuses 10 typically capture fish-eye images, it may not be possible to address this by capturing fish-eye images with increased field of view, as this will lead to increased distortion of the images which may negatively impact point correspondence determination.

A computer vision method for performing camera pose registration which may address some or all of the challenges mentioned above will now be described.

Figure 2A illustrates one of the plurality of multi-directional image capture apparatuses 10 of Figure 1. Each of the cameras 11 of the multi-directional image capture apparatus

10 may capture a respective first image 21. Each first image 21 may be an image of a scene within the field of view 20 of its respective camera 11. In some examples, the lens of the camera 11 may be a fish-eye lens and so the first image 21 may be a fish-eye image (in which the camera field of view is enlarged). However, the method described herein may be applicable for use with lenses and resulting images of other types. More specifically, the camera pose registration method described herein may also be applicable to images captured by a camera with a hyperbolic mirror in which the camera optical centre coincides with the focus of the hyperbola, and images captured by a camera with a parabolic mirror and an orthographic lens in which all reflected rays are parallel to the mirror axis and the orthographic lens is used to provide a focused image.

The first images 21 maybe processed to generate a stereo-pair of panoramic images 22. Each panoramic image 22 of the stereo-pair may correspond to a different view of a scene captured by the first images 21 from which the stereo-pair is generated. For example, one panoramic image 22 of the stereo-pair may represent a left-eye panoramic image and the other one of the stereo-pair may represent a right-eye

-9panoramic image. As such, the stereo-pair of panoramic images 22 maybe offset from each other by a baseline distance B. By generating panoramic images 22 as an initial step, the effective field of view may be increased, which may allow the methods described herein to better deal with scenes which lack distinct textures (e.g. corridors).

The generated panoramas may be referred to as spherical (or part-spherical) panoramas in the sense that they may include image data from a sphere (or part of a sphere) around the multi-directional image capture apparatus 10.

If the first images 21 are fish eye images, processing the first images to generate the panoramic images may comprise de-warping the first images 21 and then stitching the de-warped images. De-warping the first images 21 may comprise re-projecting each of the first images to convert the first images 21 from a fish eye projection to a spherical projection. Fish eye to spherical re-projections are generally known in the art and will not be described here in detail. Stitching the de-warped images may, in general, be performed using any suitable image stitching technique. Many image stitching techniques are known in the art and will not be described here in detail. Generally, image stitching involves connecting portions of images together based on point correspondences between images (which may involve feature matching).

Following the generation of the stereo-pair of panoramic images 22, the stereo pair may be processed to generate one or more second images 23. More specifically, image reprojection may be performed on each of the panoramic images 22 to generate one or more re-projected second images 23. For example, if the panoramic image 22 is not rectilinear (e.g. if it is curvilinear), it may be re-projected to generate one or more second images 23 which are rectilinear images. As illustrated in Figure 2A, a corresponding set of second images 23 may be generated for each panoramic image 22 of the stereo pair. The type of re-projection maybe dependent on the algorithm used to analyse the second images 23. For instance, as is explained below, structure from motion algorithms, which are typically used to analyse rectilinear images, may be used, in which case the re-projection may be selected so as to generate rectilinear images. However, it will be appreciated that, in general, the re-projection may generate any type of second image 23, as long as the image type is compatible with the algorithm used to analyse the re-projected images 23.

- 10 Each re-projected second image 23 maybe associated with a respective virtual camera. A virtual camera is an imaginary camera which does not physically exist, but which corresponds to a camera which would have captured the re-projected second image 23 with which it is associated. A virtual camera may be defined by virtual camera parameters which represent the configuration of the virtual camera required in order to have captured to the second image 23. As such, for the purposes of the methods and operations described herein, a virtual camera can be treated as a real physical camera. For example, each virtual camera has, among other virtual camera parameters, a position and orientation which can be determined.

As illustrated by Figure 2B, the processing of each panoramic image 22 may be performed by resampling the panoramic image 22 based on a horizontal array of overlapping sub-portions 22-1 of the panoramic image 22. The sub-portions 22-1 may be chosen to be evenly spaced so that adjacent sub-portions 22-1 are separated by the same distance (as illustrated by Figure 2B). As such, the viewing directions of adjacent sub-portions 22-1 may differ by the same angular distance. A corresponding reprojected second image 23 maybe generated for each sub-portion 22-1. This maybe performed by casting rays following the pinhole camera model (which represents a first order approximation of the mapping from the spherical (3D) panorama to the 2D second images) based on a given field of view (e.g. 120 degrees) of each sub-portion 221 from a single viewpoint to the panoramic image 23. As such, each re-projected second image 23 may correspond to a respective virtual pinhole camera. The virtual pinhole cameras associated with second images 23 generated from one panoramic image 22 may all have the same position, but different orientations (as illustrated by Figure 3A).

Each second image 23 generated from one of the stereo-pair of panoramic images 22 may form a stereo pair with a second image 23 from the other one of the stereo-pair of panoramic images 22. As such, each stereo-pair of second images 23 may correspond to a stereo-pair of virtual cameras. Each stereo-pair of virtual cameras may be offset from each other by the baseline distance as described above.

It will be appreciated that, in general, any number of second images 23 may be generated. Generally speaking, generating more second images 23 may lead to less distortion in each of the second images 23, but may also increase computational complexity. The precise number of second images 23 maybe chosen based on the

- 11 scene/environment being captured by the multi-directional image capture apparatus

10.

The methods described with reference to Figures 2A and 2B maybe performed for each 5 of a plurality of multi-directional image capture apparatuses 10 which are capturing the same general environment, e.g. the plurality of multi-directional images capture apparatuses 10 as illustrated in Figure 1. In this way, all of the first images 21 captured by a plurality of multi-directional image capture apparatuses 10 of a particular scene maybe processed as described above.

It will be appreciated that the first images 21 may correspond to images of a scene at a particular moment in time. For example, if the multi-directional image capture apparatuses 10 are capturing video images, a first image 21 may correspond to a single video frame of a single camera 11, and all of the first images 21 may be video frames that are captured at the same moment in time.

Figures 3A and 3B illustrate the process of determining the position and orientation of a multi-directional image capture apparatus 10. In Figures 3A and 3B, each arrow 31, represents the position and orientation of a particular element in a reference coordinate system 30. The base of the arrow represents the position and the direction of the arrow represents the orientation. More specifically, each arrow 31 in Figure 3A represents the position and orientation of a virtual camera associated with a respective second image 23, and the arrow 32 in Figure 3B represents the position and orientation of the multi-directional image capture apparatus 10.

After generating the second images 23, the second images 23 may be processed to generate respective positions of the virtual cameras associated with the second images 23. The output of the processing for one multi-directional image capture apparatus 10 is illustrated by Figure 3A. The processing may include generating the positions of a set of virtual cameras for each panoramic image 22 of the stereo-pair of panoramic images.

As illustrated by Figure 3A, one set of arrows 33A may correspond to virtual cameras of one of the stereo-pair of panoramic images 22, and the other set of arrows 33B may correspond to virtual cameras of the other one of the stereo-pair of panoramic images. The generated positions may be relative to the reference coordinate system 30. The processing of the second images may also generate respective orientations of the virtual cameras relative to the reference coordinate system 30. As mentioned above and

- 12 illustrated by Figure 3A, all of the virtual cameras of each set of virtual cameras, which correspond to the same panoramic image 22, may have the same position but different orientations.

It will be appreciated that, in order to perform the processing for a plurality of multidirectional image capture apparatuses 10, it maybe necessary for the multi-directional image capture apparatuses 10 to have at least partially overlapping fields of view with each other (for example, in order to allow point correspondence determination as described below).

The above described processing may be performed by using a structure from motion (SfM) algorithm to determine the position and orientation of each of the virtual cameras. The SfM algorithm may operate by determining point correspondences between various ones of the second images 23 and determining the positions and orientations of the virtual cameras based on the determined point correspondences. For example, the determined point correspondences may impose certain geometric constraints on the positions and orientations of the virtual cameras, which can be used to solve a set of quadratic equations to determine the positons and orientations of the virtual cameras relative to the reference coordinate system 30. More specifically, in some examples, the SfM process may involve any one of or any combination of the following operations: extracting images features, matching image features, estimating camera position, reconstructing 3D points, and performing bundle adjustment.

Once the positions of the virtual cameras have been determined, the position of the multi-directional image capture apparatus 10 relative to the reference coordinate system 30 may be determined based on the determined positions of the virtual cameras. Similarly, once the orientations of the virtual cameras have been determined, the orientation the multi-directional image capture apparatus 10 relative to the reference coordinate system 30 may be determined based on the determined orientations of the virtual cameras. The position of the multi-directional image capture apparatus 10 may be determined by averaging the positions of the two sets 33A, 33B of virtual cameras illustrated by Figure 3A. For example, as illustrated, all of the virtual cameras of one set 33A may have the same position as each other and all of the virtual cameras of the other set 33B may also have the same position as each other. As such, the position of the multi-directional image capture apparatus 10 may be determined to

-13be the average of the two respective positions of the two sets 33A, 33B of virtual cameras.

Similarly, the orientation of the multi-directional image capture apparatus 10 may be 5 determined by averaging the orientation of the virtual cameras. In more detail, the orientation of the multi-directional image capture apparatus 10 may be determined in the following way.

The orientation of each virtual camera may be represented by rotation matrix Ri. The 10 orientation of the multi-directional image capture apparatus 10 may be represented by rotation matrix Rdev The orientation of each virtual camera relative to the multidirectional image capture apparatus 10 may be known, and may be represented by rotation matrix Ridev Thus, the rotation matrices Ri of the virtual cameras may be used to obtain a rotation matrix for multi-directional image capture apparatus 10 the according to:

Put another way, the rotation matrix of a multi-direction image capture apparatus 20 (Rdev} can be determined by multiplying the rotation matrix of a virtual camera (Rj} onto the inverse of the matrix representing the orientation of the virtual camera relative to the orientation of the multi-directional image capture apparatus (Ridev¹}·

For example, if there are twelve virtual cameras (six from each panoramic image 22 of 25 the stereo-pair of panoramic images) corresponding to the multi-directional image capture apparatus 10 (as illustrated in Figure 3A) then twelve rotation matrices are obtained for the orientation of the multi-directional image capture apparatus 10. Each of these rotation matrices may then be converted into corresponding Euler angles to obtain a set of Euler angles for the multi-directional image capture apparatus 10. The set of Euler angles may then be averaged and converted into a final rotation matrix representing the orientation of the multi-directional image capture apparatus 10.

The set of Euler angles may then be averaged according to:

-14θι ~ arct&n

ΣΧο &<$)

Where θι represents the averaged Euler angles for a multi-directional image capture apparatus 10 and θι represents the set of Euler angles. Put another way, the averaged

Euler angles are determined by calculating the sum of the sines of the set of Euler angles divided by the sum of the cosines of the set of Euler angles, and taking the arctangent of the ratio, θι may then be converted back into a rotation matrix representing the final determined orientation of multi-directional image capture apparatus 10.

It will be appreciated that the above formula is for the specific example in which there are nine virtual cameras - the maximum value of i may vaiy according to the number of virtual cameras generated. For example, if there are twelve virtual cameras as illustrated in Figure 3A, then i may take values from zero to eleven.

In some examples, unit quaternions maybe used instead of Euler angles for the abovementioned process. The use of unit quaternions to represent orientation is a known mathematical technique and will not be described in detail here. Briefly, quaternions q_lt q₂,... qN corresponding to the virtual camera rotation matrices maybe determined. Then, the quaternions may be transformed, as necessary, to ensure that they are all on the same side of the 4D hypersphere. Specifically, one representative quaternion q_M is selected and the signs of any quaternions qi where the product of qM and qi is less than zero may be inverted. Then, all quaternions qi (as 4D vectors) may be summed into an average quaternion q_A, and q_A may be normalised into a unit quaternion q_A. The unit quaternion q_A may represent the averaged orientation of the camera and may be converted back to other orientation representations as desired. Using unit quaternions to represent orientation maybe more numerically stable than Euler angles.

In will be appreciated that the generated positions of the virtual cameras (e.g. from the SfM algorithm) may be in units of pixels. Therefore, in order to enable scale conversions between pixels and a real world distance (e.g. metres), a pixel to real world distance conversion factor may be determined. This may be performed by determining the baseline distance B of a stereo-pair of virtual cameras in both pixels and in a real

-15world distance. The baseline distance in pixels may be determined from the determined positions of the virtual cameras in the reference coordinate system 30. The baseline distance in a real world distance (e.g. metres) maybe known already from being set initially during the generation of the panoramic images 22. The pixel to real world distance conversion factor may then be simply calculated by taking the ratio of the two distances. This may be further refined by calculating the conversion factor based on each of the stereo-pairs of virtual cameras, determining outliers and inliers (as described in more detail below), and averaging the inliers to obtain a final pixel to real world distance conversion factor. The pixel to real world distance conversion factor may be denoted Spvx^meter in the present specification.

The inlier and outlier determination may be performed according to:

ώ «κ LS\- ~ Mediaa(5‘)L ¥5) e 5‘ ™ .Median({^o,... ,4vl) inliers ~ € N _I5 4 where S is the set of pixel to real world distance ratios of all stereo-pairs of virtual cameras, di is a measure of the difference between a pixel to real world distance ratio and the median of all pixel to real world distance ratios, d_o is the median absolute deviation (MAD), m is a threshold value below which a determined pixel to real world distance ratio is considered an inlier (for example, m may be set to be 2). The MAD may be used as it may be a robust and consistent estimator of inlier errors, which follow a Gaussian distribution.

It will therefore be understood from the above expressions that a pixel to real world distance ratio may be determined to be an inlier if the difference between its value and the median value divided by the median absolute deviation is less than a threshold value. That is to say, for a pixel to real world distance ratio to be considered an inlier, the difference between its value and the median value must be less than a threshold number of times larger than the median absolute deviation.

Once final positions for a plurality of multi-directional image capture apparatuses 10 has been determined, the relative positions of the plurality of multi-directional image capture apparatuses maybe determined according to:

-ι6Λ'>

Λ;

In the above equation, ^ti' - represents the relative positions of one of the plurality of multi-directional image capture apparatuses (apparatus J) relative to another one of the plurality of multi-directional image capture apparatuses (apparatus z). ddev is the position of apparatus j and ddev is the position of apparatus i. Spaeiemeter is the pixel to real world distance conversion factor.

As will be understood from the above expression, a vector representing the relative position of one of the plurality of multi-directional image capture apparatuses relative to another one of the plurality of multi-directional image capture apparatuses may be determined by taking the difference between their positions. This maybe divided by the pixel-to-real world distance conversion factor depending on the scale desired.

As such, the positions of all of the multi-directional image capture apparatuses to relative to one another may be determined in the reference coordinate system 30.

The baseline distance B described above described above may be chosen in two different ways. One way is to set a predetermined fixed baseline distance (e.g. based on the average human interpupillary distance) to be used to generate stereo-pairs of panoramic images. This fixed baseline distance may then be used to generate all of the stereo-pairs of panoramic images.

An alternative way is to treat B as a variable within a range (e.g. a range constrained by the dimensions of the multi-directional image capture apparatus) and to evaluate a cost function for each value of B within the range. For example, this may be performed by minimising a cost function which indicates an error associated with the use of each of a plurality of baseline distances, and determining that the baseline distance associated with the lowest error is to be used.

The cost function may be defined as the weighted average of the re-projection error from the structure from motion algorithm and the variance of calculated baseline distances between stereo-pairs of virtual cameras. An example of a cost function which

-17may be used is E(B) = w_oxR(B)+WixV(B), where E(B) represents the total cost, R(B) represents the re-projection error returned by the SfM algorithm by aligning the generated second images from the stereo-pairs displaced by value B, V(B) represents the variance of calculated baseline distances, and w₀ and Wi are constant weighting parameters for R(B) and V(B) respectively.

As such, the above process may involve generating stereo-pairs of panoramic images for each value of B, generating re-projected second images from the stereo-pairs, and inputting the second images for each value of B into a structure from motion algorithm, as described above. It will be appreciated that the re-projection error from the structure from motion algorithm may be representative of a global registration quality and the variance of calculated baseline distances may be representative of the local registration uncertainty.

It will be appreciated that, by evaluating a cost function as described above, the baseline distance with the lowest cost (and therefore lowest error) may be found, and this maybe used as the baseline distance used to determine the position/orientation of the multi-directional image capture apparatus 10.

Figure 4 is a flowchart showing examples of operations as described herein.

At operation 4.1, a plurality of first images 21 which are captured by a plurality of multidirectional image capture apparatuses 10 may be received. For example, image data corresponding to the first images 21 maybe received at image processing apparatus 50 (see Figure 5).

At operation 4.2, the first images 21 maybe processed to generate a plurality of stereopairs of panoramic images 22.

At operation 4.3, the stereo-pairs of panoramic images 22 may be re-projected to generate re-projected second images 23.

At operation 4.4, the second images 23 from operation 4.3 maybe processed to obtain positions and orientations of virtual cameras. For example, the second images 23 may be processed using a structure from motion algorithm.

-18At operation 4.5, a pixel-to-real world distance conversion factor may be determined based on the positions of the virtual cameras determined at operation 4.4 and a baseline distance between stereo-pairs of panoramic images 22.

At operation 4.6, positions and orientations of the plurality of multi-directional image capture apparatuses 10 may be determined based on the positions and orientations of the virtual cameras 11 determined at operation 4.4.

At operation 4.7, positions of the plurality of multi-directional image capture 10 apparatuses 10 relative to each other may be determined based on the positions of the plurality of multi-directional image capture apparatuses 10 determined at operation

4-7It will be appreciated that, as described herein, the position of a virtual camera maybe the position of the centre of a virtual lens of the virtual camera. The position of the multi-directional image capture apparatus 10 may be the centre of the multi-directional image capture apparatus (e.g. if a multi-directional image capture apparatus is spherically shaped, its position may be defined as the geometric centre of the sphere).

Figure 5 is a schematic block diagram of an example configuration of image processing (or more simply, computing) apparatus 50, which maybe configured to perform any of or any combination of the operations described herein. The computing apparatus 50 may comprise memory 51, processing circuitry 52, an input 53, and an output 54.

The processing circuitry 52 may be of any suitable composition and may include one or more processors 52A of any suitable type or suitable combination of types. For example, the processing circuitry 52 may be a programmable processor that interprets computer program instructions and processes data. The processing circuitry 52 may include plural programmable processors. Alternatively, the processing circuitry 52 may be, for example, programmable hardware with embedded firmware. The processing circuitry 52 maybe termed processing means. The processing circuitry 52 may alternatively or additionally include one or more Application Specific Integrated Circuits (ASICs). In some instances, processing circuitry 52 maybe referred to as computing apparatus.

-19The processing circuitry 52 described with reference to Figure 5 may be coupled to the memory 51 (or one or more storage devices) and may be operable to read/write data to/from the memory. The memory 51 may store thereon computer readable instructions 512A which, when executed by the processing circuitry 52, may cause any one of or any combination of the operations described herein to be performed. The memory 51 may comprise a single memory unit or a plurality of memory units upon which the computer-readable instructions (or code) 512A is stored. For example, the memory 51 may comprise both volatile memory 511 and non-volatile memory 512. For example, the computer readable instructions 512A may be stored in the non-volatile memory 512 and may be executed by the processing circuitry 52 using the volatile memory 511 for temporary storage of data or data and instructions. Examples of volatile memory include RAM, DRAM, and SDRAM etc. Examples of non-volatile memory include ROM, PROM, EEPROM, flash memory, optical storage, magnetic storage, etc. The memories 51 in general may be referred to as non-transitory computer readable memory media.

The input 53 may be configured to receive image data representing the first images 21 described herein. The image data may be received, for instance, from the multidirectional image capture apparatuses 10 themselves or may be received from a storage device. The output 54 may be configured to output any of or any combination of the camera pose registration information described herein. As discussed above, the camera pose registration information output by the computing apparatus 50 may be used for various functions as described above with reference to Figure 1.

Figure 6 illustrates an example of a computer-readable medium 60 with computerreadable instructions (code) stored thereon. The computer-readable instructions (code), when executed by a processor, may cause any one of or any combination of the operations described above to be performed.

Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a “memory” or “computer-readable medium” may be any media or means that can contain, store, communicate, propagate or transport the

- 20 instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

Reference to, where relevant, “computer-readable storage medium”, “computer 5 program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc.

should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.

As used in this application, the term “circuitry” refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

If desired, the different functions discussed herein maybe performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagram of Figure 4 is an example only and that various operations depicted therein may be omitted, reordered and/or combined. For example, it will be appreciated that operation S4.5 as illustrated in Figure 4 maybe omitted.

Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.

- 21 It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims.

Claims

Claims

1. A method comprising:

processing a plurality of first images to generate a plurality of stereo-pairs of 5 panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses;

io performing image re-projection on each panoramic image of the plurality of stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;

processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and

15 based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.
2. The method of claim l, wherein the first images are fisheye images.

20
3. The method of claim 2, wherein processing the plurality of first images to generate the plurality of stereo-pairs of panoramic images comprises:

de-warping the first images; and stitching the de-warped images to generate the panoramic images.

25
4. The method of any one of the preceding claims, wherein the second images are rectilinear images.
5. The method of any one of the preceding claims, wherein the processing of the plurality of second images to generate respective positions of the virtual cameras

30 comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.
6. The method of any one of the preceding claims, wherein the panoramic images of each stereo-pair are offset from each other by a baseline distance.

-237- The method of claim 6, wherein the baseline distance to be used is a predetermined fixed distance.
8. The method of claim 6, wherein the baseline distance to be used is determined 5 by:

minimising a cost function which indicates an error associated with use of each of a plurality of baseline distances; and determining that the baseline distance associated with the lowest error is to be used.

io
9. The method of claim 8 wherein the processing of the plurality of second images to generate respective positions of the virtual cameras comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras and wherein the cost function is a weighted average of:

15 re-projection error from the structure from motion algorithm; and variance of calculated baseline distances between stereo-pairs of virtual cameras.
10. The method of any one of claims 6 to 9, further comprising:

20 determining a pixel to real world distance conversion factor based on the determined positions of the virtual cameras and the baseline distance used.
11. The method of any one of the preceding claims, wherein the processing of the plurality of second images generates respective orientations of the virtual cameras, and

25 the method further comprises:

based on the generated orientations of the virtual cameras, determining an orientation of each of the plurality of multi-directional image capture apparatuses.
12. Apparatus configured to perform a method according to any one of claims 1 to

30 11.
13. Computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform a method according to any one of claims 1 to

11.
14. Apparatus comprising:

-24at least one processor; and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to:

process a plurality of first images to generate a plurality of stereo-pairs of 5 panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses;

10 perform image re-projection on each panoramic image of the plurality of stereopairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;

process the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and
15 based on the generated positions of the virtual cameras, determine a position of each of the plurality of multi-directional image capture apparatuses.

15. The apparatus of claim 14, wherein the first images are fisheye images.

20
16. The apparatus of claim 15, wherein processing the plurality of first images to generate the plurality of stereo-pairs of panoramic images comprises:

de-warping the first images; and stitching the de-warped images to generate the panoramic images.

25
17. The apparatus of any one of claims 14 to 16, wherein the second images are rectilinear images.
18. The apparatus of any one of claims 14 to 17, wherein the processing of the plurality of second images to generate respective positions of the virtual cameras

30 comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.
19. The apparatus of any one of claims 14 to 18, wherein the panoramic images of each stereo-pair are offset from each other by a baseline distance.

-2520. The apparatus of claim 19, wherein the baseline distance to be used is a predetermined fixed distance.

21. The apparatus of claim 19, wherein the baseline distance to be used is 5 determined by:

minimising a cost function which indicates an error associated with use of each of a plurality of baseline distances; and determining that the baseline distance associated with the lowest error is to be used.

22. The apparatus of claim 21, wherein the processing of the plurality of second images to generate respective positions of the virtual cameras comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras and wherein the cost function is a weighted average of:

15 re-projection error from the structure from motion algorithm; and variance of calculated baseline distances between stereo-pairs of virtual cameras.

23. The apparatus of any one of claims 19 to 22, wherein the computer program
20 code, when executed by the at least one processor, causes the apparatus to:

determine a pixel to real world distance conversion factor based on the determined positions of the virtual cameras and the baseline distance used.
24. The apparatus of any one of claims 14 to 23, wherein the processing of the
25 plurality of second images generates respective orientations of the virtual cameras, and the computer program code, when executed by the at least one processor, causes the apparatus to:

determine an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.

25. A computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by at least one processor, causes performance of:

processing a plurality of first images to generate a plurality of stereo-pairs of

35 panoramic images, wherein each first image is captured by a respective camera of a respective one of a plurality of multi-directional image capture apparatuses, and

- 26 wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses;

performing image re-projection on each panoramic image of the plurality of

5 stereo-pairs of panoramic images, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;

processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and based on the generated positions of the virtual cameras, determining a position

10 of each of the plurality of multi-directional image capture apparatuses.
26. Apparatus comprising:

means for processing a plurality of first images to generate a plurality of stereopairs of panoramic images, wherein each first image is captured by a respective camera

15 of a respective one of a plurality of multi-directional image capture apparatuses, and wherein each stereo-pair of panoramic images is generated from first images captured by cameras of a respective one of the plurality of multi-directional image capture apparatuses;

means for performing image re-projection on each panoramic image of the

20 plurality of stereo-pairs of panoramic images, thereby to generate a plurality of reprojected second images which are each associated with a respective virtual camera;

means for processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and means for determining a position of each of the plurality of multi-directional

25 image capture apparatuses based on the generated positions of the virtual cameras.

Intellectual

Property

Office

Application No: Claims searched: