GB2557212A

GB2557212A - Methods and apparatuses for determining positions of multi-directional image capture apparatuses

Info

Publication number: GB2557212A
Application number: GB1620312.7A
Authority: GB
Inventors: Wang Tinghuai; You Yu; Fan Lixin; Tapio Roimela Kimmo
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2016-11-30
Filing date: 2016-11-30
Publication date: 2018-06-20
Also published as: GB201620312D0; WO2018100230A1

Abstract

A method that comprises performing image re-projection on each of a plurality of first images 20, wherein each first image is captured by a camera 11 of a respective one of a plurality of multidirectional image capture apparatuses 10, thereby to generate a plurality of re-projected second images 22 which are each associated with a respective virtual camera; processing the plurality of second images to generate respective positions of the virtual cameras (31, figure 3A) associated with the second images; and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses (33, figure 3C). Apparatus to perform the method comprises at least one processor and at least one memory including computer program code. An algorithm is used to process the second re-projected images to determine respective positions of the virtual cameras associated with the second images. The processing may be performed by using a structure from motion (SFM) algorithm to determine the position and orientation of each of the virtual cameras.

Description

(54) Title of the Invention: Methods and apparatuses for determining positions of multi-directional image capture apparatuses

Abstract Title: Determining Positions of Multi- Directional Image Capture Apparatuses (57) A method that comprises performing image re-projection on each of a plurality of first images 20, wherein each first image is captured by a camera 11 of a respective one of a plurality of multidirectional image capture apparatuses 10, thereby to generate a plurality of re-projected second images 22 which are each associated with a respective virtual camera; processing the plurality of second images to generate respective positions of the virtual cameras (31, figure 3A) associated with the second images; and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multidirectional image capture apparatuses (33, figure 3C). Apparatus to perform the method comprises at least one processor and at least one memory including computer program code. An algorithm is used to process the second re-projected images to determine respective positions of the virtual cameras associated with the second images. The processing may be performed by using a structure from motion (SFM) algorithm to determine the position and orientation of each of the virtual cameras.

22	22	22

22	22	22

22	22	22

1/6

FIGURE 1

2/6

	22	22

22	22

22	22	22

FIGURE 2

3/6

FIGURE 3B

FIGURE 3C

4/6

FIGURE 4

5/6

S5.1

S5.2

S5.3

S5.4

S5.5

S5.6

S5.7

S5.8

FIGURE 5

6/6

FIGURES

FIGURE?

- 1 Methods and Apparatuses for Determining Positions of MultiDirectional Image Capture Apparatuses

Technical Field

The present specification relates to methods and apparatuses for determining positions of multi-directional image capture apparatuses.

Background

Camera pose registration is an important technique used to determine positions and orientations of image capture apparatuses such as cameras. The recent advent of commercial multi-directional image capture apparatuses, such as 360° camera systems, brings new challenges with regard to the performance of camera pose registration in a reliable, accurate and efficient manner.

Summary

According to a first aspect, this specification describes a method comprising performing image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.

A plurality of second images may be generated from each first image.

Each of the second images may have a different viewing direction compared to each of the other second images.

The first images may be fisheye images.

The second images maybe rectilinear images.

The processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.

The determination of a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras may comprise determining a position of each of the cameras of each of the plurality of multidirectional image capture apparatuses based on the generated positions of the virtual cameras, and determining the positions of each of the plurality of multi-directional image capture apparatuses based on the determined positions of the cameras.

The determination of a position of each of the cameras of each of the plurality of multidirectional image capture apparatuses based on the generated positions of the virtual cameras may comprise determining outliers and inliers in the generated positions of the virtual cameras, and determining the positions of each of the cameras based only on the inliers.

The processing of the plurality of second images may generate respective orientations of the virtual cameras, and the method may further comprise determining an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.

The determination of an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras may comprise determining an orientation of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras, determining the orientation of each of the plurality of multi-directional image capture apparatuses based on the determined orientations of the cameras.

The position of each of the plurality of multi-directional image capture apparatuses may be determined based on both the generated positions and the generated orientations of the virtual cameras.

The method may further comprise determining a pixel to real world distance conversion factor based on the determined positions of the cameras.

-3The method may further comprise determining an up-vector of each of the multidirectional image capture apparatuses based on the determined positions of the cameras.

The up-vector may be determined by determining two respective vectors between the position of one of the cameras and the positions of two other cameras, and determining the cross product of the two vectors.

According to a second aspect, this specification describes apparatus configured to 10 perform any method described with reference to the first aspect.

According to a third aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method described with reference to the first aspect.

According to a fourth aspect, this specification describes apparatus comprising at least one processor, and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to: perform image reprojection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, process the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and based on the generated positions of the virtual cameras, determine a position of each of the plurality of multi-directional image capture apparatuses.

A plurality of second images may be generated from each first image.

The first images may be fisheye images.

The second images maybe rectilinear images.

-4The processing of the plurality of second images to generate respective positions of the virtual cameras may comprise processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.

The processing of the plurality of second images may generate respective orientations of the virtual cameras, and the computer program code, when executed by the at least one processor may cause the apparatus to determine an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.

The determination of an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras may comprise determining an orientation of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras, and determining the orientation of each of the plurality of multidirectional image capture apparatuses based on the determined orientations of the cameras.

-5The computer program code, when executed by the at least one processor, may cause the apparatus to determine a pixel to real world distance conversion factor based on the determined positions of the cameras.

The computer program code, when executed by the at least one processor, may cause the apparatus to determine an up-vector of each of the multi-directional image capture apparatuses based on the determined positions of the cameras.

The up-vector may be determined by determining two respective vectors between the 10 position of one of the cameras and the positions of two other cameras, and determining the cross product of the two vectors.

According to a fifth aspect, this specification describes a computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by at least one processor, causes performance of performing image reprojection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and determining a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras.

The computer-readable code stored on the medium of the fifth aspect may further cause performance of any of the operations described with reference to the method of the first aspect.

According to a sixth aspect, this specification describes apparatus comprising means for performing image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera, means for processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images, and means for determining a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras.

-6The apparatus of the sixth aspect may further comprise means for causing performance of any of the operations described with reference to the method of the first aspect.

Brief Description of the Drawings

For a more complete understanding of the methods, apparatuses and computerreadable instructions described herein, reference is now made to the following descriptions taken in connection with the accompanying drawings, in which:

Figure 1 illustrates an example of multiple multi-directional image capture apparatuses in an environment;

Figure 2 illustrates an example of processing of an image captured by a multidirectional image capture apparatus to generate re-projected images;

Figures 3A to 3C illustrate the determination of the position and orientation of a multidirectional image capture apparatus relative to a reference coordinate system;

Figure 4 illustrates an example of the determination of an up-vector of a multi15 directional image capture apparatus;

Figure 5 is a flowchart illustrating examples of various operations described herein. Figure 6 is a schematic diagram of an example configuration of computing apparatus configured to perform various operations described herein.

Figure 7 illustrates an example of a computer-readable storage medium with computer readable instructions stored thereon.

Detailed Description

In the description and drawings, like reference numerals may refer to like elements throughout.

Figure 1 illustrates a plurality of multi-directional image capture apparatuses 10 located within an environment. The multi-directional image capture apparatuses 10 may, in general, be any apparatus capable of capturing images of the scene 13 from multiple different perspectives simultaneously. For example, multi-directional image capture apparatus 10 may be a 360° camera system (also known as an omnidirectional camera system or a spherical camera system). However, it will be appreciated that multidirectional image capture apparatus 10 does not necessarily have to have full angular coverage of its surroundings and may only cover a smaller field of view.

-ΊThe term “image” used herein refers generally to visual content captured by multidirectional image capture apparatus to. For example, an image may be a photograph or a single frame of a video.

As illustrated in Figure 1, each multi-directional image capture apparatus to may comprise a plurality of cameras 11. The term “camera” used herein may refer to a subpart of a multi-directional image capture apparatus to which performs the capturing of images. As illustrated, each of the plurality of cameras 11 of multi-directional image capture apparatus to may be facing a different direction to each of the other cameras 11 of the multi-directional image capture apparatus 10. As such, each camera n of a multidirectional image capture apparatus to may have a different field of view, thus allowing the multi-directional image capture apparatus to to capture images of a scene 13 from different perspectives simultaneously.

Similarly, as illustrated in Figure 1, each multi-directional image capture apparatus 10 may be at a different location to each of the other multi-directional image capture apparatuses 10. Thus, each of the plurality of multi-directional image capture apparatuses 10 may capture images of the environment (via their cameras 11) from different perspectives simultaneously.

In the example scenario illustrated in Figure 1, a plurality of multi-directional image capture apparatuses 10 are arranged to capture images of a particular scene 13 within the environment. In such circumstances, it may be desirable to perform camera pose registration in order to determine the position and orientation of each of the multi25 directional image capture apparatuses 10. In particular, it may be desirable to determine these positions and orientations relative to a particular reference coordinate system. This allows the overall arrangement of the multi-directional image capture apparatuses 10 relative to each other to be determined, which may be useful for a number of functions. For example, such information may be used for any of:

performing 3D reconstruction of the captured environment, 3D registration of multidirectional image capture apparatuses 10 with respect to other sensors such as LiDAR (Light Detection and Ranging) or infrared (IR) depth sensors, audio positioning of audio sources, playback of object-based audio with respect to multi-directional image capture apparatus 10 location, and presenting multi-directional image capture apparatuses positions as ‘hotspots’ to which a viewer can switch during virtual reality (VR) viewing.

-8One way of determining the positions of multi-directional image capture apparatuses 10 is to use Global Positioning System (GPS) localization. However, GPS only provides position information and does not provide orientation information. One way of determining orientation information is to obtain the orientation information from magnetometers and accelerometers installed in the multi-directional image capture apparatuses io. However, such instruments may be susceptible to local disturbance (e.g. magnetometers maybe disturbed by a local magnetic field), so the accuracy of orientation information obtained in this way is not necessarily very high.

io Another way of performing camera pose registration is to use a computer vision method. For example, position and orientation information can be obtained by performing structure from motion (SfM) analysis on images captured by a multidirectional image capture apparatus io. Broadly speaking, SfM works by determining point correspondences between images (also known as feature matching) and calculating location and orientation based on the determined point correspondences. However, when used on images captured by multiple multi-directional image capture apparatuses io, SfM analysis may be unreliable due to unreliable determination of point correspondences between images.

A computer vision method for performing camera pose registration which may address some or all of the challenges mentioned above will now be described.

Figure 2 illustrates one of the plurality of multi-directional image capture apparatuses io of Figure l. A camera n of the multi-directional image capture apparatus io may capture a first image 21. The first image 21 may be an image of a scene within the field of view 20 of the camera 11. In some examples, the lens of the camera 11 may be a fisheye lens and so the first image 21 may be a fish-eye image (in which the camera field of view is enlarged). However, the method described herein maybe applicable for use with lenses and resulting images of other types. More specifically, the camera pose registration method described herein may also be applicable to images captured by a camera with a hyperbolic mirror in which the camera optical centre coincides with the focus of the hyperbola and images captured by a camera with a parabolic mirror and an orthographic lens in which all reflected rays are parallel to the mirror axis and the orthographic lens is used to provide a focused image.

-9The first image 21 maybe processed to generate one or more second images 22. More specifically, image re-projection maybe performed on the first image 21 to generate one or more re-projected second images 22. For example, if the first image 21 is not a rectilinear image (e.g. a fish-eye image), it maybe re-projected to generate one or more second images 22 which are rectilinear images (as illustrated by Figure 2). The type of re-projection may be dependent on the algorithm used to analyse the second images. For instance, as is explained below, a structure from motion algorithm, which are typically used to analyse rectilinear images, may be used, in which case the reprojection may be selected so as to generate rectilinear images. However, it will be appreciated that, in general, the re-projection may generate any type of second image, as long as the image type is compatible with the algorithm used to analyse the reprojected images.

Each re-projected second image 22 maybe associated with a respective virtual camera.

A virtual camera is an imaginary camera which does not physically exist, but which corresponds to a camera which would have captured the re-projected second image 22 with which it is associated. A virtual camera is defined by virtual camera parameters which represent the configuration of the virtual camera required in order to have captured to the second image 22. As such, for the purposes of the methods and operations described herein, a virtual camera can be treated as a real physical camera. For example, each virtual camera has, among other virtual camera parameters, a position and orientation which can be determined.

When a plurality of re-projected second images 22 are generated (e.g. Figure 2 illustrates nine re-projected second images 22 being generated), each re-projected second image 22 may have a different viewing direction compared to each of the other second images 22. In other words, the virtual camera of each second image 22 may have a different orientation compared to each of the other virtual cameras. Similarly, the orientation of each of the virtual cameras may also be different to the orientation of the real camera 11 which captured the first image 21. Furthermore, each virtual camera may have a smaller field of view than the real camera 11 as a result of the re-projection. The virtual cameras may have overlapping fields of view with each other.

The orientations of the virtual cameras may be pre-set. In other words, the re35 projection of the first image 21 may generate second images 22 with associated virtual cameras which each have a certain pre-set orientation relative to the orientation of the

- 10 real camera n. For example, the orientation of each virtual camera maybe pre-set such that it has certain yaw, pitch and roll angles relative to the real camera n.

It will be appreciated that, in general, any number of second images 22 may be 5 generated. Generally speaking, generating more second images 22 leads to less distortion in each of the second images 22, but may also increase computational complexity. The precise number of second images maybe chosen based on the scene/environment being captured by the multi-directional image capture apparatus 10.

The re-projection process described with reference to Figure 2 maybe performed for a plurality of first images 21 respectively captured by a plurality of cameras 11 of the multi-directional image capture apparatus 10. Furthermore, the same process maybe performed for each of a plurality of multi-directional image capture apparatuses 10 which are capturing the same general environment, e.g. the plurality of multidirectional images capture apparatuses 10 as illustrated in Figure 1. In this way, all of the first images 21 captured by a plurality of multi-directional image capture apparatuses 10 of a particular scene maybe processed as described above.

It will be appreciated that the first images 21 may correspond to images of a scene at a particular moment in time. For example, if the plurality of multi-directional image capture apparatuses 10 are capturing video images, a first image 21 may correspond to a single video frame of a single camera 11, and all of the first images 21 maybe video frames that are captured at the same moment in time.

Figures 3A to 3C illustrate the process of determining the positions and orientations of a multi-directional image capture apparatus 10. In Figures 3A to 3C, each arrow 31,32, 33 represents the position and orientation of a particular element in a reference coordinate system 30. The base of the arrow represents the position and the direction of the arrow represents the orientation. More specifically, each arrow 31 in Figure 3A represents the position and orientation of a virtual camera associated with a respective second image, each arrow 32 in Figure 3B represents the position and orientation of a real camera 11 (determined based on the positions and orientations of the re-projected second images 22 derived from the first image 21 captured by the real camera), and the arrow 33 in Figure 3C represents the position and orientation of the multi-directional image capture apparatus 10.

- 11 After generating the one or more second images, the one or more second images are processed to generate respective positions of the virtual cameras associated with the second images, the generated positions being relative to the reference coordinate system 30. The processing of the one or more second images may also generate respective orientations of the virtual cameras relative to the reference coordinate system 30. The processing may involve processing a plurality of the second images generated from first images captured by a plurality of different multi-directional image capture apparatuses 10.

It will be appreciated that, in order to perform the processing for a plurality of multidirectional image capture apparatuses 10, it maybe necessary for the multi-directional image capture apparatuses 10 to have at least partially overlapping fields of view with each other (for example, in order to allow point correspondence determination as described below).

The output of the processing for one multi-directional image capture apparatus is illustrated by Figure 3A. As shown, each cluster 34 of arrows 31 in Figure 3A represents the virtual cameras corresponding to a single first image of a single real camera.

The above described processing may be performed by using a structure from motion (SfM) algorithm to determine the position and orientation of each of the virtual cameras. The SfM algorithm may operate by determining point correspondences between various ones of the second images and determining the positions and orientations of the virtual cameras based on the determined point correspondences.

For example, the determined point correspondences may impose certain geometric constraints on the positions and orientations of the virtual cameras, which can be used to solve a set of quadratic equations to determine the positons and orientations of the virtual cameras relative to the reference coordinate system 30. More specifically, in some examples, the SfM process may involve any one of or any combination of the following operations: extracting images features, matching image features, estimating camera position, reconstructing 3D points, and performing bundle adjustment.

Once the positions of the virtual cameras have been determined, the position of each of the real cameras 11 relative to the reference coordinate system 30 may be determined based on the determined positions of the virtual cameras. Similarly, once the

- 12 orientations of the virtual cameras have been determined, the orientation of each of the real cameras 11 relative to the reference coordinate system 30 may be determined based on the determined orientations of the virtual cameras. For example, the position of each real camera may be determined by averaging the positions of the virtual cameras corresponding to the real camera. Similarly, the orientation of each real camera may be determined by averaging the orientation of the virtual cameras corresponding to the real camera. In other words, referring to Figures 3A to 3C, each cluster of arrows 34 in Figure 3A may be averaged to obtain a corresponding arrow 32 in Figure 3B.

The above described determination of the positions of each of the real cameras 11 may further involve determining outliers and inliers in the generated positions of the virtual cameras and determining the positions of each of the real cameras 11 based only on the inliers. For example, the above mentioned averaging may involve only averaging the inlier positions. This may improve the accuracy of the determined positions of the real cameras 11.

The inlier and outlier determination may be performed according to:

where Cvirtuai is the set of the positions of the virtual cameras, di is a measure of the difference between the position of a virtual camera and the median position of all of the virtual cameras, d_o is the median absolute deviation (MAD), m is a threshold value below which a determined virtual camera position is considered an inlier (for example, m may be set to be 2).

It will therefore be understood from the above expressions that a virtual camera may be determined to be an inlier if the difference between its position and the median position of all of the virtual cameras divided by the median absolute deviation is less than a threshold value. That is to say, for a virtual camera to be considered an inlier, the difference between its position and the median position of all of the virtual cameras

-13must be less than a threshold number of times larger than the median absolute deviation.

The orientation of each real camera may be determined in the following way. The orientation of each virtual camera maybe represented by a rotation matrix Rv. Similarly the orientation of each real camera 11 relative to the reference coordinate system 30 may be represented by a rotation matrix Ri. The orientation of each virtual camera relative to its corresponding real camera 11 may be known as this may be pre-set (as described above with reference to Figure 2), and may be represented by rotation matrix R_vi. Thus, the rotation matrix of each virtual camera may be used to obtain a rotation matrix for the real camera 11 according to:

Put another way, the rotation matrix of a real camera (Ri) may be determined by multiplying the rotation matrix of a virtual camera (Rv) onto the inverse of the rotation matrix representing the orientation of the virtual camera relative to the orientation of the real camera (Rvi¹).

For example, if there are nine virtual cameras corresponding to each real camera (as illustrated in Figure 3) then nine rotation matrices are obtained for the orientation of each real camera 11. Each of these rotation matrices may then be converted into corresponding Euler angles to obtain a set of Euler angles for each real camera 11. The set of Euler angles may then be averaged according to:

arctan

Where θι represents the averaged Euler angles for a real camera and represents the set of Euler angles. Put another way, the averaged Euler angles are determined by calculating the sum of the sines of the set of Euler angles divided by the sum of the cosines of the set of Euler angles, and taking the arctangent of the ratio, θι may then be converted back into a rotation matrix representing the final determined orientation of real camera 11.

-14It will be appreciated that the above formula is for the specific example in which there are nine virtual cameras per real camera 11 - the maximum value of i may vary according to the number of virtual cameras generated.

In some examples, unit quaternions may be used instead of Euler angles for the abovementioned process. The use of unit quaternions to represent orientation is a known mathematical technique and will not be described in detail here. Briefly, quaternions q_lt q₂,... qN corresponding to the virtual camera rotation matrices maybe determined. Then, the quaternions may be transformed, as necessary, to ensure that they are all on the same side of the 4D hypersphere. Specifically, one representative quaternion q_M is selected and the signs of any quaternions qi where the product of qM and qi is less than zero may be inverted. Then, all quaternions qi (as 4D vectors) may be summed into an average quaternion q_A, and q_A may be normalised into a unit quaternion q_A. The unit quaternion q_A may represent the averaged orientation of the camera and may be converted back to other orientation representations as desired. Using unit quaternions to represent orientation maybe more numerically stable than Euler angles.

Once the orientation of each real camera 11 of a multi-directional image capture apparatus 10 is known, the orientation of the multi-directional image capture apparatus 10 relative to the reference coordinate system 30 maybe determined in the following way. The orientation of the multi-directional image capture apparatus 10 may be represented by rotation matrix Rdev- The orientation of each real camera 11 relative to its corresponding multi-directional image capture apparatus 10 may be known, and may be represented by rotation matrix Ridev- Thus, the rotation matrices Ri of the real cameras 11 may be used to obtain a rotation matrix for multi-directional image capture apparatus 10 the according to:

Put another way, the rotation matrix of a multi-direction image capture apparatus (Rdev) can be determined by multiplying the rotation matrix of a real camera (Ri) onto the inverse of the matrix representing the orientation of the real camera relative to the orientation of the multi-directional image capture apparatus (Ridev'¹)35

-ι₅For example, if there are six real cameras n corresponding to the multi-directional image capture apparatus to (as illustrated in Figure 3) then six rotation matrices are obtained for the orientation of the multi-directional image capture apparatus 10. Each of these rotation matrices may then be converted into corresponding Euler angles to obtain a set of Euler angles for the multi-directional image capture apparatus 10. The set of Euler angles may then be averaged and converted into a final rotation matrix representing the orientation of the multi-directional image capture apparatus 10. This may be done using the same process as described above, with corresponding equations. Similarly, as above, unit quaternions maybe used instead of Euler angles.

The position of the multi-directional image capture apparatus 10 may be determined in the following way. The position of each real camera 11 relative to its corresponding multi-directional image capture apparatus 10 maybe known, and maybe represented by vector videv However, videv is relative to a local coordinate system of the multi15 directional image capture apparatus. To obtain the position of each real camera 11 relative to its corresponding multi-directional image capture apparatus 10 (relative the reference coordinate system 30), ίλη, may be rotated according to:

Videv

Where R/etds the final rotation matrix of the multi-directional image capture apparatus 10 as determined above, and v^widev is a vector representing the position of each real camera 11 relative to the multi-directional image capture apparatus 10 relative the reference coordinate system 30. As such, the position of a real camera 11 relative to its corresponding multi-directional image capture apparatus (relative the reference coordinate system 30) may be determined by multiplying the final rotation matrix of the multi-directional image capture apparatus 10 onto the position of the real camera relative to the multi-directional image capture apparatus in the local coordinate system of the multi-directional image capture apparatus.

Therefore, the position of the multi-directional image capture apparatus 10 maybe determined according to:

-ι6Where Ci represents the position vector of each of the real cameras 11 as determined above, v^widev represents the position of each real camera 11 relative to the multidirectional image capture apparatus as determined above, and Cdev is a set of position vectors of the multi-directional image capture apparatus to in the reference coordinate system. Put another way, a position of the multi-directional image capture apparatus to may be determined by taking the difference between the position vector of a real camera 11 and the position vector of the real camera relative to the multi-directional image capture apparatus.

The same inlier and outlier determination and averaging process as described above may then be applied to G/_ei,to obtain a final position for the multi-directional image capture apparatus to, except substituting the determined positions of the virtual cameras for the set of positions of the multi-directional image capture apparatus to.

In examples in which only one second image 22 is generated for each first image 21, and thus only one virtual camera’s position and/or orientation is determined, the position of the real camera 11 may simply be determined to be the position of the one virtual camera, and the, the orientation of the real camera 11 may simply be determined to be the orientation of the one virtual camera.

Once the positions of the real cameras 11 in the reference coordinate system 30 have been determined, a pixel to real world distance conversion factor maybe determined. This may be performed by determining the distance between a pair of real cameras 11 on a multi-directional image capture apparatus 10 in both pixels and in a real world distance (e.g. metres). The pixel distance may be determined from the determined positions of the real cameras 11 in the reference coordinate system. The real world distance may be known already from known physical parameters of the multidirectional image capture apparatus 10. The pixel to real world distance conversion factor may then be simply calculated by taking the ratio of the two distances. This may be further refined by calculating the factor based on multiple different pairs of real cameras 11 of the multi-directional image capture apparatus 10, determining outliers and inliers (for example, in the same way as described above), and averaging the inliers to obtain a final pixel to real world distance conversion factor. The pixel to real world distance conversion factor may be denoted S_Pixei2meter in the present specification.

-17In addition, an up-vector of each of the multi-directional image capture apparatuses 10 may also be determined based on the determined positions of the real cameras 11. As illustrated in Figure 4, this may be performed by determining two vectors Vi and V₂between the position of one of the real cameras 11 and the positions of two other real cameras 11. As such, the up-vector may be determined based on the positions of a group of three real cameras 11. The up-vector may be determined by determining the crossproduct of Vi and V₂ in accordance with the right hand rule. As illustrated in Figure 4,

V₃ is the result of the cross product of Vi and V₂ and represents the direction of the upvector. V₃ may be normalised to obtain a unit vector representing the up-vector.

It will be appreciated that the up-vector of a multi-directional image capture apparatus 10 maybe defined based on a group of real cameras 11 of the multi-directional image capture apparatus 10 which are, in normal use, intended to be in a plane that is perpendicular to gravity. As such, the up-vector may be another representation of the orientation of the multi-directional image capture apparatus 10. Further, if it is assumed that the multi-directional image capture apparatus 10 is placed in an orientation in which the plane of the cameras in the group is actually perpendicular to gravity, the up-vector may correspond to the real world up-vector (the vector opposite in direction to the local gravity vector). The up-vector may provide further information which can be used in 3D reconstruction of the captured environment. In some instances, the reference coordinate system discussed herein may not correspond exactly with the real world (for instance, the “up” direction in the reference coordinate system may not correspond with “up” direction in the real world). As such, assuming that the multi-directional image capture apparatuses were/are being used in a level orientation, the calculated up-vector may allow a 3D reconstruction of the captured environment to be aligned with the real world (e.g. by ensuring that the up-vector is pointing in an up direction in the 3D reconstruction).

As above, a set of up-vectors may be determined for each multi-directional image capture apparatus 10 based on determining Vi, V₂ and V₃ for a plurality of different groups of three cameras. Then outliers and inliers maybe determined (in the same way as above, except substituting the determined positions of the virtual cameras for the set of determined up-vectors) and a final up-vector may be determined based only on the inliers (e.g. by averaging the inliers). Once the up-vector is determined, it may be rotated to align with a known local gravity vector (which is, for instance, determined using an accelerometer forming part of, or otherwise co-located with, the multi-18directional image capture apparatus to) to determine the real world up-vector in the reference coordinate system 30 (if it is not already aligned).

Once final positions for a plurality of multi-directional image capture apparatuses 10 5 has been determined, the relative positions of the plurality of multi-directional image capture apparatuses maybe determined according to:

Λ*.'

In the above equation, represents the relative positions of one of the plurality of multi-directional image capture apparatuses (apparatus j) relative to another one of the plurality of multi-directional image capture apparatuses (apparatus i). O'dev is the position of apparatus j and c'dev is the position of apparatus i. Sp^meter is the pixel to real world distance conversion factor.

As will be understood from the above expression, a vector representing the relative position of one of the plurality of multi-directional image capture apparatuses relative to another one of the plurality of multi-directional image capture apparatuses may be determined by taking the difference between their positions. This maybe divided by the pixel-to-real world distance conversion factor depending on the scale desired.

As such, the positions of all of the multi-directional image capture apparatuses 10 relative to one another may be determined in the reference coordinate system 30.

Figure 5 is a flowchart showing examples of operations as described herein.

At operation 5.1, a plurality of first images 21 which are captured by a plurality of multidirectional image capture apparatuses 10 maybe received. For example, image data corresponding to the first images 21 maybe received at computing apparatus 60 (see Figure 6).

-19At operation 5.2, image re-projection maybe performed on each of the first images 21 to obtain one or more re-projected second images 22 corresponding to respective virtual cameras.

At operation 5.3, the second images 22 may be processed to obtain positions and orientations of the virtual cameras. For example, the second images 22 maybe processed using a structure from motion algorithm.

At operation 5.4, positions and orientations of real cameras 11 maybe determined 10 based on the positions and orientations of the virtual cameras determined at operation

5-3At operation 5.5, a pixel-to-real world distance conversion factor may be determined based on the positions of the real cameras 11 determined at operation 5.4.

At operation 5.6, an up-vector of each multi-directional image capture apparatus 10 may be determined based on the positions of the real cameras 11 determined at operation 5.4.

At operation 5.7, positions and orientations of the plurality of multi-directional image capture apparatuses 10 may be determined based on the positions and orientations of the real cameras 11 determined at operation 5.4.

At operation 5.8, positions of the plurality of multi-directional image capture apparatuses 10 relative to each other may be determined based on the positions of the plurality of multi-directional image capture apparatuses 10 determined at operation

5-7It will be appreciated that the position of a real camera 11 as described herein may be the position of the centre of a lens of the real camera 11. Similarly, the position of a virtual camera may be the position of the centre of a virtual lens of the virtual camera. The position of the multi-directional image capture apparatus 10 may be the centre of the multi-directional image capture apparatus (e.g. if a multi-directional image capture apparatus is spherically shaped, its position may be defined as the geometric centre of the sphere).

- 20 Figure 6 is a schematic block diagram of an example configuration of computing apparatus 60, which maybe configured to perform any of or any combination of the operations described herein. The computing apparatus 60 may comprise memory 6l, processing circuitry 62, an input 63, and an output 64.

The processing circuitry 62 may be of any suitable composition and may include one or more processors 62A of any suitable type or suitable combination of types. For example, the processing circuitry 62 maybe a programmable processor that interprets computer program instructions and processes data. The processing circuitry 62 may include plural programmable processors. Alternatively, the processing circuitry 62 may be, for example, programmable hardware with embedded firmware. The processing circuitry 62 may be termed processing means. The processing circuitry 62 may alternatively or additionally include one or more Application Specific Integrated Circuits (ASICs). In some instances, processing circuitry 62 may be referred to as computing apparatus.

The processing circuitry 62 described with reference to Figure 6 is coupled to the memory 61 (or one or more storage devices) and is operable to read/write data to/from the memory. The memory 61 may store thereon computer readable instructions 612A which, when executed by the processing circuitry 62, may cause any one of or any combination of the operations described herein to be performed. The memory 61 may comprise a single memoiy unit or a plurality of memory units upon which the computer-readable instructions (or code) 612A is stored. For example, the memory 61 may comprise both volatile memory 611 and non-volatile memory 612. For example, the computer readable instructions 612A may be stored in the non-volatile memory 612 and may be executed by the processing circuitry 62 using the volatile memory 611 for temporary storage of data or data and instructions. Examples of volatile memory include RAM, DRAM, and SDRAM etc. Examples of non-volatile memory include ROM, PROM, EEPROM, flash memory, optical storage, magnetic storage, etc. The memories 61 in general may be referred to as non-transitory computer readable memory media.

The input 63 maybe configured to receive image data representing the first images 21 described herein. The image data may be received, for instance, from the multi35 directional image capture apparatuses 10 themselves or may be received from a storage device. The output may be configured to output any of or any combination of the

- 21 camera pose registration information described herein. As discussed above, the camera pose registration information output by the computing apparatus 60 may be used for various functions as described above with reference to Figure 1.

Figure 7 illustrates an example of a computer-readable medium 70 with computerreadable instructions (code) stored thereon. The computer-readable instructions (code), when executed by a processor, may cause any one of or any combination of the operations described above to be performed.

Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a “memory” or “computer-readable medium” may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

Reference to, where relevant, “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.

As used in this application, the term “circuitry” refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and

- 22 memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

If desired, the different functions discussed herein maybe performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagram of Figure 5 is an example only and that various operations depicted therein may be omitted, reordered and/or combined. For example, it will be appreciated that operations S5.5 and S5.6 as illustrated in Figure 5 maybe omitted.

Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.

It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims.

Claims

Claims

1. A method comprising:

performing image re-projection on each of a plurality of first images, wherein 5 each first image is captured by a camera of a respective one of a plurality of multidirectional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;

processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and io based on the generated positions of the virtual cameras, determining a position of each of the plurality of multi-directional image capture apparatuses.
2. The method of claim l, wherein a plurality of second images are generated from each first image.
3. The method of claim l or claim 2, wherein each of the second images has a different viewing direction compared to each of the other second images.
4. The method of any one of the preceding claims, wherein the first images are

20 fisheye images.
5. The method of any one of the preceding claims, wherein the second images are rectilinear images.

25
6. The method of any one of the preceding claims, wherein the processing of the plurality of second images to generate respective positions of the virtual cameras comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.

30
7. The method of any one of the preceding claims, wherein the determination of a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras comprises:

determining a position of each of the cameras of each of the plurality of multidirectional image capture apparatuses based on the generated positions of the virtual

35 cameras; and

-24determining the positions of each of the plurality of multi-directional image capture apparatuses based on the determined positions of the cameras.
8. The method of claim 7, wherein the determination of a position of each of the 5 cameras of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras comprises:

determining outliers and inliers in the generated positions of the virtual cameras; and determining the positions of each of the cameras based only on the inliers.
9. The method of any one of the preceding claims, wherein the processing of the plurality of second images generates respective orientations of the virtual cameras, and the method further comprises:

based on the generated orientations of the virtual cameras, determining an

15 orientation of each of the plurality of multi-directional image capture apparatuses.
10. The method of claim 9, wherein the determination of an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras comprises:

20 determining an orientation of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras; and determining the orientation of each of the plurality of multi-directional image capture apparatuses based on the determined orientations of the cameras.
11. The method of claim 9 or claim 10, wherein the position of each of the plurality of multi-directional image capture apparatuses is determined based on both the generated positions and the generated orientations of the virtual cameras.

30
12. The method of any one of claims 7 to 11, further comprising:

determining a pixel to real world distance conversion factor based on the determined positions of the cameras.
13. The method of any one of claims 7 to 12, further comprising:

35 determining an up-vector of each of the multi-directional image capture apparatuses based on the determined positions of the cameras.

-2514· The method of claim 13, wherein the up-vector is determined by:

determining two respective vectors between the position of one of the cameras and the positions of two other cameras; and 5 determining the cross product of the two vectors.

15. Apparatus configured to perform a method according to any one of claims 1 to
14·

10 16. Computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform a method according to any one of claims 1 to 1417. Apparatus comprising:
15 at least one processor; and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to:

perform image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi-directional

20 image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;

process the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and based on the generated positions of the virtual cameras, determine a position of

25 each of the plurality of multi-directional image capture apparatuses.
18. The apparatus of claim 17, wherein a plurality of second images are generated from each first image.

30
19. The apparatus of claim 17, wherein each of the second images has a different viewing direction compared to each of the other second images.
20. The apparatus of claim 17, wherein the first images are fisheye images.

35
21.

The apparatus of claim 17, wherein the second images are rectilinear images.

- 26
22. The apparatus of claim 17, wherein the processing of the plurality of second images to generate respective positions of the virtual cameras comprises processing the second images using a structure from motion algorithm to generate the positions of the virtual cameras.
23. The apparatus of claim 17, wherein the determination of a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras comprises:

determining a position of each of the cameras of each of the plurality of multi10 directional image capture apparatuses based on the generated positions of the virtual cameras; and determining the positions of each of the plurality of multi-directional image capture apparatuses based on the determined positions of the cameras.

15
24. The apparatus of claim 23, wherein the determination of a position of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras comprises:

determining outliers and inliers in the generated positions of the virtual cameras; and

20 determining the positions of each of the cameras based only on the inliers.
25. The apparatus of claim 17, wherein the processing of the plurality of second images generates respective orientations of the virtual cameras, and the computer program code, when executed by the at least one processor, causes the apparatus to:

25 determine an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras.
26. The apparatus of claim 25, wherein the determination of an orientation of each of the plurality of multi-directional image capture apparatuses based on the generated
30 orientations of the virtual cameras comprises:

determining an orientation of each of the cameras of each of the plurality of multi-directional image capture apparatuses based on the generated orientations of the virtual cameras; and determining the orientation of each of the plurality of multi-directional image

35 capture apparatuses based on the determined orientations of the cameras.

-τη2η. The apparatus of claim 25, wherein the position of each of the plurality of multidirectional image capture apparatuses is determined based on both the generated positions and the generated orientations of the virtual cameras.

5 28. The apparatus of claim 23, wherein the computer program code, when executed by the at least one processor, causes the apparatus to:

determine a pixel to real world distance conversion factor based on the determined positions of the cameras.

10 29. The apparatus of claim 23, wherein the computer program code, when executed by the at least one processor, causes the apparatus to:

determine an up-vector of each of the multi-directional image capture apparatuses based on the determined positions of the cameras.

15 30. The apparatus of claim 29, wherein the up-vector is determined by:

determining two respective vectors between the position of one of the cameras and the positions of two other cameras; and determining the cross product of the two vectors.

20
31. A computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by at least one processor, causes performance of:

performing image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of multi25 directional image capture apparatuses, thereby to generate a plurality of re-projected second images which are each associated with a respective virtual camera;

processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and based on the generated positions of the virtual cameras, determining a position

30 of each of the plurality of multi-directional image capture apparatuses.
32. Apparatus comprising:

means for performing image re-projection on each of a plurality of first images, wherein each first image is captured by a camera of a respective one of a plurality of
35 multi-directional image capture apparatuses, thereby to generate a plurality of reprojected second images which are each associated with a respective virtual camera;

- 28 means for processing the plurality of second images to generate respective positions of the virtual cameras associated with the second images; and means for determining a position of each of the plurality of multi-directional image capture apparatuses based on the generated positions of the virtual cameras.

Intellectual

Property

Office

GB1620312.7

1 - 32