GB2572795A

GB2572795A - Camera registration

Info

Publication number: GB2572795A
Application number: GB1805978.2A
Authority: GB
Inventors: You Yu
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2018-04-11
Filing date: 2018-04-11
Publication date: 2019-10-16
Also published as: GB201805978D0

Abstract

A method and apparatus is provided comprising: receiving scene definition data (by scene parser,12 fig2) 22; converting the scene definition data into camera registration parameters 24; and providing the camera registration parameters to a camera position and/or orientation registration module 26.

Description

Camera Registration

Field

The present specification relates to camera registration. In particular, the specification 5 relates to the provision of camera registration parameters, for example for use in camera pose registration for multi-camera registration.

Background

Camera pose registration is a technique used to determine positions and/or orientations of image camera apparatuses such as cameras. The recent advent of systems involving multiple cameras and applications such as 360 degree camera systems bring new challenges with regard to the performance of camera pose registration.

Summary

In a first aspect, this specification provides an apparatus comprising: means for receiving scene definition data; means for converting the scene definition data into camera registration parameters; and means for providing the camera registration parameters to a camera position and/or orientation registration module.

The camera position and/or orientation registration module may be used to register the position and/or orientation (sometimes referred to herein as the camera pose) of one or more cameras (such as the camera(s) being used to capture a scene). The position and/or orientation of cameras may be used to determine camera locations within a 25 scene, such that, for example, the camera outputs can be used to regenerate a 3D scene.

The means for providing the camera registration parameters may provide camera registration parameters to a plurality of camera position and/or orientation registration modules, wherein each camera position and/or orientation registration module is used 30 to register the position and/or orientation of one or more cameras. Furthermore, a plurality of cameras maybe organised into groups, with each group being associated with a different one of the plurality of camera position and/or orientation registration modules. Thus, the one or more cameras registered by each camera position and/or orientation module maybe the cameras from the respective group.

- 2 The means for converting the scene definition data into camera registration parameters may convert scene definition data in the form of an environment type into feature extractor type information.

The means for converting the scene definition data into camera registration parameters may convert scene definition data in the form of illumination information into preprocessing requirements.

The means for converting the scene definition data into camera registration parameters 10 may convert scene definition data in the form of key objects into mask image information.

The means for converting the scene definition data into camera registration parameters may convert multiple scene definition data elements into one or more camera registration parameters depending on a priority order.

The means for converting the scene definition data into camera registration parameters may comprise a parser.

The means for converting the scene definition data into camera registration parameters may comprise a look-up-table (which look-up-table may, for example, be application specific). The look-up table may be a conversion look-up table.

The means for converting the scene definition data into camera registration parameters 25 and/or the camera position and/or orientation registration module may be cloudbased.

The means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at last one processor, cause the performance of the apparatus.

In a second aspect, this specification describes a method comprising: receiving scene definition data; converting the scene definition data into camera registration parameters; and providing the camera registration parameters to a camera position and/or orientation registration module. The camera position and/or orientation registration module maybe used to register the position and/or orientation (sometimes

-3referred to herein as the camera pose) of one or more cameras (such as the camera(s) being used to capture a scene). The position and/or orientation of cameras may be used to determine camera locations within a scene, such that, for example, the camera outputs can be used to regenerate a 3D scene. The means for converting the scene definition data into camera registration parameters may take many different forms, such as one or more of the forms as described with reference to the first aspect.

In a third aspect, this specification describes an apparatus configured to perform any method as described with reference to the second aspect.

In a fourth aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the second aspect.

In a fifth aspect, this specification describes a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receive scene definition data; convert the scene definition data into camera registration parameters; and provide the camera registration parameters to a camera position and/or orientation registration module.

In a sixth aspect, this specification describes a non-transitory computer readable medium comprising program instructions stored thereon for performing at least the following: receiving scene definition data; converting the scene definition data into camera registration parameters; and providing the camera registration parameters to a 25 camera position and/or orientation registration module.

In a seventh aspect, this specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to: receive scene definition 30 data; convert the scene definition data into camera registration parameters; and provide the camera registration parameters to a camera position and/or orientation registration module.

Brief description of the drawings

Example embodiments will now be described, by way of example only, with reference to the following schematic drawings, in which:

-4FIG. 1 shows a scene being captured by multiple cameras;

FIG. 2 is a block diagram of a system in accordance with an example embodiment;

FIG. 3 is a flow chart showing an algorithm in accordance with an example embodiment;

FIG. 4 is a flow chart showing an algorithm in accordance with an example embodiment;

FIG. 5 is a block diagram of a system in accordance with an example embodiment;

FIG. 6 is a block diagram showing an arrangement of cameras in accordance with an io example embodiment;

FIG. 7 is a block diagram of a system in accordance with an example embodiment; and FIGS. 8a and 8b show tangible media, respectively a removable memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to embodiments.

Detailed description

In the description and drawings, like reference numerals may refer to like elements throughout.

FIG. i shows a scene, indicated generally by the reference numeral i, including a house

2. The scene also includes a first camera 4, a second camera 6 and a third camera 8 being used to take one or more images of the house 2. Each of the cameras has a different field of view, thus allowing images of the house 2 to be captured from different perspectives simultaneously.

The term “image” used herein refers generally to visual content captured by cameras (such as the cameras 4, 6 and 8). For example, an image maybe a photograph or a single frame of a video. The term “camera” is used herein to refer to any suitable image capture device (e.g. for generating still and/or moving images).

In the example scenario illustrated in FIG. 1, the plurality of cameras 4,6 and 8 are arranged to capture images of the house 2. In such circumstances, it may be desirable to perform camera pose registration in order to determine the position and/or orientation of each of the cameras. In particular, it maybe desirable to determine these 35 positions and orientations relative to a particular reference coordinate system. This allows the overall arrangement of the cameras 4, 6 and 8 relative to each other to be

-5determined, which maybe useful for a number of functions. For example, such information maybe used for one or more of: performing 3D reconstruction of the captured environment, 3D registration of cameras with respect to other sensors, audio positioning of audio sources, and playback of object-based audio with respect to camera 5 locations. Other uses of such information will be apparent to those skilled in the art.

One way of determining the positions of multi-directional cameras, such as the cameras 4, 6 and 8 is to use hardware sensor approaches like Global Positioning System (GPS) localization. However, GPS only provides position information and does not provide 10 orientation information. One way of determining orientation information is to obtain the orientation information from magnetometers and accelerometers installed in the cameras. However, such instruments maybe susceptible to local disturbance (e.g. magnetometers may be disturbed by a local magnetic field), so the accuracy of orientation information obtained in this way is not necessarily very high. In general, sensor-based approaches typically do not meet the expected accuracy requirement, and extra hardware sensors, like magnetometers, suffer from the same limitation.

Another way of performing camera pose registration is to use a computer vision method. For example, position and orientation information can be obtained by 20 performing structure from motion (SfM) analysis on images captured by a camera.

Broadly speaking, SfM works by determining point correspondences between images (also known as feature matching) and calculating location and orientation based on the determined point correspondences. However, when used on images captured by multiple cameras, SfM analysis may be unreliable due to unreliable determination of 25 point correspondences between images.

FIG. 2 is a block diagram of a system, indicated generally by the reference numeral 10, in accordance with an example embodiment. The system 10 comprises a scene parser 12, a camera pose module 14 and a conversion look-up-table (LUT) 16. The look-up table may define different ways of converting scene properties to parameters optimal to the camera pose module 14.

The scene parser 12 receives scene definition information. The scene definition information is a data structure that may describe the environment and requirements in 35 which images are being captured, such as the environment of the scene 1 described above. With the help of classification technologies, a generic schema can be defined to

-6characterise some constraints of the environment. Such information can be incorporated into a general description (for example, having a data structure format such as JavaScript Object Notation (JSON), YAML, Extensible Markup Language (XML) etc.).

As described in detail below, the schema provided at the input to scene parser 12 can be parsed and converted into a more technical configuration used by the camera pose module 14. The scene parser 12 may make use of the LUT16 or some other mapping for converting the scene definition information into camera registration parameters for use 10 in the camera pose module 14. The LUT 16 (or other mapping module) may be application specific.

The camera pose module 14 maybe used to localize the position and/or orientation of one or more cameras (such as the cameras 4, 6 and 8 used to capture the scene 2). The 15 position and/or orientation of cameras (the so-called camera pose) may be used to determine camera locations within a scene (for example, using principles of structure from motion (SfM) analysis on images captured by a camera, as described above).

These data may be used, for example, to enable the camera outputs can be used to regenerate a 3D scene (such as a 3D version of the scene 2).

FIG. 3 is a flow chart showing an algorithm, indicated generally by the reference numeral 20, in accordance with an example embodiment. The algorithm 20 starts at operation 22, where scene definition information is received by the scene parser 12.

At operation 24, the scene parser 12 converts the scene definition data into camera registration parameters. At operation 26, the camera registration parameters generated in operation 24 are output, for example to the camera pose module 14.

The conversion of the scene definition information into camera registration parameters 30 in the operation 24 described above can assist with reducing instances of unsuccessful camera pose estimation caused, for example, by image feature extractors being operated in conditions that are not optimal for that feature extractor.

The operation 24 by which the scene definition data is converted into camera registration parameters may take many forms. Some embodiments are described below

-7by way of example. Not all of the examples below need be provided in any particular embodiment. Moreover, other examples will be apparent to those skilled in the art.

The scene definition data may be provided in the form of an environment type and may 5 be converted into camera registration parameters in the form of extractor type information. Example scene environment types may include, for example, indoor (or small/close) environments and outdoor (or large/far) environments. The scene environment defined by an environment type constraint within a scene definition may affect the selection of a feature extractor (for example, for use in structure from motion 10 (SfM) analysis). Example feature extractors include Scale invariant feature transform (SIFT) and AKAZE feature extractors that might be selected for indoor and outdoor environments respectively.

Other feature extractor systems could be used. Image feature extractors or descriptors 15 may, for example, be divided into two general categories: high performance (floating point-based) feature extractors, e.g. SIFT, SURF etc. and high efficient (binary format) feature extractors, e.g. AKAZE, ORB, BRIEF, BRISK etc. A given environment type constraint may be translated into one category and within that category one or multiple descriptors and their parameters can be generated. In this case, multiple features and 20 associated feature descriptors may be the output of the feature extraction step and piped altogether to the feature matching step to find the best matches, if the quality is the main concern. Conventional descriptors may be also replaced by machine-learned (e.g. convolutional neural networks, CNN) descriptors. The learned approaches may use sample images that produce the best descriptors. Such learned approach may train 25 image data sets labelled under the same environment constraints and generate neural network models respectively for the best predictions under the same environment constraints.

Alternatively, or in addition, the environment type may have an impact on handling the 30 field of view of the captured images in the case of some extra wide-angle lens models (such as fisheye or omnidirectional lens types). For example, to gain reliable and consistent output, the camera pose module 14 can first rectify and re-project input images to a canonical image type with a pre-configured field-of-view value (e.g. 90 degrees) by applying a homogeneous internal virtual camera model. This processing 35 step can happen particularly in camera models with an extra large field-of-view.

-8The scene definition data may be provided in the form of illumination information and maybe converted into camera registration parameters in the form of pre-processing requirements. A pre-processing step maybe useful under deteriorated lighting conditions in order, for example, to improve the consistency of scene appearance over a 5 range of illumination conditions. Illumination information provided in the scene definition data may be converted into pre-processing requirements based on a mapping stored within the LUT16. Clearly, that mapping could be application-dependent and could readily be modified, if required.

io The pre-processing described above may be applied to input frames and may take the form, for example, of a simple uniform scaling of mean greyscale intensities. Other example pre-processing includes advanced illuminant-invariant colour mapping techniques. The camera registration parameters output by the scene parser 12 maybe dependent on the illumination information provided in the scene definition data as well 15 as knowledge of available colour mapping techniques at the camera pose module 14, for example at a pre-processing step.

The scene definition data may be provided in the form of key objects or a region of interest and maybe converted into camera registration parameters in the form of mask 20 image information. The key objects scene definition data may define one or more semantic object masks and bounding boxes or regions of found objects in images. Example values of such key objects definition data include object labels, for example from known image classifications such as COCO (Common Objects in Context), ImageNet etc. Such object labels maybe useful, for example, for speeding up 25 computation by excluding some less important regions or improving quality in important regions.

The masking definition can also be generated by excluding some objects. For example, in a sport scene, the definition can exclude all players as they may be moving fast and 30 may not be captured properly by all cameras.

On receipt of scene definition data in the form of key objects information (such as labels), the scene parser 12 may generate a mask image per input frame. The mask may be an image having the same size (width and height) as the input frame. Black areas 35 may be provided in the mask corresponding to less important areas of the image, for which descriptors may not be computed.

-9The scene definition data may be provided in the form of a quality of service indication and may be converted into camera registration parameters in the form of threshold values. The quality of service indication may indicate the quality required from the 5 extractor and may also affect the total speed of a pipeline. Example values of quality of service include: ultra, high, medium and low. The quality of service indication may be translated into system threshold values, such as the number of features and matches and the nearest neighbour distance ratio.

To calculate the correspondence between given two images, mathematic feature extraction and matching are two steps that may be implemented by the camera pose module 14. For instance, given the SIFT type of feature extraction as an example, the values of the quality of service can be converted to extraction-specific parameters. Many feature extraction techniques take the original image and generate progressively blurred out images (e.g. Gaussian blur approach), also referred to as multi-scaling approach. In the approach of SIFT, while generating the blurred images, the algorithm may also progressively resize the original image to half size (named “octaves”). The number of octaves and scale (the blurred level) may have a substantial impact on the efficiency and robustness. For instance, the default value maybe 4 octaves and 5 blur levels. The actual values can vary, depending on the values of the quality of service. Like the SIFT extraction approach, other extraction methods have their own specific threshold values. Such values maybe empirical and therefore a table of parameter sets maybe provided to provide optimal results as much as possible for different requirements.

The matching described above may build correspondence for each image pixels among detected features by finding the most similar one in other images. The similar ones may be selected based on their nearest-neighbour distance (e.g. Euclidean space or Hamming distance) matching or being within a threshold distance from each other.

Nearest-neighbour matching results in highly noisy correspondence. An approach to determine good correspondence is the nearest neighbour distance ratio, where the ratio determines the distinctiveness of features by comparing the distance of their two nearest neighbours. A threshold usually ranges from 0.6 to 0.8.

A number of scene definition data formats have been described above. In any particular embodiment, a plurality of scene definition data inputs could be provided. It may be

- 10 necessary to convert multiple scene definition data elements into one or more camera registration parameters depending on a priority order (for example, some scene definition data elements may have a higher priority than other scene definition data elements). This maybe required, for example, in the event of contradictory requirements. The order may be important as the scene parser 12 may read the constraints in the same order of reading. This is not essential to all embodiments, for example the parser implementation may read all constraints as a whole and generate optimal output parameters.

The scene definition may also define one or more logical group of cameras. Such logical groups can be defined by:

1. same camera types; and/or

2. camera physical locations.

Such logical groups may lead to concurrent processing pipeline defined in FIG. 5.

FIG. 4 is a flow chart showing an algorithm, indicated generally by the reference numeral 30, in accordance with an example embodiment.

The algorithm 30 starts at operation 32 and then moves to operation 34 where a scene 20 definition, received in an instance of the operation 22 described above, is read. The scene definition includes scene definition data in the form of an environment type.

At operation 36, the value of the environment type input is determined. If the value is “outdoor” (or large/far environment), then the algorithm 30 moves to operation 38. If 25 the value if “indoor” (or small/close environment), then the algorithm 30 moves to operation 40.

At operation 38, the camera registration parameter is set to indicate that the AKAZE descriptor should be used by the camera pose module 14 and the algorithm moves to 30 operation 42.

At operation 40, the camera registration parameter is set to indicate that the SIFT descriptor should be used by the camera pose module 14 and the algorithm moves to operation 42.

- 11 At operation 42 the scene definition is read again. The scene definition includes scene definition data in the form of a quality type. At operation 44, the value of the quality type input is determined. If the value is “high” or “slow”, then the algorithm 30 moves to operation 46. If the value if “low” or “fast”, then the algorithm 30 moves to operation

48.

At operation 46, the camera registration parameter is set to indicate that the relevant threshold values of the high/slow mode of operation are to be used by the camera pose module 14 and the algorithm moves to operation 50.

At operation 48, the camera registration parameter is set to indicate that the relevant threshold values of the low/fast mode of operation are to be used by the camera pose module 14 and the algorithm moves to operation 50.

At operation 50, the camera registration parameters set as described above are provided to the camera pose module 14 (thereby implementing operation 26 of the algorithm 20 described above). The algorithm 30 then terminates at operation 32.

Thus, for example, if the scene definition data indicates that the scene has an outdoor 20 environment type with a high or slow quality requirement, then the operation 50 outputs camera registration parameters indicating that the AKAZE feature descriptor should be used with threshold values of the high/slow mode of operation. Clearly, the mapping of such parameters can easily be changed if, for example, a new feature description module is made available (for example by changing parameters of the 25 relevant look-up table).

Consider, for example, a scenario in which a sports event is being filmed by multiple cameras. Scene definition data maybe provided to indicate that the scene is an outdoor scene, with high illumination conditions in which a low quality/fast quality of service 30 should be applied (due to the fast moving nature of the sporting event). One or more stationary objects of the sports event (e.g. one or more goals, the ground/stadium, or all regions excluding the moving players) may be provided as key objects, since the positions of those objects can be precisely defined.

On the basis of the scene definition data indicated above, a scene parser can generate suitable camera registration parameters for use in generating camera position and/or

- 12 orientation information. Thus, for example, the AKAZE feature extractor may be used, with illumination pre-processing based on high illumination conditions with threshold values based on a low quality/fast mode of operation.

FIG. 5 is a block diagram of a system, indicated generally by the reference numeral 6o, in accordance with an example embodiment. The system 6o includes a scene parser 62 and a look-up-table (LUT) 64 that are similar to (and may be identical to) the scene parser 12 and LUT 16 described above. The system 60 also includes a camera pose module 66. In common with the system 10 described above, the system 60 receives scene definition information at an input from the scene parser 62 and provides camera pose data at an output of the camera pose module 66.

The camera pose module 66 includes a workflow scheduler 68, a first camera registration workflow module 70, a second camera registration workflow module 72 and a camera pose aggregator module 74. The system 60 can therefore use multiple camera registration modules to convert multiple scene definitions into multiple camera registration parameters simultaneously. The camera pose aggregator module 74 aggregates the camera registration parameters received from the first and second camera registration workflow modules. The aggregator module 74 may, for example, implement a prioritisation of camera registration parameters, if necessary.

FIG. 6 is a block diagram, indicated generally by the reference numeral 80, showing an arrangement of cameras in accordance with an example embodiment. The block diagram 80 includes a first camera 85, a second camera 86, a third camera 87 and a fourth camera 88. As shown in FIG. 6, the cameras are organised into two groups. The first camera 85 and the second camera 86 form a first group of cameras 82. The second camera 86, third camera 87 and fourth camera 88 form a second group of cameras 84.

In the system 60, the first camera registration workflow module 70 may process data from the cameras of the first group of cameras 82 and the second camera registration workflow module 72 may process data from the cameras of the second group of cameras 84.

Clearly, although two camera registration workflow modules are provided in the system 35 60, any number of parallel camera registration workflow modules could be provided in alternative embodiments.

-13At least some of the modules described above may be implemented online (e.g. cloudbased). For example, the scene parsers 12, 62 and/or the camera pose modules 14, 66 may be implemented online. It may be advantageous to implement the scene parsers

12, 62 and/or the camera pose modules 14, 66 in the cloud (or online) in the event that advanced computer vision modules are used. This is not, however, essential in all embodiments.

It may be beneficial to provide a scene definition data structure that includes parameters that enable cloud-based camera registration or localization for selecting camera registration parameters. The scene definition may, for example, be provided by an image capture apparatus to a cloud-based scene parser over a communication network.

For completeness, FIG. 7 is a schematic diagram of components of one or more of the modules described previously (e.g. the scene parser 12 or 62 and/or the camera pose module 14 or 66), which hereafter are referred to generically as processing systems 300. A processing system 300 may have a processor 302, a memory 304 coupled to the processor and comprised of a RAM 314 and ROM 312, and, optionally, user inputs 310 and a display 318. The processing system 300 may comprise one or more network interfaces 308 for connection to a network, e.g. a modem which maybe wired or wireless.

The processor 302 is connected to each of the other components in order to control 25 operation thereof.

The memory 304 may comprise a non-volatile memory, a hard disk drive (HDD) or a solid state drive (SSD). The ROM 312 of the memory 304 stores, amongst other things, an operating system 315 and may store software applications 316. The RAM 314 of the 30 memory 304 is used by the processor 302 for the temporary storage of data. The operating system 315 may contain code which, when executed by the processor, implements aspects of the algorithms 20 or 30.

The processor 302 may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors. A processor may comprise processor circuitry.

-14The processing system 300 may be a standalone computer, a server, a console, or a network thereof.

In some embodiments, the processing system 300 may also be associated with external software applications. These maybe applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications. The processing system 300 may be in communication with the remote server device in order to utilize the software application stored there.

FIG. 8a and FIG. 8b show tangible media, respectively a removable memory unit 365 and a compact disc (CD) 368, storing computer-readable code which when run by a computer may perform methods according to embodiments described above. The 15 removable memory unit 365 may be a memory stick, e.g. a USB memory stick, having internal memory 366 storing the computer-readable code. The memory 366 maybe accessed by a computer system via a connector 367. The CD 368 may be a CD-ROM or a DVD or similar. Other forms of tangible storage media may be used.

Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a “memory” or “computer-readable medium” may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

Reference to, where relevant, “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc.

-ι₅should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.

As used in this application, the term “circuitry” refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to 10 portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagrams of FIGS. 3 and 4 are examples only and that 20 various operations depicted therein may be omitted, reordered and/or combined.

It will be appreciated that the above described example embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present 25 specification.

Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present 30 application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.

Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described 35 embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.

-16It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which maybe made without departing from the scope of 5 the present invention as defined in the appended claims.

Claims

1. An apparatus comprising:

means for receiving scene definition data;

5 means for converting the scene definition data into camera registration parameters; and means for providing the camera registration parameters to a camera position and/or orientation registration module.

io

2. An apparatus as claimed in claim 1, wherein the camera position and/or orientation registration module is used to register the position and/or orientation of one or more cameras.

3. An apparatus as claimed in claim 1 or claim 2, wherein the means for providing

15 the camera registration parameters provides camera registration parameters to a plurality of camera position and/or orientation registration modules, wherein each camera position and/or orientation registration module is used to register the position and/or orientation of one or more cameras.

20

4. An apparatus as claimed in claim 3, wherein a plurality of cameras are organised into groups, each group associated with a different one of the plurality of camera position and/or orientation registration modules.

5. An apparatus as claimed in any one of the preceding claims, wherein the means

25 for converting the scene definition data into camera registration parameters converts scene definition data in the form of an environment type into feature extractor type information.

6. An apparatus as claimed in any one of the preceding claims, wherein the means

30 for converting the scene definition data into camera registration parameters converts scene definition data in the form of illumination information into pre-processing requirements.

7. An apparatus as claimed in any one of the preceding claims, wherein the means

35 for converting the scene definition data into camera registration parameters converts scene definition data in the form of key objects into mask image information.

8. An apparatus as claimed in any one of the preceding claims, wherein the means for converting the scene definition data into camera registration parameters converts multiple scene definition data elements into one or more camera registration

5 parameters depending on a priority order.

9. An apparatus as claimed in any one of the preceding claims, wherein the means for converting the scene definition data into camera registration parameters comprises a parser.

io

10. An apparatus as claimed in any one of the preceding claims, wherein the means for converting the scene definition data into camera registration parameters comprises a look-up-table.

15

ii. An apparatus as claimed in claim io, wherein the look-up-table is application specific.

12. An apparatus as claimed in any one of the preceding claims, wherein: the means for converting the scene definition data into camera registration parameters and/or the

20 camera position and/or orientation registration module is/are cloud-based.

13. An apparatus as claimed in any one of the preceding claims, wherein the means comprise:

at least one processor; and

25 at least one memory including computer program code, the at least one memory and computer program code configured to, with the at last one processor, cause the performance of the apparatus.

14. A method comprising:

30 receiving scene definition data;

converting the scene definition data into camera registration parameters; and providing the camera registration parameters to a camera position and/or orientation registration module.

35

15. A computer readable medium comprising program instructions for causing an apparatus to perform at least the following:

-19receive scene definition data;

convert the scene definition data into camera registration parameters; and provide the camera registration parameters to a camera position and/or orientation registration module.