GB2572795A - Camera registration - Google Patents

Camera registration Download PDF

Info

Publication number
GB2572795A
GB2572795A GB1805978.2A GB201805978A GB2572795A GB 2572795 A GB2572795 A GB 2572795A GB 201805978 A GB201805978 A GB 201805978A GB 2572795 A GB2572795 A GB 2572795A
Authority
GB
United Kingdom
Prior art keywords
camera
definition data
scene definition
scene
registration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1805978.2A
Other versions
GB201805978D0 (en
Inventor
You Yu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Priority to GB1805978.2A priority Critical patent/GB2572795A/en
Publication of GB201805978D0 publication Critical patent/GB201805978D0/en
Publication of GB2572795A publication Critical patent/GB2572795A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/38Registration of image sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19617Surveillance camera constructional details
    • G08B13/19626Surveillance camera constructional details optical details, e.g. lenses, mirrors or multiple lenses
    • G08B13/19628Surveillance camera constructional details optical details, e.g. lenses, mirrors or multiple lenses of wide angled cameras and camera groups, e.g. omni-directional cameras, fish eye, single units having multiple cameras achieving a wide angle view

Abstract

A method and apparatus is provided comprising: receiving scene definition data (by scene parser,12 fig2) 22; converting the scene definition data into camera registration parameters 24; and providing the camera registration parameters to a camera position and/or orientation registration module 26.

Description

Camera Registration
Field
The present specification relates to camera registration. In particular, the specification 5 relates to the provision of camera registration parameters, for example for use in camera pose registration for multi-camera registration.
Background
Camera pose registration is a technique used to determine positions and/or orientations of image camera apparatuses such as cameras. The recent advent of systems involving multiple cameras and applications such as 360 degree camera systems bring new challenges with regard to the performance of camera pose registration.
Summary
In a first aspect, this specification provides an apparatus comprising: means for receiving scene definition data; means for converting the scene definition data into camera registration parameters; and means for providing the camera registration parameters to a camera position and/or orientation registration module.
The camera position and/or orientation registration module may be used to register the position and/or orientation (sometimes referred to herein as the camera pose) of one or more cameras (such as the camera(s) being used to capture a scene). The position and/or orientation of cameras may be used to determine camera locations within a 25 scene, such that, for example, the camera outputs can be used to regenerate a 3D scene.
The means for providing the camera registration parameters may provide camera registration parameters to a plurality of camera position and/or orientation registration modules, wherein each camera position and/or orientation registration module is used 30 to register the position and/or orientation of one or more cameras. Furthermore, a plurality of cameras maybe organised into groups, with each group being associated with a different one of the plurality of camera position and/or orientation registration modules. Thus, the one or more cameras registered by each camera position and/or orientation module maybe the cameras from the respective group.
- 2 The means for converting the scene definition data into camera registration parameters may convert scene definition data in the form of an environment type into feature extractor type information.
The means for converting the scene definition data into camera registration parameters may convert scene definition data in the form of illumination information into preprocessing requirements.
The means for converting the scene definition data into camera registration parameters 10 may convert scene definition data in the form of key objects into mask image information.
The means for converting the scene definition data into camera registration parameters may convert multiple scene definition data elements into one or more camera registration parameters depending on a priority order.
The means for converting the scene definition data into camera registration parameters may comprise a parser.
The means for converting the scene definition data into camera registration parameters may comprise a look-up-table (which look-up-table may, for example, be application specific). The look-up table may be a conversion look-up table.
The means for converting the scene definition data into camera registration parameters 25 and/or the camera position and/or orientation registration module may be cloudbased.
The means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at last one processor, cause the performance of the apparatus.
In a second aspect, this specification describes a method comprising: receiving scene definition data; converting the scene definition data into camera registration parameters; and providing the camera registration parameters to a camera position and/or orientation registration module. The camera position and/or orientation registration module maybe used to register the position and/or orientation (sometimes
-3referred to herein as the camera pose) of one or more cameras (such as the camera(s) being used to capture a scene). The position and/or orientation of cameras may be used to determine camera locations within a scene, such that, for example, the camera outputs can be used to regenerate a 3D scene. The means for converting the scene definition data into camera registration parameters may take many different forms, such as one or more of the forms as described with reference to the first aspect.
In a third aspect, this specification describes an apparatus configured to perform any method as described with reference to the second aspect.
In a fourth aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the second aspect.
In a fifth aspect, this specification describes a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receive scene definition data; convert the scene definition data into camera registration parameters; and provide the camera registration parameters to a camera position and/or orientation registration module.
In a sixth aspect, this specification describes a non-transitory computer readable medium comprising program instructions stored thereon for performing at least the following: receiving scene definition data; converting the scene definition data into camera registration parameters; and providing the camera registration parameters to a 25 camera position and/or orientation registration module.
In a seventh aspect, this specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to: receive scene definition 30 data; convert the scene definition data into camera registration parameters; and provide the camera registration parameters to a camera position and/or orientation registration module.
Brief description of the drawings
Example embodiments will now be described, by way of example only, with reference to the following schematic drawings, in which:
-4FIG. 1 shows a scene being captured by multiple cameras;
FIG. 2 is a block diagram of a system in accordance with an example embodiment;
FIG. 3 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 4 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 5 is a block diagram of a system in accordance with an example embodiment;
FIG. 6 is a block diagram showing an arrangement of cameras in accordance with an io example embodiment;
FIG. 7 is a block diagram of a system in accordance with an example embodiment; and FIGS. 8a and 8b show tangible media, respectively a removable memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to embodiments.
Detailed description
In the description and drawings, like reference numerals may refer to like elements throughout.
FIG. i shows a scene, indicated generally by the reference numeral i, including a house
2. The scene also includes a first camera 4, a second camera 6 and a third camera 8 being used to take one or more images of the house 2. Each of the cameras has a different field of view, thus allowing images of the house 2 to be captured from different perspectives simultaneously.
The term “image” used herein refers generally to visual content captured by cameras (such as the cameras 4, 6 and 8). For example, an image maybe a photograph or a single frame of a video. The term “camera” is used herein to refer to any suitable image capture device (e.g. for generating still and/or moving images).
In the example scenario illustrated in FIG. 1, the plurality of cameras 4,6 and 8 are arranged to capture images of the house 2. In such circumstances, it may be desirable to perform camera pose registration in order to determine the position and/or orientation of each of the cameras. In particular, it maybe desirable to determine these 35 positions and orientations relative to a particular reference coordinate system. This allows the overall arrangement of the cameras 4, 6 and 8 relative to each other to be
-5determined, which maybe useful for a number of functions. For example, such information maybe used for one or more of: performing 3D reconstruction of the captured environment, 3D registration of cameras with respect to other sensors, audio positioning of audio sources, and playback of object-based audio with respect to camera 5 locations. Other uses of such information will be apparent to those skilled in the art.
One way of determining the positions of multi-directional cameras, such as the cameras 4, 6 and 8 is to use hardware sensor approaches like Global Positioning System (GPS) localization. However, GPS only provides position information and does not provide 10 orientation information. One way of determining orientation information is to obtain the orientation information from magnetometers and accelerometers installed in the cameras. However, such instruments maybe susceptible to local disturbance (e.g. magnetometers may be disturbed by a local magnetic field), so the accuracy of orientation information obtained in this way is not necessarily very high. In general, sensor-based approaches typically do not meet the expected accuracy requirement, and extra hardware sensors, like magnetometers, suffer from the same limitation.
Another way of performing camera pose registration is to use a computer vision method. For example, position and orientation information can be obtained by 20 performing structure from motion (SfM) analysis on images captured by a camera.
Broadly speaking, SfM works by determining point correspondences between images (also known as feature matching) and calculating location and orientation based on the determined point correspondences. However, when used on images captured by multiple cameras, SfM analysis may be unreliable due to unreliable determination of 25 point correspondences between images.
FIG. 2 is a block diagram of a system, indicated generally by the reference numeral 10, in accordance with an example embodiment. The system 10 comprises a scene parser 12, a camera pose module 14 and a conversion look-up-table (LUT) 16. The look-up table may define different ways of converting scene properties to parameters optimal to the camera pose module 14.
The scene parser 12 receives scene definition information. The scene definition information is a data structure that may describe the environment and requirements in 35 which images are being captured, such as the environment of the scene 1 described above. With the help of classification technologies, a generic schema can be defined to
-6characterise some constraints of the environment. Such information can be incorporated into a general description (for example, having a data structure format such as JavaScript Object Notation (JSON), YAML, Extensible Markup Language (XML) etc.).
As described in detail below, the schema provided at the input to scene parser 12 can be parsed and converted into a more technical configuration used by the camera pose module 14. The scene parser 12 may make use of the LUT16 or some other mapping for converting the scene definition information into camera registration parameters for use 10 in the camera pose module 14. The LUT 16 (or other mapping module) may be application specific.
The camera pose module 14 maybe used to localize the position and/or orientation of one or more cameras (such as the cameras 4, 6 and 8 used to capture the scene 2). The 15 position and/or orientation of cameras (the so-called camera pose) may be used to determine camera locations within a scene (for example, using principles of structure from motion (SfM) analysis on images captured by a camera, as described above).
These data may be used, for example, to enable the camera outputs can be used to regenerate a 3D scene (such as a 3D version of the scene 2).
FIG. 3 is a flow chart showing an algorithm, indicated generally by the reference numeral 20, in accordance with an example embodiment. The algorithm 20 starts at operation 22, where scene definition information is received by the scene parser 12.
At operation 24, the scene parser 12 converts the scene definition data into camera registration parameters. At operation 26, the camera registration parameters generated in operation 24 are output, for example to the camera pose module 14.
The conversion of the scene definition information into camera registration parameters 30 in the operation 24 described above can assist with reducing instances of unsuccessful camera pose estimation caused, for example, by image feature extractors being operated in conditions that are not optimal for that feature extractor.
The operation 24 by which the scene definition data is converted into camera registration parameters may take many forms. Some embodiments are described below
-7by way of example. Not all of the examples below need be provided in any particular embodiment. Moreover, other examples will be apparent to those skilled in the art.
The scene definition data may be provided in the form of an environment type and may 5 be converted into camera registration parameters in the form of extractor type information. Example scene environment types may include, for example, indoor (or small/close) environments and outdoor (or large/far) environments. The scene environment defined by an environment type constraint within a scene definition may affect the selection of a feature extractor (for example, for use in structure from motion 10 (SfM) analysis). Example feature extractors include Scale invariant feature transform (SIFT) and AKAZE feature extractors that might be selected for indoor and outdoor environments respectively.
Other feature extractor systems could be used. Image feature extractors or descriptors 15 may, for example, be divided into two general categories: high performance (floating point-based) feature extractors, e.g. SIFT, SURF etc. and high efficient (binary format) feature extractors, e.g. AKAZE, ORB, BRIEF, BRISK etc. A given environment type constraint may be translated into one category and within that category one or multiple descriptors and their parameters can be generated. In this case, multiple features and 20 associated feature descriptors may be the output of the feature extraction step and piped altogether to the feature matching step to find the best matches, if the quality is the main concern. Conventional descriptors may be also replaced by machine-learned (e.g. convolutional neural networks, CNN) descriptors. The learned approaches may use sample images that produce the best descriptors. Such learned approach may train 25 image data sets labelled under the same environment constraints and generate neural network models respectively for the best predictions under the same environment constraints.
Alternatively, or in addition, the environment type may have an impact on handling the 30 field of view of the captured images in the case of some extra wide-angle lens models (such as fisheye or omnidirectional lens types). For example, to gain reliable and consistent output, the camera pose module 14 can first rectify and re-project input images to a canonical image type with a pre-configured field-of-view value (e.g. 90 degrees) by applying a homogeneous internal virtual camera model. This processing 35 step can happen particularly in camera models with an extra large field-of-view.
-8The scene definition data may be provided in the form of illumination information and maybe converted into camera registration parameters in the form of pre-processing requirements. A pre-processing step maybe useful under deteriorated lighting conditions in order, for example, to improve the consistency of scene appearance over a 5 range of illumination conditions. Illumination information provided in the scene definition data may be converted into pre-processing requirements based on a mapping stored within the LUT16. Clearly, that mapping could be application-dependent and could readily be modified, if required.
io The pre-processing described above may be applied to input frames and may take the form, for example, of a simple uniform scaling of mean greyscale intensities. Other example pre-processing includes advanced illuminant-invariant colour mapping techniques. The camera registration parameters output by the scene parser 12 maybe dependent on the illumination information provided in the scene definition data as well 15 as knowledge of available colour mapping techniques at the camera pose module 14, for example at a pre-processing step.
The scene definition data may be provided in the form of key objects or a region of interest and maybe converted into camera registration parameters in the form of mask 20 image information. The key objects scene definition data may define one or more semantic object masks and bounding boxes or regions of found objects in images. Example values of such key objects definition data include object labels, for example from known image classifications such as COCO (Common Objects in Context), ImageNet etc. Such object labels maybe useful, for example, for speeding up 25 computation by excluding some less important regions or improving quality in important regions.
The masking definition can also be generated by excluding some objects. For example, in a sport scene, the definition can exclude all players as they may be moving fast and 30 may not be captured properly by all cameras.
On receipt of scene definition data in the form of key objects information (such as labels), the scene parser 12 may generate a mask image per input frame. The mask may be an image having the same size (width and height) as the input frame. Black areas 35 may be provided in the mask corresponding to less important areas of the image, for which descriptors may not be computed.
-9The scene definition data may be provided in the form of a quality of service indication and may be converted into camera registration parameters in the form of threshold values. The quality of service indication may indicate the quality required from the 5 extractor and may also affect the total speed of a pipeline. Example values of quality of service include: ultra, high, medium and low. The quality of service indication may be translated into system threshold values, such as the number of features and matches and the nearest neighbour distance ratio.
To calculate the correspondence between given two images, mathematic feature extraction and matching are two steps that may be implemented by the camera pose module 14. For instance, given the SIFT type of feature extraction as an example, the values of the quality of service can be converted to extraction-specific parameters. Many feature extraction techniques take the original image and generate progressively blurred out images (e.g. Gaussian blur approach), also referred to as multi-scaling approach. In the approach of SIFT, while generating the blurred images, the algorithm may also progressively resize the original image to half size (named “octaves”). The number of octaves and scale (the blurred level) may have a substantial impact on the efficiency and robustness. For instance, the default value maybe 4 octaves and 5 blur levels. The actual values can vary, depending on the values of the quality of service. Like the SIFT extraction approach, other extraction methods have their own specific threshold values. Such values maybe empirical and therefore a table of parameter sets maybe provided to provide optimal results as much as possible for different requirements.
The matching described above may build correspondence for each image pixels among detected features by finding the most similar one in other images. The similar ones may be selected based on their nearest-neighbour distance (e.g. Euclidean space or Hamming distance) matching or being within a threshold distance from each other.
Nearest-neighbour matching results in highly noisy correspondence. An approach to determine good correspondence is the nearest neighbour distance ratio, where the ratio determines the distinctiveness of features by comparing the distance of their two nearest neighbours. A threshold usually ranges from 0.6 to 0.8.
A number of scene definition data formats have been described above. In any particular embodiment, a plurality of scene definition data inputs could be provided. It may be
- 10 necessary to convert multiple scene definition data elements into one or more camera registration parameters depending on a priority order (for example, some scene definition data elements may have a higher priority than other scene definition data elements). This maybe required, for example, in the event of contradictory requirements. The order may be important as the scene parser 12 may read the constraints in the same order of reading. This is not essential to all embodiments, for example the parser implementation may read all constraints as a whole and generate optimal output parameters.
The scene definition may also define one or more logical group of cameras. Such logical groups can be defined by:
1. same camera types; and/or
2. camera physical locations.
Such logical groups may lead to concurrent processing pipeline defined in FIG. 5.
FIG. 4 is a flow chart showing an algorithm, indicated generally by the reference numeral 30, in accordance with an example embodiment.
The algorithm 30 starts at operation 32 and then moves to operation 34 where a scene 20 definition, received in an instance of the operation 22 described above, is read. The scene definition includes scene definition data in the form of an environment type.
At operation 36, the value of the environment type input is determined. If the value is “outdoor” (or large/far environment), then the algorithm 30 moves to operation 38. If 25 the value if “indoor” (or small/close environment), then the algorithm 30 moves to operation 40.
At operation 38, the camera registration parameter is set to indicate that the AKAZE descriptor should be used by the camera pose module 14 and the algorithm moves to 30 operation 42.
At operation 40, the camera registration parameter is set to indicate that the SIFT descriptor should be used by the camera pose module 14 and the algorithm moves to operation 42.
- 11 At operation 42 the scene definition is read again. The scene definition includes scene definition data in the form of a quality type. At operation 44, the value of the quality type input is determined. If the value is “high” or “slow”, then the algorithm 30 moves to operation 46. If the value if “low” or “fast”, then the algorithm 30 moves to operation
48.
At operation 46, the camera registration parameter is set to indicate that the relevant threshold values of the high/slow mode of operation are to be used by the camera pose module 14 and the algorithm moves to operation 50.
At operation 48, the camera registration parameter is set to indicate that the relevant threshold values of the low/fast mode of operation are to be used by the camera pose module 14 and the algorithm moves to operation 50.
At operation 50, the camera registration parameters set as described above are provided to the camera pose module 14 (thereby implementing operation 26 of the algorithm 20 described above). The algorithm 30 then terminates at operation 32.
Thus, for example, if the scene definition data indicates that the scene has an outdoor 20 environment type with a high or slow quality requirement, then the operation 50 outputs camera registration parameters indicating that the AKAZE feature descriptor should be used with threshold values of the high/slow mode of operation. Clearly, the mapping of such parameters can easily be changed if, for example, a new feature description module is made available (for example by changing parameters of the 25 relevant look-up table).
Consider, for example, a scenario in which a sports event is being filmed by multiple cameras. Scene definition data maybe provided to indicate that the scene is an outdoor scene, with high illumination conditions in which a low quality/fast quality of service 30 should be applied (due to the fast moving nature of the sporting event). One or more stationary objects of the sports event (e.g. one or more goals, the ground/stadium, or all regions excluding the moving players) may be provided as key objects, since the positions of those objects can be precisely defined.
On the basis of the scene definition data indicated above, a scene parser can generate suitable camera registration parameters for use in generating camera position and/or
- 12 orientation information. Thus, for example, the AKAZE feature extractor may be used, with illumination pre-processing based on high illumination conditions with threshold values based on a low quality/fast mode of operation.
FIG. 5 is a block diagram of a system, indicated generally by the reference numeral 6o, in accordance with an example embodiment. The system 6o includes a scene parser 62 and a look-up-table (LUT) 64 that are similar to (and may be identical to) the scene parser 12 and LUT 16 described above. The system 60 also includes a camera pose module 66. In common with the system 10 described above, the system 60 receives scene definition information at an input from the scene parser 62 and provides camera pose data at an output of the camera pose module 66.
The camera pose module 66 includes a workflow scheduler 68, a first camera registration workflow module 70, a second camera registration workflow module 72 and a camera pose aggregator module 74. The system 60 can therefore use multiple camera registration modules to convert multiple scene definitions into multiple camera registration parameters simultaneously. The camera pose aggregator module 74 aggregates the camera registration parameters received from the first and second camera registration workflow modules. The aggregator module 74 may, for example, implement a prioritisation of camera registration parameters, if necessary.
FIG. 6 is a block diagram, indicated generally by the reference numeral 80, showing an arrangement of cameras in accordance with an example embodiment. The block diagram 80 includes a first camera 85, a second camera 86, a third camera 87 and a fourth camera 88. As shown in FIG. 6, the cameras are organised into two groups. The first camera 85 and the second camera 86 form a first group of cameras 82. The second camera 86, third camera 87 and fourth camera 88 form a second group of cameras 84.
In the system 60, the first camera registration workflow module 70 may process data from the cameras of the first group of cameras 82 and the second camera registration workflow module 72 may process data from the cameras of the second group of cameras 84.
Clearly, although two camera registration workflow modules are provided in the system 35 60, any number of parallel camera registration workflow modules could be provided in alternative embodiments.
-13At least some of the modules described above may be implemented online (e.g. cloudbased). For example, the scene parsers 12, 62 and/or the camera pose modules 14, 66 may be implemented online. It may be advantageous to implement the scene parsers
12, 62 and/or the camera pose modules 14, 66 in the cloud (or online) in the event that advanced computer vision modules are used. This is not, however, essential in all embodiments.
It may be beneficial to provide a scene definition data structure that includes parameters that enable cloud-based camera registration or localization for selecting camera registration parameters. The scene definition may, for example, be provided by an image capture apparatus to a cloud-based scene parser over a communication network.
For completeness, FIG. 7 is a schematic diagram of components of one or more of the modules described previously (e.g. the scene parser 12 or 62 and/or the camera pose module 14 or 66), which hereafter are referred to generically as processing systems 300. A processing system 300 may have a processor 302, a memory 304 coupled to the processor and comprised of a RAM 314 and ROM 312, and, optionally, user inputs 310 and a display 318. The processing system 300 may comprise one or more network interfaces 308 for connection to a network, e.g. a modem which maybe wired or wireless.
The processor 302 is connected to each of the other components in order to control 25 operation thereof.
The memory 304 may comprise a non-volatile memory, a hard disk drive (HDD) or a solid state drive (SSD). The ROM 312 of the memory 304 stores, amongst other things, an operating system 315 and may store software applications 316. The RAM 314 of the 30 memory 304 is used by the processor 302 for the temporary storage of data. The operating system 315 may contain code which, when executed by the processor, implements aspects of the algorithms 20 or 30.
The processor 302 may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors. A processor may comprise processor circuitry.
-14The processing system 300 may be a standalone computer, a server, a console, or a network thereof.
In some embodiments, the processing system 300 may also be associated with external software applications. These maybe applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications. The processing system 300 may be in communication with the remote server device in order to utilize the software application stored there.
FIG. 8a and FIG. 8b show tangible media, respectively a removable memory unit 365 and a compact disc (CD) 368, storing computer-readable code which when run by a computer may perform methods according to embodiments described above. The 15 removable memory unit 365 may be a memory stick, e.g. a USB memory stick, having internal memory 366 storing the computer-readable code. The memory 366 maybe accessed by a computer system via a connector 367. The CD 368 may be a CD-ROM or a DVD or similar. Other forms of tangible storage media may be used.
Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a “memory” or “computer-readable medium” may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
Reference to, where relevant, “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc.
5should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.
As used in this application, the term “circuitry” refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to 10 portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagrams of FIGS. 3 and 4 are examples only and that 20 various operations depicted therein may be omitted, reordered and/or combined.
It will be appreciated that the above described example embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present 25 specification.
Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present 30 application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.
Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described 35 embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.
-16It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which maybe made without departing from the scope of 5 the present invention as defined in the appended claims.

Claims (15)

Claims
1. An apparatus comprising:
means for receiving scene definition data;
5 means for converting the scene definition data into camera registration parameters; and means for providing the camera registration parameters to a camera position and/or orientation registration module.
io
2. An apparatus as claimed in claim 1, wherein the camera position and/or orientation registration module is used to register the position and/or orientation of one or more cameras.
3. An apparatus as claimed in claim 1 or claim 2, wherein the means for providing
15 the camera registration parameters provides camera registration parameters to a plurality of camera position and/or orientation registration modules, wherein each camera position and/or orientation registration module is used to register the position and/or orientation of one or more cameras.
20
4. An apparatus as claimed in claim 3, wherein a plurality of cameras are organised into groups, each group associated with a different one of the plurality of camera position and/or orientation registration modules.
5. An apparatus as claimed in any one of the preceding claims, wherein the means
25 for converting the scene definition data into camera registration parameters converts scene definition data in the form of an environment type into feature extractor type information.
6. An apparatus as claimed in any one of the preceding claims, wherein the means
30 for converting the scene definition data into camera registration parameters converts scene definition data in the form of illumination information into pre-processing requirements.
7. An apparatus as claimed in any one of the preceding claims, wherein the means
35 for converting the scene definition data into camera registration parameters converts scene definition data in the form of key objects into mask image information.
8. An apparatus as claimed in any one of the preceding claims, wherein the means for converting the scene definition data into camera registration parameters converts multiple scene definition data elements into one or more camera registration
5 parameters depending on a priority order.
9. An apparatus as claimed in any one of the preceding claims, wherein the means for converting the scene definition data into camera registration parameters comprises a parser.
io
10. An apparatus as claimed in any one of the preceding claims, wherein the means for converting the scene definition data into camera registration parameters comprises a look-up-table.
15
ii. An apparatus as claimed in claim io, wherein the look-up-table is application specific.
12. An apparatus as claimed in any one of the preceding claims, wherein: the means for converting the scene definition data into camera registration parameters and/or the
20 camera position and/or orientation registration module is/are cloud-based.
13. An apparatus as claimed in any one of the preceding claims, wherein the means comprise:
at least one processor; and
25 at least one memory including computer program code, the at least one memory and computer program code configured to, with the at last one processor, cause the performance of the apparatus.
14. A method comprising:
30 receiving scene definition data;
converting the scene definition data into camera registration parameters; and providing the camera registration parameters to a camera position and/or orientation registration module.
35
15. A computer readable medium comprising program instructions for causing an apparatus to perform at least the following:
-19receive scene definition data;
convert the scene definition data into camera registration parameters; and provide the camera registration parameters to a camera position and/or orientation registration module.
GB1805978.2A 2018-04-11 2018-04-11 Camera registration Withdrawn GB2572795A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1805978.2A GB2572795A (en) 2018-04-11 2018-04-11 Camera registration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1805978.2A GB2572795A (en) 2018-04-11 2018-04-11 Camera registration

Publications (2)

Publication Number Publication Date
GB201805978D0 GB201805978D0 (en) 2018-05-23
GB2572795A true GB2572795A (en) 2019-10-16

Family

ID=62202810

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1805978.2A Withdrawn GB2572795A (en) 2018-04-11 2018-04-11 Camera registration

Country Status (1)

Country Link
GB (1) GB2572795A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120196679A1 (en) * 2011-01-31 2012-08-02 Microsoft Corporation Real-Time Camera Tracking Using Depth Maps
US20130004060A1 (en) * 2011-06-29 2013-01-03 Matthew Bell Capturing and aligning multiple 3-dimensional scenes
US20130070961A1 (en) * 2010-03-23 2013-03-21 Omid E. Kia System and Method for Providing Temporal-Spatial Registration of Images
EP2575079A2 (en) * 2011-09-29 2013-04-03 The Boeing Company Method and apparatus for processing images
US20140206443A1 (en) * 2013-01-24 2014-07-24 Microsoft Corporation Camera pose estimation for 3d reconstruction
US20160125585A1 (en) * 2014-11-03 2016-05-05 Hanwha Techwin Co., Ltd. Camera system and image registration method thereof
CN107369183A (en) * 2017-07-17 2017-11-21 广东工业大学 Towards the MAR Tracing Registration method and system based on figure optimization SLAM
EP3264740A1 (en) * 2016-06-30 2018-01-03 Nokia Technologies Oy Modular camera blocks for virtual reality capture

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130070961A1 (en) * 2010-03-23 2013-03-21 Omid E. Kia System and Method for Providing Temporal-Spatial Registration of Images
US20120196679A1 (en) * 2011-01-31 2012-08-02 Microsoft Corporation Real-Time Camera Tracking Using Depth Maps
US20130004060A1 (en) * 2011-06-29 2013-01-03 Matthew Bell Capturing and aligning multiple 3-dimensional scenes
EP2575079A2 (en) * 2011-09-29 2013-04-03 The Boeing Company Method and apparatus for processing images
US20140206443A1 (en) * 2013-01-24 2014-07-24 Microsoft Corporation Camera pose estimation for 3d reconstruction
US20160125585A1 (en) * 2014-11-03 2016-05-05 Hanwha Techwin Co., Ltd. Camera system and image registration method thereof
EP3264740A1 (en) * 2016-06-30 2018-01-03 Nokia Technologies Oy Modular camera blocks for virtual reality capture
CN107369183A (en) * 2017-07-17 2017-11-21 广东工业大学 Towards the MAR Tracing Registration method and system based on figure optimization SLAM

Also Published As

Publication number Publication date
GB201805978D0 (en) 2018-05-23

Similar Documents

Publication Publication Date Title
Jiang et al. Self-supervised relative depth learning for urban scene understanding
US10769496B2 (en) Logo detection
Kumar et al. Recent trends in multicue based visual tracking: A review
JP5261501B2 (en) Permanent visual scene and object recognition
JP2018022360A (en) Image analysis device, image analysis method and program
WO2011161579A1 (en) Method, apparatus and computer program product for providing object tracking using template switching and feature adaptation
JP2015079505A (en) Noise identification method and noise identification device of parallax depth image
CN112561978B (en) Training method of depth estimation network, depth estimation method of image and equipment
US20180253852A1 (en) Method and device for locating image edge in natural background
WO2022156533A1 (en) Three-dimensional human body model reconstruction method and apparatus, electronic device, and storage medium
CN110443279B (en) Unmanned aerial vehicle image vehicle detection method based on lightweight neural network
WO2023142602A1 (en) Image processing method and apparatus, and computer-readable storage medium
WO2023109361A1 (en) Video processing method and system, device, medium and product
WO2019100348A1 (en) Image retrieval method and device, and image library generation method and device
Yun et al. Panoramic vision transformer for saliency detection in 360∘ videos
CN111382647B (en) Picture processing method, device, equipment and storage medium
CN104539942A (en) Video shot switching detection method and device based on frame difference cluster
CN108229281B (en) Neural network generation method, face detection device and electronic equipment
CN112883940A (en) Silent in-vivo detection method, silent in-vivo detection device, computer equipment and storage medium
Sunny et al. Map-Reduce based framework for instrument detection in large-scale surgical videos
US9008434B2 (en) Feature extraction device
CN112613373A (en) Image recognition method and device, electronic equipment and computer readable storage medium
JP2015041293A (en) Image recognition device and image recognition method
Li et al. 2.5 D-VoteNet: Depth Map based 3D Object Detection for Real-Time Applications.
Tan et al. 3D detection transformer: Set prediction of objects using point clouds

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)