AU2017265068A1

AU2017265068A1 - Setup of multiple cameras

Info

Publication number: AU2017265068A1
Application number: AU2017265068A
Authority: AU
Inventors: James Austin Besley; Cameron Murray Edwards; Steven Richard Irrgang
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-11-22
Filing date: 2017-11-22
Publication date: 2019-06-06

Abstract

Abstract SETUP OF MULTIPLE CAMERAS A method of setting up a field of view of a plurality of cameras capturing a scene. A first image of the scene is captured by a first camera in the plurality of cameras overlapping with a 5 second image of the scene being captured by a second camera in the plurality of cameras. A set of first camera characteristics of the first camera in the plurality of cameras and a three dimensional structure information for the scene is received, the first camera being configured according to a predetermined configuration. A desired view for the second camera in the plurality of cameras is determined based on the received set of first camera characteristics of 10 the first camera, a desired set of second camera characteristics of the second camera, the first image of the scene captured by the first camera and the three-dimensional structure information for the scene. The field of view of the second camera is set up, the field of view of the second camera being selected based on an alignment of the determined desired view of the second camera with the second image of the scene captured by the second camera, where the second 15 image of the scene overlaps with the first image. 13921986v1 (P280090_Speci_As Filed) -1/10 Start Receive images Transform image Align images Determine corrective camera transform Adjust camera End Fig. 1 13921906v1 (P280090_FigsAs Filed)

Description

SETUP OF MULTIPLE CAMERAS

TECHNICAL FIELD

The present invention relates generally to the field of image processing and, in particular to image alignment for the purpose of automatic camera setup for a network of calibrated cameras.

The present invention also relates to a method and apparatus for setting up a field of view of a plurality of cameras capturing a scene, and to a computer program product including a computer readable medium having recorded thereon a computer program for setting up a field of view of a plurality of cameras capturing a scene.

BACKGROUND

As image processing technology continues to improve, increasingly complex applications become possible. A number of these applications require large numbers of cameras to be set up in particular positions.

One example application area is sporting events, and other live events such as music performances or ceremonies. Technology is expanding the methods by which such events are viewed and recorded. Free Viewpoint Video (FVV) is an example, in which a user may interactively generate views of an event from a continuum of viewing angles. If there are insufficient cameras, then some details of the event may be missed due to occlusions from different angles, and the absence of these details then becomes visible when the viewpoint is changed. Multiple views of each point in the scene are desirable in order to triangulate the depth reliably. Additional cameras may also be required to capture important details like players faces in high resolution.

Another application area is three-dimensional (3D) scanning and motion capture. Photogrammetry is an increasingly competitive method of generating 3D models of real world objects. Some systems use a fixed structure to hold the cameras in known positions, and move the object of interest inside the structure. However, for some types of objects, such as large or difficult to move objects, it may be more practical to set up a camera system around the object.

In each of the above applications, ensuring good coverage of the objects of interest by the camera system requires planning. The positions and fields of view of the cameras may be simulated in advance in order to ensure adequate performance during the event or capture.

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017

Desirable camera positions may also be determined manually, automatically, or semiautomatically based on the simulations. Before the event or capture, it is then desirable to set up the cameras in the pre-determined positions and fields of view. For a large number of cameras, the setup can be time consuming. In some cases, time may also be constrained by other factors relating to the event or access to the object or venue in which images are to be captured.

SUMMARY

It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.

According to one aspect of the present disclosure, there is provided a method of setting up the 10 field of view of a second camera, using the image from a first, already set up camera and depth information in a scene to generate an expected view for the second camera, and image alignment to align the view of the second camera to the expected view.

According to another aspect of the present disclosure, there is provided a method of setting up a field of view of a plurality of cameras capturing a scene, a first image of the scene being captured by a first camera in the plurality of cameras overlapping with a second image of the scene being captured by a second camera in the plurality of cameras, the method comprising:

receiving a set of first camera characteristics of the first camera in the plurality of cameras and a three-dimensional structure information for the scene, the first camera being configured according to a predetermined configuration;

determining a desired view for the second camera in the plurality of cameras based on the received set of first camera characteristics of the first camera, a desired set of second camera characteristics of the second camera, the first image of the scene captured by the first camera and the three-dimensional structure information for the scene; and setting up the field of view of the second camera, the field of view of the second camera being 25 selected based on an alignment of the determined desired view of the second camera with the second image of the scene captured by the second camera, wherein the second image of the scene overlaps with the first image.

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017

According to another aspect of the present disclosure, there is provided an apparatus for setting up a field of view of a plurality of cameras capturing a scene, a first image of the scene being captured by a first camera in the plurality of cameras overlapping with a second image of the scene being captured by a second camera in the plurality of cameras, the apparatus comprising:

means for receiving a set of first camera characteristics of the first camera in the plurality of cameras and a three-dimensional structure information for the scene, the first camera being configured according to a predetermined configuration;

means for determining a desired view for the second camera in the plurality of cameras based on the received set of first camera characteristics of the first camera, a desired set of second camera characteristics of the second camera, the first image of the scene captured by the first camera and the three-dimensional structure information for the scene; and means for setting up the field of view of the second camera, the field of view of the second camera being selected based on an alignment of the determined desired view of the second camera with the second image of the scene captured by the second camera, wherein the second 15 image of the scene overlaps with the first image.

According to still another aspect of the present disclosure, there is provided a system for setting up a field of view of a plurality of cameras capturing a scene, a first image of the scene being captured by a first camera in the plurality of cameras overlapping with a second image of the scene being captured by a second camera in the plurality of cameras, the system comprising:

a memory for storing data and a computer program;

a processor coupled to the memory for executing the computer program, the computer program comprising instructions for:

determining a desired view for the second camera in the plurality of cameras based on the received set of first camera characteristics of the first camera, a desired set of second

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 camera characteristics of the second camera, the first image of the scene captured by the first camera and the three-dimensional structure information for the scene; and setting up the field of view of the second camera, the field of view of the second camera being selected based on an alignment of the determined desired view of the second camera with the second image of the scene captured by the second camera, wherein the second image of the scene overlaps with the first image.

According to still another aspect of the present disclosure, there is provided a non-transitory computer readable medium having a computer program stored on the medium for setting up a field of view of a plurality of cameras capturing a scene, a first image of the scene being 10 captured by a first camera in the plurality of cameras overlapping with a second image of the scene being captured by a second camera in the plurality of cameras, the program comprising:

code for receiving a set of first camera characteristics of the first camera in the plurality of cameras and a three-dimensional structure information for the scene, the first camera being configured according to a predetermined configuration;

code for determining a desired view for the second camera in the plurality of cameras based on the received set of first camera characteristics of the first camera, a desired set of second camera characteristics of the second camera, the first image of the scene captured by the first camera and the three-dimensional structure information for the scene; and code for setting up the field of view of the second camera, the field of view of the second 20 camera being selected based on an alignment of the determined desired view of the second camera with the second image of the scene captured by the second camera, wherein the second image of the scene overlaps with the first image.

Other aspects are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS

One or more embodiments of the invention will now be described with reference to the following drawings, in which:

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017

Fig. 1 is a schematic flow diagram showing a method of setting up a camera, as executed in the method of Fig. 2;

Fig. 2 is a schematic flow diagram showing a method of setting up a field of view of a plurality of cameras, as executed in the method of Fig. 3;

Fig. 3 is a schematic flow diagram showing a method of setting up and capturing an event;

Fig. 4 is a schematic flow diagram showing a method of constructing a 3D model of an object;

Fig. 5 is a view of a model of a stadium and some modelled camera positions and fields of view;

Fig. 6 shows a view from a camera showing possible framings to capture regions of interest;

Fig. 7A shows an image from a first camera;

Fig. 7B shows an image from a second camera;

Fig. 8A shows a warp map which maps the image of Fig. 7A to the image of Fig. 7B using a ground plane assumption;

Fig. 8B shows the result of applying the warp map of Fig. 8A to the image of Fig. 7B; and

Figs. 9A and 9B form a schematic block diagram of a general purpose computer system upon which arrangements described can be practiced.

DETAILED DESCRIPTION INCLUDING BEST MODE

Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the 20 purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

Figs. 9A and 9B depict a general-purpose computer system 900, upon which the various arrangements described can be practiced.

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017

As seen in Fig. 9A, the computer system 900 includes: a computer module 901; input devices such as a keyboard 902, a mouse pointer device 903, a scanner 926, a camera 927, and a microphone 980; and output devices including a printer 915, a display device 914 and loudspeakers 917. An external Modulator-Demodulator (Modem) transceiver device 916 may be used by the computer module 901 for communicating to and from a communications network 920 via a connection 921. The communications network 920 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 921 is a telephone line, the modem 916 may be a traditional “dial-up” modem. Alternatively, where the connection 921 is a high capacity (e.g., cable) connection, the modem 916 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 920.

The computer module 901 typically includes at least one processor unit 905, and a memory unit 906. For example, the memory unit 906 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 901 also includes 15 an number of input/output (I/O) interfaces including: an audio-video interface 907 that couples to the video display 914, loudspeakers 917 and microphone 980; an FO interface 913 that couples to the keyboard 902, mouse 903, scanner 926, camera 927 and optionally a joystick or other human interface device (not illustrated); and an interface 908 for the external modem 916 and printer 915. In some implementations, the modem 916 may be incorporated within the 20 computer module 901, for example within the interface 908. The computer module 901 also has a local network interface 911, which permits coupling of the computer system 900 via a connection 923 to a local-area communications network 922, known as a Local Area Network (LAN). As illustrated in Fig. 9A, the local communications network 922 may also couple to the wide network 920 via a connection 924, which would typically include a so-called “firewall” 25 device or device of similar functionality. The local network interface 911 may comprise an

Ethernet circuit card, a Bluetooth® wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 911.

The I/O interfaces 908 and 913 may afford either or both of serial and parallel connectivity, the 30 former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 909 are provided and typically include a hard disk drive (HDD) 910. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 912 is

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g., CD-ROM, DVD, Blu-ray Disc™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 900.

The components 905 to 913 of the computer module 901 typically communicate via an interconnected bus 904 and in a manner that results in a conventional mode of operation of the computer system 900 known to those in the relevant art. For example, the processor 905 is coupled to the system bus 904 using a connection 918. Likewise, the memory 906 and optical disk drive 912 are coupled to the system bus 904 by connections 919. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun 10 Sparcstations, Apple Mac™ or like computer systems.

Methods described below may be implemented using the computer system 900 wherein the processes of Figs. 1 to 8B, to be described, may be implemented as one or more software application programs 933 executable within the computer system 900. In particular, the steps of the described methods are effected by instructions 931 (see Fig. 9B) in the software 933 that 15 are carried out within the computer system 900. The software instructions 931 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the described methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

The software may be stored in a computer readable medium, including the storage devices described below, for example. The software 933 is typically stored in the HDD 910 or the memory 906. The software is loaded into the computer system 900 from the computer readable medium, and then executed by the computer system 900. Thus, for example, the software 933 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 925 that is read by the optical disk drive 912. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 900 preferably effects an advantageous apparatus for implementing the described methods.

In some instances, the application programs 933 may be supplied to the user encoded on one or 30 more CD-ROMs 925 and read via the corresponding drive 912, or alternatively may be read by the user from the networks 920 or 922. Still further, the software can also be loaded into the computer system 900 from other computer readable media. Computer readable storage media

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 900 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray™ Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card 5 such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 901. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 901 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets 10 including e-mail transmissions and information recorded on Websites and the like.

The second part of the application programs 933 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 914. Through manipulation of typically the keyboard 902 and the mouse 903, a user of the computer system 900 and the 15 application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 917 and user voice commands input via the microphone 980.

Fig. 9B is a detailed schematic block diagram of the processor 905 and a “memory” 934. The memory 934 represents a logical aggregation of all the memory modules (including the HDD 909 and semiconductor memory 906) that can be accessed by the computer module 901 in Fig. 9A.

When the computer module 901 is initially powered up, a power-on self-test (POST) program 950 executes. The POST program 950 is typically stored in a ROM 949 of the semiconductor memory 906 of Fig. 9A. A hardware device such as the ROM 949 storing software is sometimes referred to as firmware. The POST program 950 examines hardware within the computer module 901 to ensure proper functioning and typically checks the processor 905, the memory 934 (909, 906), and a basic input-output systems software (BIOS) 30 module 951, also typically stored in the ROM 949, for correct operation. Once the POST program 950 has run successfully, the BIOS 951 activates the hard disk drive 910 of Fig. 9A. Activation of the hard disk drive 910 causes a bootstrap loader program 952 that is resident on

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 the hard disk drive 910 to execute via the processor 905. This loads an operating system 953 into the RAM memory 906, upon which the operating system 953 commences operation. The operating system 953 is a system level application, executable by the processor 905, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

The operating system 953 manages the memory 934 (909, 906) to ensure that each process or application running on the computer module 901 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 900 of Fig. 9A must be used properly so that each process can 10 run effectively. Accordingly, the aggregated memory 934 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 900 and how such is used.

As shown in Fig. 9B, the processor 905 includes a number of functional modules including a control unit 939, an arithmetic logic unit (ALU) 940, and a local or internal memory 948, 15 sometimes called a cache memory. The cache memory 948 typically includes a number of storage registers 944 - 946 in a register section. One or more internal busses 941 functionally interconnect these functional modules. The processor 905 typically also has one or more interfaces 942 for communicating with external devices via the system bus 904, using a connection 918. The memory 934 is coupled to the bus 904 using a connection 919.

The application program 933 includes a sequence of instructions 931 that may include conditional branch and loop instructions. The program 933 may also include data 932 which is used in execution of the program 933. The instructions 931 and the data 932 are stored in memory locations 928, 929, 930 and 935, 936, 937, respectively. Depending upon the relative size of the instructions 931 and the memory locations 928-930, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 930. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 928 and 929.

In general, the processor 905 is given a set of instructions which are executed therein. The processor 905 waits for a subsequent input, to which the processor 905 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 902, 903, data received

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 from an external source across one of the networks 920, 902, data retrieved from one of the storage devices 906, 909 or data retrieved from a storage medium 925 inserted into the corresponding reader 912, all depicted in Fig. 9A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 934.

The described methods use input variables 954, which are stored in the memory 934 in corresponding memory locations 955, 956, 957. The described methods produce output variables 961, which are stored in the memory 934 in corresponding memory locations 962, 963, 964. Intermediate variables 958 may be stored in memory locations 959, 960, 966 and 967.

Referring to the processor 905 of Fig. 9B, the registers 944, 945, 946, the arithmetic logic unit (ALU) 940, and the control unit 939 work together to perform sequences of micro-operations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 933. Each fetch, decode, and execute cycle comprises:

a fetch operation, which fetches or reads an instruction 931 from a memory location 928, 929, 930;

a decode operation in which the control unit 939 determines which instruction has been fetched; and an execute operation in which the control unit 939 and/or the ALU 940 execute the instruction.

Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 939 stores or writes a value to a memory location 932.

Each step or sub-process in the processes of Figs. 1, 2, 3 and 4 may be associated with one or more segments of the program 933 and is performed by the register section 944, 945, 947, the ALU 940, and the control unit 939 in the processor 905 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 933.

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017

The described methods may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of the described methods. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.

Fig. 3 shows a method 300 of setting up and capturing a sporting event at a stadium. One or more steps of the method 300 may be implemented as one or more software code modules of the software application program 933 resident in the hard disk drive and being controlled in its execution by the processor 905.

In creating step 310, a model of the stadium is created. The model may be a virtual model comprising a file storing a mesh of points. The model may be created manually, or based on architectural models of a venue, or based on a physical scan of the stadium if access is given and time permits. Step 310 may be implemented under execution of the processor 905.

In simulating step 320, a simulation of the event to be captured is created, under execution of the processor 905. For a sporting event the simulation may consist of modelled players, in a static position or even animated.

In determining step 330, based on the simulated event created at step 320, desired camera configurations may be determined, under execution of the processor 905, such that there is sufficient coverage and overlap between views to reconstruct important details of the event.

The desired camera configuration defines characteristics of the camera, such as the position and field of view of the camera. If processing, such as free viewpoint video for example, is intended, then the free viewpoint video may be tested on the simulated camera positions. Fig. 5 shows an example of a stadium 532 along with desired camera configurations (i.e., camera positions and camera fields of view) 501-528, in two dimensions. The cameras 501-528 have been placed to focus on one of two key regions 530 and 531. Fig. 6 shows a view from the location of one camera 526. Box 620 shows the desired field of view of the camera 526, designed so as to cover the key region of interest 530. The modelled camera settings are selected to achieve the desired field of view 620. Box 610 shows an alternative field of view from where the camera 526 is set up to view region 531.

Having determined suitable camera positions and fields of view, at a time before the start of the event, in setting step 340, the cameras are set up according to the positions and fields of view.

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017

A method 200 of setting up a field of view of a plurality of cameras, as executed at step 340, will be described in more detail below with reference to Fig. 2.

Having set up the cameras at step 340, at capturing step 350, the sporting event is captured using the cameras. Other steps following step 350 will depend on the specific intended application.

The method 200 of setting up the field of view of the plurality of cameras, according to a desired camera configuration (i.e., camera position and field of view), will now be described with reference to Fig. 2. As described, the desired camera configuration defines characteristics of the camera, such as the position and field of view of the camera. One or more 10 steps of the method 200 may be implemented as one or more software code modules of the software application program 933 resident in the hard disk drive and being controlled in its execution by the processor 905. The method 200 will be described with reference to the example stadium 532 of Fig. 5.

In positioning step 210, the cameras 501-528 are physically placed in corresponding positions 15 according to the desired camera configuration determined at step 330. Step 210 is performed manually. The following steps assume that the cameras 501-528 are in the desired position, with only the field of view required to be set up. Errors in the position of the camera 501-528 may be handled during the alignment step 120 described below. However, accounting for possible errors in camera position makes alignment more difficult as it increases the range of 20 possible transforms that need to be accounted for. Once the cameras are positioned correctly at step 210, the term set up refers to field-of-view adjustment, where field-of-view comprises one or more of pan, tilt, roll and zoom.

While placing the cameras 501-528, it is preferable for the field of view of the cameras to be close enough to the desired field of view determined at step 330, such that there is some overlap 25 between the current view and the desired view for the cameras. Image alignment between the captured view and the estimated desired view, will be described below with reference to step 120. Image alignment may not work if there is no content in common between two views.

In setting step 220, at least one of the cameras 501-528 is set up correctly using a suitable method other than method 100 (see Fig. 1). The camera may be manually adjusted to the required settings such as pan, tilt and zoom. Manual setup requires human effort and may take

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 significant time to complete. Alternatively, the camera may be set up at step 220 based on automatically aligning the camera image to a model of the stadium 532.

Automatically aligning the camera image to a model of the stadium may have disadvantages. Automatic methods may not be robust, and may fail for certain cameras (e.g. may be due to reliability on features in the image for image alignment), thus requiring another approach to be used to set up those cameras. In this case, a means of identifying at least one camera which is correctly set up is required, such as an alignment or other quality measure, or by confirming consistency between a pair of potentially correct cameras.

In decision step 230, a test is made for whether there are cameras which have not been set up. If there are more cameras to set up then processing moves to setting step 240. Otherwise, processing continues to step 250.

In setting step 240, an additional camera is set up based on the known camera positions, desired camera configuration, scene three-dimensional (3D) structure information, and at least one already set up camera. A method 100 of setting up a camera, as executed at step 240, will be described in detail below with reference to Fig. 1. The already set up camera is configured according to a predetermined configuration and may be the original first camera set up at step 220 using another method, or the already set up camera may be one of the cameras set up previously using method 100. The order in which the cameras are set up and which camera the camera is set up based on may impact the performance of the setup, and may be selected to improve the performance. An already set up first camera having a view overlapping with the second camera by a sufficient amount may be used as the basis for setting up the camera at step 240. It is also desirable for the cameras to have a similar viewing direction if possible, so that occlusions are similar and also so that lighting directions are similar. Cameras at opposite ends of the stadium 532, for example, looking at the same area within the stadium 532 will have opposite viewing direction and may have different lighting conditions. It may be necessary to compare some cameras to the original correctly set up camera in order to avoid compounding errors.

Once the additional camera is set up and step 240 is complete, processing then returns to the condition 230.

In calibrating step 250, calibration is performed on the cameras 501-528. It is assumed that although the cameras are set up according to the desired camera configuration, there is a limit to

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 the precision of the setup due to limited precision of the pan-tilt-zoom stages, and practical issues which may arise while placing the cameras.

For many applications, more precise information about the location, field of view, settings, and distortion of the cameras is required. Where more precise information is required, camera calibration may be performed to provide such precise information. Suitable methods of camera calibration may be based on matching identifiable features between the images provided by different cameras. The output of calibration is intrinsic and extrinsic parameters of each camera according to the camera calibration mode. The term “intrinsic” parameters refer to the parameters which relate to the camera (including the lens) itself, such as focal length, principle point, and any distortion parameters. “Extrinsic” parameters are the parameters that relate to the position and orientation of the camera. Having performed the calibration, the method 200 is then complete.

A method 100 of setting up a second camera, as executed at step 240, will be described in detail below with reference to Fig. 1. In the method 100, the field of view (e.g., one or more of pan, tilt, zoom and roll) is adjusted based on known camera positions of the first and second camera, a desired camera configuration, scene three-dimensional (3D) structure information, and at one already set up first camera. The position as well as the field of view of the first camera is correct as per the desired configuration.

One or more steps of the method 100 may be implemented as one or more software code modules of the software application program 933 resident in the hard disk drive and being controlled in its execution by the processor 905. In receiving step 105, an image from each of the first and second cameras is received, under execution of the processor 905. The images may be captured at the same time, which may be important if the scene viewed by the cameras is dynamic. Further, simultaneously captured images may be easier to analyse to obtain alignment information.

In transforming step 110, the image from the first (i.e. already set up) camera is transformed according to the known positions of the cameras, the desired field of view of the second camera, and the scene three-dimensional (3D) structure information. The transformed image resulting from step 110 represents the expected view from the second camera, based on the image of the first camera.

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017

Using a pinhole camera model, the view from the first camera may be represented by the matrix according to Equations (1) and (2), below:

Ci — H₄. (R_± I Ci)

ffxi	0	^CX1
K₁₌ 0	fyl	^cyi
\ 0	0	1

(1) (2) where Ci is the product of Ki and (Ri|-ti), giving a 3x4 matrix.

Ki represents the intrinsic parameters of the first camera. Ki includes the focal length fi (measured in pixels), which may have different values in the x and y directions as f_xi and f_yi. Ki 10 also includes the location of the optical axis in the image (ex, cy), and is typically the centre of the image.

Ci is a translation representing the position of the first camera in camera co-ordinates, which may be calculated as c = -Rl.t where t is the position of the camera in three-dimensional (3D) world co-ordinates,

Ri is a rotation matrix representing the orientation of the first camera, (Ri|ci) is a 3x4 matrix comprising Ri on the left and Ci on the right, representing the extrinsic parameters of the first camera. Matrix Ci transforms the three-dimensional (3D) point w = (w_x, w_y, w_z, 1) by left multiplication, where

Q.w = m — (m_zm_x,m_zm_y, m_z^ and where m_x, m_y are the x- and y-coordinates of the image pixel on which the point w appears. 20 m_x and m_y are calculated by dividing out the z-coordinate of the image pixel m_z.

The camera matrix C₂ for the second camera may be represented similarly as K2.(R2|c2).

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017

Described below is a solution for a simple linear pinhole camera model. However, alternative camera models may be used including skew or non-linear distortions instead.

Calculating transform from the first camera view to the second camera view depends on the nature of the three-dimensional (3D) structure information, as the three-dimensional (3D) structure information provides the missing depth information for the images. As described below, the three-dimensional (3D) structure information may be a ground plane assumption. Alternatively, the three-dimensional (3D) structure information may be a planar assumption.

For some applications, such as capturing a sporting event played on a predominantly flat sporting field, an assumption which may be made is that the majority of the content in the 10 image is at the ground level. The ground level may be represented as a plane in the 3D world co-ordinate space. In other applications, the content may be known to lie on a different plane. Any plane, ground or otherwise, may be represented by a (four-valued) vector π in three dimensional homogenous co-ordinates, such that a point X is on the plane if and only if π.Χ = 0.

For example, the plane Z=0 may be represented by the vector (0, 0, 1, 0).

Given camera matrices Ci = K_r. (/fy 1tq) and C₂ = K₂.(R₂|t₂) for the first camera and second camera respectively, and a ground plain assumption represented by the vector π, a homography H mapping the first camera view to the second camera view may be calculated in accordance with Equation (3), as follows:

H = K₂(R₂.Rffi - (t₂ + Rffi.tffi(R₁\t₁).n^TyKffi (3)

The homography mapping matrix H is the product of three matrices. The first matrix is the intrinsic matrix of the second camera K₂. The second matrix is a sum of two matrices; the combined rotation R₂.Ri^_1, and the outer product of the combined translation (t₂ + Rf'.ti) with the plain vector transformed to camera co-ordinates (R||t|).7i^T. The third matrix is the inverse of the intrinsic matrix of the first camera.

An alternative form of three-dimensional (3D) structure information may be in the form of depth information for the first camera image. One source of depth information may be direct measurement, such as a depth sensor aligned to the first camera. An alternative source of depth information may be a 3D model of the area viewed by the camera. Since the first camera is already correctly set up, if a 3D model of the area is known then the expected depth of each

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 pixel may be measured based on the three-dimensional (3D) model and the known position of the first camera.

Given depth information, each pixel in the image captured by the first camera may be expressed as a three-dimensional (3D) point in camera co-ordinates X=(x, y, 1, l/d)^T, where x and y are the pixel co-ordinates and d is the depth value. The 3D point may be converted to a point in world by first extending the camera matrix Ci by adding another row (0, 0, 0, 1) to make the 4x4 extended camera matrix Ci , finding the inverse Ci ' , and transforming the 3D point into world co-ordinates by calculating Ci ' .X. The three-dimensional (3D) point may then be mapped to a point on the second camera using the matrix C2.C1 ' .X of the second camera.

Using one of the above described methods, an image representing the expected view from the second camera is produced. The image representing the expected view from the second camera is formed from the image captured by the first camera. However, the image representing the expected view from the second camera should be aligned with and matching the view that the second camera would see if the second camera were correctly set up. An example image 801 representing the expected view from the second camera is shown in Fig. 8B. The image 801 shows the image 701 from a first camera, as shown in Fig. 7A, transformed to align with the image 702 from a second camera, as shown in Fig. 7B. Alternatively, a warp map representing the correspondence between the pixels in the images 701 and 702 from the two cameras may be used in place of explicitly generating the transformed image.

In aligning step 120, image alignment is performed between the image from the second camera and the transformed image from the first camera. Image alignment attempts to find a mapping between two images such that the features in the images are aligned.

One class of alignment method finds a general warp, which maps each pixel in one image to a corresponding pixel in the second image. Such a warp may be represented as a warp map, which is a map having the same size as the first image and containing an offset value representing the offset to the corresponding pixel in the other image. For example, Fig. 7A shows the image 701 from a first camera, and Fig. 7B shows the image 702 from a second camera, while Fig. 8A shows a warp map 800 mapping the first camera image to the second camera image, each arrow representing the direction of the offset with a length proportional to the distance between the corresponding pixels in the two images. Fig. 8B shows the result of applying the warp map 800 of Fig. 8A to transform the image 702 of Fig. 7B from the second camera into the viewpoint of the first camera. Note that a warp map is generally used to

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 perform the inverse of the transform that the warp map visibly represents. A warp map is used because generating a transformed image is most effectively done using a warp which maps each pixel in the output image to a source image location from which to derive an interpolated colour for the pixel.

The image 801 of Fig 8B also shows the effect of the ground plane assumption used to generate the warp 800 in Fig. 8A. It can be seen that ground areas such as the field markings are transformed correctly, while the goal posts are distorted and incorrectly placed.

In general, and as can also be seen in the image 801 of Fig. 8B, the transformed image from the one camera may not cover all of the field of view of the other camera. There may be areas outside the field of view of the first camera, or, in the case of three-dimensional (3D) structure information that is a depth map, there may be areas which are occluded in the field of view of the first camera. As the location of such occluded areas in the transformed image is known (and is independent of the unknown field of view of the second camera), a mask may be determined to represent the missing regions. The determined mask may be used to indicate the alignment to ignore the missing areas in the transformed image.

An alternative class of alignment methods that may be used takes advantage of the assumption that what is required to align the images should correspond directly to an adjustment for the second camera, since the second camera is assumed to be in the correct position but with incorrect settings.

Typical camera settings consist of adjusting pan, tilt and zoom of the camera. In some cases the roll of the camera may be adjustable as well. In cases where the roll of the camera is adjustable, RST (Rotation + Scale + Translation) alignment provides a close approximation of the required transformation. Pan and tilt do not exactly correspond to a translation, but depending on the desired accuracy the approximation may be sufficient. Alternatively an alignment method which optimises over the desired space of transforms, for example pan, tilt and zoom, may be used.

In some cases, good alignment may not be possible without accounting for other factors such as lens distortion, incorrect camera placement, and errors relating to the 3D structure information of the scene. In cases where good alignment is not possible, general image alignment methods, which provide a general warp between the two images, may be used.

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017

If, in step 110, a warp is provided rather than a transformed image, alignment may be performed directly between the images from the first and second cameras. The alignment may be performed directly using the warp as an initial estimate. Alternatively, the alignment may be performed directly by exploring the space of warps which consist of the warp composed of a simpler space of transforms such as RST (Rotation, Scale, Translation), or the space of transforms generated by a combination of Pan, Tilt, Roll (optionally) and Zoom according to the camera model and intrinsic parameters, referred to as “PTRZ” below.

In determining step 130, a corrective transform is determined to correct the settings of the second camera. The specific calculation performed at step 130 required depends on the form of 10 the alignment results. In the simplest case, if the alignment is expressed in the same space as the available camera transforms, such as PTRZ, then the corrective transform is the same as the alignment results (or possibly the inverse, depending on the direction of the alignment mapping). If the alignment is expressed as a general warp, then correction parameters need to be fit to the warp. Methods such as least squares fitting may be used to find the transform which 15 best fits the determined warp.

If the 3D structure information used a ground plane assumption, care should be taken to find an adjustment which is accurate for the areas of the image for which the ground plane assumption holds, and unaffected by areas where the assumption is incorrect. Robust fitting methods such as RANSAC may be used to find transform parameters which tightly fit the warp for the ground 20 points while ignoring the warp determined for points off the ground.

Finally in adjusting step 140, the corrective transform determined in step 130 is applied to the second camera in order to rotate and zoom the second camera to the desired field of view. Having applied the corrective transform to the second camera, the second camera should now be correctly set up, and the view from the second camera should match the transformed image 25 from step 110.

Fig. 4 shows a method 400 constructing a 3D model of an object for use in a 3D scanning application. One or more steps of the method 400 may be implemented as one or more software code modules of the software application program 933 resident in the hard disk drive and being controlled in its execution by the processor 905.

In determining step 410, the approximate shape of the object to capture is determined under execution of the processor 905. Step 410 may be implemented using similar methods to the

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 capture itself but with less detail than the capture. If the object has a known or simple shape then the shape of the object may be used in step 410.

In capturing step 420, capture planning is performed under execution of the processor 905. For objects with a complex shape, it can be difficult to ensure that sufficient levels of detail are captured on all parts of the object. It is also difficult to ensure that every part of the object is visible in at least one captured image. Whether parts of the object are missing may not be obvious during capture time, and may only be noticed after constructing a 3D textured model. For objects at inconvenient locations or with limited access, it can be problematic to have to return to recapture parts of the object that were missed. Capture planning means deciding upon 10 the specific locations and orientations of all images captured in advance. Using a model based on the approximate shape of the object determined during step 420, the capture may be simulated based on the planned camera positions to determine whether the planned camera setup is sufficient.

Once at the capture site, in setup step 430 the cameras are set up according to the plan determined in step 420, in accordance with the method 200.

In capturing step 440, the capture is performed using the cameras 501-528. For a static object, not all cameras need to be in place and capturing simultaneously. In the case of a static object, capture is performed using the cameras as already set up, and if there are more cameras to set up then the method 400 returns to step 430. The capture may even be performed with just a single camera. In the case where the capture is performed with a single camera it should be understood that in the description of method 200 and 100, the “first camera” and “second camera” may be the same physical camera, but in a first and second position at different times.

In constructing step 450, a 3D model of the object is constructed using the collection of images of the object and known 3D model construction methods.

The arrangements described are applicable to the computer and data processing industries and particularly for image processing.

The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

13921986v 1 (P280090_Speci_As Filed)

In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of’. Variations of the word comprising, such as “comprise” and “comprises” have correspondingly varied meanings.

2017265068 22 Nov 2017

Claims

1. A method of setting up a field of view of a plurality of cameras capturing a scene, a first image of the scene being captured by a first camera in the plurality of cameras overlapping with a second image of the scene being captured by a second camera in the plurality of cameras, the

5 method comprising:

determining a desired view for the second camera in the plurality of cameras based on

10 the received set of first camera characteristics of the first camera, a desired set of second camera characteristics of the second camera, the first image of the scene captured by the first camera and the three-dimensional structure information for the scene; and setting up the field of view of the second camera, the field of view of the second camera being selected based on an alignment of the determined desired view of the second camera with 15 the second image of the scene captured by the second camera, wherein the second image of the scene overlaps with the first image.

2. The method according to claim 1, wherein the three-dimensional structure information is a ground plane assumption.

3. The method according to claim 1, wherein the three-dimensional structure information 20 is a planar assumption.

4. The method according to claim 1, wherein the three-dimensional structure information is based on a previous 3D scan of the location.

5. The method according to claim 1, wherein the three-dimensional structure information is from an aligned depth sensor.

25

6. The method according to claim 1, wherein the field of view of the second camera is set using a pan-tilt-zoom stage.

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017

7. The method according to claim 1, further comprising aligning the first camera according to a model of the scene.

8. The method according to claim 1, further comprising calibrating the first camera.

9. An apparatus for setting up a field of view of a plurality of cameras capturing a scene, a

5 first image of the scene being captured by a first camera in the plurality of cameras overlapping with a second image of the scene being captured by a second camera in the plurality of cameras, the apparatus comprising:

means for receiving a set of first camera characteristics of the first camera in the plurality of cameras and a three-dimensional structure information for the scene, the first

10 camera being configured according to a predetermined configuration;

means for determining a desired view for the second camera in the plurality of cameras based on the received set of first camera characteristics of the first camera, a desired set of second camera characteristics of the second camera, the first image of the scene captured by the first camera and the three-dimensional structure information for the scene; and

15 means for setting up the field of view of the second camera, the field of view of the second camera being selected based on an alignment of the determined desired view of the second camera with the second image of the scene captured by the second camera, wherein the second image of the scene overlaps with the first image.

10. A system for setting up a field of view of a plurality of cameras capturing a scene, a first

20 image of the scene being captured by a first camera in the plurality of cameras overlapping with a second image of the scene being captured by a second camera in the plurality of cameras, the system comprising:

a memory for storing data and a computer program;

a processor coupled to the memory for executing the computer program, the computer

25 program comprising instructions for:

13921986v 1 (P280090_Speci_As Filed)

2017265068 22 Nov 2017 receiving a set of first camera characteristics of the first camera in the plurality of cameras and a three-dimensional structure information for the scene, the first camera being configured according to a predetermined configuration;

5 the received set of first camera characteristics of the first camera, a desired set of second camera characteristics of the second camera, the first image of the scene captured by the first camera and the three-dimensional structure information for the scene; and setting up the field of view of the second camera, the field of view of the second camera being selected based on an alignment of the determined desired view of the second camera with 10 the second image of the scene captured by the second camera, wherein the second image of the scene overlaps with the first image.

11. A non-transitory computer readable medium having a computer program stored on the medium for setting up a field of view of a plurality of cameras capturing a scene, a first image of the scene being captured by a first camera in the plurality of cameras overlapping with a 15 second image of the scene being captured by a second camera in the plurality of cameras, the program comprising:

20 code for determining a desired view for the second camera in the plurality of cameras based on the received set of first camera characteristics of the first camera, a desired set of second camera characteristics of the second camera, the first image of the scene captured by the first camera and the three-dimensional structure information for the scene; and code for setting up the field of view of the second camera, the field of view of the

25 second camera being selected based on an alignment of the determined desired view of the second camera with the second image of the scene captured by the second camera, wherein the second image of the scene overlaps with the first image.