AU2010202963A1

AU2010202963A1 - Frame rate up-sampling for multi-view video coding using distributing video coding principles

Info

Publication number: AU2010202963A1
Application number: AU2010202963A
Authority: AU
Inventors: Ka Ming Leung; Zhonghua Ma
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2010-07-13
Filing date: 2010-07-13
Publication date: 2012-02-02

Abstract

FRAME RATE UP-SAMPLING FOR MULTI-VIEW VIDEO CODING USING DISTRIBUTED VIDEO CODING PRINCIPLES Disclosed is a method of frame rate up-sampling and decoding using multiple cameras based on distributed source coding techniques. A first set of frames (F", Fn+1 , Fn+2 ... ) are received from a first camera (221) at a first frame rate and at least one second set of frames (11, 112, 121, 122 ... ) and error correction bits (WZ11, WZ12, WZ 2 1, WZ 22 ... ) are 10 received from a corresponding at least one second camera (222a, 222b) at a second frame rate higher than the first frame rate. The method determines the degree of temporal correlation between frames of said second set of frames using said error correction bits, and the degree of correlation between said first and said second sets of frames using said error correction bits. The method then performs frame rate up 15 sampling for said first camera based on said degrees of correlation determined temporally and spatially between said first and second sets of frames. R289491 1 947504_speciJodge &m 0)) I-I

Description

S&F Ref: 947504 AUSTRALIA PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3 of Applicant: chome, Ohta-ku, Tokyo, 146, Japan Actual Inventor(s): Zhonghua Ma Ka Ming Leung Address for Service: Spruson & Ferguson St Martins Tower Level 35 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Frame rate up-sampling for multi-view video coding using distributing video coding principles The following statement is a full description of this invention, including the best method of performing it known to me/us: 5845c(2831633_1) - 1 FRAME RATE UP-SAMPLING FOR MULTI-VIEW VIDEO CODING USING DISTRIBUTED VIDEO CODING PRINCIPLES TECHNICAL FIELD The present invention relates generally to image and video decoding, and 5 particularly to up-sampling image and video using distributed video coding principles. BACKGROUND Various products, such as digital (still) cameras and digital video cameras, are used to capture images and videos. These products contain an image sensing device, such as a charge coupled device (CCD), which is used to capture light energy focussed 10 on the image sensing device that is indicative of a scene. The captured light energy is then processed to form a digital image and is encoded according to an image or video coding standard such as JPEG, JPEG2000, Motion JPEG, Motion JPEG2000, MPEGI, MPEG2, MPEG4 and H.264. All the formats listed above are compression formats, although the manner in 15 which compression is performed is varied. While these formats offer high quality and improve the number of images that can be stored on a given media, they typically suffer from long encoding runtime. For conventional formats, such as JPEG, JPEG2000, Motion JPEG, Motion JPEG2000, MPEG1, MPEG2, MPEG4 and H.264, the encoding process is typically five to ten times more complex than the corresponding decoding 20 process. This imposes strict frame-rate limitations (e.g., 30 fps for typical HD video cameras) upon the encoding devices and has significant impact on the quality of the displayed videos due to fast and non-linear motions. The prior-art shown in Fig. 1 is a typical method for frame-rate up-sampling based on motion compensation temporal interpolation (MCTI). In this example, a 25 stream of frames [Fn, Fn~+, Fn+ 2 , Fn+ 3 ... ] are generated by a conventional camera 2829491_1 947504_specilodge -2 (camcorder) and intermediate frames, for example frames FnJa and Fni2 are interpolated linearly from the adjacent frames, in this example frames Fn and F,1. This method assumes linear motions and thus performs sub-optimally for fast and non-linear motions. To perform video up-sampling effectively for video coding systems with more 5 than one camera, "distributed video coding (DVC)" or sometimes referred to as "distributed source coding (DSC)", which is based on the well-known Wyner-Ziv coding paradigm, may be used. In a DVC scheme, the complexity is shifted from the encoder to the decoder. This permits fast, low power encoding and does not require inter-camera communications. This also allows decoders (e.g. display devices) to generate high 10 quality up-sampled videos by evaluating fast, nonlinear motions in videos. SUMMARY According to one aspect of the present disclosure there is provided a method of frame rate up-sampling using multiple cameras based on distributed source coding techniques, said method comprising the steps of: 15 receiving a first set of frames from a first camera at a first frame rate; receiving at least one second set of frames and error correction bits generated from a corresponding at least one second camera at a second frame rate higher than the first frame rate; determining the degree of temporal correlation between frames of said second set 20 of frames using said error correction bits; determining the degree of correlation between said first and said second sets of frames using said error correction bits; and performing frame rate up-sampling for said first camera based on said degrees of correlation determined temporally and spatially between said first and second sets of 25 frames. 2829491_1 947504_speci_lodge -3 The degree of temporal correlation may be determined using temporal interpolation or using temporal-view hybrid interpolation. Preferably, the determining the degree of temporal correlation further comprises determining an interpolation mode based on a temporal reliability measure and selecting 5 one of temporal interpolation and temporal inter-view hybrid interpolation to determine the degree of temporal correlation. In this implementation, the temporal inter-view hybrid interpolation may comprise: performing temporal interpolation based on adjacent key frames; performing inter-view interpolation based on a view-point difference between the 10 set of frames of said first camera and at least one of the set of frames of said second camera; determining weights for each of the temporal interpolation and the inter-view interpolation; and averaging the temporal interpolation and the inter-view interpolation using the 15 determined weights to obtain the temporal inter-view hybrid interpolation. Desirably, the weights are determined using reliability measures of a non-key frame, said reliability measure being generated based on the error correction bits. Further, weights for a first intermediate frame located between frames of the first set of frames may be determined from the reliability measure generated for a non-key frame of 20 a first one of the second cameras, said non-key frame for the first one of the second set of cameras, said reliability measure being generated based on said error correction bits. Also each block of the first intermediate frame may be associated to two reliability values generated by Wyner-Ziv decoding of at least said error correction bits. Preferably, one of the reliability values may be determined from a temporal prediction using 25 adjacent key frames from the at least one second set of frames, and the other reliability 2829491_1 947504_specilodge -4 value is determined based on inter-view prediction from a key frame of the at least one second set of frames of a second one of the second cameras. Further, preferably for a next intermediate frame in the video sequence, the method determines weights for the frame from a reliability measure generated for a non-key frame for the second one of the 5 second cameras, in which each block of the intermediate frame is associated to two reliability values generated by the Wyner-Ziv decoding, in which one of the reliability values is determined from a temporal prediction using adjacent key frames from the second set of frames of the second one of the second cameras, and the other reliability value is determined based on inter-view prediction from a key frame from the second set 10 of frame of the first one of the second cameras. Typically, the two reliability values associated to a block of the intermediate frame are mapped to the determined weights. Alternatively, the reliability values are firstly subject to a preset threshold to determine the weights for the block of the intermediate frame of the first camera, such that any spatial or temporal reliability values below the preset threshold is reset to zero and such 15 that the corresponding interpolation makes no contribution to the final interpolation of that particular block. Desirably, the first camera is a conventional camera and the second camera comprises at least one and typically two distributed video coding (DVC) cameras. Other aspects are also disclosed. 20 BRIEF DESCRIPTION OF THE DRAWINGS At least one embodiment of the present invention will now be described hereinafter with reference to the drawings, in which: Fig. I is a prior-art method for frame-rate up-sampling; 2829491_1 947504_specilodge -5 Fig. 2 is a block diagram of an exemplary configuration of a video coding system, having more than one camera, with which the arrangements to be described may be practiced; Fig. 3A is a schematic block diagram of a conventional camera; 5 Fig. 3B is a schematic block diagram of a camera that employs Distributed Video Coding (DVC) schemes according to the present disclosure; Fig. 3C is a schematic block diagram of a pixel-domain Wyner-Ziv encoder; Fig. 4A is a schematic block diagram of a video decoder 400 of Fig. 2; Fig. 4B is a schematic block diagram of a correlation estimator 450 of Fig. 4A; 10 Fig. 5 is a schematic block diagram of a pixel-domain Wyner-Ziv decoder (4540 and 4590) of Fig. 4B; Fig. 6 is a frame-rate up-sampling method to be described in this patent specification; Fig. 7 is a flow diagram of an interpolation mode selector 460 of Fig. 4A; 15 Fig. 8 is a flow diagram of a frame rate up-sampler 420 of Fig. 4A; Figs. 9A and 9B collectively form a schematic block diagram of a computer system in which the arrangements shown in Figs. 3A, 3B, 3C, 4A, 4B, and 5 may be implemented; Fig. 10A illustrates video streams arising from the camera arrangement of Fig.2; 20 and Fig. 10B schematically illustrates the manner of upsampling and thus the construction of intermediate frames according to the present disclosure. DETAILED DESCRIPTION INCLUDING BEST MODE Methods, apparatus, and computer program products are disclosed for processing 25 digital images each comprising a plurality of pixels. In the following description, 2829491_1 947504_specilodge -6 numerous specific details, including image/video compression/encoding formats and the like, are set forth. However, from this disclosure, it will be apparent to those skilled in the art that modifications and/or substitutions may be made without departing from the scope and spirit of the invention. In other circumstances, specific details may be omitted 5 so as not to obscure the invention. Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears. 10 Video Coding System with Multiple Cameras Fig. 2 shows a configuration of a video coding system 200 comprising three independent video cameras 221, 222a, and 222b for capturing images of a scene 210. The scene 210 comprises a 3D spherical object and a 3D square object, for purposes of illustration only. The camera 221 is a conventional video camera generating raw pixel 15 data or entropy encoded data at a frame rate R; using a system 300 of Fig. 3A. In this configuration, the other two cameras 222a and 222b are hybrid DVC cameras, one physically positioned on each side of the conventional video camera 221, and each capturing the scene 210 at a frame rate R 2 which is higher than that of the conventional camera 221, so that R; < R 2 . Each DVC camera 222a and 222b uses a system 350 shown 20 in Fig. 3B to perform Wyner-Ziv (WZ) encoding, which generates error correction bits using bitwise error correction methods. The cameras 221, 222a, and 222b each have a corresponding stream output 330, 395a and 395b and together form an independent encoder 220 of the system 200. 2829491_1 947504_specilodge -7 Data transmission is performed independently from each video camera 221, 222a, and 222b to a joint video decoder 400 via a transmission medium or a storage device, unitarily depicted at 230 in Fig. 2. In a DVC scheme, decoding of the image data from the cameras 221, 222a and 5 222b is performed jointly at the joint video decoder 400 of Fig. 2. Detail of the joint video decoder 400 will be described later with reference to Fig. 4A. The joint video decoder 400 evaluates the degrees of temporal and inter-view correlations using the conventionally encoded key-frames and error correction bits from the hybrid DVC cameras 222a and 222b of Fig. 2. The inter-view correlations arise from the spatial 10 view-point difference between the cameras 222a and 222b. Based on the computed degrees of temporal and spatial/inter-view correlations, the joint video decoder 400 determines the best interpolation mode for frame interpolations and performs frame-rate up-sampling to the received video from the conventional video camera 221. The output of the joint video decoder 400 is a video 250 formed by frames as if captured by the 15 conventional video camera 221 but at the higher frame rate R 2 of the DVC cameras 222a and 222b. The frames of the video stream 250 therefore replicate the resolution of the camera 221 at the frame rate of the cameras 222a and 222b. Fig. I0A illustrates the nature of the video streams 330, 395a and 395b output from the cameras 221, 222a and 222b and which are subject to storage or transmission, 20 or both via the medium 230. As can be seen from Fig. 10A, the conventional camera 221 outputs a stream 330 of a first set of frames [F,, Fn 1 1, Fn+ 2 ... ], typically at a relatively high resolution and at a relatively low frame rate. The DVC cameras 222a and 222b output streams 395a and 395b of second sets of frames respectively at relatively low resolution and at a relatively high frame rate. The frames output from the DVC 25 cameras in each case are alternate intra-encoded frames (I frames or key frames) and 2829491 1 947504_specilodge -8 Wyner-Ziv (WZ) encoded frames (parity bits used during the prediction of non-key frames). In the example of Fig. 10A, it will be appreciated that the frame rates of the DVC cameras 222a and 222b are the same, and three times the rate of the conventional camera 221. In practice, the disparity in frame rates can be altered depending on the 5 specific application. Some applications (e.g. higher speed TV sports video) may involve the conventional camera 221 to operate at traditional video frame rates of 25 or 30 frames per second, and the DVC frames rates could be at four times the rate, permitting construction of a TV quality image at 100 or 120 frames per second. In other applications (e.g. video surveillance), the conventional camera 221 may be configured to 10 capture one frame per second, whereas the DVC cameras 222 may operate at traditional video rates of 25 or 30 frames per second. This would permit reproduction of a surveillance video at the resolution of the conventional camera at 25 or 30 frames per second. It will be appreciated that significant bandwidth savings may be obtained in each example. 15 Fig. IOB schematically illustrates the (re)construction of the output stream 250 from the input streams 330, 395a and 395b exemplified in Fig. 1OA for one decoding period. A decoding period includes two frames from the conventional camera 221, in which successive decoding periods overlap as shown in Fig. 10B (e.g. one period includes F, and F,+,, and the following period includes F", 1 and Fn+ 2 ). The WZ frames 20 produced by the DVC cameras are used to exploit the temporal correlation between adjacent frames captured by a camera, such as two successive I frames captured by camera 222b, and the inter-view correlation between two views (e.g. two cameras at different locations). In this example there are two types of inter-view correlation, each associated with its own two views with one between a DVC camera, such as camera 25 222a, and the conventional camera 221 while the other type is between the two DVC 2829491_1 947504_speci_lodge -9 cameras 222a and 222b. Temporal correlation is determined by applying error correction bits to the temporal side information (SI) interpolated from two adjacent I frames. Inter-view correlation is determined by applying error correction bits to the inter-view side information (SI) extrapolated from the F, and I frames. Information 5 from these two correlations is combined to generate a block-based fusion mask and a set of optimal weights for temporal or interview predictions. The intermediate frame Fi is interpolated adaptively from the nearest frames, based on the fusion mask and the optimal weights. As can be seen from Fig. lOB, adjacent I frames in the same input stream are 10 able to be related by motion compensation temporal interpolation (MCTI) forming a temporal prediction P 1 (non-key frame) for each WZ frame. Time adjacent WZ and I frames from the two DVC input streams are able to be related by disparity compensation view prediction (DCVP) based on the estimated view-point difference "dv" between a frame Fn from the conventional camera 221 and the time adjacent I frame from a DVC 15 camera. These relations exist because of both the like spatial relationship of the DVC cameras (they both view the same scene distinguished only by a small spatial separation from the conventional camera 221) and because their relatively high frame rates, compared with the conventional camera 221, provide that temporal differences will be minor. By performing DCVP based on the predetermined view-point difference dv 20 between a DVC camera (222a, 222b) and the conventional camera 221, a spatial prediction P 2 , non-key frame, is generated at the location of each WZ frame. The WZ error correction bits may then be used to determine the reliability of the temporal and spatial predictions, as seen in Fig. 10B. The arrangements to be described select either one of the predictions P, and P 2 , or a manner in which they can be combined to generate 2829491_1 947504_specilodge -10 the associated intermediate frame F,,_i interpolated from the adjacent full resolution frames Fn, Fn, 1 and an I frame from a DVC camera. Whilst Figs. IOA and lOB illustrate substantial synchronicity between the DVC cameras, such is not essential. For example, the DVC cameras may operate at different 5 frame rates. In such a situation, arbitration will be required within the decoding process to identify those WZ and I frames of the slower DVC camera that should be matched or combined with those of the faster DVC camera. This situation is more likely, not through distinctly different frame rates, but rather through minor variations in clock speeds and phasing variations in the production of alternate WZ and I frames. 10 Computer Implementation Fig. 3A shows schematically the relevant portions of the conventional camera 221, whereas Figs. 3B and 3C show schematically the relevant portions of the DVC cameras 222a and 222b. Those cameras are typically formed by stand-alone devices which have embedded systems configured to perform the relevant encoding functions 15 300 or 350. Those embedded systems may operate using software, or hardware, or a combination of the two. Typically, the joint video decoder 400 is implemented in a more substantial computing device, such as a stand-alone computer, a set-top box or the like, where the decoding functions are most likely performed in software, although the computing device may be supplemented by specific hardware devices where speed of 20 operation or special purposes make such desirable. In some implementations, the cameras 221, 222a and 222b may each be traditional cameras (simple image capture devices) and the relevant encoding processes 300 or 350 may be performed by a computer device to which they each couple. This is depicted in Figs. 9A and 9B, where the encoder 300, the encoder 350, as well as the 25 joint video decoder 400, may each be implemented as software, using one or more 2829491_1 947504_speci_lodge - II application programs executable within a computer system 900. The software may be stored in a computer readable medium, including the storage devices described hereinafter, for example. The software is loaded into the computer system 900 from the computer readable medium, and then executed by the computer system 900. A 5 computer readable medium having such software or computer program recorded on the medium is a computer program product. In Fig. 9A, three cameras 927a, 927b and 927c, partly replicating the cameras 221, 222a and 222b, are seen to provide inputs to the computer system 900. As seen in Fig. 9A, the computer system 900 is formed by a computer module 10 901, input devices such as a keyboard 902, a mouse pointer device 903, a scanner 926, the cameras 927, and a microphone 980, and output devices including a printer 915, a display device 914 and loudspeakers 917. An external Modulator-Demodulator (Modem) transceiver device 916 may be used by the computer module 901 for communicating to and from a communications network 920 via a connection 921. The 15 network 920 may be a wide-area network (WAN), such as the Internet or a private WAN. Where the connection 921 is a telephone line, the modem 916 may be a traditional "dial up" modem. Alternatively, where the connection 921 is a high capacity (e.g. cable) connection, the modem 916 may be a broadband modem. A wireless modem may also be used for wireless connection to the network 920. 20 The computer module 901 typically includes at least one processor unit 905, and a memory unit 906 for example formed from semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 901 also includes a number of input/output (I/O) interfaces including an audio-video interface 907 that couples to the video display 914, loudspeakers 917 and microphone 980, an I/O 25 interface 913 for the keyboard 902, mouse 903, scanner 926, camera 927 and optionally 2829491_1 947504_speci_lodge - 12 a joystick (not illustrated), and an interface 908 for the external modem 916 and printer 915. In some implementations, the modem 916 may be incorporated within the computer module 901, for example within the interface 908. The computer module 901 also has a local network interface 911 which, via a connection 923, permits coupling of 5 the computer system 900 to a local computer network 922, known as a Local Area Network (LAN). As also illustrated, the local network 922 may also couple to the wide network 920 via a connection 924, which would typically include a so-called "firewall" device or device of similar functionality. The interface 911 may be formed by an Etherneti" circuit card, a Bluetoothm wireless arrangement or an IEEE 802.11 wireless 10 arrangement. The interfaces 908 and 913 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 909 are provided and typically include a hard disk drive (HDD) 910. 15 Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 912 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g. CD-ROM, DVD), USB-RAM, and floppy disks for example may then be used as appropriate sources of data to the system 900. 20 The components 905 to 913 of the computer module 901 typically communicate via an interconnected bus 904 and in a manner that results in a conventional mode of operation of the computer system 900 known to those in the relevant art. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations, Apple MacTM or a like computer systems evolved 25 therefrom. 2829491_1 947504_speci_lodge -13 By means of the storage devices 909 and the networks 920 and 922, the computer system 900 with the cameras 927 can operate as the independent encoder 220, as well as the storage/transmission medium 230 of Fig. 2. Further, where the joint decoder 400 is implemented within the computer system 900, the computer system 900 5 can operate as a receiver of encoded distributed video formed by the streams 330, 395a and 395b, for example sourced from cameras 221, 222a and 222b coupled to the networks 920 and 922. The method of distributed video coding which images comprise of a plurality of pixels may be implemented using the computer system 900 wherein the processes may 10 be implemented as one or more software application programs 933 executable within the computer system 900. In particular, the steps of the method of processing of digital images each comprising a plurality of pixels are effected by instructions 931 in the software 933 that are carried out within the computer system 900. The software instructions 931 may be formed as one or more code modules, each for performing one 15 or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the methods of processing of digital images each comprising a plurality of pixels and a second part and the corresponding code modules manage a user interface between the first part and the user. 20 The software 933 is generally loaded into the computer system 900 from a computer readable medium and is then typically stored in the HDD 910, as illustrated in Fig. 9A, or the memory 906, after which the software 933 can be executed by the computer system 900. In some instances, the application programs 933 may be supplied to the user encoded on one or more CD-ROM 925 and read via the corresponding drive 25 912 prior to storage in the memory 910 or 906. Alternatively the software 933 may be 2829491_1 947504_specilodge - 14 read by the computer system 900 from the networks 920 or 922 or loaded into the computer system 900 from other computer readable media. Computer readable storage media refers to any storage medium that participates in providing instructions and/or data to the computer system 900 for execution and/or processing. Examples of such 5 storage media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 901. Examples of computer readable transmission media that may also participate in the provision of software, application programs, instructions 10 and/or data to the computer module 901 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like. The second part of the application programs 933 and the corresponding code 15 modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 914. Through manipulation of typically the keyboard 902 and the mouse 903, a user of the computer system 900 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications 20 associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 917 and user voice commands input via the microphone 980. Fig. 9B is a detailed schematic block diagram of the processor 905 and a "memory" 934. The memory 934 represents a logical aggregation of all the memory 2829491_1 947504_specilodge - 15 devices (including the HDD 910 and semiconductor memory 906) that can be accessed by the computer module 901 in Fig. 9A. When the computer module 901 is initially powered up, a power-on self-test (POST) program 950 executes. The POST program 950 is typically stored in a ROM 5 949 of the semiconductor memory 906. A program permanently stored in a hardware device such as the ROM 949 is sometimes referred to as firmware. The POST program 950 examines hardware within the computer module 901 to ensure proper functioning, and typically checks the processor 905, the memory (909, 906), and a basic input-output systems software (BIOS) module 951, also typically stored in the ROM 949, for correct 10 operation. Once the POST program 950 has run successfully, the BIOS 951 activates the hard disk drive 910. Activation of the hard disk drive 910 causes a bootstrap loader program 952 that is resident on the hard disk drive 910 to execute via the processor 905. This loads an operating system 953 into the RAM memory 906 upon which the operating system 953 commences operation. The operating system 953 is a system level 15 application, executable by the processor 905, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface. The operating system 953 manages the memory (909, 906) to ensure that each process or application running on the computer module 901 has sufficient memory in 20 which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 900 must be used properly so that each process can run effectively. Accordingly, the aggregated memory 934 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the 25 computer system 900 and how such is used. 2829491_1 947504_specijodge - 16 The processor 905 includes a number of functional modules including a control unit 939, an arithmetic logic unit (ALU) 940, and a local or internal memory 948, sometimes called a cache memory. The cache memory 948 typically includes a number of storage registers 944-947 in a register section. One or more internal buses 941 5 functionally interconnect these functional modules. The processor 905 typically also has one or more interfaces 942 for communicating with external devices via the system bus 904, using a connection 918. The application program 933 includes a sequence of instructions 931 that may include conditional branch and loop instructions. The program 933 may also include 10 data 932, which is used in execution of the program 933. The instructions 931 and the data 932 are stored in memory locations 928-930 and 935-937 respectively. Depending upon the relative size of the instructions 931 and the memory locations 928-930, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 930. Alternately, an instruction may be 15 segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 928-929. In general, the processor 905 is given a set of instructions which are executed therein. The processor 905 then waits for a subsequent input, to which it reacts to by executing another set of instructions. Each input may be provided from one or more of a 20 number of sources, including data generated by one or more of the input devices 902, 903, data received from an external source across one of the networks 920, 922, data retrieved from one of the storage devices 906, 909 or data retrieved from a storage medium 925 inserted into the corresponding reader 912. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve 25 storing data or variables to the memory 934. 2829491_1 947504_specilodge -17 The disclosed arrangements use input variables 954, which are stored in the memory 934 in corresponding memory locations 955-958. The arrangements produce output variables 961, which are stored in the memory 934 in corresponding memory locations 962-965. Intermediate variables may be stored in memory locations 959, 960, 5 966 and 967. The register section 944-947, the arithmetic logic unit (ALU) 940, and the control unit 939 of the processor 905 work together to perform sequences of micro operations needed to perform "fetch, decode, and execute" cycles for every instruction in the instruction set making up the program 933. Each fetch, decode, and execute cycle 10 comprises: (a) a fetch operation, which fetches or reads an instruction 931 from a memory location 928; (b) a decode operation in which the control unit 939 determines which instruction has been fetched; and 15 (c) an execute operation in which the control unit 939 and/or the ALU 940 execute the instruction. Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 939 stores or writes a value to a memory location 932. 20 Each step or sub-process in the processes of Figs. 3A, 3B, 3C, 4A, 4B, 5, 7, and 8 is associated with one or more segments of the program 933, and is performed by the register section 944-947, the ALU 940, and the control unit 939 in the processor 905 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 933. 2829491_1 947504_specilodge - 18 The encoders 300, 350 and the joint video decoder 400 of Fig. 2 may alternatively be implemented in dedicated hardware such as one or more integrated circuits. Such dedicated hardware may include Field-programmable gate array (FPGA), application-specific integrated circuit (ASIC), graphic processors, digital signal 5 processors, or one or more microprocessors and associated memories. This may include embedded devices within the cameras 927 which would otherwise have comparable functions to the arrangements performed solely in software upon a so-called desk top computer. In one implementation, the encoders 300, 350 and the joint video decoder 400 10 are implemented within the camera 927, where the encoders 300, 350 and the decoder 400 may be implemented as software being executed by a processor of the camera 927, or may be implemented using dedicated hardware within the camera 927. In another implementation, only the encoders 300, 350 are implemented within a camera. The encoders 300, 350 may be implemented as software executing in a 15 processor of the camera 927, or implemented using dedicated hardware within the camera 927. Conventional Video Camera As noted hereinbefore, the conventional video camera 221 of Fig. 2 may compress the captured images using a conventional compression method prior to data 20 transmission using the system 300 as shown in Fig. 3A. In the system 300 of Fig. 3A, the input image (input video frame) 310 is read by a conventional video encoder 320 implemented using the processor 905 of Fig. 9A. The compression method used by the conventional video encoder 320 may be JPEG, JPEG2000, Motion JPEG, Motion JPEG2000, MPEG1, MPEG2, MPEG4, or H.264 to produce the compressed video 25 stream 330, seen in Fig.3A as well as in Fig. 2. 2829491_1 947504_speci_lodge -19 DVC-based Video Camera The other cameras 222a and 222b of Fig. 2 perform both conventional and Wyner-Ziv coding on the corresponding input video frames 360, and each generates a compressed DVC bit stream 395 as shown in Fig. 3B and also seen in Fig. 2. The 5 system 350 of Fig. 3B comprises of a frame splitter module 365, an intra-frame encoder 370, a Wyner-Ziv encoder 380, and a stream merger module 390. The input video frames 360 are first read by a frame splitter module 365 to divide the input video frames 360 into two frame subsets 367 and 369. In the exemplary embodiment, the video frames 367 and 369 represent the key and non-key frames respectively of the input video 10 360. The key frames 367 are encoded conventionally by an intra-frame encoder 370 based on compression methods such as JPEG, JPEG 2000, or H.26x (intra) to produce the compress key frames 387. In parallel to the intra-frame encoder 370, a Wyner-Ziv (WZ) encoder 380 encodes the non-key frames 369 according to a Wyner-Ziv coding scheme and generates error correction bits 389. In the exemplary implementation, the 15 WZ encoder 380 operates directly on pixel values and can therefore be referred to as a pixel domain Wyner-Ziv encoder. Detail of the pixel domain Wyner-Ziv encoder will be described later with reference to Fig. 3C. Alternatively, the WZ encoder 380 may be a transform-domain Wyner-Ziv encoder. This transform-domain encoder is substantially identical to the pixel-domain Wyner-Ziv encoder 380 of Fig. 3C (described below) 20 except that the transform-domain encoder has an additional pre-processing step. This pre-processing step includes (i) transforming the pixel values of the non-key frames 369 to transform coefficients and (ii) forming coefficient bands by concatenating coefficients of the same importance. The transformation is preferably a DCT-based transform. Alternative transforms, such as wavelet transforms and Fourier transform, can also be 25 used. 2829491_1 947504_specilodge - 20 Wyner-Ziv Encoder Fig. 3C shows a schematic block diagram of a pixel-domain Wyner-Ziv encoder 380 which may be used by the DVC cameras 222a and 222b of Fig. 2. The encoder 380 comprises a quantizer module 3810, a bit plane extractor module 3820, and an error 5 correction code generator module 3830. Non-key frames 369 input to the WZ encoder 380 are firstly processed by the quantizer module 3810, which reduces the bit depth of the non-key frames 369 to generate a quantized image 3815. Desirably, the quantizer module 3810 is a uniform quantizer which can be implemented efficiently by performing bitwise right shift 10 operations. The quantization step size is fixed to 16, allowing as many as 16 quantization bins for 8-bit greyscale images. The quantized pixel values 3815 representing the non-key frames 369 are then extracted by a bit plane extractor module 3820. The bit plane extractor module 3820 performs the step of forming bit streams from the quantized pixel values of each non-key 15 frame 3815 on a bit-plane-by-bit-plane basis. Preferably, bit stream scanning starts on the most significant bit plane of each non-key frame 3815. The most significant bits of the image 3815 are concatenated to form a bit stream containing only the most significant bits. In a second pass, the scanning concatenates the second most significant bits of all quantized pixels. This process repeats in this manner until the least significant 20 bit plane is completed. This generates a single bit stream 3825 including each bit plane of the quantized image 3815. The bit plane extractor module 3820 is followed by an error correction code (ECC) generator 3830 configured to generate error correction bits for each bit plane individually, starting with the most significant bit plane. These error correction bits 25 from each bit plane are concatenated together to form an output bit stream 389 of error 2829491_1 947504_ speci_lodge -21 correction bits. Preferably, the error correction method used by the generator 3830 is Low Density Parity Check (LDPC) codes. Alternative methods such as turbo codes, Reed-Solomon codes, or a combination of the aforementioned error correction codes can be employed. 5 In an alternate implementation, the quantization step size used by the quantizer 3810 may be determined statistically from image data and is transmitted directly to the joint video decoder 400, thus permitting skipping of processing by the bit plane generator 3820 and the ECC generator 3830, providing for the quantized image to be merged by the merger 390 into the stream 395. In a further implementation, each non 10 key frame 369 is divided into blocks. Each block may be quantized using a different quantization step size, and the step size may be re-computed at the decoder 400 to minimize the bit rate of the encoder 350 of Fig. 3B. Joint Video Decoder Fig. 4A shows a schematic block diagram of the joint video decoder 400, which 15 comprises of a conventional video decoder 410, a frame rate up-sampler 420, a correlation estimator 450, described below in relation to Fig. 4B, and an interpolation mode selector 460, described below in relation to Fig. 7. The conventional video decoder 410 decodes the compressed video stream 330 generated by the system 300 to reconstruct each frame of decoded video 415. The correlation estimator 450 decodes the 20 conventionally encoded images (i.e., key frames) from the received DVC bit streams 395a(L) and 395b(R) from the DVC cameras 222a and 222b, positioned to the left (L) and right (R) respectively of the camera 221, and determines the degrees of temporal and inter-view/spatial correlations between images within and across the DVC cameras 222a and 222b using the conventionally encoded images and parity bits received from the two 25 cameras. The outputs of the correlation estimator 450 are the decoded frames 425, 2829491_1 947504_speciJodge -22 including both I and non-key frames and a pair of reliability measures 435 of the decoded non-key frames, which consist of a temporal reliability measure and an inter view reliability measure, representing the degrees of temporal and inter-view correlations in the compressed videos 395a(L) and 395b(R). The best interpolation 5 mode for each non-key frame is then determined by an interpolation mode selector 460. This is desirably performed on a block-by-block basis and is represented in Fig. 4A by a block-based fusion mask 455. Finally, the frame rate up-sampler 420 performs image interpolations between the decoded video frames 415 from the conventional video decoder 410 and the decoded video frames 425 from the correlation estimator 450 based 10 on the block-based fusion mask 455 output from the interpolation mode selector 460. The output of system 400 is the video 250 at frame rate R 2 , which is equivalent to the received video 330 from the conventional camera 221, except at a higher frame rate. Correlation Estimator Fig. 4B is a schematic block diagram of a correlation estimator 450 in the joint 15 video decoder 400 of Fig. 4A. The estimator 450 receives compressed DVC bit stream 395a(L) and 395b(R) from the DVC cameras 222a and 222b of Fig. 2 as input. The estimator 450 performs conventional and Wyner-Ziv decoding for key-frames and non key frames respectively to generate the decoded video frames 425, captured by the DVC cameras 222a and 222b, and the reliability measures 435 of the decoded non-key frames, 20 which indicates the degrees of temporal and inter-view correlations in the decoded video frames 425. The decoded video frames 425 and the associated reliability measures of the decoded non-key frames 435 form the outputs of estimator 450. A stream splitter 4510 firstly reads the compressed DVC bit stream 395a(L) from the DVC camera 222a and performs the inverse operation of the stream merger 390 of 25 Fig. 3B to separate the stream 395a(L) into compressed key frames 4515 and error 2829491_1 947504_speci_lodge - 23 correction bits 4525 generated from the non-key frames. An intra-frame decoder 4520 performs the inverse operation of the intra-frame encoder 370 of Fig. 3B to retrieve images 4535 from the compressed key frames 4515. Similarly, a stream splitter 4560, which is substantially identical to the stream splitter 4510, reads the compressed DVC 5 bit stream 395b(R) from the DVC camera 222b and divides the compressed DVC bit stream 395b(R) into compressed key frames 4565 and error correction bits 4575 generated by the DVC camera 222b. The compressed key frames 4565 are then decoded by an intra-frame decoder 4570, which is again substantially identical to the intra-frame decoder 4520, to form images 4585. 10 A temporal inter-view predictor 4530 generates a first prediction of the non-key frames from adjacent decoded key frames 4535 captured by the DVC camera 222a based on one or more temporal interpolation methods. In an exemplary implementation, MCTI is performed. Alternatively, bi-linear interpolation may be used. The temporal inter view predictor 4530 also generates a second prediction of the non-key frames from 15 decoded key frames 4585 captured by the DVC camera 222b based on inter-view interpolation. This is achieved by performing DCVP in the exemplary implementation. Alternatively, bi-linear interpolation may be used. These two predictions collectively form an output 4545 of the temporal inter-view predictor 4530. A Wyner-Ziv decoder 4540 that follows reconstructs non-key frames 4542 based 20 on the two predictions 4545 using the error correction bits 4525. The Wyner-Ziv decoder 4540 associates each decoded non-key frame with a corresponding reliability measure 4544 which indicates the confidence of the decoded non-key frame. Detail of the Wyner-Ziv decoder 4540 will be described later with reference to Fig. 5. In an exemplary implementation, a temporal inter-view predictor 4580 and 25 associated Wyner-Ziv decoder 4590 are substantially identical to the temporal inter-view 2829491.1 947504_speci_lodge - 24 predictor 4530 and the Wyner-Ziv decoder 4540 respectively. The temporal inter-view predictor 4580 computes the temporal and inter-view predictions of the non-key frames captured by the DVC camera 222b from adjacent key frames 4585 and the decoded key frames 4535 captured by the DVC camera 222a. The predictor 4580 outputs the 5 predictions 4595 of non-key frames which are corrected by the Wyner-Ziv decoder 4590 to reconstruct the decoded non-key frames 4592 using the error correction bits 4575 and to also generate a pair of reliability measures 4594 corresponding to the decoded non key frames 4592. The decoded key frames and non-key frames collectively from the intra-frame 10 decoders 4520, 4570 and the Wyner-Ziv decoders 4540, 4590 form the decoded video frames 425 of the estimator 450. The decoded video frames 425 together with the reliability measures 435 associated to the decoded non-key frames form the outputs of the estimator 450. Wyner-Ziv Decoder 15 Exemplary Wyner-Ziv decoders 4540, 4590 described in Fig. 4B are now described in detail hereinafter with reference to a WZ decoder system 500 of Fig. 5. The WZ decoder system 500 includes a quantizer module 510, a bit plane extractor module 520, an error correction code decoder module 530, an image reconstructor module 540, and a syndrome checker module 550. The WZ decoder system 500 takes two inputs. 20 The first input is a prediction of a non-key frame 4545 (or 4595) (i.e., the frame is predicted from temporally adjacent key frames 4535 or predicted from decoded key frames 4585 generated by the DVC camera 222b), which is used as the side information for Wyner-Ziv decoding. The second input of the WZ decoder system 500 is the error correction bits 4525 or 4575 sent from the system 350 of Fig. 3B. 2829491_1 947504_speci_lodge - 25 In the first step, the quantizer module 510 reduces the bit depth of the inter-view prediction 437 to generate a quantized image 515. Preferably, the quantizer module 510 is a uniform quantizer and is substantially identical to the module 3810 in Fig. 3C. The quantized pixel values are then extracted from the quantized image 515 by the bit plane 5 extractor module 520. In the second step, the bit plane extractor module 520 performs the step of forming bit streams from the quantized pixel values of the quantized image 515 on a bit plane-by-bit-plane basis. Preferably, the bit plane extractor module 520 is substantially identical to the module 3820 in Fig. 3C. The extraction processing begins from the most 10 significant bit plane to the least significant bit plane and generates one bit stream for each bit plane of the quantized image 515. In the next step, the error correction code decoder module 530 performs the operation of bitwise error correction to correct prediction errors in the output of the bit plane extractor module 520. This operation is performed on a bit-plane-by-bit-plane 15 basis, starting with the most significant bit plane. Preferably, a LDPC decoder is employed as the decoder 530. The LDPC decoder performs iterative decoding on each bit plane separately using belief propagation techniques such as Soft-Output Viterbi Algorithm (SOVA), Maximum A-Posteriori Algorithm (MAP), and a variant of MAP. In an alternative implementation, the error correction code decoder 530 may employ 20 turbo codes, Reed-Solomon codes, or a combination of these error correction codes. In the exemplary implementation, each decoded bit (or decoded bit sequence if SOVA is employed) is associated to a log likelihood ratio (LLR) value, which indicates the confidence of the decoded bit (or decoded bit sequence in the case of SOVA). These LLR values are combined together on per pixel basis and output as reliability measures 25 4544 (or 4594) for pixels of the non-key frame 4545(or 4595). 2829491_1 947504_speci_lodge - 26 The next step in the decoding process is performed by the image reconstructor module 540. The image reconstructor module 540 takes the decoded bit planes from the error correction code decoder module 530. The decoded bits corresponding to the same spatial location are concatenated together to reconstruct a quantized version of the image 5 captured by the DVC camera 222a (or 222b), being a quantized image 545. Each element of the quantized image 545 is a coset index and is used to correct the approximation errors in the predicted non-key frame 4545 (or 4595). The syndrome checker module 550 then compares the pixel values of the predicted non-key frame 4545 (or 4595) against the coset indexes 545 from the image 10 reconstructor module 540 to generate the decoded non-key frame 4542 (or 4592). The syndrome checker module 550 operates on a pixel-by-pixel basis. For a given pixel location (i, j) in the predicted non-key frame 4545 (or 4595), if the pixel value Yij is within the quantized bin (interval) Xij from the quantized image 545, then the final pixel value Y'ij of the decoded non-key frame 4542 (or 4592) takes the value of Yij. If Yj lies 15 outside the quantized bin (interval) Xj, then the syndrome checker module 550 clips the reconstruction towards the boundary of the quantized bin (interval) Xij closest to Yij. This process repeats until all pixels in the decoded non-key frame 4542 (or 4592) are determined by the module 550. The decoded non-key frame 4542 (or 4592) represents an improved approximation of the predicted non-key frame 4545 (or 4595) of the image 20 captured by the DVC camera 222a (or 222b). Interpolation mode Selector Fig. 7 shows the method steps 700 performed by the interpolation mode selector 460 of Fig. 4A. In the method 700, the interpolation mode selector 460 receives the reliability measures 435 of the decoded non-key frames from the Wyner-Ziv decoders 25 4540, 4590 as input and selects the best interpolation mode based on the reliability 2829491_1 947504_speci_lodge - 27 measures 435. In an exemplary implementation, two interpolation modes are available for the selection. The first interpolation mode is called temporal interpolation. This mode uses temporal correlation (i.e., motions) between adjacent key frames within a video to interpolate a new frame. The temporal interpolation works well if motions are 5 linear and if there is little overlap among the objects in the scene. The second interpolation mode is called temporal-view hybrid interpolation. This mode takes into accounts both the temporal correlation between adjacent key frames within a video, and the inter-view structural correlation between the frames captured at the same instant by different cameras. The temporal-view hybrid interpolation mode can cope better with 10 nonlinear motions and occlusion, but operates at a higher computational cost. In a first step 4620 of the method 700, for each block of a non-key frame, the interpolation mode selector 460 extracts the reliability measures 435 derived from the adjacent key frames (e.g. 4535 of Fig. 4B) and from the decoded key frames generated by a different DVC camera (e.g. 4585), in a block-by-block basis. Desirably, the block 15 size is eight-by-eight pixels. In the next step 4630, the temporal reliability measure is compared against the inter-view reliability measure. If temporal reliability measure is equal or higher than that of the inter-view reliability measure, then the method 700 proceeds to step 4650 to select temporal interpolation as the best interpolation method for the current block of the 20 non-key frame. In contrast, if temporal reliability measure is lower than the inter-view reliability measure, this implies that the current block of the non-key frame cannot be predicted reliably by temporal interpolation from adjacent key frames (e.g. 4535) due to for example occlusions. If so, the method 700 proceeds to step 4640 and selects the weighted average of inter-view prediction and temporal interpolation as the best 25 interpolation method. 2829491_1 947504_specilodge - 28 In the next step 4660, the interpolation mode selector 460 determines whether all blocks of the current non-key frame are processed. If the interpolation method of any blocks in the current non-key frame is not defined, the process returns to step 4620 to compute the best interpolation method for the next block to be processed. Otherwise, 5 the method 700 proceeds and terminates in step 4670. The output of the interpolation mode selector 460 is a block-based fusion mask 455 for each non-key frame where each block has an interpolation mode selected from a predefined set of interpolation mode. Frame Rate Up-sampler Fig. 8 shows the flow diagram of the operations 800 performed by the frame rate 10 up-sampler 420 of Fig. 4A. The frame rate up-sampler 420 takes three inputs: the decoded video 415 from the conventional video decoder 410, the decoded videos 425 generated by the correlation estimator 450 for the videos captured by the DVC cameras 222a and 222b of Fig. 2, and the block-based fusion mask 455 generated by the interpolation mode selector 460. The purpose of the frame rate up-sampler is to generate 15 the view-point difference frames with the resolution of the decoded video 415 to provide intermediate frames of the up-sampled video frames 250. In a first step 4220 of the method 800, the frame rate up-sampler 420 divides the intermediate frame to be interpolated into fixed size blocks in a similar manner performed by the interpolation mode selector 460. 20 In the next step 4230, the frame rate up-sampler 420 selects a block of the intermediate frame to be interpolated next. Then the operation of the frame rate up-sampler 420 proceeds to step 4240 to perform temporal interpolation based on the adjacent frames of the decoded video 415 received from the conventional video decoder 410. 2829491_1 947504_speci_lodge - 29 In step 4250, the frame rate up-sampler 420 retrieves the best interpolation mode for the current block from the block-based fusion mask 455 generated by the interpolation mode selector 460. If the fusion mask 455 indicates that temporal interpolation is the best interpolation mode in step 4260, then the process 800 proceeds 5 to step 4295. Otherwise, the frame rate up-sampler 420 proceeds for execution of step 4270, in which the frame rate up-sampler 420 performs inter-view interpolation from the decoded images 425 generated by the correlation estimator 450. This occurs based on the view-point difference between the different views of the predicted frames. In the next step 4280, the frame rate up-sampler 420 determines the weights for 10 the temporal and inter-view interpolations based on the reliability measures 4544, 4594 output from the Wyner-Ziv decoders (4540, 4590). In the exemplary implementation, the weights are determined according to the reliability measures of the corresponding non-key frame 369 from one of the DVC cameras 222a, 222b. For example in Fig. 6, the frame rate up-sampler 420 determines 15 the weights for the frame Fil from the reliability measures 4544 generated for the non key frame at the location of WZ 1 of the DVC camera I (222a). Each block of the non key frame Fil is associated to two reliability values generated by the Wyner-Ziv decoder 4540. One of the reliability values is determined from a temporal prediction using the adjacent key frames I1 and 112 from the DVC camera I (222a). The other reliability 20 value is determined based on inter-view prediction from frame 121 of the DVC camera 2 (222b). Similarly, for the next frame in the video sequence Fi 2 , the frame rate up-sampler 420 determines the weights for the frame Fi 2 from the reliability measures 4594 generated for the non-key frame at the location of WZ 22 of the DVC camera 2 (222b). 25 Again each block of the intermediate frame Fi 2 is associated to two reliability values 2829491_1 947504_speciJodge - 30 generated by the Wyner-Ziv decoder 4590. One of the reliability values is determined from a temporal prediction using the adjacent key frames 121 and 122 from the DVC camera 2 (222b). The other reliability value is determined based on inter-view prediction from frame 112 of the DVC camera I (222a). 5 In addition, in the exemplary implementation, the two reliability values associated to a block of an intermediate frame (F 1 or Fi 2 ) are mapped to a pair of weights (i.e., using a lookup table). The determined weights may be normalised. Alternatively, the reliability values are firstly subject to a preset threshold to determine optimal weights for the block of the intermediate frame of the conventional camera. Any reliability 10 values (spatial or temporal) below the preset threshold is reset to zero and therefore the corresponding interpolation method makes no contribution to the final interpolation of that particular block. Returning to Fig. 8 and, in step 4290, the current block is then interpolated from the adjacent frames of the decoded video 415 and the decoded images from the 15 correlation estimator 450 based on a weighted average of temporal and inter-view interpolations using the determined weights in step 4280. This results in the inter-view hybrid interpolation. Then in the next step 4295 of the method 800, the frame rate up-sampler 420 determines whether all the blocks of the current frame are interpolated. If all blocks of 20 the current frame are processed, then the process 800 proceeds and terminates in step 4298. Otherwise, the process 800 returns to step 4230 until all the blocks of the current frame are processed. The output of the step 4298 is the frame rate up-sampled video 250 at frame rate

R

2 . 2829491_1 947504_speci_lodge -31 The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive. In the context of this specification, the word "comprising" means "including 5 principally but not necessarily solely" or "having" or "including", and not "consisting only of'. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings. 2829491_1 947504_speci_lodge

Claims

1. A method of frame rate up-sampling using multiple cameras based on distributed source coding techniques, said method comprising the steps of: 5 receiving a first set of frames from a first camera at a first frame rate; receiving at least one second set of frames and error correction bits generated from a corresponding at least one second camera at a second frame rate higher than the first frame rate; determining the degree of temporal correlation between frames of said second set 10 of frames using said error correction bits; determining the degree of correlation between said first and said second sets of frames using said error correction bits; and performing frame rate up-sampling for said first camera based on said degrees of correlation determined temporally and spatially between said first and second sets of 15 frames.

2. A method according to claim I wherein the degree of temporal correlation is determined using temporal interpolation. 20

3. A method according to claim I wherein the degree of temporal correlation is determined using temporal-view hybrid interpolation.

4. A method according to claim I wherein the determining the degree of temporal correlation further comprises determining an interpolation mode based on a temporal 2829491_1 947504_specilodge - 33 reliability measure and selecting one of temporal interpolation and temporal inter-view hybrid interpolation to determine the degree of temporal correlation.

5. A method according to claim 4 wherein the temporal inter-view hybrid 5 interpolation comprises: performing temporal interpolation based on adjacent key frames; performing inter-view interpolation based on a view-point difference between the set of frames of said first camera and at least one of the set of frames of said second camera; 10 determining weights for each of the temporal interpolation and the inter-view interpolation; and averaging the temporal interpolation and the inter-view interpolation using the determined weights to obtain the temporal inter-view hybrid interpolation. 15

6. A method according to claim 5 wherein the weights are determined using reliability measures of a non-key frame, said reliability measure being generated based on the error correction bits.

7. A method according to claim 6 wherein weights for a first intermediate frame 20 located between frames of the first set of frames are determined from the reliability measure generated for a non-key frame of a first one of the second cameras, said non key frame for the first one of the second set of cameras, said reliability measure being generated based on said error correction bits. 2829491_1 947504_speci_lodge -34

8. A method according to claim 7 wherein each block of the first intermediate frame is associated to two reliability values generated by Wyner-Ziv decoding of at least said error correction bits. 5

9. A method according to claim 8 wherein one of the reliability values is determined from a temporal prediction using adjacent key frames from the at least one second set of frames, and the other reliability value is determined based on inter-view prediction from a key frame of the at least one second set of frames of a second one of the second cameras. 10

10. A method according to claim 9, for a next intermediate frame in the video sequence, determining weights for the frame from a reliability measure generated for a non-key frame for the second one of the second cameras, in which each block of the intermediate frame is associated to two reliability values generated by the Wyner-Ziv 15 decoding, in which one of the reliability values is determined from a temporal prediction using adjacent key frames from the second set of frames of the second one of the second cameras, and the other reliability value is determined based on inter-view prediction from a key frame from the second set of frame of the first one of the second cameras. 20

11. A method according to claim 10 wherein the two reliability values associated to a block of the intermediate frame are mapped to the determined weights.

12. A method according to claim 10 wherein the reliability values are firstly subject to a preset threshold to determine the weights for the block of the intermediate frame of 25 the first camera, such that any spatial or temporal reliability values below the preset 2829491_1 947504_speci_lodge -35 threshold is reset to zero and such that the corresponding interpolation makes no contribution to the final interpolation of that particular block.

13. A method according to claim I wherein the first camera is a conventional camera 5 and the second cameras comprise distributed video coding (DVC) cameras.

14. A method of video decoding comprising the method of any one of the preceding claims. 10

15. A video coding method substantially as described herein with reference to any one of the embodiments as that embodiment is illustrated in the drawings.

16. A distributed video decoder comprising: an input for receiving a first set of frames from a first camera at a first frame rate; 15 an input receiving at least one second set of frames and error correction bits generated from a corresponding at least one second camera at a second frame rate higher than the first frame rate; a correlation estimator for determining each of a degree of temporal correlation between frames of said second set of frames using said error correction bits and a degree 20 of correlation between said first and said second sets of frames using said error correction bits; and a frame rate up-sampler for frame rate up-sampling frames from the first set of frames for said first camera based on said degrees of correlation determined temporally and spatially between said first and second sets of frames. 25 2829491_1 947504_speci_lodge - 36

17. A distributed video decoder substantially as described here with reference to any one of the embodiments as that embodiment is illustrated in the drawings.

18. A distributed video coding system comprising: 5 a first camera for providing a first set of frames at a first frame rate; at least one second camera for providing a corresponding at least one second set of frames and error correction bits generated at a second frame rate higher than the first frame rate; at least one of a transmission medium and a storage media for communicating 10 the frames and error correction bits from the cameras; and a decoder for receiving communicated frames and error correction bits, the decode comprising: a correlation estimator for determining a degree of temporal correlation between frames of said second set of frames using said error correction bits and a degree 15 of correlation between said first and said second sets of frames using said error correction bits; and a frame rate up-sampler for frame rate up-sampling for said first set of frames based on said degrees of correlation determined temporally and spatially between said first and second sets of frames. 20

19. A distributed video coding system substantially as described herein with reference to the drawings. 2829491 1 947504_speci_lodge -37

20. A computer readable storage medium having a computer program recorded thereon, the program being executable by computer apparatus to decode distributed video coded information, the program comprising: code for receiving a first set of frames from a first camera at a first frame rate; 5 code for receiving at least one second set of frames and error correction bits generated from a corresponding at least one second camera at a second frame rate higher than the first frame rate; code for determining the degree of temporal correlation between frames of said second set of frames using said error correction bits; 10 code for determining the degree of correlation between said first and said second sets of frames using said error correction bits; and code for performing frame rate up-sampling for said first camera based on said degrees of correlation determined temporally and spatially between said first and second sets of frames. 15 Dated this 13th day of July 2010 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant Spruson & Ferguson 20 2829491_1 947504_speclodge