AU2008255262A1

AU2008255262A1 - Performing a display transition

Info

Publication number: AU2008255262A1
Application number: AU2008255262A
Authority: AU
Inventors: William Simpson-Young
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2008-12-12
Filing date: 2008-12-12
Publication date: 2010-07-01
Anticipated expiration: 2028-12-12
Also published as: AU2008255262B2

Description

S&F Ref: 885516 AUSTRALIA PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3 of Applicant : chome, Ohta-ku, Tokyo, 146, Japan Actual Inventor(s): William Simpson-Young Address for Service: Spruson & Ferguson St Martins Tower Level 35 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Performing a display transition The following statement is a full description of this invention, including the best method of performing it known to me/us: 5845c(1890494_1) - 1 PERFORMING A DISPLAY TRANSITION FIELD OF INVENTION The current invention relates to video conferencing and in particular, to a method and apparatus for performing a display transition. The present invention also relates to a computer program product including a computer readable medium having recorded thereon a computer 5 program for performing a display transition. DESCRIPTION OF BACKGROUND ART In most conventional multi-participant video conferencing systems, tracking a participant or any other object is done by adjusting pan, tilt and zoom of a camera. For D example, a camera of the video conferencing system may be zoomed into a participant while he or she is speaking. Further, camera pan or tilt may be adjusted to show a next speaker. The disadvantage of this approach is that a picture displayed during the change from one participant to another, can be disconcerting to the end-user if the camera is operated too quickly. On the other hand, if the camera is operated too slowly, then an end-user may miss 5 viewing appropriate moments, such as the speaker introducing himself/herself or starting his/her speech. In other video conferencing systems, a low resolution panoramic view of a meeting may be received. A participant may select different areas of interest in the panoramic view, and obtain the different areas, for example, in higher resolution. Camera control operations 0 such as pan, tilt or zoom, may be performed on these different regions. The disadvantage of this approach is that the end user has to manually initiate a change from one area of interest to another. Otherwise, the user may need to either have several active windows showing different areas of the panoramic view which can result in wasted bandwidth. Some video conferencing systems may alternatively choose to receive the panoramic view in higher 25 resolution. However, again, receiving the panoramic view in higher resolution will result in wasted bandwidth. In some other video conferencing systems, editing of multiple video streams is performed to provide a change of view from one participant to another. However, this approach requires use of multiple cameras or video sources. Further, many video streams 30 obtained at run-time need to be analysed and edited in real-time to obtain an output stream. Such analysis and editing increases bandwidth required and hence is not scalable when the (889857v1) 885516_FinalFiled -2 number of participants increases. In yet other conventional video conferencing systems, editing of one video stream is performed to show many participants. For example, a picture-in-picture output may be produced. In this case, a wide angle view of a video conference may be taken first. Following 5 that, editing may be performed to create an output video stream displaying only one of the participants in close-up as a picture-in-picture display or displaying participants one at a time. The disadvantage of this approach is that the original view is a wide angle shot. As such, display of individual participants will be in lower resolution, which has an impact on picture quality and end-user experience. 3 SUMMARY OF THE INVENTION It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements. According to one aspect of the present invention there is provided a method of performing a display transition from a first object to a second object using a camera, the 5 method comprising steps of: displaying a first viewport comprising the first object, using a video stream generated by the camera; maintaining the first viewport displaying the first object while controlling the camera so as to include the first object and the second object within the video stream; and _0 performing a display transition from the first object to the second object, according to a transition effect, based on the first and second object, so as to display the second object using the video stream generated by the camera. According to another aspect of the present invention there is provided an apparatus for performing a display transition from a first object to a second object using a camera, the 25 apparatus comprising: means for displaying a first viewport comprising the first object, using a video stream generated by the camera; means for maintaining the first viewport displaying the first object while controlling the camera so as to include the first object and the second object within the video stream; and (889857v1) 885516_FinalFiled - 3 means for performing a display transition from the first object to the second object, according to a transition effect, based on the first and second object, so as to display the second object using the video stream generated by the camera. According to still another aspect of the present invention there is provided a computer 5 readable medium having recorded thereon a computer program for performing a display transition from a first object to a second object using a camera, the program comprising: code for displaying a first viewport comprising the first object, using a video stream generated by the camera; code for maintaining the first viewport displaying the first object while controlling the ) camera so as to include the first object and the second object within the video stream; and code for performing a display transition from the first object to the second object, according to a transition effect, based on the first and second object, so as to display the second object using the video stream generated by the camera. According to still another aspect of the present invention there is provided a system for 5 performing a display transition from a first object to a second object using a camera, the system comprising: a memory for storing data and a computer program; a processor coupled to the memory for executing the computer program, the computer program comprising instructions for: ?0 displaying a first viewport comprising the first object, using a video stream generated by the camera; maintaining the first viewport displaying the first object while controlling the camera so as to include the first object and the second object within the video stream; and (889857vl) 885516_FinalFiled -4 performing a display transition from the first object to the second object, according to a transition effect, based on the first and second object, so as to display the second object using the video stream generated by the camera. Other aspects of the invention are also disclosed. BRIEF DESCRIPTION OF THE DRAWINGS One or more embodiments of the invention will now be described with reference to the following drawings, in which: Figs. I a and lb collectively forma schematic block diagram showing a video conferencing system, upon which various arrangements described can be practiced; ) Fig. 2 is a schematic block diagram of the hardware and software architecture of a VC Unit software application program executing on the system of Figs. la and Ib; Fig. 3 is a schematic block diagram of a graphical user interface; Fig. 4 is a flow diagram showing a method of processing an input video stream generated by a camera of the system of Figs. Ia and Ib; 5 Fig. 5 is a schematic flow diagram showing a method of performing a display transition from a first object to a second object; Fig. 6 is a schematic flow diagram showing a method of detecting an object; Fig. 7 is a schematic flow diagram showing a method of controlling a camera; Fig. 8 is a schematic flow diagram showing a method of generating one or more display D transition images; Fig. 9a shows a viewport created after detecting a first object in accordance with one example; Fig. 9b shows an actual view of the object of Fig. 9 as seen in a video window of the graphical user interface of Fig. 3; ?5 Fig. 10a shows a camera view after the camera is zoomed out to cover a second object, in accordance with the example of Fig. 9; Fig. lOb shows an actual view of the object of Fig. 9 as seen in the video window; Fig. II a shows the camera view of Fig. I Oa with a second viewport; Fig. 1 lb shows an actual view of the first object and second object of Fig. 1 Ia displayed in 30 accordance with a display transition effect; Fig. 12a shows the camera view of Fig. I 1a without the viewport of Fig. 9a; Fig. 12b shows an actual view of the second object of Fig 10, as seen in the video window; Fig. 13a shows a camera view covering the second object of Fig. 10a; (889857v1) 885516_FinalFiled -5 Fig. 13b shows an actual view of the second object as seen in the video window; and Fig. 14 is a schematic diagram of a timeline labelled in accordance with the example of Figs. 9a to 13b. 5 DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears. D A method 500 (see Fig. 5) of performing a display transition from a first object to a second object, will be described below with reference to Figs. I to 14. The method 500 provides a film like transition between the display of objects, including participants or other content in a video conference. The method 500 uses a single camera 127 (see Fig la). Each of the objects may be, for example, a face, head, upper body, or group of faces, heads, upper bodies, a 5 whiteboard, or document within a conference scene. A "display transition" as described below is a change of content being viewed in a video conference. A "transition period" as described below is a time period between viewing different content in a video conference. Display transitions in a video conference may be shown with a transition effect such as cross fade, wipe or fade to black image. 0 The term "camera view" as used below refers to a view as captured by a network camera (e.g., the camera 127) in a video conferencing system 100 (see Fig. Ia). The term "viewport" as used below is equivalent to or a subset of a camera view and contents of the viewport are displayed in a video window 301 of a graphical user interface (GUI) 300 as seen in Fig. 3. Figs. ]a and lb collectively form a schematic block diagram of a video conferencing 25 system 100, upon which the various arrangements described can be practiced. As seen in Fig. I a, the video conferencing system 100 is formed by a computer module 101, input devices such as a keyboard 102, a mouse pointer device 103, a network camera 127, scanner 126 and a microphone 180, and output devices including a display unit 114 and loudspeakers 117. In another implementation, the microphone 180 may be a 30 microphone-array or the system 100 may comprise a plurality of microphones. An external Modulator-Demodulator (Modem) transceiver device 116 may be used by the computer module 101 for communicating to and from other video conferencing systems 180, 185 and 190, via a communications network 120 and a connection 121. The network 120 may (889857v1) 885516_FinalFiled -6 be a wide-area network (WAN), such as the Internet or a private WAN. Where the connection 121 is a telephone line, the modem 116 may be a traditional "dial-up" modem. Alternatively, where the connection 121 is a high capacity (eg: cable) connection, the modem 116 may be a broadband modem. A wireless modem may also be used for wireless connection to the 5 network 120. The computer module 101 typically includes at least one processor unit 105, and a memory unit 106 for example formed from semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The module 101 also includes an number of input/output (I/O) interfaces including an audio-video interface 107 that couples to the video ) display 114, loudspeakers 117 and microphone 180, an I/O interface 113 for the keyboard 102, mouse 103, camera 127, and an interface 108 for the external modem 116. In some implementations, the modem 116 may be incorporated within the computer module 101, for example within the interface 108. The module 101 also has a local network interface 111 which, via a connection 123, permits coupling of the video conferencing system 100 to a local 5 computer network 122, known as a Local Area Network (LAN). As also illustrated, the local network 122 may also couple to the wide network 120 via a connection 124, which would typically include a so-called "firewall" device or device of similar functionality. The interface 111 may be formed by an Ethernet circuit card, a BluetoothTM wireless arranegment or an IEEE 802.11 wireless arrangement. 3 The interfaces 108 and 113 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 109 are provided and typically include a hard disk drive (HDD) 110. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk 5 drive 112 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (eg: CD-ROM, DVD), USB-RAM, and floppy disks for example may then be used as appropriate sources of data to the system 100. The components 105 to 113 of the computer module 101 typically communicate via an interconnected bus 104 and in a manner which results in a conventional mode of operation of 0 the computer system 100 known to those in the relevant art. The computer 101 may be formed by a computer module such as, for example, IBM-PC's and compatibles, Sun Sparcstations, Apple MacTM or alike computer systems evolved therefrom. (889857vl) 885516_FinalFiled -7 The method 500 and other methods described herein may be implemented using the video conferencing system 100 wherein the processes of Figs. 4 to 14, to be described, may be implemented as a software application program 133 executable within the video conferencing system 100. The software application program 133 will be referred to below as "a video 5 conferencing unit (VC-Unit) software application program". The steps of the described methods are effected by instructions 131 in the software that are carried out within the video conferencing system 100. The software instructions 131 may be formed as one or more code modules 201, 204, 205, 206, 207, 212 and 213, as seen in Fig. 2, each for performing one or more particular tasks. ) As seen in Fig. 2, the network camera 127 generates an input video stream representing a scene comprising the participants or other content in a video conference. The VC-Unit software application program 133 processes the input video stream and the display unit 114 displays the processed video stream. The processed video stream is also transmitted by the processor 105, via the network 120, to remote participants using one or more of the remote 5 video conferencing systems 180, 185 and 190. The microphone 180 captures audio from the participants of the video conference. The audio is also transmitted as a digital signal, via the network 120, to remote participants using one or more of the remote video conferencing systems 180, 185 and 190. The speakers 117 play any audio received from remote participants, via the network 120. 3 As seen in Fig. 2, an embodiment of the code modules forming the VC-Unit software application program 133 comprises a graphical user interface (GUI) module 201, a Transition Controller module 204, a Camera Controller module 205, an Image Processor module 206, an Object Detector module 207, a Gesture Detector module 212 and an Audio Processor module 213 and VC-Unit application program module 209. The VC-Unit software application 5 program 133 may be launched with an executable being executed by the processor 105. The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the video conferencing system 100 from the computer readable medium, and then executed by the computer system 100. A computer readable medium having such software or computer program recorded on it is a 0 computer program product. The use of the computer program product in the computer system 100 preferably effects an advantageous apparatus for implementing the described methods. The VC-Unit software application program 133 is typically stored in the HDD 110 or the memory 106 of the computer module 101. The software is loaded into the video conferencing (889857v]) 885516_FinalFiled -8 system 100 from a computer readable medium, and then executed by the video conferencing system 100. Thus, for example the software may be stored on an optically readable CD-ROM medium 125 that is read by the optical disk drive 112. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of 5 the computer program product in the video conferencing system 100 preferably effects an advantageous apparatus for implementing the described methods. In some instances, the VC-Unit software application program 133 may be supplied to the user encoded on one or more CD-ROM 125 and read via the corresponding drive 112, or alternatively may be read by the user from the networks 120 or 122. Still further, the software 0 can also be loaded into the video conferencing system 100 from other computer readable media. Computer readable storage media refers to any storage medium that participates in providing instructions and/or data to the video conferencing system 100 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a 5 computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 101. Examples of computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 101 include radio or infra-red transmission channels as well as a network connection to another computer or networked 0 device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like. Fig. lb is a detailed schematic block diagram of the processor 105 and a "memory" 134. The memory 134 represents a logical aggregation of all the memory modules (including the HDD 109 and semiconductor memory 106 that can be accessed by the computer module 101 25 in Fig. IA. When the computer module 101 is initially powered up, a power-on self-test (POST) program 150 executes. The POST program 150 is typically stored in a ROM 149 of the semiconductor memory 106. A hardware device such as the ROM 149 is sometimes referred to as firmware. The POST program 150 examines hardware within the computer module 101 30 to ensure proper functioning, and typically checks the processor 105, the memory (109, 106), and a basic input-output systems software (BIOS) module 151, also typically stored in the ROM 149, for correct operation. Once the POST program 150 has run successfully, the BIOS 151 activates the hard disk drive 110. Activation of the hard disk drive 110 causes a (889857v1) 885516_FinalFiled -9 bootstrap loader program 152 that is resident on the hard disk drive 110 to execute via the processor 105. This loads an operating system 153 into the RAM memory 106 upon which the operating system 153 commences operation. The operating system 153 is a system level application, executable by the processor 105, to fulfil various high level functions, including 5 processor management, memory management, device management, storage management, software application interface, and generic user interface. The operating system 153 manages the memory (109, 106) in order to ensure that each process or application running on the computer module 101 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the D different types of memory available in the system 100 must be used properly so that each process can run effectively. Accordingly, the aggregated memory 134 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 100 and how such is used. 5 The processor 105 includes a number of functional modules including a control unit 139, an arithmetic logic unit (ALU) 140, and a local or internal memory 148, sometimes called a cache memory. The cache memory 148 typically includes a number of storage registers 144 146 in a register section. One or more internal busses 141 functionally interconnect these functional modules. The processor 105 typically also has one or more interfaces 142 for 0 communicating with external devices via the system bus 104, using a connection 118. The VC-Unit software application program 133 includes a sequence of instructions 131 that may include conditional branch and loop instructions. The VC-Unit software application program 133 may also include data 132 which is used in execution of the VC-Unit software application program 133. The instructions 131 and the data 132 are stored in memory 25 locations 128-130 and 135-137 respectively. Depending upon the relative size of the instructions 131 and the memory locations 128-130, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 130. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory 30 locations 128-129. In general, the processor 105 is given a set of instructions which are executed therein. The processor 105 then waits for a subsequent input, to which the processor 105 reacts to by executing another set of instructions. Each input may be provided from one or more of a (889857v1) 885516_FinalFiled -10 number of sources, including data generated by one or more of the input devices 102, 103, data received from an external source across one of the networks 120, 102, data retrieved from one of the memory 106, 109 or data retrieved from a storage medium 125 inserted into the corresponding reader 112. The execution of a set of the instructions may in some cases result 5 in output of data. Execution may also involve storing data or variables to the memory 134. The described methods use input variables 154 that are stored in the memory 134 in corresponding memory locations 155-158. The described methods produce output variables 161 which are stored in the memory 134 in corresponding memory locations 162 165. Intermediate variables may be stored in memory locations 159, 160, 166 and 167. 3 The register section 144-146, the arithmetic logic unit (ALU) 140, and the control unit 139 of the processor 105 work together to perform sequences of micro-operations needed to perform "fetch, decode, and execute" cycles for every instruction in the instruction set making up the program 133. Each fetch, decode, and execute cycle comprises: (a) a fetch operation, which fetches or reads an instruction 131 from a memory 5 location 128; (b) a decode operation in which the control unit 139 determines which instruction has been fetched; and (c) an execute operation in which the control unit 139 and/or the ALU 140 execute the instruction. 0 Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 139 stores or writes a value to a memory location 132. Each step or sub-process in the processes of Figs. 4 to 8 is associated with one or more segments of the VC-Unit software application program 133, and is performed by the register 25 section 144-147, the ALU 140, and the control unit 139 in the processor 105 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the VC-Unit software application program 133. The described methods may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of the described 30 methods. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories. GUI (889857vl) 885516_FinalFiled - 11 The GUI module 201 provides a user of the system 100 an ability to choose different transition effects and display the processed video stream. In one implementation, the GUI module 201 generates the GUI 300 shown in Fig 3. The GUI 300 may be rendered or otherwise represented upon the display 114. Through manipulation of typically the 5 keyboard 102 and the mouse 103, a user of the video conferencing system 100 and the VC Unit software application program 133, may manipulate the GUI 300 in a functionally adaptable manner to provide controlling commands and/or input to the VC-Unit software application program 133 associated with the GUI 300. Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts D output via the loudspeakers 117 and user voice commands input via the microphone 180. The GUI 300 may be implemented using a C# and .NET framework. The GUI 300 comprises an outer window 303, which encapsulates video panel 302 and option panel 309. The video panel 302 comprises the video window 301 to display the processed video stream. The option panel 309 comprises buttons labelled as Connect 311, End 5 312 and Disconnect 308 and a single select list box 310 labelled Transition Effects. The list box 310 has buttons labelled as Wipe 307, Cross Fade 306 and Fade To/From Black 305 representing transition effects that can be selected by the user. Upon selection of the Connect button 311, using the pointer device 103, for example, the processor 105 triggers an event for the user to connect to the camera 127, and obtain the 0 processed video stream. The processed video stream includes a transition display as will be described in detail below. Upon selection of the Disconnect button 308, the processor 105 stops video processing and discontinues display of the video stream. Upon selection of the End button 312, the processor 105 cleans up all resources used by the 25 VC-Unit software application program 133 and shuts the program 133 down. In one embodiment, there may be many video windows, similar to the video window 301, displaying video streams generated by different cameras. Typically, these video streams represent different participants in a multi-participant video conference. Transition Controller 30 The Transition Controller module 204 is another one of the code modules forming the VC Unit software application program 133. The Transition Controller module 204 receives requests and responses from the other code modules 201, 205, 206, 207, 212 and 213 in the VC-Unit software application program 133. The Transition Controller module 204 manages a (889857vl) 885516_FinalFiled - 12 sequence of operations to perform display transitions by delegating tasks to the modules 201, 205, 206, 207, 212 and 213 in the VC-Unit software application program module 209. The method 500 of performing a display transition from a first object to a second object, will now be described below with reference to Fig. 5. The method 500 is implemented as 5 software in the form of the Transition Controller module 204 resident on the hard disk drive 110 and being controlled in its execution by the processor 105. The method 500 will be described by way of example with reference to a video conference taking place in a room. The video conference comprises a plurality of participants and other content. The scene within the room is captured by the camera 127, which generates an input ) video stream. The input to the method 500 is the input video stream generated by the camera 127. As described above, the video stream generated by the camera 127 represents a scene comprising the participants or other content in the example video conference. The generated input video stream may be stored in the memory 106 and/or the hard disk drive 110. 5 The method 500 begins at a first step 520, where the Transition Controller module 204, under execution of the processor 105, performs the step of detecting a first object in the input video stream generated by the camera 127, using an object detection method. In particular, at step 520, the Transition Controller module 204 requests the Object Detector module 207 to detect a face or other objects such as whiteboard, document or group of faces in the input 0 video stream using face or other object detection. The Object Detector module 207 returns details of at least the first object or face detected in the video stream to the Transition Controller module 204. The details returned by the Object Detector module 207 include rectangular co-ordinates of a rectangular bounding box surrounding the first object. In one implementation, a method 600 (see Fig. 6) of detecting an object is invoked by the Object Z5 Detector module 207 to detect the first object. At the next step 525, a first viewport is created for the detected first object, by the Transition Controller module 204. The viewport displays the first object and is created based on the rectangular co-ordinates received from the Object Detector module 207. In step 530, the Transition Controller module 204, under the execution of the processor 30 105, performs camera operations to adjust the camera view of the camera 127 to cover the viewport determined in step 525. In one implementation, the Transition Controller module 204 requests the Camera Controller module 205 to adjust the zoom of the camera 127, for (889857vl) 885516_FinalFiled - 13 example, by zooming into the first object, to cover the viewport displaying the first object detected at step 520. The Transition Controller module 204, under the execution of the processor 105, communicates with the GUI module 201 to perform the step of displaying the viewport 5 comprising the first object in the video window 301. The Transition Controller module 204 and the GUI module 201 use the video stream generated by the camera 127 from the zoomed in position of the camera 127, to display the viewport. Accordingly, the viewport determined in step 525 is displayed in the video window 301. A method 700 of controlling the camera, as executed by the Camera Controller module 205 at step 530, will be described in detail below 3 with reference to Fig. 7. At the next step 532, the processor 105 receives a trigger indicating that a change in the object of interest is required. An example of a trigger may be that the person (e.g., the first object) stops speaking for a period of time. Another example of a trigger may be that the processor 105 detects audio from another location in the room. Yet another example of a 5 trigger may be that the processor 105 has detected that a new participant has joined the video conference. In the next step 535, in one implementation, the Transition Controller module 204 requests the Camera Controller module 205 to adjust the camera zoom of the camera 127. For example, the Camera Controller module 205 may perform the step of zooming out the camera 0 127 to cover a larger area which includes the first object and potentially a second object. In this instance, the Transition Controller module 204 performs the step of maintaining the viewport displaying the first object while Camera Controller module 205 is controlling the camera 127 to generate a further video stream including the first object and potentially the second object. The viewport displaying the first object is maintained and the first object is 25 displayed in the video window 301 in lower resolution. The dimensions of the viewport are adjusted while the viewable range of the camera 127 is increased, so that the view of the first object on the display of the video window 301 is effectively unchanged except for a reduction in resolution. Thus, the user viewing the input video stream from the camera 114 is largely unaware of the movement of the camera 127, except for some reduction in the image 30 resolution. In the next step 540, the Transition Controller module 204, under execution of the processor 105, performs the step detecting a second object in the input video stream. In one implementation, a method 600 (see Fig. 6) of detecting an object is invoked at step 540 to (889857v1) 885516_FinalFiled - 14 detect the second object. As an example, the second object may be the face of an adjacent participant to the first object in the room where the video conference is occurring. In another implementation, the second object may be detected at step 540 using audio detection. In particular, audio received from the microphone 180 may be analysed using the 5 Audio Processor module 213. Using the microphone 180, the direction of the audio, for example, from a participant who is currently speaking, may be detected by the Audio Processor module 213. Thereafter, the Transition Controller module 204 may perform camera operations so that the camera 127 points to the location of the second object. In another implementation, the system 100 may comprise a number of microphones (e.g., D 180). In this instance, the microphones communicate with the camera 127 when the microphones are active. The microphone 180 is "active" when a participant of the conference speaks on the microphone 180. The camera 127 may be operated to cover the location of the active microphone 180 to detect the second object. In another implementation, gestures of the first object (e.g., hand direction of a person), 5 may be analysed to locate the second object. In this instance, the first person (i.e., the first object) may be pointing to the second person (i.e., the second object) or to a document or whiteboard adjacent to him/her in the room where the video conference is occurring. The Gesture Detector module 212 may perform the step of analysing gestures of the first object to detect the second object. In one implementation, gestures can be detected by determining 0 bounding rectangles around the upper body and the hands of the first person (i.e., first object), and their relative geometric positions. In yet another implementation, disparity maps obtained from a pair of images in a video sequences may be used, together with colours, to identify and track the upper body components. A disparity map may be obtained by measuring the difference between image blocks at the same position in the pair of images. Z5 In still another implementation, a zoomed out view of the camera 180 covers the whole scene within the room and the detection of lip movements of the participants in the room, may be used to locate the second object. In still another implementation, the second object may be located from a previously captured panoramic image covering the entire conference scene including the first and second 30 objects. In this instance, knowledge of locations of participants within the conference is available from a panoramic image captured by the camera 127. A decision may be made to operate the camera 127 to a particular location to cover the second object based on the knowledge of locations of participants. The particular location can be determined using, for (889857v1) 885516_FinalFiled - 15 example, speech location, detection of lip movements and gestures of the first object, as described above. In still another implementation, objects may be located in pre-determined locations and the position of the camera 180 may be adjusted by the processor 105 to capture images from those 5 predetermined locations in a round robin fashion. For example, the camera 180 may have various presets configured to point to locations where participants or other objects are present within the room. Accordingly, the second object may be detected by operating the camera 127 to point to a pre-determined location. Once the locations of the second object are determined, the Transition Controller module 3 204 requests the Object Detector module 207 to detect face or other objects in the input video stream obtained from the camera 127. The Transition Controller module 204 also performs the step of creating a second viewport at step 540 to cover the region of the second object to display the second object. In another implementation, the second viewport is created by using the rectangular co-ordinates returned from the Object Detector module 207 over the second 5 object. Continuing the method 500, at the next step 550, the Transition Controller module 204, under execution of the processor 105, communicates with the Image Processor module 206 providing the Image Processor module 206 with the first and second viewports, displaying their respective detected first and second objects, along with a transition effect. The transition 0 effect may be represented by a transition effect parameter. The transition effect parameter represents whether the transition effect is, for example, a "cross fade" between the first object and the second object, a "wipe" between the first object and the second object, or "fade to black" of the first object fading to black or "fade from black" of the second object fading from black. As described in detail below, depending on the selected transition effect, images are 25 generated to provide the effect. Also at step 550, the Transition Controller module 204 executes the step of performing the display transition from the first object to the second object, according to the transition effect, so as to display the second object using the video stream generated by the camera 127, for example, in the zoomed out position. The display transition is performed using transition 30 images as will be described in detail below. The display transition occurs between the first viewport created at step 525 and the second viewport created at step 540, using the video stream generated by the camera 127 (889857vl) 885516_FinalFiled -16 In another implementation, the outlines of the detected first and second objects and the transition effect are passed to the Image Processor module 206 at step 550. The viewports and the first and second objects are provided as images, as input to the Image Processor module 206. The Image Processor module 206 generates one or more display transition images, based 5 on the input image(s) received. A method 800 of generating one or more display transition images, as performed by the Image Processor module 206 at step 550, will be described in detail below with reference to Fig. 8. The generated display transition images show the transition effect. Also at step 550, the Transition Controller module 204 provides the display transition images to the GUI module 201 and the GUI module 201 displays the transition 3 images in sequence in the video window 301 of the GUI 300. The time taken to display the display transition from the first viewport to the second viewport (including displaying the transition images) by the Transition Controller module 204 is defined as the transition time Tx. Fig. 14 shows a timeline 1400 where the transition time is equal to (t4 - t2). Typically, the display transition starts when the second viewport is 5 determined because the Image Processor module 206 can perform the transition making use of an available image corresponding to the second object. For some transition effects such as fade to/from black, the transition may start before the availability of the image corresponding to the second object. The display transition time may vary based on a number of factors. In one implementation, the display transition time may be a user-defined period specified in 0 seconds. In a second implementation, the display transition time may depend on the type of transition and/or the type of objects in the viewport. In another example, the time taken to perform the display transition may also be dependent on a number of objects involved in the display transition. For example, transitioning between pages of a document might be longer to allow the user to read the entire page when compared to transitioning from one person to a 25 next person. Further, transitioning from one person to another person may be dependent on factors such as length of time that the person is talking or involved in a conversation. In the next step 560, the Transition Controller module 204, under execution of the processor 105, displays the second object, shown in the second viewport, in the video window 301. In one implementation, as the camera 127 remains zoomed out as in the previous step 30 550, a lower resolution view of the second object is displayed in the video window 301 of the GUI 200. In the next step 570, the Transition Controller module 204 communicates with the Camera Controller module 205 to adjust the zoom of the camera 127 (e.g., zoom in), to cover the (889857vl) 885516_FinalFiled - 17 second detected object, in accordance with the method 700. Also at step 570, the Transition Controller module 204, under the execution of the processor 105, communicates with the GUI module 201 to display the second object as captured by the zoomed in position of the camera 127, in the video window 301. 5 In one embodiment, step 535 of zooming out the camera 127, may be executed in parallel with step 540 of detecting a second object. An example of such an embodiment is, in one thread controls of the camera 127 are adjusted to zoom out gradually to cover, for example, the entire video conference room whilst in another thread the second object is detected. The zoom out operation of the camera 127 stops in first thread when the second object in the D second thread is detected. In one embodiment, the step 550 of generating one or more display transition images may be executed in parallel with the step 570 of adjusting the zoom of the camera 127 to zoom into the second object. In such an embodiment, the images associated with the first object, along with the images associated with the second object are used to produce display transition 5 images. As the display transition images with transition effects are being generated and displayed in the video window 301, the camera 127 is operated to zoom in to the second object. In this instance, as soon as the zoom in operation of the camera 127 is complete, the content being displayed in the video window 301 changes from the display of transition images (i.e., showing the transition effect) to a zoomed in image of the second object. 0 In one embodiment, the step 525 of determining the first viewport is not required. After detecting the first object in step 520, a frame containing the zoomed in first object may be saved in the memory 106 and/or the hard disk drive 110. Images associated with the first object from the saved zoomed in frame, along with the images associated with the second object may then be used to produce display transition images at step 550. Z5 In one embodiment, the Transition Controller module 204 performs the step of detecting motion of the second object while step 550 is in progress. The motion detection is performed to determine if the second object has moved from the camera view, while performing the display transition. In such an embodiment, before step 550, a record representing number and type of objects is maintained within the memory 106 and/or the hard disk drive 110. Then, 30 object detection is performed on the input video stream, in parallel to step 550 on a separate thread. If the number and type of objects differ from that present before step 550, then the Transition Controller module 204 implies that the second object has moved. If the second (889857v1) 885516_FinalFiled - 18 object has moved, then the Transition Controller module 204 may pass control to step 540 to detect another second object. Camera Controller The Camera Controller module 205 is used to handle different camera operations such as 5 connect to camera, zoom in and out, pan and tilt. The Camera Controller module 205 receives requests from the Transition Controller module 204. The Camera Controller module 205 is initialized when the VC-unit software application program 133 starts up. The Camera Controller module 205 may also receive communication or commands from the microphone 180 or the Audio Processor module 213, indicating the location of a participant within the D room that is currently speaking. The Camera Controller module 205 may further instruct the camera 127 to cover the location as indicated by the microphone 180. The method 700 of controlling the camera 127 as executed at some of the steps of method 500 such as steps 535 and 570, will now be described in detail below with reference to Fig. 7. The method 700 may be executed as software in the form of the Camera Controller module 5 205 resident on hard disk drive 110 and being controlled in its execution by the processor 105. The method 700 describes the steps performed by the Camera Controller module 205, required to control the camera 127, when the Camera Controller module 205 receives commands from the Transition Controller module 204. The method 700 begins at a first step 701, where the Camera Controller module 205, under 0 execution of the processor 105, waits for a command from the Transition Controller module 204. In the next step 702, if the Camera Controller module 205 receives a command sent from the Transition Controller module 204 and the command is a valid camera command, the method 700 proceeds to step 704. Otherwise, the method 700 proceeds to step 703. At step 704, the Camera Controller module 205 processes the command. In step 704, the Camera 25 Controller module 205, in one implementation, sends hypertext transport protocol (HTTP) requests to the camera 127, to connect to the camera 127, and perform camera control operations such as pan, tilt and zoom. In some other implementations, a real-time transport protocol (RTP) or some other proprietary protocol requests may be used to connect to the camera 127 and perform camera control operations. 30 In step 704, after processing the command, the method 700 returns to step 701 where the Camera Controller module 205 waits to process other camera commands. If a shutdown command is received by the Camera Controller module 205 at 702, then the method 700 is shutdown in step 703. (889857v1) 885516_FinalFiled -19 Object Detector The Object Detector module 207 helps to detect faces of participants or other objects such as white boards or documents in the input video stream captured by the camera 127. The Object Detector module 207 may be used to identify the faces and other objects between i which the display transition is to occur. The method 600 may be executed as software in the form of the Object Detector module 207 resident on hard disk drive 110 and being controlled in its execution by the processor 105. The method 600 begins at a first step 601, where the Object Detector module 207, under execution of the processor 105, receives the input video stream from the camera 127 and ) decodes the video stream. In the next step 602, an array of "ObjectRect" objects is initialized within the memory 106. An ObjectRect object is an object having coordinates for a bounding rectangle encapsulating a face or other object. In the next step 603, if the Object Detector module 207 determines that there are any faces or other objects detected in the input video stream, then the method 600 proceeds to step 604. 5 Otherwise, the method 600 proceeds to step 607. Any suitable face and object detection methods may be used in step 603. In one implementation, feature based face detection is used at step 603. In accordance with such a method, different parts of the face, namely eyes, nose and mouth are detected and then geometric relationships of the parts is measured to detect the face. D In yet another implementation, a colour histogram may be used to detect the face. Different regions of the face have different facial features and hence the colour histograms of these regions have certain characteristics. By concatenating the colour histograms of different regions and by identifying their spatial relationship, faces are detected. Objects such as documents or a whiteboard, in one implementation, may be detected by shape or feature !5 detection. In one embodiment, people can be detected using human body detection (e.g., by detecting a typical shape of a head with connected shoulders and torso). In another embodiment, people or objects may be detected by subtracting a background image from each image (or performing other known forms of background modelling) to detect foreground people or 30 objects. In this description, when the term "face" is used, it should be understood that other representations of a person (e.g., head or upper body) can be used. If an object or face or head is detected, in step 604, the Object Detector module 207, under execution of the processor (889857v1) 885516_FinalFiled - 20 105, determines the rectangle co-ordinates of the bounding rectangle enclosing the detected object or face. In the next step 605, the Object Detector module 207 creates an ObjectRect object using the rectangle co-ordinates obtained in the previous step 604. In step 606, the Object Rect array 5 is updated with the newly created Object Rect object created at step 605. Following step 606, the method 600 returns to step 603 to process other objects. If there are no more faces or objects to be processed, the method 600 proceeds to step 607. In step 607 the ObjectRect array is returned to the Transition Controller module 204. The Transition Controller204 may use the rectangle coordinates of the bounding rectangle ) enclosing the detected face or other object to create a viewport on the video stream and display the viewport on the video window 301. In other embodiments, the position and/or size of the face or object can be detected, represented and processed using different geometrical representations including polygons representing the shape of the face or object, bitmap masks (e.g., representing the pixels or blocks corresponding to the image region of the face or 5 object), a centroid position together with width and height of the face or object, or any other representation of the face or position and/or size of the object. Image Processor The Image Processor module 206 produces transition effect images, depending on the transition effect requested and the input images. The Transition Controller module 204 passes 0 images of the identified objects between which display transition is to be performed and the transition effect to the Image Processor module 206. The method 800 of generating one or more display transition images, as performed by the Image Processor module 206 at step 550, will be described in detail below with reference with to Fig. 8. The method 800 may be implemented as software in the form of the Image Processor module 206 being resident on the 25 hard disk drive 110 and being controlled in its execution by the processor 105. The method 800 begins at a first step 801, where the Image Processor module 206 receives the images of the identified objects from the Transition Controller module 204. In step 802, parameters, such as the transition effect parameter, are received from the Transition Controller module 204. As described above, the transition effect parameter is a parameter representing 30 whether the transition effect is wipe, fade to black, fade from black or cross fade. The images and the transition effect parameters are stored in the memory 106 and/or hard disk drive 110. In the next step 803, if the Transition Controller module 204 determines that the transition effect is wipe, then the method 800 proceeds to step 805. Alternatively, if the Transition (889857v1) 885516_FinalFiled -21 Controller module 204 determines that the transition effect is fade to/from black, at step 803, then the method 800 proceeds to step 804. In another alternative, if the Transition Controller module 204 determines that the transition effect is cross fade, at step 803, then the method 800 proceeds to step 806. 5 At step 805, the Transition Controller module 204 generates transition images associated with the first object and the second object to give the effect of wipe. In this instance, at least one of the transition images comprises both the first and second object. A wipe is a gradual spatial transition from one image to another image. Accordingly, the images generated at step 805, produce a gradual spatial transition from an image of the first object to an image of the ) second object. At step 806, the Transition Controller module 204 generates a sequence of transition images producing a visual effect of crossfade. A cross fade video transition gradually fades out a first image while fading in a second image. In this instance, at least one of the transition images comprises both the first and second object. Accordingly the transition images 5 generated at step 806 produce a gradual fade from the image of the first object to the image of the second object. At step 804, the Transition Controller module 204 generates a sequence of transition images based on the image of the first object, which gives the visual effect of the image of the first object fading to black. Also at step 804, the Transition Controller module 204 generates D a sequence of transition images based on the image of the second object, which gives the visual effect of the image of the second object fading from black. The method 800 concludes at the next step 807, where the Transition Controller module 204 returns the display transition images generated at steps 804, 805 and 806 such as fade to/from black images, crossfade images, wipe images to the Transition Controller module 204. !5 A method 400 of processing the input video stream generated from the camera 127, will now be described with reference to Fig. 4. The method 400 may be implemented as a further software application to the VC-unit software application program 133. The further software application may be resident on the hard disk drive 110 and be controlled in its execution by the processor 105. 0 In an initial step 410, the processor 105 launches the VC-Unit software application program 133. In one implementation, the VC-unit software application program 133 may be an executable. (889857vl) 885516_FinalFiled - 22 In a next step 420, the GUI module 201 displays the graphical user interface 300 on the display 114. The GUI module 201 also initializes the GUI 300. As described above, through manipulation of typically the keyboard 102 and the mouse 103, a user of the video conferencing system 100 and the VC-Unit software application program 133, may manipulate 5 the GUI 300 in a functionally adaptable manner to provide controlling commands and/or input to the VC-Unit software application program 133 associated with the GUI 300. Typically, a separate thread is created to handle such user input. On some operating systems, this is called the user interface (UI) thread. This thread accepts and processes user input. In the next step 430, the processor 105 detects selection of one of the Wipe button 307, ) Cross Fade button 306 and Fade To/From Black button 305, a transition effect selected by the user. In the following decision step 440, the processor 105 waits to detect further user input. If the processor 105 detects selection of the connect button 311, at step 440, then the method 400 proceeds to step 450. In step 450, the VC-Unit software application program 5 module 209 communicates with the Camera Controller module 205 to connect to the camera 127. In the next step 460, the input video stream generated by the camera 127 is received in step 460. In step 470, the Transition Controller module 204 executes the method 500 to perform the display transition. Typically, the video stream is received and processed on a separate thread D which is different from the UI thread. While the input video stream is received and processed in step 470, the method 400 returns to 440 to receive further user input. As described above, at step 550 of the method 500, the Transition Controller module 204 provides the display transition images to the GUI module 201 and the GUI module 201 displays the transition images in the video window 301 of the GUI 300. Z5 If the processor 105 detects selection of the end button 312 at step 440, then the method 400 proceeds to step 491. In step 491, the VC-Unit software application program 133 is shut down. If the processor 105 detects selection of the disconnect button 308 at step 440, then the method 400 proceeds to step 480. At step 480, the VC-Unit software application program 0 module 209 causes the Camera Controller module 205 to discontinue the video streaming from the camera 127. (889857v1) 885516_FinalFiled - 23 In next step 490, the VC-Unit software application program module 209 causes the Camera Controller module 205 to disconnect from the camera 127. Then the method 400 proceeds to step 440 to wait for further user input. The methods described above will now be described in further detail by way of example. 5 Fig. 14 shows the timeline 1400 representing the state of the graphical user interface 300 at various time instants tI 1401, t2 1402, t3 1403, t4 1404 and t51405. In particular, Fig. 14 shows the state of the video window 301. The timeline 1400 shows the changes that the video window 301 undergoes by the application of the steps of the method 500. These changes are shown in finer detail in Fig. 9, Fig. 10, Fig. 11, Fig. 12 and Fig. 13 which is a sequence of 3 figures illustrating one particular scenario. In Fig 14, states 1406, 1407, 1408, 1409 and 1410 represent the state of display of the video window 301 in Fig. 3 at time tI 1401, t2 1402, t3 1403, t4 1404, and t5 1405, respectively, on time line 1400. In state 1406, the video window 301 displays a first object 901 (see Fig. 9a) on the display 5 114, as at step 530 of the method 500. Fig. 9a shows a camera view 902 on the object 901. The object 901 is a person or face of a person in the video conference, which the camera view 902 has currently captured. Fig. 9a shows a viewport 903 created (as at step 525) after detecting the object 901 (as at step 520). The viewport 903 encompasses the detected object 901. Fig 9b represents an actual view 904 as seen by the user in the video window 301 0 displayed on the display 114. The actual view 904 corresponds to the viewport 903. Further, object 905 is the view on the display 114 of the object 901. Accordingly, steps 520 and 525 of the method 500 leads to the display of the actual view 904 at state 1406 in the timeline 1400. In state 1407 the video window 301 displays an object 1006 representing the first object 25 901 at a lower resolution. For example, Fig. 10a shows a camera view 1001 after the camera 127 is zoomed out (as at step 535). The camera view 1001 includes the viewport 903 on the first object 901. The viewport 903 is maintained after the camera 127 is zoomed out (as at step 535). Fig. 10a also shows a second object 1003 detected (as at step 540) during the video conference, which the camera view 1001 has currently captured. Fig. 10b represents the 30 actual view 1005 as seen by the user on the video window 301, which is the same view as the viewport 903. The actual view 1005 includes the object 1006 which is the representation of the object 901 (as at step 560) on the display 114. The object 1006 may be of a lower resolution to the object 905 since the camera 127 may be zoomed out (as at step 535) to (889857vl) 885516_FinalFiled - 24 capture both the object 1003 and the object 901. Application of steps 530, 532, 535 and 540 of the method 500 leads to the display of the actual view 1005 at state 1407 in the timeline 1400. State 1412 indicates that t2 1402 is the start of the display transition. In state 1408 of Fig. 14 the video window 301 shows the transition effect. Fig. I Ia shows 5 the camera view 1101 covering the objects 901 and 1003. The camera view 1101 includes the viewport 903 covering the object 901 (or person) and a second view port 1105 covering the object 1003 (or person). As described above, the first viewport 903 is created, as at step 525, after the object 901 has been detected at step 520. The second viewport 1105 is created, as at step 540, after the object 1003 has been detected. ) Fig. 1 lb shows an actual view 1107 that the user sees in the video window 301. The view 1107 includes an object 1106 displayed in accordance with the display transition effect(as at step 550 of the method 500). For example, the object 1106 may be a wipe image generated using the two objects 901 and 1003. This display transition effect object 1106 is displayed in the video window 301 (as at step 550). Application of the step 550 of the method 500 leads to 5 the display of state 1408 in the timeline 1400. In state 1409 of Fig. 14, the video window 301 displays an object 1204, representing the second object 1003, at a lower resolution (as at step 560). Fig. 12a shows the camera view 1001 covering the objects 901 and 1003. The camera view 1001 includes the viewport 1105 covering the second object 1003 (or person). Fig. 12b shows an actual view 1203 that the user 0 sees in the video window 301. The actual view 1203 includes the object 1204 representing the view of the object 1003 in the viewport 1105 in typically lower resolution. Application of the step 560 of the method 500 leads to the display of state 1409 in the timeline 1400. In the state 1410, the video window 301 displays an object 1305, representing the second object 1003, at a higher resolution (as at step 570). . Fig. 13a shows a camera view 1301 !5 covering the second object 1003. The camera view 1301 includes the viewport 1105 covering the object 1003. Fig. 13b shows an actual view 1304 that the user sees in the video window 301. The actual view 1304 includes the object 1305 representing the actual view of the object 1003 in the viewport 1105 as seen by the user in the video window 301. The object 1305 is displayed 30 typically in a higher resolution than the object 1204 since the camera view 1301 is zoomed in to cover the object 1003 only. Accordingly, application of the step 570 of the method 500 leads to the display of the state 1410 corresponding Fig. 13b, in the timeline 1400. Industrial Applicability (889857v1) 885516_FinalFiled - 25 The arrangements described are applicable to the computer and data processing industries and particularly for the image processing. The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit i of the invention, the embodiments being illustrative and not restrictive. In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of". Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings. (889857v1) 885516_FinalFiled

Claims

1. A method of performing a display transition from a first object to a second object using a camera, the method comprising steps of: 5 displaying a first viewport comprising the first object, using a video stream generated by the camera; maintaining the first viewport displaying the first object while controlling the camera so as to include the first object and the second object within the video stream; and performing a display transition from the first object to the second object, according ) to a transition effect, based on the first and second object, so as to display the second object using the video stream generated by the camera.

2. The method according to claim 1, further comprising the step of creating a second viewport comprising the second object to display the second object. 5

3. The method according to claim 1, further comprising the step of creating the first viewport comprising the first object to display the first object.

4. The method according to claim 2, wherein the display transition occurs between the !0 first viewport and the second viewport using the further video stream generated by camera, while maintaining the display of the first viewport.

5. The method according to claim 1, wherein at least one image displayed during the display transition comprises both the first and second object. !5

6. The method according to claim 1, further comprising the step of zooming into the first object to display the first viewport. (889857v1) 885516_FinalFiled - 27

7. The method according to claim 1, further comprising the step of detecting at least the second object using an object detection method. 5

8. The method according to claim 7, wherein the object detection method is face detection

9. The method according to claim 7, wherein the second object is detected using audio detection.

10. The method according claim 7, further comprising the step of analysing gestures of the first object to detect the second object.

11. The method according to claim 7, wherein the second object is detected based on a 5 captured panoramic image of a scene including the first and second objects.

12. The method according to claim 7, wherein the second object is detected by operating the camera to a pre-determined location. 20

13. The method according to claim 1, further comprising the step of zooming out the camera to cover both the first and the second object.

14. The method according to claim 1, wherein time taken to perform the display transition is dependent on type of transition and number of objects involved in the display transition. 25 (889857v1) 885516_FinalFiled -28

15. The method according to claim 1, wherein the transition effect is a cross fade between the first object and the second object.

16. The method according to claim 1, wherein the transition effect is a wipe between the 5 first object and the second object.

17. The method according to claim 1, wherein the transition effect is a fade to black of the first object and the second object. 9

18. The method according to claim 1, wherein the transition effect is a fade from black.

19. The method according to claim 1, further comprising the step of detecting motion of the second object. 5

20. The method according to claim 1, wherein the camera is part of a video conferencing system.

21. An apparatus for performing a display transition from a first object to a second object using a camera, the apparatus comprising: 20 means for displaying a first viewport comprising the first object, using a video stream generated by the camera; means for maintaining the first viewport displaying the first object while controlling the camera so as to include the first object and the second object within the video stream; and (889857v1) 885516_FinalFiled -29 means for performing a display transition from the first object to the second object, according to a transition effect, based on the first and second object, so as to display the second object using the video stream generated by the camera. 5

22. A computer readable medium having recorded thereon a computer program for performing a display transition from a first object to a second object using a camera, the program comprising: code for displaying a first viewport comprising the first object, using a video stream generated by the camera; code for maintaining the first viewport displaying the first object while controlling the camera so as to include the first object and the second object within the video stream; and code for performing a display transition from the first object to the second object, according to a transition effect, based on the first and second object, so as to display the second object using the video stream generated by the camera. 5

23. A system for performing a display transition from a first object to a second object using a camera, the system comprising: a memory for storing data and a computer program; a processor coupled to the memory for executing the computer program, the computer 20 program comprising instructions for: displaying a first viewport comprising the first object, using a video stream generated by the camera; maintaining the first viewport displaying the first object while controlling the camera so as to include the first object and the second object within the video stream; and (889857vl) 885516_FinalFiled -30 performing a display transition from the first object to the second object, according to a transition effect, based on the first and second object, so as to display the second object using the video stream generated by the camera. 5

24. A method of performing a display transition from a first object to a second object using a camera, the method being substantially as herein before defined with reference to anyone of the embodiments as that embodiment is shown in the accompanying drawings. DATED this 11th Day of December 2008 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant SPRUSON&FERGUSON (889857v1) 885516_FinalFiled