WO2002087218A2 - Ensemble de cameras maniables et indicateur associe - Google Patents

Ensemble de cameras maniables et indicateur associe Download PDF

Info

Publication number
WO2002087218A2
WO2002087218A2 PCT/US2002/013004 US0213004W WO02087218A2 WO 2002087218 A2 WO2002087218 A2 WO 2002087218A2 US 0213004 W US0213004 W US 0213004W WO 02087218 A2 WO02087218 A2 WO 02087218A2
Authority
WO
WIPO (PCT)
Prior art keywords
camera
user
array
cameras
images
Prior art date
Application number
PCT/US2002/013004
Other languages
English (en)
Other versions
WO2002087218A3 (fr
Inventor
Scott Sorokin
Andrew H. Weber
David C. Worley
Gary Gendel
J. Hanna Keith
Kumar Rakesh
Samarasekera Supun
Original Assignee
Kewazinga Corp.
Sarnoff Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kewazinga Corp., Sarnoff Corporation filed Critical Kewazinga Corp.
Priority to AU2002307545A priority Critical patent/AU2002307545A1/en
Publication of WO2002087218A2 publication Critical patent/WO2002087218A2/fr
Publication of WO2002087218A3 publication Critical patent/WO2002087218A3/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/62Control of parameters via user interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2625Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects for obtaining an image which is composed of images from a temporal image sequence, e.g. for a stroboscopic effect
    • H04N5/2627Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects for obtaining an image which is composed of images from a temporal image sequence, e.g. for a stroboscopic effect for providing spin image effect, 3D stop motion effect or temporal freeze effect
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2628Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation

Definitions

  • the present invention relates to a telepresence system and, more particularly, to a navigable camera array telepresence system and method of using same for comparing two or more images.
  • static venues such as museums, and dynamic venues or events, such as a music concerts.
  • the viewing of such venues is limited by time, geographical location, and the viewer capacity of the venue. For example, potential visitors to a museum may be prevented from viewing an exhibit due to the limited hours the museum is open. Similarly, music concert producers must turn back fans due to the limited seating of an arena. In short, limited access to venues reduces the revenue generated.
  • the broadcast resulting from these editorial and production efforts provides viewers with limited enjoyment.
  • the broadcast is typically based on filming the venue from a finite number of predetermined cameras.
  • the broadcast contains limited viewing angles and perspectives of the venue.
  • the viewing angles and perspectives presented in the broadcast are those selected by a producer or director during the editorial and production process; there is no viewer autonomy.
  • the broadcast is often recorded for multiple viewings, the broadcast has limited content life because each viewing is identical to the first. Because each showing looks and sounds the same, viewers rarely come back for multiple viewings.
  • This system has several drawbacks. For example, in order for a viewer's perspective to move through the venue, the moving vehicle must be actuated and controlled. In this regard, operation of the system is complicated. Furthermore, because the camera views are contiguous, typically at right angles, changing camera views results in a discontinuous image.
  • 360 degree camera systems also suffer from drawbacks. In particular, such systems limit the user's view to 360 degrees from a given point perspective. In other words, 360 degree camera systems provide the user with a panoramic view from a single location. Only if the camera system was mounted on a moving vehicle could the user experience simulated movement through an environment.
  • U.S. Patent No. 5,187,571 for Television System For Displaying Multiple Views of A Remote Location issued February 16, 1993 describes a camera system similar to the 360 degree camera systems described above. The system described provides a user to select an arbitrary and continuously variable section of an aggregate field of view. Multiple cameras are aligned so that each camera's field of view merges contiguously with those of adjacent cameras thereby creating the aggregate field of view.
  • the aggregate field of view may expand to cover 360 degrees.
  • the cameras' views In order to create the aggregate field of view, the cameras' views must be contiguous. In order for the camera views to be contiguous, the cameras have to share a common point perspective, or vertex.
  • the system of U.S. Patent No. 5,187,571 limits a user's view to a single point perspective, rather than allowing a user to experience movement in perspective through an environment.
  • a viewer e.g., electronic graphical user interface
  • a viewer is capable for use in selecting images and/or views from an array of cameras, each of which has an associated view of an environment and an associated output representing the view.
  • the input to the matrix viewer can be either a raw set of pre-synthesized image data, or a set of original image data together with a set of flow-fields.
  • the matrix viewer allows the user to navigate the data both in space and in time, with the use of 2 slider controls, a single graphical control (e.g., the four-quadrant button described above) and the like.
  • the matrix viewer thus works generally according to the following four steps.
  • the second step generally involves mapping the desired view given by the User Interface, and the next likely desired- view, onto the source data.
  • the third step uses the mapping performed in the second step to fetch the data for the desired view and the next (anticipated) desired view into local memory.
  • the forth step involves processing the data for the desired view as required, displaying the view on the screen, and continuing to fetch data for the next desired view.
  • Figure 1 is an overall schematic of one embodiment of the present invention.
  • Figure 2a is a perspective view of a camera and a camera rail section of the array according to one embodiment of the present invention.
  • Figures 2b-2d are side plan views of a camera and a camera rail according to one embodiment of the present invention.
  • Figure 2e is a top plan view of a camera rail according to one embodiment of the present invention.
  • Figure 3 is a perspective view of a portion of the camera array according to one embodiment of the present invention.
  • Figure 4 is a perspective view of a portion of the camera array according to an alternate embodiment of the present invention.
  • Figure 5 is a flowchart illustrating the general operation of the user interface according to one embodiment of the present invention.
  • Figure 6 is a flowchart illustrating in detail a portion of the operation shown in Figure 5.
  • Figure 7a is a perspective view of a portion of one embodiment of the present invention illustrating the arrangement of the camera array relative to objects being viewed.
  • Figures 7b-7g illustrate views from the perspectives of selected cameras of the array in Figure 7a.
  • Figure 8 is a schematic view of an alternate embodiment of the present invention.
  • Figure 9 is a schematic view of a server according to one embodiment of the present invention.
  • Figure 10 is a schematic view of a server according to an alternate embodiment of the present invention.
  • Figure 11 is a top plan view of an alternate embodiment of the present invention.
  • Figure 12 is a flowchart illustrating in detail the image capture portion of the operation of the embodiment shown in Figure 11.
  • Figure 13 is a schematic illustrating an array of one embodiment of the present invention.
  • Figure 14 is flowchart illustrating the image capture process of one embodiment of the present invention.
  • Figure 15 is a schematic illustrating the logical arrangement of frames of an image according to one embodiment of the present invention.
  • Figure 16 is a flowchart illustrating the playback process of one embodiment of the present invention.
  • Figure 17 is a schematic view representing a display according to one embodiment of the present invention.
  • Figure 18a-c are schematics illustrating the logical relationship among frames according to one embodiment of the present invention.
  • Figure 19 is a schematic illustrating the logical arrangement of frames according to one embodiment of the present invention.
  • Figure 20 is a flowchart illustrating the process of harmonizing the duration of images according to one embodiment of the present invention.
  • Figure 21 is a schematic of a viewer according to one embodiment of the present invention.
  • Figure 22 is a schematic illustrating system components according to one embodiment of the present invention.
  • Figure 23 is a schematic illustrating the processing flow associated with capturing images, creating "on-the-fly" tweened images and making such images available to a viewer according to one embodiment of the present invention.
  • Figure 24 is a schematic illustrating the processing flow associated with capturing images, creating "pre-tweened” images and making such images available to a viewer according to one embodiment of the present invention.
  • the present invention relates to a viewer for use in selecting images and/or cameras of a telepresence system, such as that disclosed in International Application Serial No. PCT/US00/28652, assigned to Kewazinga Corporation (the "Kewazinga Application"), hereby incorporated herein by reference.
  • the telepresence system includes an array of cameras, the outputs of which are electronically provided to one or more users, each with its own viewer, in response to user inputs, such that the users can simultaneously and independently navigate through the array.
  • the outputs of these microcameras are linked by tiny (less than half the width of a human hair) Vertical Cavity Surface Emitting Lasers (VCSELs) to optical fibers, fed through area net hubs, buffered on server arrays or server farms (either for recording or (instantaneous) relay) and sent to viewers at remote terminals, interactive wall screens, or mobile image appliances (like Virtual Retinal Displays).
  • VCSELs Vertical Cavity Surface Emitting Lasers
  • GUI graphical user interface
  • the system uses the multiplicity of positioned cameras to move the viewer's perspective from camera node to adjacent camera node in a way that provides the viewer with a sequential visual and acoustical path throughout the extent of the array. This allows the viewer to fluidly track or dolly through a 3 -dimensional remote environment, to move through an event and make autonomous realtime decisions about where to move and when to linger.
  • a telepresence system 100 is shown in Fig. 1.
  • the telepresence system 100 generally includes an array 10 of cameras 14 coupled to a server 18, which in turn is coupled to one or more users 22 each having a user interfaced/display device 24.
  • the operation and functionality of the embodiment described herein is provided, in part, by the server and user interface/display device. While the operation of these components is not described by way of particular code listings or logic diagrams, it is to be understood that one skilled in the art will be able to arrive at suitable implementations based on the functional and operational details provided herein. Furthermore, the scope of the present invention is not to be construed as limited to any particular code or logic implementation.
  • the camera array 10 is conceptualized as being in an X, Z coordinate system. This allows each camera to have an associated, unique node address comprising an X, and Z coordinate (X, Z).
  • a coordinate value corresponding to an axis of a particular camera represents the number of camera positions along that axis the particular camera is displaced from a reference camera.
  • the X axis runs left and right, and the Z axis runs down and up.
  • Each camera 14 is identified by its X, Z coordinate. It is to be understood, however, that other methods of identifying cameras 14 can be used.
  • the array is three dimensional, located in an X, Y, Z coordinate system.
  • the array 10 comprises a plurality of rails 12, each rail 12 including a series of one or more cameras 14.
  • the output from the cameras 14 are coupled to the server 18 by means of local area hubs 16.
  • the local area hubs 16 gather the outputs and, when necessary, amplify the outputs for transmission to the server 18. hi an alternate embodiment, the local area hubs 16 multiplex the outputs for transmission to the server 18.
  • the communication links 15 may be employed.
  • the communication links 15 may take the form of fiber optics, cable, satellite, microwave transmission, internet, and the like.
  • an electronic storage device 20 is also coupled to the server 18.
  • the server 18 transfers the outputs to the electronic storage device 20.
  • the electronic (mass) storage device 20 transfers each camera's output onto a storage medium or means, such as CD- ROM, DVD, fluorescent multilayered disk (FMD), tape, platter, disk array, or the like.
  • the output of each camera 14 is stored in particular locations on the storage medium associated with that camera 14 or is stored with an indication to which camera 14 each stored output corresponds. For example, the output of each camera 14 is stored in contiguous locations on a separate disk, tape, CD-ROM, or platter.
  • the camera output may be stored in a compressed format, such as JPEG, which is a standard format for storing still color and grayscale photographs in bitmap form, MPEG1, which is a standard format for storing video output with a resolution of 30 frames per second, MPEG2, which is a standard format for storing video output with a resolution of 60 frames per second (typically used for high bandwidth applications such as HDTV and DVD-ROMs), and the like.
  • JPEG which is a standard format for storing still color and grayscale photographs in bitmap form
  • MPEG1 which is a standard format for storing video output with a resolution of 30 frames per second
  • MPEG2 which is a standard format for storing video output with a resolution of 60 frames per second (typically used for high bandwidth applications such as HDTV and DVD-ROMs), and the like.
  • the server 18 receives output from the cameras 14 in the array.
  • the server 18 processes these outputs for either storage in the electronic storage device 20, transmission to the users 22 or both.
  • the server 18 is configured to provide the functionality of the system 100 in the present embodiment, it is to be understood that other processing elements may provide the functionality of the system 100.
  • the user interface device is a personal computer programmed to interpret the user input and transmit an indication of the desired current node address, buffer outputs from the array, and provide other of the described functions.
  • the system 100 can accommodate (but does not require) multiple users 22.
  • Each user 22 has associated therewith a user interface device including a user display device (collectively 24).
  • user 22-1 has an associated user interface device and a user display device in the form of a computer 24-1 having a monitor and a keyboard.
  • User 22-2 has associated therewith an interactive wall screen 24-2 which serves as a user interface device and a user display device.
  • the user interface device and the user display device of user 22-3 includes a mobile audio and image appliance 24-3.
  • a digital interactive TV 24-4 is the user interface device and user display device of user 22-4.
  • user 22-5 has a voice recognition unit and monitor 24-5 as the user interface and display devices.
  • user interface devices and user display devices are merely exemplary; for example, other interface devices include a mouse, touch screen, biofeedback devices, as well as those identified in U.S. Provisional Patent Application Serial No. 60/080,413 and the like.
  • each user interface device 24 has associated therewith user inputs. These user inputs allow each user 22 to move or navigate independently through the array 10. other words, each user 22 enters inputs to generally select which camera outputs are transferred to the user display device.
  • each user display device includes a graphical representation of the array 10. The graphical representation includes an indication of which camera in the array the output of which is being viewed.
  • the user inputs allow each user to not only select particular cameras, but also to select relative movement or navigational paths through the array 10. It is to be understood that as used herein a path is defined by both cameras and time. As such, two users navigating through the same series of cameras may navigate different paths, provided the users do not access all cameras simultaneously. In other words, a linear series of plurality of cameras provides for a plurality of paths.
  • each user 22 maybe coupled to the server 18 by an independent communication link.
  • each communication link may employ different technology.
  • the communication links include an internet link, a microwave signal link, a satellite link, a cable link, a fiber optic link, a wireless link, and the like.
  • the array 10 provides several advantages. For example, because the array 10 employs a series of cameras 14, no individual camera, or the entire array 10 for that matter, need be moved in order to obtain a seamless view of the environment. Instead, the user navigates through the array 10, which is strategically placed through and around the physical environment to be viewed. Furthermore, because the cameras 14 of the array 10 are physically located at different points in the environment to be viewed, a user is able to view changes in perspective, a feature unavailable to a single camera that merely changes focal length.
  • CMOS active pixel sensor APS
  • the video chips used in microcameras may be CMOS, CCD and the like, and are produced in a mainstream manufacturing process, by several companies, including Photobit, Pasadena, CA; Sarnoff Corporation, Princeton, NJ; and VLSI Vision, Ltd., Edinburgh, Scotland.
  • One specific suitable cameras is the analog color CCD camera manufactured by Sanyo Electric Co. Ltd. under the tradename VCC-5974.
  • VCC-5974 video capture boards
  • Meteor-II an analog to digital converter for converting analog NTSC video.
  • the capture boards also receive a video synchronizing signal, noted below, so that the output of each camera is synchronize, with each captured frame of one camera corresponding to that of the other. From the capture boards, the camera output is then provided to one or more servers or processing elements for processing, (b) Structure of the Array
  • the camera array 10 of the present embodiment comprises a series of modular rails 12 carrying cameras 14.
  • the structure of the rails 12 and cameras 14 will now be discussed in greater detail with reference to Figs. 2a through 2d.
  • Each camera 14 includes registration pins 34.
  • the cameras 14 utilize VCSELs to transfer their outputs to the rail 12. It is to be understood that the present invention is not limited to any particular type of camera 14, however, or even to an array 10 consisting of only one type of camera 14.
  • Each rail 12 includes two sides, 12 a, 12b, at least one of which 12b is hingeably connected to the base 12c of the rail 12.
  • the base 12c includes docking ports 36 for receiving the registration pins 34 of the camera 14.
  • Each rail 12 further includes a first end 38 and a second end 44.
  • the first end 38 includes, in the present embodiment, two locking pins 40 and a protected transmission relay port 42 for transmitting the camera outputs.
  • the second end 44 includes two guide holes 46 for receiving the locking pins 40, and a transmission receiving port 48.
  • the first end 38 of one rail 12 is engageable with a second end 44 of another rail 12. Therefore, each rail 12 is modular and can be functionally connected to another rail to create the array 10.
  • each rail 12 includes communication paths for transmitting the output from each camera 14.
  • a cable couples each camera to the server.
  • the array 10 is shown having a particular configuration, it is to be understood that virtually any configuration of rails 12 and cameras 14 is within the scope of the present invention.
  • the array 10 may be a linear array of cameras 14, a 2- dimensional array of cameras 14, a 3-dimensional array of cameras 14, or any combination thereof.
  • the array 10 need not be comprised solely of linear segments, but rather may include curvilinear sections.
  • individual rails support a single camera and include varying degree of freedom extension spacers on either end of the rail to change the spacing between cameras or change the angle between adjacent cameras.
  • These spacers comprise linear or rotary actuators or electrostrictive polymers controlled by one of the system servers.
  • the array 10 is supported by any of a number of support means.
  • the array 10 can be fixedly mounted to a wall or ceiling; the array 10 can be secured to a moveable frame that can be wheeled into position in the environment or supported from cables.
  • Fig. 3 illustrates an example of a portion of the array 10.
  • the array 10 comprises five rows of rails 12a, through 12e.
  • Each of these rails 12a-12e is directed towards a central plane, which substantially passes through the center row 12c. Consequently, for any object placed in the same plane as the middle row 12c, a user would be able to view the object essentially from the bottom, front, and top.
  • the rails 12 of the array 10 need not have the same geometry.
  • some of the rails 12 may be straight while others may be curved.
  • Fig. 4 illustrates the camera alignment that results from utilizing curved rails. It should be noted that rails in Fig. 4 have been made transparent so that the arrangement of cameras 14 may be easily seen.
  • each rail is configured in a step-like fashion or an arc with each camera above (or below) and in front of a previous camera.
  • the user has the option of moving forward through the environment.
  • the spacing of the cameras 14 depends on the particular application, including the objects being viewed, the focal length of the cameras 14, and the speed of movement through the array 10. hi general, the closer the cameras and the greater the overlap in views, the more seamless the transition between camera views. In one embodiment the distance between cameras 14 can be approximated by analogy to the distance between exposed frames taken by a motion picture camera dollying linearly through an environment. In general, the speed of movement of the projector through the environment divided by the frames exposed per unit of time results in a frame-distance ratio.
  • a frame is taken ever inch.
  • a conventional movie camera records twenty-four frames per second. When such a camera is moved linearly through an environment at two feet per second, a frame is taken approximately every inch.
  • a frame of the projector is analogous to a camera 14 in the present invention.
  • one frame exposed per inch results in a movie having a seamless view of the environment, so too does one camera 14 per inch.
  • the cameras 14 are spaced approximately one inch apart, thereby resulting in a seamless view of the environment.
  • the spacing between cameras is greater than one inch, provided the fields of view of adjacent cameras overlap. Again, the greater the degree of overlap, the more seamless the progression between adjacent camera views.
  • the spacing between cameras may be further increased by generating synthetic or mixed images between contiguous cameras.
  • the linear spacing between cameras becomes less important in a curved array, where the angular displacement between cameras is more important.
  • the array is in a 180 degree arc, with cameras placed at five degree intervals, directed towards the center of the arc.
  • the linear distance between the cameras also increase; however, the angular displacement, five degrees, and the overlap in fields of view remain the same. Because the overlap in field of view remains, the system maintains the seamless progression from camera to adjacent camera.
  • the array comprises an arc of cameras.
  • the arc extends 110 degrees, with a radius of nine feet, and the cameras placed at approximately seven and a half degree intervals around the arc.
  • the arc has a radius of fifteen feet, with the cameras located every sixteen inches.
  • step 110 the user is presented with a predetermined starting view of the environment corresponding to a starting camera.
  • the operation of the system is controlled, in part, by software residing in the server.
  • the system associates each camera in the array with a coordinate.
  • the system is able to note the coordinates of the starting camera node.
  • the user determines that they want to move or navigate through the array, the user enters a user input through the user interface device 24.
  • the user inputs of the present embodiment generally include moving to the right, to the left, up, or down in the array. Additionally, a user may jump to a particular camera in the array. In alternate embodiments, a subset of these or other inputs, such as forward, backward, diagonal, over, and under, are used.
  • the user interface device transmits the user input to the server in step 120.
  • decoding the input generally involves determining whether the user wishes to move to the right, to the left, up, or down in the array.
  • the server 18 proceeds to determine whether the input corresponds to moving to the user's right in the array 10. This determination is shown in step 140. If the received user input does correspond to moving to the right, the current node address is incremented along the X axis in step 150 to obtain an updated address.
  • the server 18 determines whether the input corresponds to moving to the user's left in the array 10 in step 160. Upon determining that the input does correspond to moving to the left, the server 18 then decrements the current node address along the X axis to arrive at the updated address. This is shown in step 170.
  • the server 18 determines whether the input corresponds to moving up in the array. This determination is made in step 180. If the user input corresponds to moving up, in step 190, the server 18 increments the current node address along the Z axis, thereby obtaining an updated address.
  • the server 18 determines whether the received user input corresponds to moving down in the array 10. This determination is made in step 200. If the input does correspond to moving down in the array 10, in step 210 the server 18 decrements the current node address along the Z axis.
  • step 220 the server 18 determines whether the received user input corresponds to jumping or changing the view to a particular camera 14. As indicated in Figure 5, if the input corresponds to jumping to a particular camera 14, the server 18 changes the current node address to reflect the desired camera position. Updating the node address is shown as step 230. h an alternate embodiment, the input corresponds to jumping to a particular position in the array 10, not identified by the user as being a particular camera but by some reference to the venue, such as stage right.
  • the server 18 may decode the received user inputs in any of a number of ways, including in any order. For example, in an alternate embodiment the server 18 first determines whether the user input corresponds to up or down, i another alternate, preferred embodiment, user navigation includes moving forward, backward, to the left and right, and up and down through a three dimensional array.
  • step 240 the server 18 causes a message signal to be transmitted to the user display device 24, causing a message to be displayed to the user 22 that the received input was not understood. Operation of the system 100 then continues with step 120, and the server 18 awaits receipt of the next user input.
  • the server 18 After adjusting the current node address, either by incrementing or decrementing the node address along an axis or by jumping to a particular node address, the server 18 proceeds in step 250 to adjust the user's view. Once the view is adjusted, operation of the system 100 continues again with step 120 as the server 18 awaits receipt of the next user input. hi an alternate embodiment, the server 18 continues to update the node address and adjust the view based on the received user input. For example, if the user input corresponded to "moving to the right", then operation of the system 100 would continuously loop through steps 140, 150, and 250, checking for a different input. When the different input is received, the server 18 continuously updates the view accordingly.
  • Fig. 6 is a more detailed diagram of the operation of the system according to steps 140, 150, and 250 of Fig. 5. Moreover, it is to be understood that while Fig. 6 describes more detailed movement one direction i.e., to the right, the same detailed movement can be applied in any other direction.
  • the determination of whether the user input corresponds to moving to the right actually involves several determinations. As described in detail below, these determinations include moving to the right through the array 10 at different speeds, moving to the right into a composited additional source output at different speeds, and having the user input overridden by the system 100.
  • the present invention allows a user 22 to navigate through the array 10 at the different speeds. Depending on the speed (i.e.
  • the server 18 will apply an algorithm that controls the transition between camera outputs either at critical speed (n nodes/per unit of time), under critical speed (n-1 nodes/per unit of time), or over critical speed (n + 1 nodes/per unit of time).
  • speed of movement through the array 10 can alternatively be expressed as the time to switch from one camera 14 to another camera 14.
  • the server 18 makes the determination whether the user input corresponds to moving to the right at a critical speed.
  • the critical speed is preferably a predetermined speed of movement through the array 10 set by the system operator or designer depending on the anticipated environment being viewed. Further, the critical speed depends upon various other factors, such as focal length, distance between cameras, distance between the cameras and the viewed object, and the like.
  • the speed of movement through the array 10 is controlled by the number of cameras 14 traversed in a given time period. Thus, the movement through the array 10 at critical speed corresponds to traversing some number, "n", camera nodes per millisecond, or taking some amount of time, "s", to switch from one camera 14 to another.
  • the server 18 increments the current node address along the X axis at n nodes per millisecond.
  • the user traverses twenty-four cameras 14 per second.
  • a movie projector records twenty-four frames per second. Analogizing between the movie projector and the present invention, at critical the user traverses (and the server 18 switches between) approximately twenty-four cameras 14 per second, or a camera 14 approximately every 0.04167 seconds.
  • the user 22 may advance not only at critical speed, but also at over the critical speed, as shown in step 140b, or at under the critical speed, as shown in step 140c.
  • the server 18 increments the current node address along the X axis by a unit of greater than n, for example, at n + 2 nodes per millisecond.
  • the step of incrementing the current node address at n + 1 nodes per millisecond along the X axis is shown in step 150b.
  • the server 18 proceeds to increment the current node address at a variable less than n, for example, at n - 1 nodes per millisecond. This operation is shown as step 150c.
  • the shape of the array 10 can also be electronically scaled and the system 100 designed with a "center of gravity” that will ease a user's image path back to a "starting" or “critical position” node or ring of nodes, either when the user 22 releases control or when the system 100 is programmed to override the user's autonomy; that is to say, the active perimeter or geometry of the array 10 can be pre-configured to change at specified times or intervals in order to corral or focus attention in a situation that requires dramatic shaping.
  • the system operator can, by real-time manipulation or via a pre-configured electronic proxy sequentially activate or deactivate designated portions of the camera array 10. This is of particular importance in maintaining authorship and dramatic pacing in theatrical or entertainment venues, and also for implementing controls over how much freedom a user 22 will have to navigate through the array 10.
  • the system 100 can be programmed such that certain portions of the array 10 are unavailable to the user 22 at specified times or intervals.
  • the server 18 makes the determination whether the user input corresponds to movement to the right through the array but is subject to a navigation control algorithm.
  • the navigation control algorithm causes the server 18 to determine, based upon navigation control factors, whether the user's desired movement is permissible.
  • the navigation control algorithm determines whether the desired movement would cause the current node address to fall outside the permissible range of node coordinates.
  • the permissible range of node coordinates is predetermined and depends upon the time of day, as noted by the server 18.
  • the navigation control factors include time.
  • permissible camera nodes and control factors can be correlated in a table stored in memory.
  • the navigation control factors include time as measured from the begimiing of a performance being viewed, also as noted by the server.
  • the system operator can dictate from where in the array a user will view certain scenes, hi another alternate embodiment, the navigation control factor is speed of movement through the array. For example, the faster a user 22 moves or navigates through the array, the wider the turns must be.
  • the permissible range of node coordinates is not predetermined.
  • the navigation control factors and, therefore, the permissible range is dynamically controlled by the system operator who communicates with the server via an input device.
  • the server 18 further proceeds, in step 150d, to increment the current node address along a predetermined path.
  • incrementing the current node address along a predetermined path the system operator is able to corral or focus the attention of the user 22 to the particular view of the permissible cameras 14, thereby maintaining authorship and dramatic pacing in theatrical and entertainment venues.
  • the server 18 does not move the user along a predetermined path. Instead, the server 18 merely awaits a permissible user input and holds the view at the current node. Only when the server 18 receives a user input resulting in a permissible node coordinate will the server 18 adjust the user's view.
  • the user 22 may, at predetermined locations in the array 10, choose to leave the real world environment being viewed. More specifically, additional source outputs, such as computer graphic imagery, virtual world imagery, applets, film clips, and other artificial and real camera outputs, are made available to the user 22. In one embodiment, the additional source output is composited with the view of the real environment. In an alternate embodiment, the user's view transfers completely from the real environment to that offered by the additional source output.
  • additional source outputs such as computer graphic imagery, virtual world imagery, applets, film clips, and other artificial and real camera outputs
  • the additional source output is stored (preferably in digital form) in the electronic storage device 20.
  • the server 18 transmits the additional source output to the user interface/display device 24.
  • the server 18 simply transmits the additional source output to the user display device 24.
  • the server 18 first composites the additional source output with the camera output and then transmits the composited signal to the user interface/display device 24.
  • the server 18 makes the determination whether the user input corresponds to moving in the array into the source output. If the user 22 decides to move into the additional source output, the server 18 adjusts the view by substituting the additional source output for the updated camera output identified in either of steps 150a-d.
  • the server 18 proceeds to adjust the user's view in step 250.
  • the server 18 "mixes" the existing or current camera output being displayed with the output of the camera 14 identified by the updated camera node address. Mixing the outputs is achieved differently in alternate embodiments of the invention. In the present embodiment, mixing the outputs involves electronically switching at a particular speed from the existing camera output to the output of the camera 14 having the new current node address.
  • the camera outputs are synchronized.
  • a synchronizing signal from a "sync generator” is supplied to the cameras and/or the processors capturing the camera output.
  • the sync generator may take the form of those used in video editing and may comprise, in alternate embodiments, part of the server, the hub, and/or a separate component coupled to the array.
  • the server 18 switches camera outputs approximately at a rate of 24 per second, or one every 0.04167 seconds. If the user 22 is moving through the array 10 at under the critical speed, the outputs of the intermediate cameras 14 are each displayed for a relatively longer duration than if the user is moving at the critical speed. Similarly, each output is displayed for a relatively shorter duration when a user navigates at over the critical speed. In other words, the server 18 adjusts the switching speed based on the speed of the movement through the array 10.
  • the user may navigate at only the critical speed.
  • mixing the outputs is achieved by compositing the existing or current output and the updated camera node output. In yet another embodiment, mixing involves dissolving the existing view into the new view, hi still another alternate embodiment, mixing the outputs includes adjusting the frame refresh rate of the user display device. Additionally, based on speed of movement through the array, the server may add motion blur to convey the realistic sense of speed. hi yet another alternate embodiment, the server causes a black screen to be viewed instantaneously between camera views. Such an embodiment is analogous to blank film between frames in a movie reel. Furthermore, although not always advantageous, such black screens reduce the physiologic "carrying over" of one view into a subsequent view.
  • the user inputs corresponding to movements through the array at different speeds may include either different keystrokes on a keypad, different positions of a joystick, positioning a joystick in a given position for a predetermined length of time, and the like.
  • the decision to move into an additional source output may be indicated by a particular keystroke, joystick movement, or the like.
  • mixing may be accomplished by "mosaicing" the outputs of the intermediate cameras 14.
  • U.S. Pat. No. 5,649,032 entitled System For Automatically Aligning Images To Form A Mosaic Image to Peter J. Burt et al. discloses a system and method for generating a mosaic from a plurality of images and is hereby incorporated by reference.
  • the server 18 automatically aligns one camera output to another camera output, a camera output to another mosaic (generated from previously occurring camera output) such that the output can be added to the mosaic, or an existing mosaic to a camera output.
  • the present embodiment utilizes a mosaic composition process to construct (or update) a mosaic.
  • the mosaic composition comprises a selection process and a combination process.
  • the selection process automatically selects outputs for incorporation into the mosaic and may include masking and cropping functions to select the region of interest in a mosaic.
  • the combination process combines the various outputs to form the mosaic.
  • the combination process applies various output processing techniques, such as merging, fusing, filtering, output enhancement, and the like, to achieve a seamless combination of the outputs.
  • the resulting mosaic is a smooth view that combines the constituent outputs such that temporal and spatial information redundancy are minimized in the mosaic, hi one embodiment of the present invention, the mosaic may be formed as the user moves through the system and the output image displayed close to real time. In another embodiment, the system may form the mosaic from a predetermined number of outputs or during a predetermined time interval, and then display the images pursuant to the user's navigation through the environment.
  • the server 18 enables the output to be mixed by a "tweening" process.
  • tweening One example of the tweening process is disclosed in U.S. Pat. No. 5,259,040 entitled Method For Determining Sensor Motion And Scene Structure And Image Processing System Therefor to Keith J. Haniia, herein incorporated by reference. Tweening enables the server 18 to process the structure of a view from two or more camera outputs of the view.
  • the server monitors the movement among the intermediate cameras 14 through a scene using local scene characteristics such as brightness derivatives of a pair of camera outputs.
  • a global camera output movement constraint is combined with a local scene characteristic constancy constraint to relate local surface structures with the global camera output movement model and local scene characteristics.
  • the method for determining a model for global camera output movement through a scene and scene structure model of the scene from two or more outputs of the scene at a given image resolution comprises the following steps:
  • step (c) resetting the initial estimates of the local scene models and the image sensor motion model using the new value of one of the models determined in step (b); (d) determining a new value of the second of the models using the estimates of the models determined in step (b) by minimizing the difference between the measured error in the outputs and the error predicted by the model;
  • an embodiment of the present invention monitors the user movement among live cameras or storage nodes.
  • the server 18 also transmits to the user display device 24 outputs from some or all of the intermediate cameras, namely those located between the current camera node and the updated camera node.
  • Fig. 7a illustrates a curvilinear portion of an array 10 that extends along the X axis or to the left and right from the user's perspective.
  • the coordinates that the server 18 associates with the cameras 14 differ only in the X coordinate. More specifically, for purposes of the present example, the cameras 14 can be considered sequentially numbered, starting with the left-most camera 14 being the first, i.e., number "1".
  • the X coordinate of each camera 14 is equal to the camera's position in the array.
  • particular cameras will be designate 14-X, where X equals the camera's position in the array 10 and, thus, its associated X coordinate.
  • Figs. 7a-7g illustrate possible user movement through the array 10.
  • the environment to be viewed includes three objects 602, 604, 606, the first and second of which include numbered surfaces. As will be apparent, these numbered surface allow a better appreciation of the change in user perspective.
  • Fig. 7a six cameras 14-2, 14-7, 14-11, 14-14, 14-20, 14-23 of the array 10 are specifically identified.
  • the boundaries of each camera's view is identified by the pair of lines 14-2a, 14-7a, 14-1 la, 14-14a, 14-20a, 14-23a, radiating from each identified camera 14-2, 14-7, 14-11, 14-14, 14-20, 14-23, respectively.
  • the user 22 navigates through the array 10 along the X axis such that the images or views of the environment are those corresponding to the identified cameras 14-2, 14-7, 14- 11, 14-14, 14-20, 14-23.
  • the present example provides the user 22 with the starting view from camera 14-2. This view is illustrated in Fig. 7b.
  • the server 18 Because the server 18 has been programmed to recognized the "7" key as corresponding to moving or jumping through the array to camera 14-7.
  • the server 18 changes the X coordinate of the current camera node address to 7, selects the output of camera 14-7, and adjusts the view or image sent to the user 22. Adjusting the view, as discussed above, involves mixing the outputs of the current and updated camera nodes. Mixing the outputs, in turn, involves switching intermediate camera outputs into the view to achieve the seamless progression of the discrete views of cameras 14-2 through 14-7, which gives the user 22 the look and feel of moving around the viewed object.
  • the user 22 now has another view of the first object 702.
  • the view from camera 14-7 is shown in Fig. 7c.
  • the server 18 would omit some or all of the intermediate outputs.
  • the user 22 indicates to the system 100 a desire to navigate to the right at critical speed.
  • the server 18 causes the mixing of the output of camera 14-11 with that of camera 14-7. Again, this includes switching into the view the outputs of the intermediate cameras (i.e., 14-8, 14-9, and 14-10) to give the user 22 the look and feel of navigating around the viewed object.
  • the user 22 is thus presented with the view from camera 14-11, as shown in Fig. 7d.
  • the user 22 enters a user input, for example, "alt-right arrow,” indicating a desire to move to the right at less than critical speed.
  • the server 18 increments the updated camera node address by n-1 nodes, namely 3 in the present example, to camera 14-14.
  • the outputs from cameras 14-11 and 14- 14 are mixed, and the user 22 is presented with a seamless view associated with cameras 14- 11 through 14-14.
  • Fig. 7e illustrates the resulting view of camera 14-14.
  • the user 22 enters a user input such as "shift-right arrow," indicating a desire to move quickly through the array 10, i.e., at over the critical speed.
  • the server 18 interprets the user input and increments the current node address by n+2, or 6 in the present example.
  • the updated node address thus corresponds to camera 14-20.
  • the server 18 mixes the outputs of cameras 14-14 and 14-20, which includes switching into the view the outputs of the intermediate cameras 14-15 through 14-19.
  • the resulting view of camera 14-20 is displayed to the user 22.
  • the user 22 now views the second object 704.
  • the user 22 desires to move slowly through the array 10. Accordingly, the user 22 enters "alt-right arrow" to indicate moving to the right at below critical speed.
  • the server 18 interprets the received user input, it updates the current camera node address along the X axis by 3 to camera 14-23.
  • the server 18 then mixes the outputs of camera 14-20 and 14-23, thereby providing the user 22 with a seamless progression of views through camera 14-23.
  • the resulting view 14-23a is illustrated in Fig. 7g.
  • devices other than cameras may be interspersed in the array.
  • These other devices such as motion sensors and microphones, provide data to the server(s) for processing.
  • output from motion sensors or microphones are fed to the server(s) and used to scale the array.
  • permissible camera nodes are those near the sensor or microphone having a desired output e.g., where there is motion or sound.
  • navigation control factors include output from other such devices.
  • the output from the sensors or microphones are provided to the user.
  • the system 800 generally includes an array of cameras 802 coupled to a server 804, which, in turn, is coupled to one or more user interface and display devices 806 and an electronic storage device 808.
  • a hub 810 collects and transfers the outputs from the array 802 to the server 804.
  • the array 802 comprises modular rails 812 that are interconnected. Each rail 812 carries multiple cameras 814 and a microphone 816 centrally located at rail 812.
  • the system 800 includes microphones 818 that are physically separate from the array 802. The outputs of both the cameras 814 and microphones 816, 818 are coupled to the server 804 for processing.
  • the server 804 receives the sound output from the microphones 816, 818 and, as with the camera output, selectively transmits sound output to the user. As the server 804 updates the current camera node address and changes the user's view, it also changes the sound output transmitted to the user.
  • the server 804 has stored in memory an associated range of camera nodes with a given microphone, namely the cameras 814 on each rail 810 are associated with the microphone 816 on that particular rail 810. hi the event a user attempts to navigate beyond the end of the array 802, the server 804 determines the camera navigation is impermissible and instead updates the microphone node output to that of the microphone 818 adjacent to the array 802.
  • the server 804 might include a database in which camera nodes in a particular area are associated with a given microphones. For example, a rectangular volume defined by the (X, Y, Z) coordinates (0,0,0), (10,0,0), (10,5,0), (0,5,0), (0,0,5), (10,0,5), (10,5,5) and (0,5,5) are associated with a given microphone. It is to be understood that selecting one of the series of microphones based on the user's position (or view) in the array provides the user with a sound perspective of the environment that coincides with the visual perspective.
  • server 902 electronic storage device 20, array 10, users (1,2,3, . . .N) 22-1 - 22-N, and associated user interface/display devices 24-1 - 24-N are shown therein.
  • the server 902 includes, among other components, a processing means in the form of one or more central processing units (CPU) 904 coupled to associated read only memory (ROM) 906 and a random access memory (RAM) 908.
  • CPU central processing units
  • ROM 906 is for storing the program that dictates the operation of the server 902
  • RAM 908 is for storing variables and values used by the CPU 904 during operation.
  • user interface/display devices 24 are also coupled to the CPU 904. It is to be understood that the CPU may, in alternate embodiments, comprise several processing units, each performing a discrete function.
  • a memory controller 910 Coupled to both the CPU 904 and the electronic storage device 20 is a memory controller 910.
  • the memory controller 910 under direction of the CPU 904, controls accesses (reads and writes) to the storage device 20.
  • the memory controller 910 is shown as part of the server 902, it is to be understood that it may reside in the storage device 20.
  • the CPU 904 receives camera outputs from the array 10 via bus 912. As described above, the CPU 904 mixes the camera outputs for display on the user interface/display device 24. Which outputs are mixed depends on the view selected by each user 22. Specifically, each user interface/display devices 24 transmits across bus 914 the user inputs that define the view to be displayed. Once the CPU 904 mixes the appropriate outputs, it transmits the resulting output to the user interface/display device 24 via bus 916. As shown, in the present embodiment, each user 22 is independently coupled to the server 902.
  • the bus 912 also carries the camera outputs to the storage device 20 for storage.
  • the CPU 904 directs the memory controller 910 to store the output of each camera 14 in particular locations of memory in the storage device 20.
  • the CPU 904 When the image to be displayed has previously been stored in the storage device 20, the CPU 904 causes the memory controller 910 to access the storage device 20 to retrieve the appropriate camera output. The output is thus transmitted to the CPU 904 via bus 918 where it is mixed. Bus 918 also carries additional source output to the CPU 904 for transmission to the users 22. As with outputs received directly from the array 10, the CPU 904 mixes these outputs and transmits the appropriate view to the user interface/display device 24.
  • FIG. 10 shows a server configuration according to an alternate embodiment of the present invention.
  • the server 1002 generally comprises a control central processing unit (CPU) 1004, a mixing CPU 1006 associated with each user 22, and a memory controller 1008.
  • the control CPU 1004 has associated ROM 1010 and RAM 1012.
  • each mixing CPU 1006 has associated ROM 1014 and RAM 1016.
  • the camera outputs from the array 10 are coupled to each of the mixing CPUs 1 through N 1006-1, 1006-N via bus 1018.
  • each user 22 enters inputs in the interface/display device 24 for transmission (via bus 1020) to the control CPU 1004.
  • the control CPU 1004 interprets the inputs and, via buses 1022-1, 1022-N, transmits control signals to the mixing CPUs 1006-1, 1006-N instructing them which camera outputs received on bus 1018 to mix.
  • the mixing CPUs 1006-1, 1006-N mix the outputs in order to generate the appropriate view and transmit the resulting view via buses 1024-1, 1024-N to the user interface/display devices 24-1, 24-N.
  • each mixing CPU 1006 multiplexes outputs to more than one user 22. Indications of which outputs are to mixed and transmitted to each user 22 comes from the control CPU 1004.
  • the bus 1018 couples the camera outputs not only to the mixing CPUs 1006-1, 1006- N, but also to the storage device 20.
  • the storage device 20 stores the camera outputs in known storage locations.
  • the control CPU 1004 causes the memory controller 1008 to retrieve the appropriate images from the storage device 20. Such images are retrieved into the mixing CPUs 1006 via bus 1026. Additional source output is also retrieved to the mixing CPUs 1006-1, 1006-N via bus 1026.
  • the control CPU 1004 also passes control signals to the mixing CPUs 1006-1, 1006-N to indicate which outputs are to be mixed and displayed.
  • the outputs of cameras are provided to networked (e.g., via an Ethernet) personal computers, for example one capture computer per pair of adjacent cameras and one control computer, i one embodiment, where analog video cameras are used, each capture computer also includes two video capture boards — one per camera coupled to the capture computer. Each capture computer also provides the mixing functionality, such as tweening, between the cameras coupled thereto. Furthermore, the control computer causes each capture computer to receive the output from a camera adjacent to one directly coupled to the capture computer so that capture computer may mix the outputs of the camera directly coupled to the capture computer and the adjacent camera.
  • networked e.g., via an Ethernet
  • each capture computer also includes two video capture boards — one per camera coupled to the capture computer.
  • Each capture computer also provides the mixing functionality, such as tweening, between the cameras coupled thereto.
  • the control computer causes each capture computer to receive the output from a camera adjacent to one directly coupled to the capture computer so that capture computer may mix the outputs of the camera directly coupled to the capture computer and the adjacent camera
  • Control computer coordinates the operation of the capture computers and other components as described herein.
  • the system retrieves from the array (or the electronic storage device) and simultaneously transmits to the user at least portions of outputs from two cameras.
  • the server processing element mixes these camera outputs to achieve a stereoscopic output.
  • Each view provided to the user is based on such a stereoscopic output.
  • the outputs from two adjacent cameras in the array are used to produce one stereoscopic view.
  • Figs. 7a-7g one view is the stereoscopic view from cameras 14-1 and 14-2.
  • the next view is based on the stereoscopic output of cameras 14-2 and 14-3 or two other cameras.
  • the user is provided the added feature of a stereoscopic seamless view of the environment.
  • the present invention allows multiple users to simultaneously navigate through the array independently of each other.
  • the systems described above distinguish between inputs from the multiple users and selects a separate camera output appropriate to each user's inputs.
  • the server tracks the current camera node address associated with each user by storing each node address in a particular memory location associate with that user.
  • each user's input is differentiated and identified as being associated with the particular memory location with the use of message tags appended to the user inputs by the corresponding user interface device.
  • two or more users may choose to be linked, thereby moving in tandem and having the same view of the environment.
  • each includes identifying another user by his/her code to serve as a "guide".
  • the server provides the outputs and views selected by the guide user to both the guide and the other user selecting the guide. Another user input causes the server to unlink the users, thereby allowing each user to control his/her own movement through the array.
  • a user may also wish to navigate forward and backward through the environment, thereby moving closer to or further away from an object.
  • the use of a zoom lens would entail robotic control by a single user and preclude the simultaneous viewing of different fields of view positions at that camera node by multiple users.
  • One embodiment that solves this problem of preventing multiple user from simultaneously viewing different fields of view from the same camera position in the array entails creating different field of view options at a single camera position.
  • the different field of view options are created with clusters of cameras at each position in the array, each camera having a different field of view lens but substantially the same vertex in the array, i one embodiment, the cameras at the same position have essentially the same vertex by employing beam splitters and/or mirrors to enable the different field of view cameras to be physically positioned away from the vertex in the array, yet have each camera field of view from the same perspective or vertex.
  • each camera and its associated output has an address, a storage location where the camera outputs are being stored, and is accessible based on user inputs indicating which field of view or relative field of view (zoom in or zoom out) the user desires to receive.
  • Fig. 11 illustrates a top plan view of another embodiment enabling the user to move left, right, up, down, forward or backwards through the environment.
  • a plurality of cylindrical arrays (121-1 - 121-n) of differing diameters comprising a series of cameras 14 maybe situated around an environment comprising one or more objects 1200, one cylindrical array at a time.
  • Cameras 14 situated around the object(s) 1100 are positioned along an X and Z coordinate system.
  • an array 12 may comprise a plurality of rings of the same circumference positioned at different positions (heights) throughout the z-axis to form a cylinder of cameras 14 around the object(s) 1100.
  • each camera in each array 12 to have an associated, unique storage node address comprising an X and Z coordinate - i.e., arrayl(X, Z).
  • a coordinate value corresponding to an axis of a particular camera represents the number of camera positions along that axis the particular camera is displaced from a reference camera.
  • the X axis runs around the perimeter of an array 12, and the Z axis runs down and up.
  • Each storage node is associated with a camera view identified by its X, Z coordinate.
  • the outputs of the cameras 14 are coupled to one or more servers for gathering and transmitting the outputs to the server 18.
  • each camera requires only one storage location.
  • the camera output may be stored in a logical arrangement, such as a matrix of n arrays, wherein each array has a plurality of (X,Z) coordinates, hi one embodiment, the node addresses may comprise of a specific coordinate within an array- i.e., Array ! (X n ,Z n ), Array (X n ,Z n ) through Array n (X n ,Z n ).
  • a cylindrical array 12-1 is situated around the object(s) located in an environment 1100.
  • the view of each camera 14 is transmitted to server 18 in step 1220.
  • the electronic storage device 20 of the server 18 stores the output of each camera 14 at the storage node address associated with that camera 14. Storage of the images may be effectuated serially, from one camera 14 at a time within the array 12, or by simultaneous transmission of the image data from all of the cameras 14 of each array 12.
  • cylindrical array 12-1 is removed from the enviromnent (step 1240).
  • step 1250 a determination is made as to the availability of additional cylindrical arrays 12 of differing diameters to those already situated. If additional cylindrical arrays 12 are desired, the process repeats beginning with step 1210. When no additional arrays 12 are available for situating around the environment, the process of inputting images into storage devices 20 is complete (step 1260). At the end of the process, a matrix of addressable stored images exist.
  • a user may navigate through the environment. Navigation is effectuated by accessing the input of the storage nodes by a user interface device 24.
  • the user inputs generally include moving around the environment or object 1100 by moving to the left or right, moving higher or lower along the z-axis, moving through the environment closer or further from the object 1100, or some combination of moving around and through the environment. For example, a user may access the image stored in the node address Array 3 (0,0) to view an object from the camera previously located at coordinate (0,0) of Array 3 .
  • the user may move directly forward, and therefore closer to the object 1100, by accessing the image stored in Array 2 (0,0) and then To move further away from the object and to the right and up, the user may move from the image stored in node address Array ⁇ O) and access the images stored in node address Array 2 (l,l), followed by accessing the image stored in node address Array 3 (2,2), an so on.
  • a user may, of course, move among arrays and/or coordinates by any increments changing the point perspective of the environment with each node. Additionally, a user may jump to a particular camera view of the environment. Thus, a user may move throughout the environment in a manner similar to that described above with respect to accessing output of live cameras.
  • This embodiment allows user to access images that are stored in storage nodes as opposed to accessing live cameras. Moreover, this embodiment provides a convenient system and method to allow a user to move forward and backward in an environment.
  • each storage node is associated with a camera view identified by its X, Z coordinate of a particular array, other methods of identifying camera views and storage nodes can be used. For example, other coordinate systems, such as those noting angular displacement from a fixed reference point as well as coordinate systems that indicate relative displacement from the current camera node may be used.
  • the camera arrays 12 may be other shapes other than cylindrical. Moreover, it is not essential, although often advantageous, that the camera arrays 12 surround the entire environment.
  • the foregoing user inputs namely, move clockwise, move counter-clockwise, up, down, closer to the environment, and further from the environment, are merely general descriptions of movement through the environment.
  • movement in each of these general directions is further defined based upon the user input.
  • the output generated by the server to the user may be mixed when moving among adjacent storage nodes associated with environment views (along the x axis, z axis, or among juxtaposed arrays) to generate seamless movement throughout the environment. Mixing may be accomplished by, but are not limited to, the processes described above.
  • an array according to the present invention may be used to capture virtually any image for any purpose.
  • One particular use of one embodiment of the present invention is to compare multiple images.
  • the present invention can allow for a comparison from any one of multiple point perspectives at any given reference point of time.
  • Exemplary embodiments which will now be described with reference to Figs. 15-17, provide a training aid that compares the images of the swings of two golfers ⁇ a training professional and a player/trainee.
  • the array is generally in the form of a geodesic dome 1305 having an opening for a golfer to enter and hit a ball. More specifically, the array extends approximately 270° in a horizontal band, 180° in a vertical band from side to side and 150° in a vertical band from the rear at the ground, forward towards the opening.
  • the array not only includes cameras 1310, but also lights 1315, a greenscreen background covering 1320, a greenscreen background flooring 1325, and a supporting rail structure 1330. As is known in the art, other color backgrounds can be used.
  • the plurality of cameras 1310 populate the interior of the dome 1305 supported by the greenscreen 1320 and or rails 1330. As described in greater detail below, the green covering 1320 and flooring 1325 allow for easier processing of the images.
  • the cameras 1310 can be logically organized in rows; for example, the lowest row 1335 can be designated row 0 , the second row from the bottom 1340 can be designated row l5 the third row from the bottom 1345 can be designated row 2 . Additionally, the cameras 1310 in each row can be logically numbered, for example, sequentially from the right of the array, clockwise to the left. As described below, such logical arrangement facilitates processing of images and navigation through the array. In alternate embodiments, the cameras 1310 are mounted in configurations other than rows, such as geometric or random patterns, preferably so that the image captured by one camera 1310 overlaps the image captured by each adjacent camera 1310.
  • the array can be coupled to one or more processing elements, storage devices, user interface devices, and other components according to any one of the configurations described above with reference to Figures 1 and 8-10 and equivalents thereto.
  • the images of the professional's swing is stored in one storage device and the image of the trainee's swing is stored in a second storage device.
  • the images of the two swings are stored in different layers, levels or partitions within a single storage device, such as a fluorescent multi-layer disk.
  • Each of the two storage devices are coupled in parallel to and can be accessed in parallel by the server.
  • the cameras 1310 are coupled to the electronic storage devices so the images may be stored and the server is coupled to the storage devices so images can be retrieved from storage, processed and restored in the storage devices.
  • a user interface device is also coupled to the server so the images can be transmitted to the user.
  • each camera 1310 operates at approximately thirty frames per second, fri an alternate embodiment, the cameras 1310 capture the image at sixty frames per second.
  • the image from each camera 1310 and for each frame is then processed to separate the image from the background. More specifically, the server (or dedicated processor) mattes out the image from the solid background 1320 (step 1410).
  • Such a process is generally known as bluescreening, matting, keying or chromakeying out the image and can be performed by any of a number of known processes, including those provided by the Ultimatte Corporation under the trade name ULTIMATTE, and by PixelCom J. V. under the trade name PPJMATTE. As will be appreciated by those skilled in the art, matting out the image is preferable for better display of the images.
  • the server then digitally stores the matted or keyed out image of each frame from each camera 1310 in an electronic storage device (step 1415).
  • the outputs (or images) captured in each frame of each camera 1310 are temporarily stored.
  • the server then processes the temporarily stored frames to matte/key out the golfer's image from each frame and stores the matted/keyed out image, preferably writing over the original (non-keyed) frames.
  • the server processes the frames, keying out the golfer's image, in real time. In such an embodiment, no temporary image need be stored. In another embodiment, no matting process is perfo ⁇ ned.
  • Figure 15 depicts one example of a logical representation and addressing scheme of one golfer's swing as stored in one storage device without storing any mixed images. Taking thirty frames per second and the average golf swing lasting less than three seconds, approximately ninety frames will be stored for each camera. As logically shown, each frame from each camera is stored at a unique location or address in the storage device. In this embodiment, the first and second (right most) digits of the address indicate frame number, the third and fourth digits indicate camera number, and the fifth and sixth digits indicate row number.
  • the first frame ⁇ framei -- taken by the first camera in the first row— row ⁇ l) ⁇ is stored at address 01 01 01.
  • the third frame - frame - taken by the second camera in the second row — row 2 (2) — is stored at address 02 02 03.
  • the addresses can be represented in any notation, such as hexadecimal or binary, and the addresses may or may not be contiguous.
  • the same logical arrangement is used for the storage of the second golfer's swing in the second storage device.
  • the foregoing discussion is a logical description of storing images for accessing certain portions thereof.
  • the images are video streams, rather than separate, discrete frames.
  • the playback of the images will now be described with reference to Figures 16 and 17 and continuing reference to Figures 13 and 15.
  • the user selects playback on the user terminal (step 1605) and the playback begins. More specifically, the system begins by providing the user a default starting view of the professional and trainee (step 1610). h the present embodiment, the images of the professional and the trainee are displayed side-by-side, as shown in Figure 17, from the same camera 1310 at frame]. Determination of the first frame is described in greater detail below.
  • the user may begin navigating the stored images.
  • the user enters a user input via the user input device, and the server receives and interprets the input in a manner as described above with reference to Figures 5 and 6 (step 1615).
  • the server then accesses and updates in parallel the trainee image (step 1620a) and the professional image (step 1620b).
  • the user inputs include moving to the left or right and up or down in the array; further, each directional movement can be forward in time, at the same point in time, or backward in time. Such movement is achieved by accessing and, where appropriate, stringing together the frames taken by the cameras.
  • navigating through the array can be based on the logical arrangement and addressing scheme of frames: to move to the left to the next camera 1310, the third digit of the address of the image to be viewed is incremented; to move up to the next row, the fifth digit of the address is incremented; to move forward in time to the next frame, the first digit of the address is incremented.
  • the next image is that associated with frame] of roW ⁇ (2) (i.e., the image stored at address 01 02 01), and then the image associated with frarnei of row ! (3) (i.e., the image stored at address 01 03 01).
  • the next image could be that associated with frame 2 of row 2 (2) (i.e., the image stored at address 02 02 02), and then the image associated with frame 3 of row 3 (3) (i.e., the image stored at address 03 03 03).
  • the viewer accesses the desired frame or time perspective of the images based on the synchronization of the image streams and/or time codes embedded in the streams.
  • the server provides an updated view to the user (step 1625). Images of both the professional and trainee are updated synchronously. Changes to the user's view is applied to both the professional's and the trainee's images. Operation of the present embodiment is made efficient by using the same addressing scheme in both the storage device containing the professional's images and the storage device containing the trainee's images. In other words, each frame from each camera is stored at the same address in different storage devices. Therefore, the server receives the user input, determines the next appropriate camera frame/output and corresponding address, mixes the last frame with the updated frame and causes the image stored at that address in each storage device to be provided to the user. Having displayed the view, the server awaits the next user input (step 1615).
  • the server continuously updates the view based on the previously entered user input until the user enters a different input.
  • the playback preferably occurs at the same rate as the image capture occurred, namely thirty frames per second in the present embodiment. Therefore, when the selected user input is "forward in time" (from any camera(s)), the view is essentially a video playback at the actual speed of the swings. It should be understood that the present invention is independent of the type of cameras and the capture and playback rates.
  • the present embodiment thus allows for enhanced comparison of images and, consequently, improved training.
  • the trainee's swing can be compared to that of the professional in many ways.
  • the swings can be compared at a single point in time, such as at the top of the frainee's back swing, and from any perspective provided by the array, such as front, back, top, etc.
  • the swings can be compared through sequential points in time, throughout a portion or the entirety of the swings, and from a changing perspective.
  • the swings can be compared at actual speed over and over again, each time from a new perspective.
  • the present embodiment allows two images to be compared at any point in time from any perspective.
  • the images are displayed one overlaid on top of another, h one alternate embodiment utilizing overlaid images, the images are displayed with differing luminance levels.
  • the professional swing image which remains constant, can be captured and stored with no change in luminance level.
  • the trainee swing image on the other hand, can be stored with a lesser luminance level so that it can be overlaid on top of the professional swing image, hi such an embodiment, the camera outputs are temporarily stored in the storage device and retrieved by the server;
  • the server not only processes the outputs to matte out the image (if desired), but also adjusts the luminance level of each image.
  • the server then stores the processed outputs for later retrieval during playback, hi related embodiments the luminance levels are adjusted at different points during the system operation, such as when originally retrieved from the cameras or just prior to outputting to the user interface display device.
  • the user may separately control the views of the professional's and the trainee's swings.
  • the server discriminates between two sets of user inputs — one relating to each of the two images.
  • the opening in the dome allows the golfers to take a realistic swing and hit an actual ball. Where a greater range of viewing is desired, however, the array need not include an opening for the ball to travel. Instead, the golfers can be completely enclosed in a dome of cameras (entering by way of a door having cameras mounted thereon), thereby allowing viewing from 360°.
  • the server mixes the camera frames/images by electronically switching between frames/images. However, in alternate embodiments the server mixes the frames/images in any of the manners described above. For example, in one embodiment, mixing includes creating a "tweened" image from the output of adjacent cameras. The tweened image can be created and stored, or depending upon available processing power, created in real time as the view is being presented to the user.
  • Figure 18a illustrates the logical relationship of real and mixed images according to one embodiment in which the mixed images are synthesized images that are the product of images (output) from adjacent cameras.
  • the logical arrangement of frames containing the real and mixed images can best be illustrated in the three dimensional representation in which the first access represents sequential frames, the second access represents sequential rows, and the third access represents sequential cameras in each row.
  • sequential frames of the same camera are illustrated along the horizontal axis (i.e., left to right)
  • adjacent rows are illustrated along the vertical access
  • adjacent cameras in the same row are illustrated along the access extending into the page.
  • frames containing real images are illustrated as squares and bear the same logical address as corresponding frames identified in Figure 15.
  • Synthesized frames created by mixing outputs from the same point in time, from two adjacent cameras, in the same row are represented by triangles; synthesized frames created by mixing outputs from the same point in time, from corresponding cameras in adjacent rows are indicated by circles; and synthesized frames created by mixing outputs from the same point in time, from a camera in a given row and from the next camera in an adjacent row are indicated by diamonds.
  • the asterisk indicates a synthesized frame created by mixing the outputs from adjacent cameras, in adjacent rows taken at subsequent points in time (i.e., adjacent frames).
  • the mixed images are labeled with the logical notation wherein an apostrophe (') adjacent to either the second or third pair of digits signifies that the image was created by mixing outputs of adjacent cameras in the same row or corresponding cameras in adjacent rows, respectively.
  • the notation Ol'Ol 01 refers to the image created by mixing frames from 01 01 01 and 02 01 01
  • Ol'Ol'Ol refers to the image created by mixing the frames 01 01 01 and 02 02 01
  • 01 '01 '01' refers to the image created by mixing the frames 01 01 01 and 02 02 02.
  • frame 01' 01 ' 01 may be created by mixing frames 02 01 01 and 01 02 01, or by mixing 01 01 01, 02 01 01, 01 02 01 and 02 01.
  • Figure 18a illustrates only two successive frames of each of two adjacent cameras in each of two adjacent rows, it is to be understood that the logical depiction is readily extensible to multiple frames, cameras and rows. Having described the logical relationship of frames containing real images and synthesized frames containing mixed images, exemplary user navigation will be described with reference to Figures 18b and c, which use the same notation as Figure 18a, and continuing reference to Figure 13.
  • a user navigating the array from the first camera in row 1 and moving to the left at the same point in time is sequentially provided the images of frames 01 01 01, 01 Ol'Ol, and 01 02 01.
  • the user is sequentially provided frames 01' 02 01 and 02 02 01.
  • moving forward in time from the same camera the user is sequentially provided the image of frame 02 02 02 and subsequent frames, 02 02 03, 02 02 04, et seq.
  • the system identifies one or more reference points of the swings and uses such reference points to synchronize the swings and/or adjust the playback speed of the swings, hi such embodiments, the system includes a user interface device, through which a user can manually indicate a reference point of a swing, or any number of motion measuring devices, such as motion detectors, range finders, electronic tags (mounted on the golfer or golf club) and the like.
  • various points in the swing can be identified, including the beginning of movement of the golf club during the back swing, the change of direction of the golf club at the end of the back swing, contact of the golf club and the golf ball, the end of the follow-through, when the golf club comes to rest, and the like.
  • Manual indications, as well as indications received from such movement measuring means, of the various points in the swing maybe used to synchronize the swings of the professional and the trainee.
  • the two identified reference frames are used as synchronizing points for the swings.
  • the reference point is the beginning of the back swing
  • such reference frames are used as the first frame in the playback and all navigation is performed relative to the two reference frames.
  • a user is able to compare the swings to determine whether the trainee is swinging too fast or too slow.
  • point-by-point comparison of the swings becomes difficult as the swings diverge and lack synchronization.
  • Use of multiple reference points permit this system to synchronize the swings and compensate for the different swing speeds, thereby allowing essentially point-by-point comparison of the swings.
  • these thirty additional frames are preferably mixed images created from successive frames of a each camera that are uniformly interspersed among the frames of each camera containing real images.
  • the logical arrangement of the frames containing real images and frames containing mixed images of the foregoing example is illusfrated in Figure 19.
  • Interspersed among the sixty frames containing real images of the professional swing are thirty frames of mixed images. More specifically, the thirty mixed images are uniformly interspersed between every other pair of frames; a mixed image has been created between frames 1 and 2, not between frames 2 and 3, between frames 3 and 4, not between frames 4 and 5, and so forth.
  • such mixed images created from successive frames from the same camera can be combined in the same embodiment as mixed images created from frames from different cameras.
  • such mixed images interspersed for the purpose of adjusting the speed of the image are used to create other mixed images.
  • the mixed images that are interspersed for adjusting the speed of the swing are indicated by an "X”, and (using the notation of Figure 18a) mixed images 01 01 01' and 02 01 01' are used to create mixed image 01' 01 01'.
  • the system first captures and stores the image of the professional's swing and the image of the trainee's swing (step 2010). The system then receives a user input via a user interface device indicating the user's desire to harmonize the speeds of two swings (step 2020). The system then proceeds to create the necessary mixed images.
  • the system receives a indications via the motion measuring device coupled to the system (e.g., server) noting both the beginning and end of the first swing (step 2030).
  • the system e.g., server
  • These user indications correspond to particular points in time relative to the start of recording, which, in turn, correspond to particular reference frames that the system tags, hi alternate embodiments the system automatically identifies the beginning and end of each swing by input from any of a number of motion measuring devices, such as motion detectors, range finders, electronic tags and the like, and in other embodiments via manual input via a user interface device during playback of the images.
  • beginning and end points of a swing need not be precisely defined, but are preferably selected so that the points correspond to the same part of the two swings.
  • the beginning may be the beginning of the golfer's back swing and the end may be when the golf club comes to rest after the golfer's follow-through.
  • the number of frames in the faster swing is subtracted from the number of frames in the slower swing, resulting in the number of mixed images to be added to the faster swing (step 2060).
  • the slower swing included ninety frames and the faster swing sixty frames, thirty mixed frames must be added to the faster swing.
  • the system must also determine the composition of the mixed images (step 2070).
  • the system must determine the "location" of the mixed images.
  • the system evenly intersperses the frames containing the mixed images.
  • the location of the frames is determined by dividing the number of additional mixed images to be added into the number of frames containing real images of the faster swing.
  • sixty original frames divided by thirty additional mixed images equals one added mixed image every two original frames. Where the division results in a non-integer, even distribution can be approximated by rounding the result to the next highest integer.
  • Each mixed image comprises the product of mixing the two adjacent frames containing real images.
  • the system proceeds to create and store the mixed images (step 2080).
  • the present invention includes other manners of harmonizing the speed of the two swings. For example, in alternate embodiments rather than interleaving mixed images into the faster swing, blank frames are inserted or repeat frames are inserted. In still other alternate embodiments, the system accounts for the different speeds by adjusting the playback speed based on the ratio of the lengths of the swings. For example, in the context of the example of Figure 19, the playback speed of the professional swing (sixty frames) to trainee swing (ninety frames) is two-thirds (60 frames/90 frames) that of the trainee.
  • the system adjusts the playback speed by accessing and/or refreshing the frames at different rates.
  • a number of frames (equal to the number otherwise to be added to the faster swing in the above embodiments) from the slower swing are dropped from the image.
  • the system and method for adjusting the speed of a swing may be separately applied to portions of a swing, thereby synchronizing discrete portions of swings.
  • the different durations of the professional's and trainee's backswings may be harmonized so that upon playback both images arrive at the end of the backswing at the same time.
  • the remainder of the swing i.e., the downswing and follow-through
  • the process of Figure 20 is performed based on the beginning and end of each portion of the swing to be synchronized.
  • One exemplary addressing scheme is that of the embodiment of Figure 15, wherein successive images are stored at known, continuous addresses, hi alternate embodiments, the system includes various degrees of a linked list of frame addresses.
  • each data element in the linked list points to a frame as well as the previous and successive frame in each of the variable dimensions, such as those illustrated in Figure 18 a, including up and down, diagonal, left and right and forward and back in time, other such embodiments, the data elements in the linked list point to either the previous or successive frame in a subset of those dimensions.
  • frames taken from cameras at the boundaries of the array are linked to frames taken at the opposite boundary. For example, the frames from the last camera in a given row of the array of Figure 13 are linked to frames from the first camera in the same row.
  • the exemplary embodiments described herein relating to harmonizing the speed and duration of images are concerned with harmonizing two images, the present invention can be used to harmonize multiple images by utilizing the process described with reference to Figure 19 to add frames to all but the longest image. Furthermore, it is to be understood that although the embodiments described herein intersperse a single frame containing a mixed image between frames containing real images, in alternate embodiments multiple frames containing mixed images are interspersed between frames containing real images.
  • images captured and processed according to the present invention may be stored on a portable storage medium, such as a CD-ROM, and played back by a user on hardware separate from that which was used to capture and process the images, hi such an embodiment, the playback hardware includes software providing the play back functionality, including the ability to interpret user inputs and, in response thereto, locate and display appropriate frames.
  • the playback software locates the frames in any number of ways, including accessing a mapping or linked list of the frames which is stored on the storage medium.
  • the user interface devices of the foregoing embodiments may be thought of as a matrix player, serving as the interface to the user to manipulate and display the matrix of real and artificial (i.e., synthesized) video captured over time from any one of the foregoing embodiments.
  • the matrix viewer allows the user to navigate through a path over the camera array (and the matrix of real and artificial frames) and to generate continuous video along this path, short, the primary goal is for the viewer to transition through the views in a smooth and intuitive fashion.
  • the matrix viewer performs different levels of image processing in different embodiments. For example, in certain embodiments where the user navigates both images captured from cameras (i.e., "real” images) and artificial images from perspectives in-between the camera images (i.e., "tweened” images), the matrix viewer receives both the real image streams and the tweened image streams, the tweened image streams preferably having been previously generated by another processing device of the system, hi other embodiments, the matrix viewer receives the real image streams and information necessary to generate the tweened images.
  • the matrix viewer receives real images and information necessary to create artificial tweened images will now be described in greater detail.
  • the matrix viewer includes a rendering engine software component that uses pre-computed flow fields between camera positions to synthesize one or more in-between viewpoints.
  • the method for view synthesis consists of a flow based warping step and image fusion step, as described more fully below. Distance to the synthesized view relative to the distance between the two viewpoints is used as a weight to compute the flow field to the new view. These flow fields are then used to warp the video onto the new view's frame of reference. These warped frames can be then combined in many ways. If the system uses only two views (camera inputs) to create the in-between or tweened view, a weighted average (where the weights correspond to the relative distances from the original frames) of the warped frames yield good results.
  • a nearest neighbor interpolation of the flow fields yields relatively quick rendering results
  • results can be obtained by first doubling the flow field resolutions before a nearest neighbor interpolation is performed
  • bilinear or even higher order interpolates can be used.
  • Flow fields may contain inaccuracies at occlusion and image boundaries. Based on forward and backward flow consistency and based on correlation measures between a frame and its warped counterpart, mask images of reliable flow areas may be generated. The rendering engine can then use these masks to decide which areas of each video frame contribute towards the final view presented by the viewer. It should be understood, however, that in many applications certain levels of inaccuracies are acceptable.
  • FIG. 21 An exemplary design schematic of a matrix viewer is shown in Figure 21.
  • the matrix viewer separates the user interface from the rendering or processing engine.
  • the primary software component of the rendering system receives user input or guidance (movement) requests and render requests from the main software component providing the graphical user interface (GUI).
  • GUI graphical user interface
  • the matrix server returns a real or synthesized view.
  • the navigation controls are tied to a keyboard or a joystick, and the GUI sends these as directional and velocity request to the Viewpoint Controller.
  • the Viewpoint Controller buffers these requests and pre-fetches the data (as described below) as needed in preparation for a request from the Callback Renderer.
  • the GUI requests new views through a callback function to the Callback Renderer. Ideally this callback is made from a timer loop within the GUI. This enables rendering at a fixed frame rate.
  • the Callback Renderer requests the position information and the corresponding data from the Viewpoint Controller and generates the new view.
  • the matrix viewer of the present embodiment preferably runs on a user interface/display device including at least the processing power of a Pentium IH Class Windows NT machine with 256MB of Memory; however, other computers, operating systems and processing devices are suitable.
  • the system preferably includes a high-speed disk that can accommodate the multiple video streams and the flow fields that are generated. It is also preferable to have a graphics card capable of blitting the rendered video frames. It is further preferred that a dual processor system be used with multi-threading available with the matrix Server.
  • a linear camera array is used, thereby restricting tweening to horizontal neighbors for illustrative purposes Although in alternate embodiements tweening may be in two or more planes or directions, depending upon the array configuration.
  • the system of the present embodiment includes multiple personal computers (PCs). In addition to one PC reserved for performing controlling functions, each of the other PCs is utilized for capturing and processing real images.
  • each such capture PC includes two video capture cards or "frame grabbers," such as those provided by the Matrox Corporation under the tradename METEOR, captures the video stream from two video cameras.
  • the controller PC performs controlling functions, such as driving and synchronizing the operation of the capture PCs.
  • the sequence of cameras in the linear array are logically numbered from left to right.
  • the exemplary system configuration is illustrated in Fig. 22.
  • the present embodiment uses a nearest neighbor interpolation of the flow fields to generate new views. Only two adjacent camera nodes are used at a time for tweening, and these nodes are limited to horizontal neighbors.
  • the flow fields are pre-computed, preprocessed and provided to the matrix viewer so that the matrix viewer synthesizes new artificial views "on-the-fly.” (See Fig. 23).
  • both real and artificial views are pre-generated, and the matrix viewer presents user-selected views.
  • the matrix viewer generates the flow field data and artificial images necessary to present the user-selected views. Playback can be set for any of a number of frames/second values, such as 15 frames/second.
  • viewpoints i.e, real or artificial perspectives
  • the matrix viewer may also utilize a mask with the flow field to create a better synthesis of the views.
  • the system may also use the Katmai (floating point) instruction set (preferably, on Pentium IH or better systems) to improve the performance of the flow field interpolation.
  • each capture PC preferably saves the digitized images to disk (referred to in Fig. 23 as ".KMF" file) and converts each frame into bitmap (BMP) format file.
  • BMP bitmap
  • the result is a series of BMP files for each camera real image stream.
  • the system provides seamless navigation not only between each pair of cameras associated with a given capture PC (e.g., cameras 1 and 2), but also between adjacent cameras associated with different capture PCs (e.g., cameras 2 and 3). Consequently, each capture PC (other than the PC associated with the last two cameras) provides to the capture PC associated with the next two cameras the BMP files associated with the BMP files associated with the adjacent camera.
  • the first capture PC provides the second capture PC with the BMP files representing the real image stream from camera 2.
  • each capture PC has three series of BMP files.
  • the real images from the last camera must be mixed with the real image from the first camera to provide an arc or tweening path completely around the circular array. Accordingly, the BMP files associated with the last camera will be copied to the capture PC associated with the first camera.
  • Each capture PC proceeds to calculate both forward and reverse flow field data for each frame in the series of BMP files. More specifically, the capture PC calculates forward and reverse flow fields between the BMP files received from the adjacent capture PC and the BMP files associated with the first camera coupled to this capture PC and between the BMP files associated with the first camera coupled to this capture PC and the BMP files associated with the second camera associated with this capture PC.
  • the flow fields are performed on a frame-by-frame (or BMP file-by-BMP file) basis for all frames (or BMP files).
  • the cameras are synchronized during capture and the flow fields are generated between frames taken at the same time instant. For example, the first frame of camera 1 and the first frame of camera 2 are used to generate flow field data.
  • forward flow field refers to the flow field from the lower numbered camera (i.e, left) to the higher numbered camera (i.e., right) and reverse flow fields refers to the flow field from the higher numbered camera to the lower numbered camera.
  • each capture PC will contain seven series of files: two series of BMP files representing the real image streams from two cameras associated with the capture PC; one series of BMP files received from an adjacent capture PC; two forward flow field data files and two reverse flow field data files.
  • the flow field data files can be a sequence of BMP files.
  • the capture PCs preferably convert the BMP files and the flow field files into the same multimedia file format, such as Audio Video Interleaved (AVI) format established by the Microsoft Corporation or other video format.
  • AVI Audio Video Interleaved
  • the BMP files are merged into AVI format files.
  • the capture PCs perform the conversion in any of a number of ways, including using a distributed system of components, such as Distributed Component Object Model (DCOM).
  • DCOM Distributed Component Object Model
  • the real image AVI files and flow field AVI files are utilized by the matrix viewer to permit seamless navigation by the end user.
  • the software components comprising the matrix viewer are installed on one of the capture PCs. As such, all AVI files are transferred to the capture PC having the matrix viewer.
  • one or more matrix viewers reside on separate user interface or processing devices coupled to one or more capture PCs by any now known or hereafter known technologies and protocols, including the Internet, Local Area Networks, Wide Area Networks, wireless transmission, and the like.
  • the matrix viewer also utilizes a camera graph file (referred to as the kwz file in the figures) to permit navigation through the images by an end user.
  • the camera graph file is loaded once by the matrix viewer at startup and informs the matrix viewer of the layout of the cameras in the array. More specifically, the file includes "nodes", which indicate actual camera positions, and "arcs", which indicate navigation paths between nodes or cameras. As shown in the following exemplary file format (where the bracketed text is a comment), the camera graph file includes various other information.
  • Path "string" [where string is the prefix path for all AVI or other video files]
  • FrameRate ⁇ integer> [the integer number of the frame rate override]
  • Nodes ⁇ integer> [the integer number of camera nodes]
  • Node ⁇ X ⁇ integer> [horizontal (X) coordinate of camera position in array, counting each tweened image as a node]
  • Y ⁇ integer> [vertical (Y) coordinate of camera position in the array; in present example with linear array, always equal to 0]
  • Arcs ⁇ integer> [number of arcs in array; e.g., in a linear array of six cameras there are five arcs]
  • Node 1 ⁇ integer> [number of first index node; i.e., node defining beginning of arc]
  • Node 2 ⁇ integer> [number of second index node; i.e., node defining end of arc]
  • FileF "string" [where string is the name of file containing forward flow data for arc]
  • FileR "string” [where string is the name of file containing reverse flow data for arc]
  • one or more nodes may include a reference to another node and, consequently, to the file associated with that other node.
  • the AVI sfream of the referred-to node is segmented equally among the referred-to node and the referring nodes.
  • each such referring node also includes an indication of the starting frame of the corresponding segment of the stream. For example, if the referred-to node has an AVI file of 1000 frames and there are nine nodes referring to the node, the each node will have an AVI segment of 100 frames (1000 divided by (9+1)).
  • an arc can reference another arc flow field files so that the flow field file can be segmented.
  • the starting frame of the flow field file is also similarly specified.
  • the matrix viewer proceeds to retrieve real images and generate artificial images in response to user inputs.
  • the matrix viewer computes correspondences between the images acquired from one camera with the images acquired from the other cameras.
  • the correspondences are preferably computed between frames from adjacent views acquired at the same time instant.
  • correspondences may be computed between frames at different time instants or non-adjacent cameras. (See, e.g., Fig. 18) Having performed the correspondence, the matrix viewer transforms the correspondence mappings such that the mappings point to the desired virtual viewpoint.
  • the desired virtual viewpoint is determined based on the number of virtual camera perspectives between each pair of real cameras (as identified in the camera graph file) and interpulating to the virtual position of each virtual camera perspective.
  • the matrix viewer warps or shifts the pixels in each image using the transformed mappings so that all pixels are in the coordinate system of the desired synthesized viewpoint.
  • tweening involves the computation of correspondences between images acquired from different camera views. Dense correspondences are preferred throughout the image since it is desired to display a complete image, and not just recover pose or some other information that typically requires fewer correspondences, although the density of correspondences often depends on the application. Correspondences are generated between at least two, preferably adjacent, views or cameras. However there are advantages in resolving occlusion problems if correspondences are computed between more than two views. As will be understood by one skilled in the art, there are several problems that can occur in the computation of correspondences.
  • Correspondences are then re-computed. This compute/warp process is repeated several times. This greatly improves the accuracy of the correspondences.
  • a second feature of this algorithm is coarse-to-fine refinement, hi this process, the compute/warp procedure is performed first at coarse image resolutions and then refined at finer scales. This improves the ability to deal with longer range motion, and also increases robustness in areas of the image that are textureless or contain aperture problems.
  • the basic flow algorithm can be modified to enhance performance. These performance enhancements include the use of a sliding local window for the correspondence calculation, rather than the use of a fixed window. Differently shaped windows have also been explored [20] and may be used in alternate embodiments. This provides enhanced performance at occlusion boundaries.
  • the flow computation can be seeded by a parametric alignment step where a global affine (or quadratic, projective) transform is computed between the frames. [4,7,10,15].
  • This brings the images into rough alignment and reduces the range of matching needed to be done by the optic flow computation for sub-pixel alignment.
  • One primary feature of the flow method is that the only constraint exploited is image matching. There are no three-dimensional (3D) constraints imposed.
  • image matching is the primary constraint (other than smoothness) that is imposed, correspondence and the resulting tweened results are typically relatively good in typical textured areas. Performance at occlusion boundaries is also acceptable, given that the camera spacing is sufficiently close.
  • a second advantage is computational efficiency. The algorithm does not require complex 3D calculation and is therefore relatively fast.
  • a third advantage is the fact that the flow process does not require any intensive camera calibration or setup procedures.
  • 3D shape in the scene is represented by 2 components.
  • the first component is a real or virtual 3D planar parametric surface in the scene, while the remaining residual 3D shape is represented by a non-parametric surface [12].
  • the scene of objects lying on a floor would be represented by a 3D planar parametric surface corresponding to the floor, and then by a residual non-parametric surface that is a direct function of the heights of the objects above the floor.
  • both the parametric and non-parametric surfaces are computed simultaneously [12].
  • the surfaces are recovered sequentially.
  • This method has several advantages and disadvantages over flow and other 3D recovery methods (discussed later).
  • An advantage over other 3D recovery methods is that the initial computation of the planar surface brings features into closer correspondence resulting in more accurate calculation of the non-parametric surface.
  • the advantage over the flow method is that 3D information is recovered and this can be used to resolve occlusion problems (discussed in the selection/merging section).
  • camera pose estimation preferably either has to be recovered during the computation, or has to be provided accurately to the algorithm.
  • the recovery of pose in the algorithm may not be sufficiently robust to deal with general scenes. Pose could be recovered in a calibration step for every camera in the system, however lens distortion and other factors would also require modeling.
  • the plane and parallax algorithm is also significantly more computationally intensive than the basic flow algorithm. This algorithm can also be extended to compute 3D shape using more than a pair of images, using the constraint that the shape is constant.
  • the ego-motion method for 3D-shape recovery [2,18] can also be used and is similar to the plane-and-parallax method, except a single depth map represents the 3D shape of the scene.
  • the advantage is that it is simpler than the plane and/or parallax methods to combine results from several image pairs to resolve occlusion boundaries.
  • a disadvantage is that long-range non-parametric correspondence is performed, and this is slightly less accurate than the parametric and non-parametric methods used in the plane and/then parallax methods.
  • This algorithm can also be extended to compute 3D shape using more than a pair of images, using the constraint that the shape is constant [17].
  • a related algorithm uses correlation rather than the brightness constraint to compute correspondence [16].
  • the correlation approach offers the ability to perform long-range correspondence without resorting to coarse scales in the pyramid, which may blur features excessively. It is to be understood that any of the above three sets of alignment methods for computing correspondences between frames as well as essentially any other heretofore [e.g., 1,5,13,14,19] or hereafter known methods may be used. However, for illustrative purposes, only the foregoing three methods are discussed in detail. The methods for computing plane and parallax, and other 3D methods are less robust in computation as compared to flow computation, and require more calibration (such as lens distortion [8] and camera pose calculations) and setup procedures. This becomes a very significant procedure when large numbers of cameras are involved.
  • the disadvantage of using flow is that 3D information on the location of occluding and dis-occluding areas is not exploited.
  • the flow-based methods allow the synthesis of new views from viewpoints which lie in straight lines between the captured views when using two cameras, hi contrast, the 3D methods compute depth and allow the generation of new views from arbitrary view-points which need not lie on the straight line between the view-points.
  • image based correspondence methods may fail when there is a little or no texture in the scene and when there is a lot of occlusion present. Some of this is mitigated is by using image pyramids and multiple images to do alignment etc.. However, there can still be cases when these methods fail.
  • These errors can be dealt with in a variety of methods, including, for example: 1. Post production editing of correspondence maps in regions of error; The correspondence maps can be examined by an operator and with a simple editing tool corrected in areas of error: 2.
  • Active sensing methods which use either project textured light patterns (in non- visible spectrum) or detect 3D range, can be used to provide more information for computing the correspondence maps. The active sensing methods can be used to acquire the background scene and/ or during imaging of the live event.
  • Various change detection algorithms such as those developed at Sarnoff Corporation [e.g. 13], may be used to detect foreground objects in the live scene.
  • the final correspondence maps between frames of the live scene are computed by intelligently selecting between correspondences computed for frames imaging the background scene and the correspondences of the frames for the live scene computed using optic flow or one of the other methods.
  • the next step is transforming the correspondence mappings.
  • the results of the algorithms described above can be used to compute a new mapping between each image and the synthetic view. If 3D information is available, then the approach is to compute an intermediate pose between two or more camera positions, and to compute the flow field produced by combining that pose and the depth map or 3D representation recovered at that camera position. If only 2D information is available, then the approach is to compute the mapping as a fraction of the flow field from one image to the next.
  • Pixels are then warped using bi-linear or bi-cubic warping methods so the pixels from the processed images are in the coordinate system of the synthetic viewpoint. If 3D information such as a depth map is available, then pixels will not be warped from locations where the depth map indicates that there is a dis-occlusion or occlusion. This area will be flagged so that a selection process can later choose the best intensity from other images to produce a result. In certain embodiments, however, limited occlusion is acceptable.
  • the correspondence algorithms that have been described can be performed both forwards and backwards on the same image pair, and can also be performed between different image pairs, both temporally and spatially. These correspondence results are preferably combined.
  • Real images are warped to the synthetic or artificial view-point. These images can be combined by a variety of methods to create the new synthetic image [3, 11] . If 3D depth is computed at each image, then the image with the nearest scene-point is used to create the new image. This prevents occluded regions from being rendered from the new viewpoint. If more than one image renders this scene point, then these images may be combined by an average, trimmed mean or median operations.
  • One effective combination method is to combine both background and forward results on the same image pair.
  • a second effective combination method is to measure the residual error after warping by each (original) flow field and to weight or discard those pixels derived from flow fields with significant local error. More specifically, an alignment quality mask can be used together with forward/backward computation of flow to select and combine the intensities appropriately
  • the objective for the display step is to show the tweened images in real-time.
  • the flow computation can be computed in non-real-time, and some parts of any flow-field selection and merging procedure may be computed in non real-time.
  • the flow field can also be quantized both spatially and in bit- depth in order to reduce the required IO bandwidth from disk (or other storage) into the display device.
  • the image warping preferably occurs in real-time.
  • all AVI files need not be transferred to a single computer or user interface device having a matrix viewer installed thereon, h one such alternate embodiment, the files are streamed from associated capture PC's over a network connection to the capture pc or other device containing the matrix viewer.
  • multiple end users each having a user interface device, are coupled to the capture system via a network, such as the Internet.
  • Each end user interface device includes a camera graph file as well as the matrix viewer software.
  • the AVI files consisting of the real image streams and the flow field data are streamed to each end user.
  • each end user interface device engages in two-way communication with a processing system for transferring the real image and flow field data (e.g., server 18, 804, 902) so that the processing system is aware of what camera view each end user is currently viewing.
  • a processing system for transferring the real image and flow field data (e.g., server 18, 804, 902) so that the processing system is aware of what camera view each end user is currently viewing.
  • the processing system is able to provide each end user with a limited number of AVI streams that are most likely to be needed by each end user. For example, if a given end user is currently viewing the output from camera 3, the processing system provides the end user with the AVI files logically surrounding camera 3, namely the flow field data files associated with the arcs between cameras 3 and 2 and camera 3 and 4, as well as the AVI files representing the real images from cameras 2 and 4.
  • a greater or fewer number of AVI files surrounding the end user's current perspective may be provided depending upon available bandwidth.
  • the processing system can anticipate the user's navigation and continuously provide the necessary AVI files.
  • the capture and processing system provides each end user with the necessary files within a window (e.g., a number of real and/or virtual camera positions in any one or more directions from the currently viewed), hi the present embodiment, the center of the window is preferably the user's current view.
  • the streaming data provided to the end user are weighted to the current direction of movement.
  • the system provides streams/files associated with more cameras and/or flow fields to the left of the end user's current view than associated with cameras to the right.
  • the matrix viewer preferably limits the speed of navigation among real and virtual perspectives so that fewer AVI files need to be provided to each end user.
  • the matrix viewer automatically causes the end user's view to stop on the perspective of the next real camera in the direction the end user was navigating.
  • This feature not only provides the user with clear (real) images when the user is not traversing the array, but also allows the system to conserve processing power and to better anticipate and provide the necessary AVI files for the end user's further navigation. It also minimizes the perception by the user of any ambient artifacts, which may be less apparent while the user is navigating between cameras, but more apparent when his/her perspective motion path has come to rest. Thus, stopping on an artifact-free real camera position insures that any transient artifacts will be less persistent and less noticeable to the user.
  • the matrix viewer of the present embodiment allows for the controlling of time and viewpoint, and warping of imagery in response to user inputs.
  • the input to the matrix viewer can be either a raw set of pre-synthesized image data, or a set of original image data together with a set of flow-fields.
  • the matrix viewer allows the user to navigate the data both in space and in time, with the use of 2 slider controls, a single graphical control (e.g., the four-quadrant button described above) and the like.
  • the matrix viewer thus works generally according to the following four steps.
  • the data Before being read into the matrix viewer, the data is first organized. The simplest organization would be to take all of the data and to store them sequentially in one file. Where the Operating System (OS) used cannot read or write the data file in a timely fashion because it is too large (e.g., many viewpoints and/or long sequence), the data is preferably pre-organized. Since real-time playback of the data is preferred, the accessing and retrieval of data from disk is preferably optimized. Accordingly, in the present embodiment the data is split into several smaller files, and data from adjacent viewpoints is grouped together in the same file. By producing several smaller files, the problem with the OS file size limitation was solved, hi certain embodiments, the data (e.g., AVI) files are stacked, with one file stored in a related file.
  • OS Operating System
  • Such stacked files may include, for example, different temporal periods of the same viewpoint.
  • image data associated with a number of contiguous viewpoints for a predetermined (relatively short) period of time are stored together.
  • the input data is preferably organized in order to minimize the number of O/S files that remain open when the data is read, and also so that the next likely temporal image that is to be displayed is likely to be contained within the same file as the current desired view.
  • the second step generally involves mapping the desired view given by the User Interface, and the next likely desired- view, onto the source data.
  • the user can control the desired view using the GUI, for example, two slider bars (one for spatial input and one for temporal input).
  • the temporal slider bar is either manually controlled or computer controlled such that it can play back a sequence automatically in time.
  • the position slider bar is controlled manually.
  • the positions of the two slider bars are fed to an indexing system, such as, for example, a file stored locally like the .kwz graphing file noted above.
  • the byte offset corresponds to the frame number or temporal position in the sfream, as each frame in such embodiments contain the same amount of data.
  • other methods of determining the position within the file and appropriate frame may be used, such as an index.
  • the mapping is performed not only for the desired frame, but also for the next expected frame in the temporal sequence. This is to allow prefetching to be performed, to optimize the speed performance of the viewer.
  • the mapping function is architected to allow navigation both in the temporal axis, as well as, depending upon the configuration of the array, in one or more spatial dimensions.
  • the third step uses the mapping performed in the second step to fetch the data for the desired view and the next (anticipated) desired view into local memory. More specifically, the mapping performed in the second step is used to determine the file that should be opened (if it is not open already), and the (byte or other) offset from the beginning of the file from which data should be read. It is important to note that at least for a WTNDOWS-based OS, it can take a relatively long time just to open a file, even without reading a byte of data into memory. As a result, it is advantageous to minimize the number of files that are required to be opened when a sequence is played. This can be done in the first step by grouping data from adjacent viewpoints into the same file.
  • the forth step involves processing the data for the desired view as required, displaying the view on the screen, and continuing to fetch data for the next desired view. If the data being read contained pre-synthesized imagery, then there is very little processing that is required to display the image data. The data is simply read from computer memory into the display memory. On the other hand, if the imagery has not been pre-synthesized, then the original image data and meta-data (in the present embodiments, flow fields were used) are processed by the processor/CPU to synthesize a new image using existing image synthesis algorithms. While the processor/CPU is performing its processing, the I/O module will continue to fetch data for the next desired view from memory. It is desirable to display imagery in real-time, which in this particular system corresponds to 30 frames per second. Thus, the viewer includes a timing function module, which ensures that the update rate does not exceed this rate.
  • Such an embodiment preferably includes a GUI that allows a producer to control the acquisition, storage and processing of the video.
  • the system continuously acquires video output from each camera, saving the video of each camera to a particular memory unit or drive. Once the memory is used up, the system overwrites the previously stored video in the same memory.
  • the producer enters an input causing the system to stop the process of overwriting the captured video, freezing a predetermined amount of video (e.g., fixed number of frames or seconds) within the memory.
  • the producer may then view the stored video and select all or a portion of the video for processing. Selection of the video portion to be processed is achieved through electronic tagging of the beginning frame and ending frame of the portion to be processed. In certain circumstances, both the beginning and end points may be a single frame.
  • the system proceeds to process the video as discussed above, creating files representative of the real images, the flow field data, and eventually the artificial views.
  • the GUI for one embodiment of the matrix viewer consists of two slider bars.
  • One bar controls the temporal advancement of the image sequence from whatever real or synthetic camera node it rests upon or is linked to. It moves the frame sequence forward or backward in time. In certain embodiments, the speed of playback is also selected by the end user. Such a temporal input adjusts the rate of the matrix viewer timer used to generate the images.
  • the other bar controls camera perspective, enabling movement from camera node to camera node, between the real and intervening synthetic camera positions, so that the user can fluidly guide his/her perspective around and through the viewpath (the aggregate available perspective motion path views within the camera array, both real and synthetic ).
  • Each location of the slider bar is mapped to a camera node. For example, the leftmost position of the slider bar corresponds to the leftmost camera, and the rightmost position of the slider bar corresponds to the rightmost camera. Spaced proportionally in between the two extremes are locations corresponding to each of the remaining real and artificial cameras/viewpoints.
  • the matrix viewer maps or correlates the position of the slider bar to the desired real camera and its associated stream or a virtual camera and its associated stream (which depending on the embodiment may need to be dynamically generated based on flow field data).
  • the number of camera nodes can be pre-programmed into the viewer, or the camera graph file provides an indication of the number of camera nodes. Based on the timer noted above, the matrix viewer generates the desired image.
  • the GUI integrates both the temporal and the viewpoint sliders into a single control enabling the user to navigate both dimensionally and in time using a single, less complicated, and less time-consuming gesture.
  • the GUI includes a grid having four quadrants: the top-left quadrant relates to movement forward in time and to the left in the array; the top right quadrant relates to movement forward in time and to the right in the array; the bottom right quadrant relates to movement back in time and to the right in the array; and the bottom left quadrant relates to movement back in time and to the left in the array.
  • Other multi-sectioned or partitioned buttons could be used in which the sections correspond to a temporal movement and directional movement.
  • the end user indicates movement through the array by placing a mouse cursor or other indicator in the grid: the further the indicator is from the center axis, the more pronounced the movement or navigation. More specifically, as the user moves the indicator further to the right, the user navigates faster through the array to the right. Likewise, movement of the indicator further to the left corresponds to faster movement to the left through the array. With regard to temporal aspects of navigating, the further the user moves the indicator to the top of the grid, the faster the image is played back, forward in time. Likewise, the further to the bottom the user moves the indicator, the faster the image is played back, back in time. It is to be understood that other arrangements in which the single location of an indicator correlates into two playback variables (e.g., direction of navigation and temporal aspect of playback) may be used.
  • the GUI can also be with a joystick-like control, a position-sensor/control technology, or any wired or wireless mouse control function.
  • a joystick-like control By clicking on an ergonomically placed feature add/drop function button, it will also be able to combine or drop either of the slider parameters so that it can also function as a controller of either of the individual slider bars by themselves.
  • the relevant camera positions within an array can then be activated or highlighted and scaled, based on real camera position data from position sensors hi the real world camera environment.
  • the user can click on a camera position within the map to view a particular camera perspective or slide a cursor along a highlighted path to fluidly guide his/her perspective motion path throughout the accessible viewpath.
  • This viewpath can be
  • the user inputs correspond to one of the real camera streams.
  • Navigation among the streams is accomplished by jumping from a frame or time perspective in one stream to the desired frame or time perspective in a second, selected stream, based on the user inputs.
  • Such an embodiments can also utilize the feature described above, by which a fixed number of streams are provided to the end user based on the user's current viewing perspective(i.e., camera being viewed) and/or navigational direction.

Abstract

Selon la présente invention, une interface utilisateur permet la commande du temps et du point de vue, et la déformation d'une imagerie suite à des entrées utilisateurs. L'entrée dans l'interface peut être un ensemble brut de données images pré-synthétisées, ou un ensemble de données images originales mêlé à un ensemble de champs d'écoulement. Dans les deux cas, l'interface permet à l'utilisateur de déplacer les données à la fois dans l'espace et dans le temps, par exemple, au moyen de deux commandes de défilement, d'une seule commande graphique (telle qu'un bouton à quatre quadrants, lesquels correspondent aux fonctions gauche, droite, avance dans le temps, retour dans le temps), et analogue. Dans un mode de réalisation de l'invention, l'interface fonctionne donc en principe selon les quatre étapes suivantes. Les données sont d'abord organisées avant d'être lues dans l'interface. La deuxième étape concerne d'une manière générale le mappage de la vue souhaitée donnée par l'interface, et de la vue suivante probablement souhaitée, sur les données sources. La troisième étape utilise le mappage effectué dans la deuxième étape pour l'extraction de données pour la vue souhaitée et la vue souhaitée (anticipée) suivante dans la mémoire locale. La quatrième étape consiste à traiter les données pour la vue souhaitée demandée, à afficher la vue sur l'écran et à poursuivre l'extraction de données en prévision de la prochaine vue souhaitée.
PCT/US2002/013004 2001-04-20 2002-04-22 Ensemble de cameras maniables et indicateur associe WO2002087218A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002307545A AU2002307545A1 (en) 2001-04-20 2002-04-22 Navigable camera array and viewer therefore

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US28520101P 2001-04-20 2001-04-20
US60/285,201 2001-04-20

Publications (2)

Publication Number Publication Date
WO2002087218A2 true WO2002087218A2 (fr) 2002-10-31
WO2002087218A3 WO2002087218A3 (fr) 2004-04-15

Family

ID=23093200

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/013004 WO2002087218A2 (fr) 2001-04-20 2002-04-22 Ensemble de cameras maniables et indicateur associe

Country Status (2)

Country Link
AU (1) AU2002307545A1 (fr)
WO (1) WO2002087218A2 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003036565A2 (fr) * 2001-10-23 2003-05-01 Carnegie Mellon University Systeme et procede permettant d'obtenir une video de plusieurs points de fixation mobiles dans une scene dynamique
DE10310636A1 (de) * 2003-03-10 2004-09-30 Mobotix Ag Überwachungsvorrichtung
US7027083B2 (en) 2001-02-12 2006-04-11 Carnegie Mellon University System and method for servoing on a moving fixation point within a dynamic scene
US7102666B2 (en) 2001-02-12 2006-09-05 Carnegie Mellon University System and method for stabilizing rotational images
EP1739957A1 (fr) * 2005-06-28 2007-01-03 Dynaslice AG Système et méthode d'enrégistrement et reproduction des images
US8249153B2 (en) 2007-06-12 2012-08-21 In Extenso Holdings Inc. Distributed synchronized video viewing and editing
WO2018030242A3 (fr) * 2016-08-09 2018-05-31 Sony Corporation Système à caméras multiples, caméra, procédé de traitement de caméra, appareil de confirmation et procédé de traitement d'appareil de confirmation
CN109565563A (zh) * 2016-08-09 2019-04-02 索尼公司 多相机系统、相机、相机的处理方法、确认装置以及确认装置的处理方法
CN113572975A (zh) * 2020-04-29 2021-10-29 华为技术有限公司 视频播放方法、装置及系统、计算机存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6304284B1 (en) * 1998-03-31 2001-10-16 Intel Corporation Method of and apparatus for creating panoramic or surround images using a motion sensor equipped camera
US20020047895A1 (en) * 2000-10-06 2002-04-25 Bernardo Enrico Di System and method for creating, storing, and utilizing composite images of a geographic location
US6393163B1 (en) * 1994-11-14 2002-05-21 Sarnoff Corporation Mosaic based image processing system
US6393144B2 (en) * 1994-12-29 2002-05-21 Worldscape, L.L.C. Image transformation and synthesis methods
US20020063775A1 (en) * 1994-12-21 2002-05-30 Taylor Dayton V. System for producing time-independent virtual camera movement in motion pictures and other media

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6393163B1 (en) * 1994-11-14 2002-05-21 Sarnoff Corporation Mosaic based image processing system
US20020063775A1 (en) * 1994-12-21 2002-05-30 Taylor Dayton V. System for producing time-independent virtual camera movement in motion pictures and other media
US6393144B2 (en) * 1994-12-29 2002-05-21 Worldscape, L.L.C. Image transformation and synthesis methods
US6304284B1 (en) * 1998-03-31 2001-10-16 Intel Corporation Method of and apparatus for creating panoramic or surround images using a motion sensor equipped camera
US20020047895A1 (en) * 2000-10-06 2002-04-25 Bernardo Enrico Di System and method for creating, storing, and utilizing composite images of a geographic location

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7027083B2 (en) 2001-02-12 2006-04-11 Carnegie Mellon University System and method for servoing on a moving fixation point within a dynamic scene
US7102666B2 (en) 2001-02-12 2006-09-05 Carnegie Mellon University System and method for stabilizing rotational images
US7106361B2 (en) 2001-02-12 2006-09-12 Carnegie Mellon University System and method for manipulating the point of interest in a sequence of images
WO2003036565A2 (fr) * 2001-10-23 2003-05-01 Carnegie Mellon University Systeme et procede permettant d'obtenir une video de plusieurs points de fixation mobiles dans une scene dynamique
WO2003036565A3 (fr) * 2001-10-23 2004-02-12 Univ Carnegie Mellon Systeme et procede permettant d'obtenir une video de plusieurs points de fixation mobiles dans une scene dynamique
DE10310636A1 (de) * 2003-03-10 2004-09-30 Mobotix Ag Überwachungsvorrichtung
US7801331B2 (en) 2003-03-10 2010-09-21 Mobotix Ag Monitoring device
EP1739957A1 (fr) * 2005-06-28 2007-01-03 Dynaslice AG Système et méthode d'enrégistrement et reproduction des images
US8249153B2 (en) 2007-06-12 2012-08-21 In Extenso Holdings Inc. Distributed synchronized video viewing and editing
WO2018030242A3 (fr) * 2016-08-09 2018-05-31 Sony Corporation Système à caméras multiples, caméra, procédé de traitement de caméra, appareil de confirmation et procédé de traitement d'appareil de confirmation
US20190020820A1 (en) * 2016-08-09 2019-01-17 Sony Corporation Multi-camera system, camera, processing method of camera, confirmation apparatus, and processing method of confirmation apparatus
CN109565562A (zh) * 2016-08-09 2019-04-02 索尼公司 多相机系统、相机、相机的处理方法、确认装置以及确认装置的处理方法
CN109565563A (zh) * 2016-08-09 2019-04-02 索尼公司 多相机系统、相机、相机的处理方法、确认装置以及确认装置的处理方法
US10708500B2 (en) 2016-08-09 2020-07-07 Sony Corporation Multi-camera system, camera, processing method of camera, confirmation apparatus, and processing method of confirmation apparatus for capturing moving images
US11323679B2 (en) 2016-08-09 2022-05-03 Sony Group Corporation Multi-camera system, camera, processing method of camera, confirmation apparatus, and processing method of confirmation apparatus
CN113572975A (zh) * 2020-04-29 2021-10-29 华为技术有限公司 视频播放方法、装置及系统、计算机存储介质
CN113572975B (zh) * 2020-04-29 2023-06-06 华为技术有限公司 视频播放方法、装置及系统、计算机存储介质

Also Published As

Publication number Publication date
AU2002307545A1 (en) 2002-11-05
WO2002087218A3 (fr) 2004-04-15

Similar Documents

Publication Publication Date Title
US6741250B1 (en) Method and system for generation of multiple viewpoints into a scene viewed by motionless cameras and for presentation of a view path
US6674461B1 (en) Extended view morphing
US7224382B2 (en) Immersive imaging system
Uyttendaele et al. Image-based interactive exploration of real-world environments
KR101203243B1 (ko) 상호작용적 시점 비디오 시스템 및 프로세스
Aliaga et al. Plenoptic stitching: a scalable method for reconstructing 3d interactive walk throughs
Kanade et al. Virtualized reality: Concepts and early results
US7613999B2 (en) Navigable telepresence method and systems utilizing an array of cameras
AU761950B2 (en) A navigable telepresence method and system utilizing an array of cameras
JP4153146B2 (ja) カメラアレイの画像制御方法、及びカメラアレイ
JP5406813B2 (ja) パノラマ画像表示装置およびパノラマ画像表示方法
US6084979A (en) Method for creating virtual reality
US20020190991A1 (en) 3-D instant replay system and method
US20070248283A1 (en) Method and apparatus for a wide area virtual scene preview system
US20030076413A1 (en) System and method for obtaining video of multiple moving fixation points within a dynamic scene
WO2001028309A2 (fr) Procede et systeme permettant de comparer plusieurs images au moyen d'un reseau de cameras navigable
WO2002011431A1 (fr) Systeme video et procede de commande associe
WO1995007590A1 (fr) Processeur d'images variant dans le temps et dispositif d'affichage
JP2002544742A (ja) デジタルイメージ撮影システム
JP2014529930A (ja) ネイティブ画像の一部の選択的キャプチャとその表示
WO2002087218A2 (fr) Ensemble de cameras maniables et indicateur associe
Kanade et al. Virtualized reality: perspectives on 4D digitization of dynamic events
Kanade et al. Virtualized reality: Being mobile in a visual scene
US20230353717A1 (en) Image processing system, image processing method, and storage medium
US11706375B2 (en) Apparatus and system for virtual camera configuration and selection

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP