WO2000079799A2 - Procede et appareil de composition et de visualisation de sequences d'image - Google Patents

Procede et appareil de composition et de visualisation de sequences d'image Download PDF

Info

Publication number
WO2000079799A2
WO2000079799A2 PCT/US2000/017372 US0017372W WO0079799A2 WO 2000079799 A2 WO2000079799 A2 WO 2000079799A2 US 0017372 W US0017372 W US 0017372W WO 0079799 A2 WO0079799 A2 WO 0079799A2
Authority
WO
WIPO (PCT)
Prior art keywords
media
scene
media object
objects
media objects
Prior art date
Application number
PCT/US2000/017372
Other languages
English (en)
Other versions
WO2000079799A3 (fr
WO2000079799A9 (fr
Inventor
Sassan Pejhan
Bing-Bing Chai
Iraj Sodagar
John Festa
Original Assignee
Sarnoff Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sarnoff Corporation filed Critical Sarnoff Corporation
Publication of WO2000079799A2 publication Critical patent/WO2000079799A2/fr
Publication of WO2000079799A3 publication Critical patent/WO2000079799A3/fr
Publication of WO2000079799A9 publication Critical patent/WO2000079799A9/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8146Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23412Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs for generating or manipulating the scene composition of objects, e.g. MPEG-4 objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • the invention relates to multi-media authoring. More particularly, the invention relates to a method and apparatus for providing an architecture for object-based multimedia authoring and composition.
  • Video clips are becoming abundant on many INTERNET web sites and have been available on CD-ROMs for many years now. Unlike other traditional media, such as audio and text, video clips in their raw format can become prohibitively large computer files, consuming storage and bandwidth at unacceptably high rates. A substantial amount of research has therefore been performed over the past 30 years to develop efficient video compression algorithms.
  • Several standards, including MPEG (-1, -2, -4), H.261 and H.263 have been developed. Almost all digital video sequences, whether on the web, on CD- ROMs or on local hard disks, are stored in one compressed format or another.
  • a given frame may not be decodable independently of other frames in the sequence.
  • the operations which are simple to implement for compressed streams include Play, Stop/Pause, Slow Motion and Rewind, which are currently performed by most standard software decoders.
  • the challenging operations include random frame access, fast forward, playing in reverse and jumping to the next scene change.
  • Brute force solutions to these challenges could be implemented using a very powerful computer, or when dealing with very small resolution and/or relatively short sequences.
  • the Fast Forward control could be implemented by decoding and displaying the clip at two or three times the natural speed. With high resolutions and long sequences, however, this is not a practical option, particularly in cases where the video is being streamed over a network.
  • an authoring or composing tool that provides an efficient and user friendly graphical interface for composing a multi-media scene.
  • various multimedia "objects” e.g., video clips, audio clips, still images, animation and the likes
  • functions that will allow a user to easily select and integrate these objects from an object library into a multi-media scene.
  • VCR-type controls as discussed below can be adapted as additional functions into such authoring or composing tool.
  • One embodiment of the present invention is a method and apparatus for multi-media authoring.
  • the invention relates to a method and apparatus for providing an architecture for object-based multimedia authoring and composition.
  • the architecture of the multi-media composer allows for automatic addition of a newly added multimedia object into the existing scene description. Namely, when a multimedia object is inserted into a scene composition area of the multi-media composer, relevant information associated with the multi-media object is automatically added to an existing scene description (or a new scene description will be created if no scene description currently exists). Various types of object oriented (00) programming "objects" are created to implement this feature. Additionally, other features or functions within a composed scene are supported by the present architecture of the multi-media composer. These functions, include but are not limited to, resizing and positioning of multimedia objects; organizing z-order of multi-media objects, implementing background transparency of multi-media objects, and providing scene playback of multi-media objects. Thus, the present invention discloses a novel and useful multi-media composer that can be employed to efficiently generate multi -media contents.
  • FIG. 1 depicts a block diagram of a video sequence encoder in accordance with the present invention that produces a compressed btstream and an associated auxiliary file;
  • FIG. 2 depicts a file structure for a first embodiment of an auxiliary file
  • FIG. 3 depicts a file structure for a second embodiment of an auxiliary file
  • FIG. 4 depicts a file structure for a thir d embodiment of an auxiliary file
  • FIG. 5 depicts a block diagram of a decoder for decoding bitstreams produced by the encoder of FIG. 1;
  • FIG. 6 depicts a block diagram of a client and server for streaming, decoding, and displaying remote bitstreams produced by the encoder of FIG. i;
  • FIG. 7 depicts a block diagram of a multi-media composer and viewer of the present invention
  • FIG. 8 depicts a view of the multi-media composer and viewer of the present invention on a display
  • FIG. 9 depicts a flowchart of a method for composing a multi-media scene using object oriented programming elements
  • FIG. 10 depicts a flowchart of a method for resizing and positioning one or more multi-media objects within a composition scene of the present invention
  • FIG. 11 depicts a flowchart of a method for organizing the z-order of one or more multi-media objects within a composition scene of the present invention
  • FIG. 12 depicts a flowchart of a method for implementing background transparency of one or more multi-media objects within a composition scene of the present invention
  • FIG. 13 depicts a flowchart of a method for implementing scene playback of one or more multi-media objects within a composition scene of the present invention.
  • the major video coding techniques/standards in use today include H.263, geared towards low bit-rate video, MPEG-1, developed for CD-ROM applications at around 1.5 Mbits/s and MPEG-2, designed for very high quality video (HDTV) at around 10 Mbits/s. Although there are major differences among these three standards, they are all based on the same basic principles described below.
  • Each frame of video can be one of three types: Intra-coded (I) frames (i.e., anchor frames), Predicted (P) frames and Bi-directionally predicted (B) frames.
  • I-frames are encoded very much like still images (i.e., JPEG) and achieve compression by reducing spatial redundancy: a Discrete Cosine Transform (DCT) operation is applied to 8x8 blocks of pixels within the frame, starting from the top, left block and moving to the right and down the rows of pixels. To complete the encoding of an I-frame, the DCT coefficients are then quantized and entropy encoded.
  • DCT Discrete Cosine Transform
  • the P-frames are predicted from a preceding I- or P-frame.
  • each 16x16 MacroBlock (MB) in a P-frame is matched to the closest MB of the frame from which it is to be predicted.
  • the difference between the two MBs is then computed and encoded, along with the motion vectors. As such, both temporal and spatial redundancy is reduced.
  • B-frames are coded in a manner similar to P-frames except that B- frames are predicted from both past and future I- or P-frames.
  • I-frames are much larger than P or B frames, but they have the advantage of being decodable independent of other frames.
  • P and B frames achieve higher compression ratios, but they depend on the availability of other frames in order to be decoded.
  • the first embodiment of the invention for implementing VCR type controls generates a small, separate auxiliary 'vcr' file for each compressed video sequence (bitstream).
  • This auxiliary file contains key information about the associated bitstream that enables efficient implementation of VCR type controls.
  • an associated auxiliary file e.g., with the same prefix as the compressed file name but with a s vcr' suffix.
  • This auxiliary file primarily contains information about the position of Intra-coded frames (I-Frames) within the compressed bitstream.
  • Fig. 1 depicts a block diagram of a video sequence encoder system 100 containing an encoder 102, an auxiliary file generator 104 and a storage device 108 that operate in accordance with the present invention.
  • the encoder 102 encodes a video sequence in a conventional manner (e.g., as described above), but also produces I-frame information for use by the auxiliary file generator 104. This information pertains to the location of the I-frames within the encoded bitstream 110, e.g. the position of the I- Frame with respect to the beginning (or end) of the bitstream.
  • the auxiliary file generator 104 produces an auxiliary file 106 for each encoded bitstream 110.
  • Video sequences may be encoded at either a variable or constant frame rate.
  • the former may occur when encoders drop frames, in an irregular fashion, in order to achieve a constant bit-rate.
  • I-Frames may or may not occur at fixed intervals.
  • Such aspects of the coding process are not specified by the standards but are left to implementers. For some applications, it may make sense to insert I-Frames at fixed intervals (e.g., every 30th frame can be an I-Frame). For other applications, implementers may decide to insert an I- Frame only whenever there is a scene change - something which may occur at irregular time intervals.
  • the auxiliary file has a different format depending on whether I-Frames are inserted at fixed or variable intervals.
  • FIG. 2 illustrates the format of an auxiliary file 200 for use with a bitstream having a fixed I-frame interval.
  • the auxiliary file 200 contains a field 202, e.g., one byte, at the head of the file indicating the size of the fixed interval.
  • a field 204 e.g., four bytes for every I-frame is included in the header to indicate the offset from the beginning (or the end) of the bitstream at which each I-frame is located.
  • auxiliary file 200 of FIG. 2 is augmented with additional information to become the auxiliary file 300 of FIG. 3.
  • the first field 302 of auxiliary file 300 is still that of the I- frame interval, but the field value is set to 0 (or some other special code) to indicate a variable frame rate.
  • Field 306 containing a 2-byte frame number
  • field 308 containing the 4-byte offset information.
  • Field 304 which indicates the total number of frames in the entire sequence, can be optionally added to the auxiliary file 300 (placed right after the frame interval field 302). As will be described below, this optional information can help speed up the implementation of the random frame access control.
  • a one-bit Scene Change Indicator (SCI) field can be inserted in the auxiliary file for each I-Frame, indicating whether there has been a scene change or not from the previous I-Frame.
  • SCI Scene Change Indicator
  • One way of inserting this field is to add another one-byte field 310 for each I-frame, with the first bit serving as the SCI and the other bits reserved for future use.
  • the first bit of the 4-byte offset field 308 can be designated as the SCI field 312, with the remaining 31 bits used for the offset, as shown in FIG. 4. Since the file format 300 for variable I-frame intervals is a superset of the one for fixed I-frame intervals (format 200), it could be used for both cases. This makes the implementation of the invention slightly easier.
  • the additional 2-bytes per I-frame will make the auxiliary files larger, however, for the case of fixed I-frame interval. Whether the trade-off is worth it or not is a choice for implementers to make and will vary from one case to another. All in all, however, the size of the auxiliary files generated is negligible compared to the size of the compressed video file. For the fixed I- frame interval case, the size is basically four bytes multiplied by the number of I-frames. If I-frames are inserted as frequently as even three times a second (i.e. once every tenth frame), then the auxiliary file adds 12 bytes (84 bits) per second.
  • the additional storage required for the auxiliary file is negligible.
  • the size of the auxiliary file is approximately six bytes multiplied by the number of I-frames. That translates into 126 bits/s, assuming three I-frames per second on the average.
  • FIG. 5 and FIG. 6 depict block diagrams of two different systems ("players") for playback of compressed video bitstreams with VCR-type controls.
  • FIG. 5 depicts a player 500 that operates to playback locally stored video files.
  • This player 500 comprises a User Interface/Display 502, a decoder 504, an auxiliary file processor 506 and local storage 108.
  • the user interacts with the system through the user interface (e.g., a graphical user interface that has various "buttons" for VCR controls).
  • the decoded bitstream may also be displayed here or on a separate display.
  • the decoder 504 operates like any standard decoder, except that when VCR commands are issued, it interacts with the auxiliary file processor 506 to determine the location in the bitstream from where the decoder needs to start decoding.
  • the auxiliary file processor 506 in turn retrieves that information from the auxiliary file.
  • Both the bitstream and the associated auxiliary file are stored locally on the storage device 108.
  • FIG. 6 depicts a system 600 where the bitstream and associated auxiliary file are stored remotely on a server 602.
  • the bitstream is streamed to the player 601 over a network 612.
  • the decoder 604 relays this command over the network 612 to the server 602.
  • a buffer 610 located between the network 612 and the decoder 604.
  • the server 602 then interacts with the auxiliary file processor 606, which now resides on the server 602, to determine the location within the bitstream from which the server should start transmission.
  • the decoder 504 or 604 operates in a conventional manner without needing to retrieve any information from the auxiliary file, i.e., the decoder sequentially selects frames for decoding and display.
  • the system 500 or 600 needs to decode frames as in the usual play mode using the interframe predictions but without displaying the decoded frames until the desired frame is reached. As such, the decoder 504/604 blocks display of the decoded frames until the selected frame is decoded. 3. All other cases require special handling. First the I-frame prior to the selected frame has to be identified, i.e., the I-frame prior to the selected frame must be identified when given the current frame number being decoded and given the fact that the first frame in the sequence is an I-frame. If the I-frame interval is fixed, the selected frame is easily determined.
  • the offset of the I-frame will be read from the auxiliary file 200 and provided to the decoder 504/604. Since there is a fixed size for the auxiliary file header and a fixed sized field (4-byte field 204) for each I-frame, determining the offset is trivial.
  • the bitstream pointer that selects frames for decoding in the decoder 504/604 would then be moved according to the offset retrieved from the auxiliary file.
  • the I-frame and the subsequent P- frames would be decoded but not displayed until the selected frame is decoded.
  • the decoder has to determine if there is an I-Frame which preceded the frame of interest or not. To this end, it has to look up the 2-byte frame numbers (field 306) in the auxiliary file 300 and extract the appropriate I-Frame accordingly. To speed up the search for the appropriate I-Frame, the field 304 indicating the total number of frames in the entire sequence is used. As such, the server 602 compares the number of the frame to be decoded with the total number of frames in the sequence and determines an estimate of where in the auxiliary file the server wants to start the I-frame search.
  • the scenarios described above apply to both local and remote files.
  • the player (client) 500 has to perform the tasks indicated; for remote files, the client 601 sends a request to a server 602 for a random frame, and the server 602 has to look up the auxiliary file and resume transmission from the appropriate place in the bitstream.
  • the auxiliary file will be scanned to find the next/previous I-Frame that has the scene change bit set to TRUE. That frame is decoded and displayed and the clip starts playing from that point onwards.
  • Algorithms for detecting scene changes are well-known in the art. An algorithm that is representative of the state of the art is disclosed in Shen et al., "A Fast Algorithm for Video Parsing Using MPEG Compressed Sequences", International Conference on Image Processing, Vol. 2, pp. 252-255, October 1995.
  • the auxiliary file information is used to provide a fast forward effect in the decoded video.
  • the Fast Forward operation can simply be viewed as a special case of random access. Running a video clip at, say, three times its natural speed by skipping two out of every three frames is equivalent to continuously making 'random' requests for every third frame (i.e., requesting frame 0, 3, 6, 9, and so on). Every time a frame is requested, the random frame access operation described above first determines the position of the nearest preceding I-frame just as before. The frames from that I-frame to the selected frame are decoded but not displayed. As such, only the requested frames are displayed, i.e., every third frame.
  • the invention includes two embodiments for implementing Reverse play (both normal and fast speed).
  • the first embodiment which is simpler but less efficient, is to view the reverse play as a special case of random frame access.
  • the server or local decoder invokes the random frame access mechanism described above.
  • the number of the 'random' frame to be retrieved is decremented by one or N each time, depending on the playback speed. This scheme is inefficient due to the existence of predicted frames. To see why, consider the following case:
  • the second embodiment involves caching in memory (cache 508 or 608) all the frames in a Group of Pictures (GOP) when the Reverse control is invoked.
  • the decoder 504/604 can decode all frames between 0 and 9, cache them in memory (508/608 in FIGS. 5 and 6), and then display them in reverse order (from 9 to 0). While this would be much more efficient than the first embodiment, this embodiment does have the drawback of consuming significant amounts of memory, if the GOP is large and/or if the resolution of the video is high.
  • FIG. 7 depicts a block diagram of a multi-media composer and viewer 700 of the present invention.
  • the multi -media composer and viewer 700 can be tailored to handle MPEG compliant multi-media contents (e.g., MPEG-4 and the likes), thereby earning the name "MPEG Authoring Tool” (MAT).
  • MPEG-4 multi-media contents
  • MAT MPEG Authoring Tool
  • the multi-media composer and viewer 700 is described below in view of MPEG-4 multi-media contents, it should be understood that the present invention is not so limited and can be adapted to other multi-media contents that are compliant to other standards, such as MPEG-2, MPEG-7, JPEG, ATSC, H.263 and the likes.
  • the multi-media composer and viewer 700 provides both composing as well as viewing functions and capabilities.
  • the viewing capabilities as described above can be deployed in the present multi-media composer and viewer 700, whereas composing functions and capabilities are now described below.
  • the MPEG Authoring Tool provides a graphical interface for the composition of MPEG-4 scene descriptions as well as a facility for viewing the playback of the new scene.
  • Various elements or features of graphical user interface design are employed, e.g., point-and-click selection, positioning, resizing, and z-order positioning of multi-media objects in the scene.
  • FIG. 8 depicts a view 800 of the multi-media composer and viewer of the present invention on a display, e.g., a computer screen.
  • a display e.g., a computer screen.
  • one such element is the ability to drag-and-drop a file, e.g., from Windows Explorer (a trademark of Microsoft Corporation of Redmond, Washington) onto the scene composition area 810 using a thumbnail bar 820 having a plurality of selectable multi-media objects 822-828.
  • the scene composition area 810 also contains a selection status area 812 that displays various specific information relating to a particular multi-media object, e.g., its relative location in the scene composition area 810, its width, its height, its display order ("Z-order") and the likes.
  • drag-and-drop is a common feature, in general, the usual result of a drag-and-drop operation is only to open a new "document window" within the parent application. In the case of the present MAT, the result of the drag-and-drop operation results in the automatic addition of a new multi-media object into the existing scene description.
  • each object is represented by an instance of a media class (e.g., a specific video object can be represented as an instance of a Video Object Class). As such, it includes information regarding the position, scale, Z-order, time it appears in, or disappears from, the scene, etc. of that specific object.
  • the scene description mechanism aggregates all of that information (for all the objects in the scene) into one single "Scene Description" file.
  • An example of such a file is an MPEG-4 Binary Format for Scenes (BIFS). Every time a new object is added to a scene, (or existing attributes of an existing object within a scene, such as position, size or z-order are modified), the scene description file is also modified to account for the newly added object (or changes to the existing object).
  • BIFS Binary Format for Scenes
  • the present invention can cycle through all the objects in the composed scene, thereby collecting their attributes and writing them into a scene description file using the BIFS syntax.
  • the update to the scene description file can occur immediately after an object is deposited into the scene composition area or when the object is modified within the scene composition area.
  • the update mechanism can be implemented in accordance to a schedule, e.g., every 15 minutes or when the user saves the composed scene.
  • the multi-media composer and viewer of the pr esent invention supports various types of multi-media objects, e.g., MPEG-4 object-based video, H.263 video, raw YUV-encoded video, an audio stream, and VTC still images.
  • the MAT employs object-oriented (00) design principles to achieve source code modularity and design extensibility for future additions.
  • object-oriented (00) design principles to achieve source code modularity and design extensibility for future additions.
  • FIG. 9 illustrates a flowchart of a method 900 for composing a scene using object oriented programming elements.
  • Method 900 starts in step 905 and proceeds to step 910, where method 900 obtains a "media object" to be inserted into a scene composition area, e.g., executing a point-and-click operation.
  • scenes are composed of one or more "media objects", each of which represents, for instance, a single multi-media object, e.g., an MPEG-4 video stream, audio stream, or VTC still image.
  • a media object would be called an "entity class" since it is responsible for maintaining state information about a real world thing.
  • a media object contains information about the source of the multi-media object (e.g., the URL, or local file path) as well as information about the intrinsics of the data, such as image width, height, and color depth.
  • method 900 generates a "rendered med ia object" associated with said media object.
  • a separate object called a "rendered media object" (another "entity class") which encapsulates knowledge of a single instance of the visual representation of the media object.
  • the rendered media object contains information such as the scale at which the media object is drawn, spatial location of the object within the scene, and may also contain temporal information such as rendered frame rate (which may be different than the intrinsic frame rate).
  • step 930 method 900 generates a "media object accessor" associated with said media object.
  • the third object "media object accessor” is responsible for accessing the data referenced by the media object and providing it to the rendering subsystem for display.
  • a media object accessor would be classified as a "boundary class" in 00 terminology.
  • the advantage of this third object is to encapsulate knowledge of how to access a media object from its source repository, and hide the details of disk or network access from the rest of the system.
  • Media object accessors are responsible for maintaining state information such as the last accessed position in the video stream. Media object accessors are also responsible for raising an alert when the end of the stream is reached.
  • Method 900 then ends in step 935.
  • illustrative multi-media composer and viewer 700 comprises a processor (CPU) 730, a memory 740, e.g., random access memory (RAM), a composer and/or viewer 720, and various input/output devices 710, (e.g., a keyboard, a mouse, an audio recorder, a camera, a camcorder, a video monitor, any number of imaging devices or storage devices, including but not limited to, a tape drive, a floppy drive, a hard disk drive or a compact disk drive).
  • processors CPU
  • memory 740 e.g., random access memory (RAM)
  • composer and/or viewer 720 e.g., a composer and/or viewer 720
  • various input/output devices 710 e.g., a keyboard, a mouse, an audio recorder, a camera, a camcorder, a video monitor, any number of imaging devices or storage devices, including but not limited to, a tape drive, a floppy drive, a hard
  • the composer 722 and viewer 724 can be implemented jointly or separately. Additionally, the composer 722 and viewer 724 can be physical devices that are coupled to the CPU 730 through a communication channel. Alternatively, the composer 722 and viewer 724 can be represented by one or more software applications (or even a combination of software and hardware, e.g., using application specific integrated circuits (ASIC)), where the software is loaded from a storage medium, (e.g., a magnetic or optical drive or diskette) and operated by the CPU in the memory 740 of the computer. As such, the composer 722 and viewer 724 (including associated data structures) of the present invention can be stored on a computer readable medium, e.g., RAM memory, magnetic or optical drive or diskette and the like.
  • a computer readable medium e.g., RAM memory, magnetic or optical drive or diskette and the like.
  • various multi-media information are stored and retrieved from a storage device 710 of the multi-media composer and viewer 700.
  • the multimedia information may include, but is not limited to, various image sequences such as complete movies, movie clips or shots, advertising clips, music videos, audio streams, animation clips and the like.
  • the image sequences may or may not include audio streams or data streams, e.g., closed captioning and the like.
  • the core task associated with graphically positioning and resizing multi-media objects in an apparatus and method such as the MAT is tracking and recording movements of a pointing device, e.g., mouse movements, known as “mouse-hit testing".
  • a pointing device e.g., mouse movements
  • mouse-hit testing e.g., mouse movements
  • a common technique for creating "hotspots" in a graphical image involves overlaying invisible graphical "widgets" over each desired hotspot, and using the widgets' built-in mouse-hit testing facilities which are returned to the parent application via software "events".
  • FIG. 10 depicts a flowchart of a method 1000 for resizing and positioning one or more media objects within a composition scene of the present invention.
  • Method 1000 starts in step 1005 and proceeds to step 1010 where method 1000 overlays each media object with a transparent widget.
  • each media object rendered in the scene is overlaid by a transparent widget known as an "image control", which is responsible for alerting the MAT which multi-media object the user has currently selected with the mouse.
  • image control which is responsible for alerting the MAT which multi-media object the user has currently selected with the mouse.
  • method 1000 then generates a separate "resizer” object that utilizes several such widgets in order to facilitate positioning and resizing.
  • the unique architecture of this "resizer object” provides various novel feature of the present invention.
  • Method 1000 then ends in step 1025.
  • the "resizer” object is constructed of eight "label controls", one "image control”, and one "shape control", all of which are standard graphical interface components provided with MS Visual Basic. Once a user selects an object in the scene, the resizer control is made visible and overlaid on the selected object. Once this is accomplished, all subsequent mouse events that occur within the region enclosed by the resizer control are captured by the resizer object.
  • the resizer control consists of a transparent rectangular region overlaid on the selected object, along with eight "handles” which are smaller opaque boxes positioned at the corners and mid-points of each of the sides of the central rectangle, shown with reference numeral 814 in FIG. 8.
  • the central rectangle consists of an invisible image control, which is capable of receiving mouse events while maintaining transparency with its background, and a shape control that is needed to provide the user with a visual cue to the placement of the central rectangle.
  • Two controls are needed here because while the image control provides mouse-tracking ability, it does not provide the feature rich visual outlining capabilities of the shape control, which, in turn, provides no mouse-tracking features.
  • Each of the handles is composed of a label control, which like the image control provides mouse tracking, but unlike the image control, provides for good visual feedback to the user.
  • the operation of the resizer control then is to capture the mouse events raised by these graphical elements, resize and reposition them according to the user's requests, and repackage those events into custom-generated events, alerting the parent software application of the resizer object that the size or position of the resizer graphical components have changed. It is then up to the parent application of the resizer object to resize/reposition the rendered media object currently associated with the resizer object, and update the data stored in the rendered media object accordingly.
  • Separating the resizer functionality into a separate object enable s the source code to be re-used across many different applications, thus reducing future development time.
  • MPEG-4 BIFS scene description generation in the MAT is accomplished upon request of the user, by simply indexing through the list of rendered media objects in the scene, and passing size and position information from each rendered media object to the BIFS- generating subsystem.
  • the MPEG-4 BIFs scene description uses its own VRML-like syntax to describe the objects in a scene and the relationships among them.
  • the MAT automatically translates the information from each "rendered media object" to BIFS syntax. It also generates the relationships using BIFS syntax.
  • Z-order can be defined as the sequential overlay order of multi-media objects in the z-axis, i.e., the axis coming out of a screen. For example, if multi-media objects overlap, then the z-order defines the relative positioning of the multi-media objects, e.g., which multimedia object is in the front and which multi-media object is in the back and so on.
  • FIG. 11 depicts a flowchart of a method 1100 for organizing the z- order of one or more multi-media objects within a composition scene of the present invention.
  • Method 1100 starts in step 1105 and proceeds to step 1110, where a z-order location is defined for a multi-media object. For example, selecting a multi-media object to define its z-order, e.g., as being in the front of a plurality of media objects, and so on.
  • step 1120 method 1100 shuffles the z-order of the multi-media object to its proper order.
  • the difficulty in graphically positioning objects in the z-order lies in the Visual Basic limitation on setting the z-order of image controls. Setting the z-order is limited to positioning an image control at either the back or the front of the z-order, but "shuffling" an image control forward or backward one position is not supported. To address this limitation, method 1100 is capabl e of providing this feature.
  • Step 1120 involves shuffling an existing z-order of multi-media objects (if one already exists) such that the z-order of multi-media objects is at a point where the desired multi-media object should be positioned.
  • the desired multi-media object is then inserted at this point in front or behind this location and the modified z-order is reshuffled forward or backward to arrive at a new z-order with the newly added multi-media object in the proper position.
  • the following pseudo-code illustrates the process of shuffling an object back one position in the z-order:
  • the steps performed by method 1100 are not detectable by the user. Namely, in operation, a user will simply drag and drop a multi-media object from a thumbnail bar 820 onto the scene composition area 810. Upon detection of one of more existing multi-media objects at the desired location, the selection status area 812 will query what z-order is desired (e.g., 1,2 ... n) for the current selected multi-media object. The user can then type a z-order value into the selection status area 812. Alternatively, in one embodiment, the user can simply click on a second button on a multi-button mouse to have the selected multi-media object simply shuffle backward.
  • z-order e.g. 1,2 ... n
  • step 1130 method 1100 tracks the z-order of each media object in accordance with its rendered media object.
  • each rendered media object is associated with an image control in order to accomplish mouse-hit testing as discussed above. Since Visual Basic provides a means to adjust the z-order of image controls, adjusting the z-order of each image control in alignment with its associated rendered media object is sufficient to distinguish which multi-media object is to be accepted by the user. This is valid since the image control at the top of the z-order will trap the mouse movement events, effectively shielding the image controls behind it.
  • Tracking the z-order of each object programmatically is then a matter of maintaining an ordered list or array in which each element of the list is a reference to a single rendered media object, and ordinal position in which the reference appears in the list represents its position in the z-order. Shuffling the references in the list enables efficient management of z-order position tracking as discussed above.
  • rendering of object-based MPEG-4 encoded video allows that an object of arbitrary shape which is situated within a rectangular region (i.e., "bounding box") be rendered onto a background such that the area that is inside the bounding box but outside the arbitrarily shaped "object" region is transparent to the background.
  • bounding box an object of arbitrary shape which is situated within a rectangular region
  • FIG. 8 shows that multi-media object 826 is shown within a bounding box having an object of arbitrary shape 826b and a background 826a. It should be noted that a multi-media object may have multiple objects of arbitrary shape.
  • the composer of the scene can selectively elect to have the background 826a be made transparent.
  • This feature allows the background of a selected multi-media object to be selectively replaced with another background.
  • Various techniques are available for accomplishing this effect, where these techniques often require both the object of arbitrary shape to be rendered, as well as, a second "negative" image, which is used as a mask.
  • the original image and the mask together define the arbitrarily shaped "object” region as well as the transparent bounding box.
  • the mask serves to negate or make transparent all other regions of the multimedia object with the exception of the arbitrarily shaped "object” region of interest.
  • other masks can be generated such that only those selected "object” regions will be visible.
  • a fourth image is required in order to achieve an animated effect.
  • This fourth image stores a copy of the destination image to be replaced by the object of arbitrary shape, so that it may be restored prior to moving the object of arbitrary shape to a new position on the screen.
  • FIG. 12 depicts a flowchart of an illustrative method 1200 for implementing background transparency of one or more multi -media objects within a composition scene of the present invention.
  • Method 1200 starts in step 1205 and proceeds to step 1210, where method 1200 copies the region of the destination image to be overwritten to a temporary, off-screen buffer.
  • step 1220 method 1200 renders the mask image onto the destination image stored in the buffer using a pixel-by-pixel logical "AND" operation, such that the mask creates a "hole” in the destination image to be filled by the object of arbitrary shape.
  • step 1230 method 1200 renders the multi-media object image onto the destination image stored in the buffer by using a pixel-by-pixel "XOR" operation, such that the "object portion” (i.e., the object of arbitrary shape) of the multi-media object fills in the hole and the uniform non-object region surrounding the object.
  • the "object portion” will reside inside the bounding box and will not corrupt the existing background, thereby giving the illusion of transparency.
  • step 1240 the "composite" region stored within the buffer is then copied back onto the scene composition area, i.e., the composite region is returned to the destination image.
  • Method 1200 then ends in step 1245.
  • a fifth image buffer is required since the background of MPEG-4 object-encoded video often contains artifacts of the encoding process which must be removed prior to rendering the object onto the background.
  • the multi- media object image is first copied to an off-screen "staging" buffer and then combined with the inverse of the mask image using a logical AND operation to remove artifacts in the non-object region of the object image. The remainder of the technique is analogous to the technique already described above.
  • FIG. 13 depicts a flowchart of a method 1300 for implementing scene playback of one or more multi-media objects within a composition scene of the present invention.
  • Method 1300 starts in step 1305 and proceeds to step 1310 where a "media object renderer" object is generated for each rendered media object as discussed above.
  • the "media object renderer" object is associated with each rendered media object and is responsible for rendering the media object in the scene according to the specifications in the rendered media object. Recall that rendered media objects are responsible for maintaining position and scaling information about a media object.
  • method 1300 generates a "media object player" object for each rendered media object as discussed above.
  • This separate "media object player” object serves two purposes; it provides timing (i.e., rate control) of the playback for a single rendered media object, and provides an graphical interface to the user enabling interaction with the playback using VCR-style functions as discussed above. Separating the user interface/rate control functionality out from the rendering functionality primarily serves to facilitate playback of composite multi-media objects. For instance, the media object player object for the composed scene makes use of services provided by independent media object renderers for each object in the scene and simply coordinates the timing of the rendering of each object. Method 1300 then ends in step 1325.

Abstract

L'invention concerne un procédé et un appareil (700) permettant la conception et la visualisation de contenus multi-média. L'invention concerne plus spécialement une architecture destinée à la conception et à la composition multimédia orientées objet.
PCT/US2000/017372 1999-06-23 2000-06-23 Procede et appareil de composition et de visualisation de sequences d'image WO2000079799A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US33861499A 1999-06-23 1999-06-23
US09/338,614 1999-06-23
US55144600A 2000-04-18 2000-04-18
US09/551,446 2000-04-18

Publications (3)

Publication Number Publication Date
WO2000079799A2 true WO2000079799A2 (fr) 2000-12-28
WO2000079799A3 WO2000079799A3 (fr) 2001-06-28
WO2000079799A9 WO2000079799A9 (fr) 2001-07-26

Family

ID=26991277

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/017372 WO2000079799A2 (fr) 1999-06-23 2000-06-23 Procede et appareil de composition et de visualisation de sequences d'image

Country Status (1)

Country Link
WO (1) WO2000079799A2 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002037857A2 (fr) * 2000-10-31 2002-05-10 Koninklijke Philips Electronics N.V. Procede et dispositif de composition de scenes video comportant des elements graphiques
WO2005076218A1 (fr) * 2004-01-30 2005-08-18 Telefonaktiebolaget Lm Ericsson (Publ) Etablissement de priorites dans un flux de donnees
EP2242057A3 (fr) * 2009-04-14 2010-12-01 MaxT Systems Inc. Édition vidéo à distance multi-utilisateurs
EP2276235A1 (fr) * 1999-01-29 2011-01-19 Sony Electronics Inc. Système et procédé de zoomage vidéo
US8261179B2 (en) 2009-07-16 2012-09-04 Benevoltek, Inc. Web page hot spots
WO2013088224A3 (fr) * 2011-12-12 2013-11-07 Ologn Technologies Ag Systèmes et procédés de transmission d'un contenu visuel

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0660220A1 (fr) * 1993-12-23 1995-06-28 International Business Machines Corporation Opérations de déplacement et de tombée dans une interface utilisateur graphique
US5659793A (en) * 1994-12-22 1997-08-19 Bell Atlantic Video Services, Inc. Authoring tools for multimedia application development and network delivery
US5900872A (en) * 1995-05-05 1999-05-04 Apple Computer, Inc. Method and apparatus for controlling the tracking of movable control elements in a graphical user interface

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0660220A1 (fr) * 1993-12-23 1995-06-28 International Business Machines Corporation Opérations de déplacement et de tombée dans une interface utilisateur graphique
US5659793A (en) * 1994-12-22 1997-08-19 Bell Atlantic Video Services, Inc. Authoring tools for multimedia application development and network delivery
US5900872A (en) * 1995-05-05 1999-05-04 Apple Computer, Inc. Method and apparatus for controlling the tracking of movable control elements in a graphical user interface

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"MPEG-4 Authoring Tools Let Pros, Consumers Create Multimedia for Web Pages, TV, HDTV" SARNOFF PRESS RELEASES, [Online] 10 December 1998 (1998-12-10), XP002155140 Retrieved from the Internet: <URL:http://www.sarnoff.com/sarnoff_story/ press/1997_and_1998/121098.htm> [retrieved on 2000-12-11] *
GERKEN P ET AL: "MPEG-4 PC - AUTHORING AND PLAYING OF MPEG-4 CONTENT FOR LOCAL AND BROADCAST APPLICATIONS" PROCEEDINGS OF THE EUROPEAN CONFERENCE ON MULTIMEDIA APPLICATIONS,SERVICES AND TECHNIQUES, 26 May 1999 (1999-05-26), XP000961402 *
SARNOFF CORPORATION: "Create, Edit, Tag, Multimedia Scenes" SARNOFF CORPORATION, [Online] XP002155375 Princeton, USA Retrieved from the Internet: <URL:http://www.sarnoff.com/tech_realworld /broadcast/bc_studio/author.pdf> [retrieved on 2000-12-12] *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2276235A1 (fr) * 1999-01-29 2011-01-19 Sony Electronics Inc. Système et procédé de zoomage vidéo
WO2002037857A2 (fr) * 2000-10-31 2002-05-10 Koninklijke Philips Electronics N.V. Procede et dispositif de composition de scenes video comportant des elements graphiques
WO2002037857A3 (fr) * 2000-10-31 2002-07-18 Koninkl Philips Electronics Nv Procede et dispositif de composition de scenes video comportant des elements graphiques
WO2005076218A1 (fr) * 2004-01-30 2005-08-18 Telefonaktiebolaget Lm Ericsson (Publ) Etablissement de priorites dans un flux de donnees
US7843959B2 (en) 2004-01-30 2010-11-30 Telefonaktiebolaget Lm Ericsson Prioritising data elements of a data stream
EP2242057A3 (fr) * 2009-04-14 2010-12-01 MaxT Systems Inc. Édition vidéo à distance multi-utilisateurs
JP2013118678A (ja) * 2009-04-14 2013-06-13 Avid Technology Inc マルチ・ユーザ遠隔ビデオ編集
US8818172B2 (en) 2009-04-14 2014-08-26 Avid Technology, Inc. Multi-user remote video editing
US8261179B2 (en) 2009-07-16 2012-09-04 Benevoltek, Inc. Web page hot spots
WO2013088224A3 (fr) * 2011-12-12 2013-11-07 Ologn Technologies Ag Systèmes et procédés de transmission d'un contenu visuel

Also Published As

Publication number Publication date
WO2000079799A3 (fr) 2001-06-28
WO2000079799A9 (fr) 2001-07-26

Similar Documents

Publication Publication Date Title
US7174055B2 (en) Image information describing method, video retrieval method, video reproducing method, and video reproducing apparatus
CN101960844B (zh) 用于编码供包含在媒体文件中的应用加强轨道的系统和方法
KR100906957B1 (ko) 서브-프레임 메타데이터를 이용한 적응 비디오 프로세싱
US7953315B2 (en) Adaptive video processing circuitry and player using sub-frame metadata
JP3719933B2 (ja) 階層的ディジタル動画要約及び閲覧方法、並びにその装置
KR100912599B1 (ko) 풀 프레임 비디오 및 서브-프레임 메타데이터를 저장하는이동가능한 미디어의 프로세싱
US7893999B2 (en) Simultaneous video and sub-frame metadata capture system
KR100915367B1 (ko) 서브-프레임 메타데이터를 생성하는 영상 처리 시스템
KR102010513B1 (ko) 레코딩된 비디오를 재생하기 위한 방법 및 장치
EP1871109A2 (fr) Serveur de distribution de métadonnées de sous-trame
WO2000022820A1 (fr) Procede et dispositif permettant des commandes du type vcr de sequences video numeriques comprimees
KR100846770B1 (ko) 동영상 부호화 방법 및 이에 적합한 장치
JP2006254366A (ja) 画像処理装置、カメラシステム、ビデオシステム、ネットワークデータシステム、並びに、画像処理方法
US7305171B2 (en) Apparatus for recording and/or reproducing digital data, such as audio/video (A/V) data, and control method thereof
WO2000079799A2 (fr) Procede et appareil de composition et de visualisation de sequences d&#39;image
JP3072971B2 (ja) ビデオ・オン・デマンドシステムとそれを構成するビデオサーバ装置及び端末装置
JPH1032826A (ja) 動画像処理装置
JPH11205753A (ja) 動画閲覧方法及び装置
JPH09219838A (ja) Mpeg映像再生装置および方法
Venkatramani et al. Frame architecture for video servers
KR19980075476A (ko) 디브이디 시스템의 앵글 선택 방법

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): CA IL JP KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): CA IL JP KR

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

AK Designated states

Kind code of ref document: C2

Designated state(s): CA IL JP KR

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

COP Corrected version of pamphlet

Free format text: PAGES 1/9-9/9, DRAWINGS, REPLACED BY NEW PAGES 1/6-6/6; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP