EP1157338A1 - Video-editierungsverfahren und system die sich auf das netz basieren - Google Patents

Video-editierungsverfahren und system die sich auf das netz basieren

Info

Publication number
EP1157338A1
EP1157338A1 EP99965296A EP99965296A EP1157338A1 EP 1157338 A1 EP1157338 A1 EP 1157338A1 EP 99965296 A EP99965296 A EP 99965296A EP 99965296 A EP99965296 A EP 99965296A EP 1157338 A1 EP1157338 A1 EP 1157338A1
Authority
EP
European Patent Office
Prior art keywords
multimedia
client
server
network
processing commands
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99965296A
Other languages
English (en)
French (fr)
Inventor
Max E. Orlov
Dmitri D. Chklovskii
Michael Samoilov
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Javu Technologies Inc
Original Assignee
Javu Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Javu Technologies Inc filed Critical Javu Technologies Inc
Publication of EP1157338A1 publication Critical patent/EP1157338A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/27Server based end-user applications
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/231Content storage operation, e.g. caching movies for short term storage, replicating data over plural servers, prioritizing data for deletion
    • H04N21/23103Content storage operation, e.g. caching movies for short term storage, replicating data over plural servers, prioritizing data for deletion using load balancing strategies, e.g. by placing or distributing content on different disks, different memories or different servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording

Definitions

  • This invention relates generally to multimedia software and more particularly to libraries for use in building processing- intensive multimedia software for Web-based video-editing applications.
  • the multimedia research community has traditionally focused its efforts on the compression, transport, storage and display of multimedia data. These technologies are fundamentally important for applications such as video conferencing and video-on-demand. The results of these efforts have made their way into many commercial products. For example, JPEG and MPEG, described below, are ubiquitous standards from image and audio/video compression. There are, however, problems in content-based retrieval and understanding, video production, and transcoding for heterogeneity and bandwidth adaptation.
  • the lack of a high- performance toolkit that can be used to build processing- intensive multimedia applications is hindering development in multimedia applications. In particular, in the area of video-editing, large volumes of data need to be stored, accessed and manipulated in an efficient manner. Solutions to the problems of storing video data include client-server applications and editing over the World Wide Web (Web) .
  • Web World Wide Web
  • GIF Graphics Interchange Format
  • JPEG Joint Photographic Experts Group
  • MPEG Motion Picture Experts Group
  • MPEG Motion Picture Experts Group
  • the MPEG standard has four types of image coding for processing, the I-frame, the P-frame, the B-frame and the D- frame (from an early version of MPEG, but absent in later standards) .
  • the I-frame (Intra-coded image) is self-contained, i.e. coded without any reference to other images.
  • the I-frame is treated as a still image, and MPEG uses the JPEG standard to encode it. Compression in MPEG is often executed in real time and the compression rate of I- frames is the lowest within the MPEG standard. I- frames are used as points for random access in MPEG streams.
  • the P-frame (Predictive-coded frame) requires information of the previous I-frame in an MPEG stream, and/or all of the previous P- frames, for encoding and decoding. Coding of P- frames is based on the principle that areas of the image shift instead of change in successive images .
  • the B-frame (Bi-directionally predictive- coded frame) requires information from both the previous and the following I-frame and/or P-frame in the MPEG stream for encoding and decoding.
  • B- frames have the highest compression ratio within the MPEG standard.
  • the D- frame (DC- coded frame) is intra- frame encoded.
  • the D- frame is absent in more recent versions of the MPEG standard, however, applications are still required to deal with D- frames when working with the older MPEG versions.
  • D- frames consist only of the lowest frequencies of an image.
  • D- frames are used for display in fast-forward and fast- rewind modes. These modes could also be accomplished using a suitable order of I- frames.
  • Video information encoding is accomplished in the MPEG standard using DCT (discrete cosine transform) . This technique represents wave form data as a weighted sum of cosines.
  • DCT is also used for data compression in the JPEG standard.
  • a network-based system for processing multimedia information.
  • a server preferably a group of servers, are coupled to the network and each incorporates a multimedia toolkit or engine enabling creation, editing, viewing, and management of multimedia objects, such as files, including at least one of video, images, sound and animation.
  • Each server includes storage for multimedia objects.
  • a client preferably a personal computer and preferably a plurality of clients, having access to the network each incorporate a multimedia-editing interface, preferably a graphic user interface (GUI) , enabling the client to send multimedia processing commands through the network to the server from among a predefined set of multimedia processing commands.
  • GUI graphic user interface
  • a client can send a series of commands at a one time and each of several clients can send commands to be performed on the same object.
  • the multimedia engine in each server acts on received multimedia processing commands from one or more clients by performing corresponding processing operations on multimedia objects previously stored by a client in the server's storage, and the server makes the processed multimedia objects available to clients over the network.
  • the servers are controlled, as by a load-balancing daemon, so that clients are recognized and commands are routed to an appropriate server, each of the servers having access to storage containing the object to be processed, based on various criteria, including performance and feature availability. It is contemplated that authorized clients would be able to control such criteria .
  • FIG. 1 is a block diagram of a Web-based video-editing system according to principles of the invention
  • Figure 2 is a schematic of memory clipping according to the principles of the invention
  • Figure 3 shows stereo samples interleaved in memory
  • Figure 4 shows a toolkit function that performs the picture in picture operation according to principles of the invention
  • FIG. 5 shows the format of an MPEG-1 video stream
  • Figure 6 shows a toolkit function that decodes the I- frames in an MPEG video into RGB images according to principles of the invention.
  • Figure 7 shows a toolkit function which acts as a filter that can be used to copy the packets of a first video stream from a system stream stored in a first BitStream to a second BitStream according to principles of the invention.
  • FIG. 1 shows a client/server Web-based video-editing system 10.
  • a client computer (client) 15 is able to connect to a plurality of servers 20, 22, 24 over the Web 30.
  • the client 15 runs a video-editing user interface 32.
  • the Web 30 is used to connect the client 15 and the servers 20, 22, 24, however in alternative embodiments, other types of networks could be used to form the client-server connection.
  • the servers 20, 22, 24 store audio and visual video data, and have multimedia processing applications .
  • the client 15 sends a command for video processing over the Web 30 to at least one of the servers 20, 22, 24.
  • the at least one of the servers 20, 22, 24 receives the command, interprets it, performs the required video processing and sends the result to the client 15.
  • a more specific sequence of operation might be:
  • the user sends a request from the client 15 to the server (A, B or C ⁇ 20, 21 or 24), establishing connection over the Web 30, between video editing user interface and video processing server with multimedia editing toolkit 34.
  • the server selected, for example, by a load-balancing daemon, recognizes the user as a client and awaits user commands.
  • the user utilizing video-editing user interface 32 ⁇ implemented, for example, in cross-platform Java or platform-based client, sends a command requesting processing of a certain operation on a specific media object stored at the server.
  • the server verifies the command and locates the requested media object.
  • the server locates an appropriate toolkit (34) which contains libraries for the requested operation and executes the operation, modifying the object.
  • the server sends feedback to the client, for example via media streaming or other transfer protocol, informing of the executed operation and displaying the results in the user interface. This process is repeated until the user closes the connection.
  • the servers 20, 22, 24 have, as part of the video processing application, a high-performance toolkit (or library) 34 according to the principles of the present invention.
  • a high-performance toolkit or library 34 according to the principles of the present invention.
  • the present invention will be described in terms of the MPEG standard, however the principle of the invention may apply to other data standards.
  • the MPEG standard is an evolving standard and the principles of the present invention may apply to MPEG standards yet to be developed.
  • Figure 1 describes an implementation of an online editing and browsing system.
  • the Client-server application allows a user to edit, view and manage video as well as images, sound, animation and text over the network.
  • This system is of a multi-component structure including one or more server (s) on the backend and one or more client (s) .
  • the system concentrates operations on the server side with commands being sent over the network from client to server.
  • the basic idea of the client is to interact with a user.
  • the client incorporates a video editing interface that allows the user to execute a set of predefined commands, pass them to the server and receive results for display. All or a majority of the Actual data processing occurs on the server, which provides the main data processing functionality and handles communication with remote clients.
  • the high-performance toolkit 34 provides code with performance competitive with hand-tuned C code, which allows optimizations to be performed without breaking open abstractions and is able to be composed by users in unforeseen ways.
  • the present invention provides a toolkit, or API, was designed with the following properties.
  • the first property of the toolkit 34 is resource control .
  • Resource control refers to control at the language level of I/O execution and memory allocation including reduction and/or elimination of unnecessary memory allocation. None of the toolkit routines of this invention implicitly allocate memory or perform I/O.
  • the few primitives in the toolkit which do perform I/O, are primitives that load or store Bitstream data.
  • the BitStream is the actual stream of multimedia data.
  • the MPEG bitstream will be discussed below. All other toolkit primitives of the invention use Bitstream as a data source. Users have full control over memory utilization and I/O. This feature gives users tight control over performance- critical resources, an essential feature for writing applications with predictable performance.
  • the toolkit also gives users mechanisms to optimize programs using techniques such as data copy avoidance and to structure programs for good cache behavior.
  • the separation of I/O in the present invention has three advantages. First, it makes the I/O method used transparent to toolkit primitives. Generally, conventional libraries use integrated processing and I/O. A library that integrates file I/O with its processing is difficult to use in a network environment, because the I/O behavior of networks is different from that of files. Second, the separation of I/O also allows control of when I/O is performed. It enables, for example, the building of a multithreaded implementation of the toolkit that allows the use of a double buffering scheme to read and process data concurrently. Third, by isolating the I/O calls, the performance of the remaining functions becomes more predictable.
  • the toolkit of this invention provides two mechanisms for sharing memory between abstractions, i.e. data objects. These mechanisms are called clipping and casting.
  • a DCTImage is an array of DCT vectors
  • a clipped DCTImage is an image that contains a subset of DCT blocks from the decoded I-frame.
  • the user then performs an IDCT (inverse discrete cosine transform) on that clipped image to complete the decoding process.
  • IDCT inverse discrete cosine transform
  • Casting refers to sharing of memory between objects of different types. Casting is typically used for I/O, because all I/O is done through the BitStream. For instance, when a gray scale image file is read into the BitStream, the headers are parsed, and the remaining data is cast into a Bytelmage. Casting avoids unnecessary copying of data.
  • the user allocates and frees all non-trivial memory resources explicitly using new and free primitives, e.g., BytelmageNew and BytelmageFree. Functions never allocate temporar memory. If such memory is required to complete an operation (ser space, for example) , the user must allocate it and pass it to the routine as a parameter. Explicit memory allocation allows the use reduce or eliminate paging, and make the performance of the application more predictable.
  • the toolkit ByteCopy operation of the present invention assumes that the source and the destination do not overlap and it copies the source into the destination. The user must determine if the source and the destination overlap, and if they do, the user must allocate a temporary Bytelmage and two ByteCopy calls as shown above.
  • the second property of the toolkit of this invention is that of having "thin" primitives.
  • the toolkit breaks complex functions into simple functions that can be layered. This feature promotes code reuse and allows optimizations that would otherwise be difficult to exploit.
  • the toolkit provides three primitives: (1) a function to decode the bit stream into three DCTImages, one for each color component (the DCTImage is an array of DCT vectors) , (2) a function to convert each DCTImage into a Bytelmage (a simple image whose pixels are in the range 0..255), and (3) a function to convert from YUV color space to RGB color space.
  • Exposing this structure has several advantages. First, it promotes code reuse. For instance, the inverse DCT and color space conversion functions are shared by the JPEG and MPEG routines. Second, it allows optimizations that would be difficult to exploit otherwise. One such optimization is compressed domain processing. Another example is decoding a JPEG image to a gray scale image where only one DCTImage needs to be decompressed, the DCTImage representing the gray scale component.
  • toolkit primitives of the present invention implement special cases of a more general operation.
  • the special cases can be combined to achieve the same functionality of the general operation, and have a simple, fast implementation whose performance is predictable.
  • ByteCopy is one such primitive - only the special case of non-overlapping images is implemented.
  • Shrink2x2 is a specialized function that shrinks the image by a factor of 2 in each dimension. It is implemented by repeatedly adding 4 pixel values together and shifting the result, an extremely fast operation. Similar implementations are provided for Shrink4x4, Shrink2xl, and Shrinklx2.
  • ShrinkBilinear shrinks an image by a factor between 1 and 2 using bilinear interpolation.
  • AudioBuffers store mono or stereo audio data. Stereo samples from the left and right channels are interleaved in memory as shown in Figure 3.
  • One possible design is to provide one primitive that processes the left channel and another that processes the right channel as can be seen in the following code:
  • the third property of the toolkit of the present invention is that of exposing structure.
  • Most libraries try to hide details of encoding algorithms from the user, providing a simple, high-level API.
  • the present invention exposes the structure of compressed data in two ways .
  • the toolkit exposes intermediate structures in the decoding process. For example, instead of decoding an MPEG frame directly into RGB format, the toolkit breaks the process into three steps: (1) bit stream decoding (including Huffman decoding and dequantization) , (2) frame reconstruction (motion compensation and IDCT) , and (3) color space conversion.
  • bit stream decoding including Huffman decoding and dequantization
  • frame reconstruction motion compensation and IDCT
  • color space conversion For example, the MpegPicParseP function parses a P frame from a BitStream and writes the results into three DCTImages and one Vectorlmage.
  • a second primitive reconstructs pixel data from DCTImage and Vectorlmage data, and a third converts between color spaces.
  • the toolkit exposes the intermediate data structures, which allows the user to exploit optimizations that are normally impossible. For example, to decode gray scale data, one simply skips the frame reconstruction step on the Cr/Cb planes.
  • compressed domain processing techniques can be applied on the DCTImage or Vectorlmage structures.
  • the toolkit of this invention also exposes the structure of the underlying bit stream.
  • the toolkit provides operations to find structural elements in compressed bit streams, such as MPEG, JPEG and GIF. This feature allows users to exploit knowledge about the underlying bit stream structure for better performance. For example, a program that searches for an event in an MPEG video stream might cull the data set by examining only the I- frames initially, because they are easily (and quickly) parsed, and compressed domain techniques can be applied. This optimization can give several orders of magnitude improvement in performance over conventional event-searching methods in some circumstances, but because other libraries hide the structure of the MPEG bit stream from the user, this optimization cannot be used. In the present invention, this optimization is trivial to exploit.
  • the user can use the MpegPicHdrFind function to find a picture header, MpegPicHdrParse to decode it, and, if the type field in the decoded header indicates the picture that follows is an I- frame, can use MpeglPicParse to decode the picture.
  • the toolkit provides a plurality of basic abstractions. These basic abstractions are:
  • DCTImage ⁇ a 2D array of elements, each of which is a sequence of ( index , value) pairs representing the run- length-encoded DCT blocks found in many block-based compression schemes, such as MPEG and JPEG.
  • AudioBuffer ⁇ a ID array of 8 or 16-bit values.
  • ByteLUT ⁇ a look-up table for Bytelmages. A ByteLUT can be applied to one Bytelmage to produce another Bytelmage.
  • AudioLUT ⁇ j a look-up tables for AudioBuffers.
  • BitStream/BitParser A BitStream is a buffer for encoded data. A BitParser provides a cursor into the BitStream and functions for reading/writing bits from/to the BitStream.
  • a gray-scale image can be represented using a Bytelmage.
  • a monochrome image can be represented using a Bitlmage.
  • An irregularly shaped region can be represented using a Bitlmage .
  • RGB image can be represented using three Bytelmages, all of the same size.
  • An YUV image in 4:2:0 format can be represented using three Bytelmages.
  • the Bytelmage that represents the Y plane is twice the width and height of the Bytelmages that represent the U and V planes .
  • the DCT blocks in a JPEG image, an MPEG I-frame, or the error terms in an MPEG P- and B-frame can be represented using three DCTImages, one for each of the Y, U and V planes of the image in the DCT domain.
  • the motion vectors in MPEG P- and B-frame can be represented with a Vectorlmage.
  • a GIF Image can be represented using three ByteLUTs, one for each color map, and one Bytelmage for the color- mapped pixel data.
  • 8 or 16-bit PCM audio, 16-bit PCM audio, ⁇ -law or A- law audio data can be represented using an AudioBuffer. The audio can be either stored as single channel or contain both left and right channels.
  • the toolkit also has abstractions to store encoding- specific information.
  • an MpegPicHdr stores the information parsed from a picture header in an MPEG-1 video bit stream.
  • the full list of header abstractions can be found in Table 1.
  • the following examples illustrate the use of the abstractions in the toolkit and demonstrate writing programs using the toolkit.
  • the first example shows how to use the toolkit to manipulate images.
  • the second example shows how to use the toolkit's primitives and abstractions for MPEG decoding.
  • the third example shows how to use a toolkit filter to demultiplex an MPEG systems stream.
  • a Bytelmage consists of a header and a body.
  • the header stores information such as width and height of the Bytelmage and a pointer to the body.
  • the body is a block of memory that contains the image data.
  • a Bytelmage can be either physical or virtual .
  • the body of a physical Bytelmage is contiguous in memory, whereas a virtual Bytelmage borrows its body from part of another Bytelmage (called its parent) .
  • a virtual Bytelmage provides a form of shared memory ⁇ $ changing the body of a virtual Bytelmage implicitly changes the body of its parent, as seen in Figure 2.
  • a physical Bytelmage is created using ByteNew(w,h) .
  • a virtual Bytelmage is created using ByteClip(b, x, y, w, h) .
  • the rectangular area whose size is w x h and has its top left corner at (x,y) is shared between the virtual Bytelmage and the physical Bytelmage.
  • the virtual/physical distinction applies to all image types in the toolkit. For example, a virtual DCTImage can be created to decode a subset of a JPEG image .
  • the steps creating the PIP effect are as follows: Given an input image (1) scale the image by half, (2) draw a white box slightly larger than the scaled image on the original image, and (3) paste the scaled image into the white box.
  • the code in Figure 4 shows a toolkit function that performs the PIP operation.
  • the function takes in three arguments: image, the input image; borderWidth, the width of the border around the inner image in the output, and margin, the offset of the inner image rom the right and bottom edge of the outer image.
  • Line 5 to line 6 of the function query the width and height of the input image.
  • Line 7 to line 10 calculate the position and dimension of the inner picture.
  • Line 13 creates a new physical Bytelmage, temp, which is half the size of the original image.
  • Line 14 shrinks the input image into temp.
  • Line 15 creates a virtual Bytelmage slightly larger than the inner picture, and line 18 sets the value of the virtual Bytelmage to 255, achieving the effect of drawing a white box.
  • Line 19 de-allocates this virtual image.
  • Line 20 creates another virtual Bytelmage, corresponding to the inner picture.
  • Line 21 copies the scaled image into the inner picture using ByteCopy.
  • lines 22 and 23 free the memory allocated for the Bytelmages.
  • This example shows how images are manipulated in the toolkit through a series of simple, thin operations. It also illustrates several design principles of the toolkit, namely (1) sharing of memory (through virtual images), (2) explicit memory control (through ByteClip, ByteNew and ByteFree) , and (3) specialized operators (ByteShrink2x2 ) .
  • the second example relating to this invention illustrates how to process MPEG video streams using the toolkit.
  • the example program decodes the I-frames in an MPEG video stream into a series of RGB images.
  • the encoded video data is first read into a BitStream.
  • a BitStream is a abstraction for input/output operations - that is, it is a buffer.
  • a BitParser is used to read and write from the BitStream.
  • a BitParser provides functions to read and write data to and from the BitStream, plus a cursor into the BitStream.
  • FIG. 5 shows the format of an MPEG-1 video stream 150.
  • the MPEG video stream 150 has of a sequence header 152, followed by a sequence of GOPs ⁇ group-of -pi ctures) 154, followed by an end of sequence marker 156.
  • Each GOP consists of a GOP header 158 followed by a sequence of pi ctures 160.
  • Each picture consists of a picture header 162, followed by a picture body 164 which is made up of compressed data required to reconstruct the picture.
  • the sequence header 152 contains information such as the width and height of the video, the frame rate, the aspect ratio, and so on.
  • the GOP header 158 contains the timecode for the GOP.
  • the picture header 162 contains information necessary for decoding the picture, most notably the type of picture (I, P, B) .
  • the toolkit provides an abstraction for each of these structural elements (see Table 1) .
  • the toolkit provides five primitives for each structural element: find, skip, dump, parse, and encode. Find positions the cursor in the BitStream just before the element. Skip advances the cursor to the end of the element. Dump moves the bytes corresponding to the element from input BitStream to the output BitStream, until the cursor is at the end of the header. Parse decodes the BitStream and stores the information into a header abstraction, and encode encodes the information from a header abstraction into a BitStream. Thus, the MpegPicHdrFind function advances the cursor to the next picture header, and MpegSeqHdrParse decodes the sequence header into a structure.
  • the parsed picture data from an MPEG I-frame is represented using a DCTImage.
  • a DCTImage is similar to Bytelmage, but each "pixel" is an 8x8 DCT encoded block.
  • the toolkit code in Figure 6 decodes the I-frames in an MPEG video into RGB images.
  • Lines 1 through 5 allocate the data structures needed for decoding.
  • Line 6 attaches the BitParser inbp to BitStream inbs .
  • the cursor of inbp will be pointing to the first byte of buffer in inbs.
  • Line 7 fills inbs with 64K of data from the input MPEG video.
  • Line 8 moves the cursor of inbp to the beginning of a sequence header, and line 9 parses the sequence header and stores the information from the sequence header into the structure seqhdr .
  • the vital information such as width, height and the minimum data that must be present to decode a picture is extracted from the sequence header in lines 10 through 12.
  • Lines 13 through 21 allocate the Bytelmages and DCTImages we need for decoding the I-frames.
  • Y, u, and v store the decoded picture in YUV color space
  • r, g, and b store the decoded picture in RGB color space.
  • Dcty, dctu, and dctv store compressed (DCT domain) picture data.
  • the main loop in the decoding program (lines 22-46) starts by advancing the BitParser cursor to the beginning of the next marker (line 24) and retrieves the current marker (line 25) .
  • the picture header is parsed (line 28) and its type is checked (line 29) . If the picture is an I-frame, the I- fra e is parsed into three DCTImages, (line 30) , the DCTImages are converted to Bytelmages (lines 31-33) , and the Bytelmages are converted into RGB color space (line 34) .
  • the header is skipped (which moves the cursor to the end of the GOP header) , because information from the GOP header is not needed.
  • UpdatelfUnderflow checks if the number of bytes remaining in a inbs is less than vbvsize. If so, the remaining bytes are shifted to the beginning of the buffer and the rest of the buffer filled with data from file.
  • Breaking down complex decoding operations like MPEG decoding into " thin" primitives makes the toolkit code highly configurable. For example, by removing lines 32 to 34, the program decodes MPEG I-frame into gray scale images. By replacing line 31 to 34 with JPEG encoding primitives, an efficient MPEG I-frame to JPEG transcoder is produced.
  • the third example relating to this invention illustrates filtering out a subset of a BitStream for processing.
  • Filters were designed to simplify the processing of bit streams with interleaved data (e.g., AVI, QuickTime, or MPEG systems streams) .
  • Filters are similar to scatter/gather vectors - they specify an ordered subset of a larger set of data.
  • a common use of filtering is processing MPEG system streams, which includes interleaved audio or video [A/V) streams.
  • each A/V stream is assigned an unique id.
  • Audio streams have ids in the range 0..31; video streams ids are in the range 32..47.
  • the A/V streams are divided up into small (approx. 2 KByte) chunks, called packets .
  • Each packet has a header that contains the id of the stream, the length of the packet, and other information (e.g., a timecode) .
  • the toolkit code for building this filter is shown in Figure 7.
  • Lines 2 through 8 allocate and initialize various structures needed by this program.
  • the variable offset stores the byte offset of a packet in the bit stream, relative to the start of the stream.
  • Line 9 advances the cursor to the beginning of the first packet header and updates offset.
  • the main loop (lines 10-18) parses the packet header (line 11) and, if the packet belongs to the first video stream, its offset and length are added to filter (line 14) .
  • EndOfBitstream is a macro that checks the position of the bit stream cursor against the length of the data buffer.
  • the filter can be saved to disk, used as a parameter to the BitStreamFileFilter or the BitstreamDumpUsingFilter functions.
  • the former reads the subset of a file specified by the filter, the latter copies the data subset specified by a filter from one bit stream to another.
  • This example illustrates how the toolkit can be used to demultiplex interleaved data. It can be easily extended to other formats, such as QuickTime, AVI, MPEG-2 and MPEG-4. Although this mechanism uses data copies, the cost of copying is offset by the performance gain when processing the filtered data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Television Signal Processing For Recording (AREA)
EP99965296A 1998-12-15 1999-12-15 Video-editierungsverfahren und system die sich auf das netz basieren Withdrawn EP1157338A1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US11223298P 1998-12-15 1998-12-15
US112232P 1998-12-15
PCT/US1999/029961 WO2000036516A1 (en) 1998-12-15 1999-12-15 Web-based video-editing method and system

Publications (1)

Publication Number Publication Date
EP1157338A1 true EP1157338A1 (de) 2001-11-28

Family

ID=22342789

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99965296A Withdrawn EP1157338A1 (de) 1998-12-15 1999-12-15 Video-editierungsverfahren und system die sich auf das netz basieren

Country Status (5)

Country Link
EP (1) EP1157338A1 (de)
JP (1) JP2002532996A (de)
AU (1) AU3124500A (de)
CA (1) CA2352962A1 (de)
WO (1) WO2000036516A1 (de)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7319536B1 (en) 1999-04-12 2008-01-15 Eastman Kodak Company Techniques for synchronizing any of a plurality of associated multimedia assets in a distributed system
US7042583B1 (en) 1999-04-12 2006-05-09 Eastman Kodak Company Techniques for acquiring a parent multimedia asset (digital negative) from any of a plurality of multiply modified child multimedia assets
US7343320B1 (en) 1999-08-02 2008-03-11 Treyz G Victor Online digital image-based product ordering system
US6904185B1 (en) 1999-12-16 2005-06-07 Eastman Kodak Company Techniques for recursively linking a multiply modified multimedia asset to an original digital negative
WO2001045384A1 (en) * 1999-12-16 2001-06-21 Pictureiq Corporation Techniques for synchronizing any of a plurality of associated multimedia assets in a distributed system
FR2814256B1 (fr) * 2000-09-19 2002-12-06 Jerome Delsol Procede de traitement de documents numerises, et support d'informations numerique obtenu selon ce procede
JP4724733B2 (ja) * 2008-06-06 2011-07-13 株式会社エヌ・ティ・ティ・ドコモ 映像編集システム、映像編集サーバ、通信端末
EP2193828B1 (de) * 2008-12-04 2012-06-13 Disney Enterprises, Inc. Kommunikationshub für Videospielentwicklungssysteme
KR101814747B1 (ko) * 2011-05-12 2018-01-03 에스케이플래닛 주식회사 클라우드 컴퓨팅 기반 영상 제작 제공 방법 및 이를 위한 서비스 장치
KR101814748B1 (ko) * 2016-10-27 2018-01-03 에스케이플래닛 주식회사 클라우드 컴퓨팅 기반의 영상 제작을 지원하는 서비스 장치 및 클라우드 컴퓨팅 기반 영상 제작 제공 방법

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5596705A (en) * 1995-03-20 1997-01-21 International Business Machines Corporation System and method for linking and presenting movies with their underlying source information
US5740388A (en) * 1996-05-10 1998-04-14 Custom Communications, Inc. Apparatus for creating individually customized videos

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0036516A1 *

Also Published As

Publication number Publication date
CA2352962A1 (en) 2000-06-22
WO2000036516A9 (en) 2001-04-12
AU3124500A (en) 2000-07-03
JP2002532996A (ja) 2002-10-02
WO2000036516A1 (en) 2000-06-22

Similar Documents

Publication Publication Date Title
US6320600B1 (en) Web-based video-editing method and system using a high-performance multimedia software library
AU2006211475B2 (en) Digital intermediate (DI) processing and distribution with scalable compression in the post-production of motion pictures
KR100922263B1 (ko) 인텔리전트 멀티미디어 서비스
EP1404086B1 (de) Verfahren und Vorrichtung zur stufenweisen komplementär-Kodierung von Bildern
JP2004274760A (ja) 圧縮されたディジタル画像を通信する方法及び製造物
JP2007020196A (ja) サーバクライアント環境におけるシステム及び方法
JP2004274758A (ja) Jpp−ストリームからjpeg2000符号ストリームへの変換方法及び変換装置
JP2004274759A (ja) 制限されたアクセスとサーバ/クライアント受け渡しを有する圧縮されたディジタル画像の通信方法及び装置
US6859557B1 (en) System and method for selective decoding and decompression
WO2000036516A1 (en) Web-based video-editing method and system
US7724964B2 (en) Digital intermediate (DI) processing and distribution with scalable compression in the post-production of motion pictures
Ooi et al. The Dali multimedia software library
US6944390B1 (en) Method and apparatus for signal processing and recording medium
JP4891335B2 (ja) ハードウェア多標準対応ビデオデコーダ装置
US7437007B1 (en) Region-of-interest editing of a video stream in the compressed domain
US8081093B2 (en) Code transforming apparatus and code transforming method
US20020188440A1 (en) Optimized MPEG-2 encoding for computer-generated output
GB2610397A (en) Encoding and decoding video data
Marschall Integration of digital video into distributed hypermedia systems
Han et al. A gigapixel image browsing scheme in edge computing environment
Bhargava et al. Impacts of codec schemes on multimedia communications
KR100487330B1 (ko) 디지털 비디오의 썸네일 영상 생성 장치
Lei et al. Software-based motion JPEG with progressive refinement for computer animation
JPH10243389A (ja) 動画早見画像作成装置、動画早見画像作成方法および動画データ検索システム
Rosenbaum et al. Progressive raster imagery beyond a means to overcome limited bandwidth

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010525

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20020102