CA2312333A1

CA2312333A1 - Multimedia compression, coding and transmission method and apparatus

Info

Publication number: CA2312333A1
Application number: CA 2312333
Authority: CA
Inventors: Kimihiko E. Sato; Kelly Lee Myers
Original assignee: KYXPYX TECHNOLOGIES INC.
Priority date: 2000-06-21
Filing date: 2000-06-21
Publication date: 2001-12-21
Also published as: WO2001099430A3; WO2001099430A2; AU6722801A

Abstract

A system, method and apparatus for the compression, coding and transmission of multimedia data. This system encompasses the content creation step of capturing or creating the raw multimedia data and converting it into compressed and encoded multimedia data, the transmission step of adaptively requesting variable levels of information from the multimedia storage device based on the capabilities of the display device and the conditions of the communication channel, and the rendering step which presents the multimedia data to the intended audience. The method encodes the multimedia data in such a way that the rendering device can request at any time any variant of the original multimedia data thus optimising the bandwidth that is available, and presenting the best possible rendition of the original source data to the audience. The encoding algorithm is structured such that the compression may not be optimal for storage, but is optimal for parsing, transmission, and display.

Description

.
MULTIMEDIA COMPRESSION, CODING AND TRANSMISSION METHOD AND
APPARATUS
FIELD OF THE INVENTION
The present invention relates to data compression, coding and transmission. In particular, the present invention relates to multimedia data compression, coding and transmission, such as video and audio multimedia data.
BACKGROUND OF THE INVENTION
1o A constant aim within video data compression systems is to achieve a higher degree of data compression with a reduced degree of degradation in image quality upon decompression.
Generally speaking, the higher the degree of compression used, the greater will be the degradation in the image when it is decompressed. Similarly, audio compression systems aim to store the best quality reproduction of an audio signal with the least amount of storage 15 space. In both audio and video compression, a model of human perception is used so that if the decompressed data is such that the information lost first is the least likely to be perceived.
Coding and its inverse step decoding describe how this compressed information of Video and Audio are stored, sequenced, and categorized. Text, Picture, Audio and Video data is 2o generically termed multimedia data, and the long-term archival storage of this information is termed multimedia storage. The apparatus that captures, encodes and compresses multimedia data is termed the content creation apparatus and the decoding, decompressing, and display apparatus for multimedia data is termed display apparatus.
25 Content creation is the step whereby the multimedia data is created from raw sources and placed into multimedia storage.
Transmission is the step whereby the multimedia data is transmitted from multimedia storage to the display apparatus.
Rendering is the step whereby the multimedia data is converted from the encoded form into a ' video and audio representation at the display apparatus.
Everybody else it seems codes the information into the lowest common denominator format, which most of the world can play using the hardware and infrastructure present. This file format is optimized for lowest bitrate coding, so that the fewest amount of bits are needed in order to reconstruct a representation of the original information. These formats, such as MPEG1, MPEG2, and MPEG4, are designed for small bitrate coding errors, such as is found in any kind of radio transmission (Microwave, Satellite, Terrestrial). These formats are not designed specifically for the Internet, which is unreliable on a packet by packet basis rather 1o than on a bit by bit basis.
A disadvantage of MPEG type information is that it achieves the compression rate by interframe coding. This means that only differences between successive frames are recorded.
The bad thing about this is that you need a reference frame that the differences are applied towards. This means that the stream needs to be coded for the bitrate that it expects to have available. The bitrates that are specified in MPEG documentation are continuous, reliable, and reproducible. The Internet is far from any of these.
Everybody else also starts from television, and tries to reproduce it using the Internet as the 2o transport mechanism for the information. In order to achieve this, the usual way is to use either TCP or RTP transports to send the MPEG coded information across the net. TCP is a protocol heavy transport, because the aim is to have the exact copy transferred from sender to receiver as fast as possible, but with no guarantee of time.
The creator of the MPEG type file needs to decide at creation time what the size and quality of the image is, and the quality of the audio. If a smaller, lower quality derivative of this image or this audio is required, the creator needs to make a separate file for each variant.
In order to compensate for unreliable and unpredictable transmission channels, such as the Internet, the conventional approach is to use "Pre-Buffering", a term that simply means that prior to commencing playback, enough of the file is received reliably and stored away such that if the network experiences any difficulties, there is a small amount of multimedia data that can be used. In some cases, tens of second to several minutes of time is spent collecting pre-buffering data instead of presenting pictures with sound to the viewing and listening audience.
These systems require that the width, height, compression level, and audio quality, are all determined at the time of compression. Even if the decoding and decompression apparatus is capable of handling a much higher quality image, the resulting experience for the user has been limited to the least capable playback device. There is no mechanism for altering the 1o experience on the fly based on the capabilities of the display apparatus.
The streaming technology is simply downloading whatever file, and starting the rendering prior to the whole file being available. This is achieved by something called prebuffering, which mean that a certain amount of information is required at the client end before playback 15 can commence. If the playback point ever catches up to the last information received, which happens whenever the conditions of the network are not what is expected, the playback has to stop and further prebuffering is required.
Conventional video and audio compression algorithms and coding systems rely heavily on 2o committee based standards work, such as MPEG2 from the MPEG committee, or H.261 from the ITU-T. These describe a multimedia data file in multimedia storage that can more or less be transmitted error-free and at a reliable rate to the decoding and decompression apparatus.
The encoding typically attempts to determine the differences from both earlier and later video frames. The encoder then stores only a representation of the differences between the earlier 25 frame and the later frame. The Audio information is typically interleaved within the same media storage file in order to correctly synchronize the audio playback with the video playback. In conventional systems, the multimedia data that is determined at the encoding step must be transmitted in its entirety to the decoding, decompressing apparatus. When motion prediction algorithms are used in the content creation step, there is a large amount of 30 computation required at both content creation and rendering. This then means that it is more expensive in hardware costs to do real-time content creation, and rendering.

The sizes and the limits of the video data is typically limited to height to width ratios of standard NTSC and PAL video, or 16:9 Wide-screen movie, as this is the standard source of moving pictures with audio. This should not be the only possible size.
Parallel to the development of video and audio compression, the field of still picture compression, such as JPEG, PNG, and Compuserve GIF, are fairly straightforward. These are usually symmetrical algorithms, so that the content creation step is of roughly equivalent complexity to the rendering step. When still pictures are flashed changing at a high enough l0 frame rates, the illusion of motion is created. Motion JPEG (MJPEG) is a system used in non-linear video editing that does just that with still JPEG files. This is simply a video storage system, and does not encompass audio as well.
There is a need for a new type of video compression method that overcomes the above-noted 15 deficiencies in the prior art methodologies.
SUMMARY OF THE INVENTION
According to the invention, there is provided a video and audio compression, coding and transmission system, method and apparatus comprising:
- a communication channel coupled to one transmitter device, at least one transmission relay device, and at least one reception device, along with a method of coding and compressing multimedia data, such that there are multiple levels of detail and reproducible coherence in multimedia data such that that a redundant set variably encoded of audio and text information can be sent adaptively with the video in a minimally acknowledged transmission protocol.
Advantageously, the video compression method and apparatus according to the invention allows:
- multimedia data to be requested by the display device and transmitted through an unpredictable transmission channel adapting to the capabilities of the display device and the reliability of the communication.
- multimedia data is encoded in such a way that the rendering of audio can continue in some capacity for a short period of time, at a reduced level in the case when information is sent but not received.
- multimedia data that the system sends to adapt by reducing the amount of data selectively in such a way that the least perceived data, such as high frequency audio, or higher frame rate, and possibly even stereo separation is selectively removed from the transmission first.
- multimedia data is encoded in such as way that multiple levels of audio and video to can be reduced to the required level for that particular display device and the current communications capacity with minimal calculations.
- multimedia data is encoded such that long term archival storage of the full highest quality video and audio is protected by multiple levels of encryption in a way that the lowest representative audio and video has minimal or no protection and the highest representation of audio and video has maximum protection The above advantages and features of this invention will be apparent from the following detailed description of illustrative embodiments, which is to be read in connection with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Preferred embodiments of the present invention will now be described, by way of example only, with reference to the attached Figures, wherein:
Figure 1 shows an apparatus for the creation, transmission, and rendering of streamed multimedia data;
Figure 2 shows a data format to store multimedia data;
Figure 3 shows a method for creation of multimedia data;
Figure 4 shows a method for front half compression and encoding;
Figure 5 shows a method for the diagonal flip step;
3o Figure 6 shows a method for front half data packing;
Figure 7 shows a method for front half decompression and decoding;

Figure 8 shows a method for Picture Substream sample;
Figure 9 shows pixel numbering;
Figure 10 shows a tiled representation.
DETAILED DESCRIPTION
The present invention is a system and method that caters to the digital convergence industry.
It is a format and a set of tools for content creation, content transmission, and content presentation. Digital convergence describes how consumer electronics, computers, and communications will eventually come together into being part of the same thing. The key to to this happening is that the digital media, the format for sound, video, and any other data related to such, are recorded, converted and transmitted from source to destination.
As used herein, the following terms are defined:
Packet = a single unit of data that can be sent unreliably across the Internet.
Frame = one full picture from the video.
Video = a series of frames that need to be shown at a certain rate to achieve the illusion of continuous motion.
A/V = Audio / Video = Audio and video put together into a single stream.
MPEG = motion picture experts group. This is an ISO standards workgroup that concentrates on moving images.
2o JPEG = joint photographics experts group. Similar ISO standards workgroup that concentrates on still images.
The system of the present invention encodes the information at the highest size and quality that will be requested by any client. It is then encoded in a way that makes it easy during the playback for a client to downscale the quality or size based on the conditions of the network.
In order to achieve this, the fundamental starting point everyone else uses, which is to use interframe coded MPEG, had to be abandoned. This instantly releases us from requiring a reliable transport mechanism such as TCP/IP.
The transport requirements are that there is a bidirectional channel, and that the propogation time from sender to receiver, and the reverse, are more or less uniform. We currently use straight unicast and multicast UDP/IP, although we are not limited to these at all.
As most motion picture is originally on film stock, any continuous tone image compression algorithm, such as, but not limited to, the discrete cosine transform (DCT) in JPEG, or the Wavelet image compression algorithm in JPEG2000, can be used to compress the frames.
The audio corresponding to the film is captured separately, and then encoded within the frame data as comment fields. The audio can encoded by any audio compression algorithm. Using the comment fields I believe has not been done before for audio.
A further innovation is to flip the image diagonally prior to compression.
What this means is that instead of the sequential information in the file corresponding to the picture information from top left to bottom right of the image by rows, the information corresponds to picture date from top left to bottom right by columns. This introduces further work at both the compression step and the decompression step, but it allows for the capability to do left to right reduction of the image without the server having to decompress the entire image.
Another coding innovation is to encode a reduced resolution image with a low bitrate coded audio signal as the front half of the encoded frame data. The information required to modify 2o this image into a higher resolution and higher quality image, as well as the corresponding high frequency encoded audio is encoded as differences in a back half of the encoded data.
The example uses two levels of image resolution, although this algorithm can be extended to have as many levels of resolution as necessary.
The back half components can be encrypted at high security levels, which allows for a lower quality rendition to be available for a lower price, etc.
Example:
Take a motion picture with audio signal at 24 frames a second.
Capture the audio at 44.1KHz Stereo PCM digital data format. This is referred to as the Raw Audio Data (RAD).
Convert the audio signal into a 44.1KHz Stereo PCM audio data file. This is referred to as the Back Half Audio Data (BHAD).
Convert the audio signal into a llKHz Mono PCM audio data file. This is referred to as the Front half audio data (FHAD).
Capture the video at 24 frames per second into 720 x 576 pixel progressive picture files.
1o Each pixel is RGB888 (CCIR601), which means that there is 8 bits of precision in the capture of the each of the Red, Green, and Blue channels according to a color profile supplied by the Digital TV industry committee (CCIR). This is chosen because it is one of the standard MPEG 2 video profile sizes and formats. A standard movie encoded to 30 frames per second for television usually does a process called 3/2 pulldown, which means that every fourth frame is doubled. This means that no extra information is being conveyed in that last frame, so we might as well just capture only the frames, A single frame of this information is referred to as the Raw Video Data (RVD), and all these frames collectively are referred to as the Raw Video Data Stream (RVDS).
Each frame is noise filtered and diagonally flipped to become a new image where the horizontal lines correspond to the columns of the original image. If there is any black band removal on a frame by frame basis, it is done at the same time as this step.
This is referred to as the Flipped Video Data (FVD).
The (FVD) is converted into a new image that is half the width and half the height by a process of collecting every other pixel. It is important that this is collected and not averaged with adjoining pixels. This frame of information is referred to as the Front half Video Data (FHVD), and is still converted into YUV format. In this example it is the lower right pixel of each 2 by 2 block that is collected.
The pixels that have not been collected into the (FHVD) are collected and encoded. This new _g_ representative of the data is now referred to as the Back half video data (BHVD), and consist of four planes, the delta left intensity plane (dLYP) the delta right intensity plane (dRYP), the delta U plane (dUP) and the delta V plane (dVP).
These last two steps are detailed as follows.
(a) Divide the FVD into 2 by 2 blocks. These are pixels with identifiers 11, 12, 21 and 22 based on their Cartesian coordinates. The RGB values of each pixel is R(xy), G(xy), and B(xy) where xy is the pixel identifiers.
(b) compute the YUV representation of the all four pixels into Y(xy), U(xy) and V(xy) using the matrix conversion formula as follows:
Y 0 +0.29 +0.59 +0.14 R

U 50% + -0.14 -0.29 +0.43 x G
-V 50% +0.36 +0.29 -0.07 B

(c) Calculate the delta values of the YUV data values with pixel 22, the bottom right pixel.
These would give the following delta values.
dY( 11 ) Y( 11 ) - Y(22) -dY(12) - Y(12) - Y(22) dY(21 ) - Y(21 ) - Y(22) dU(11) - U(11) - U(22) dU(12) - U(12) - U(22) dU(21 ) - U(21 ) - U(22) dV(11) - V(11) - V(22) dV(12) - V(12) - V(22) dV(21) - V(21) - V(22) (d) Average the values for the delta U values to get dU.

dUavg = [ dU( 11 ) + dU( 12) + dU(21 ) ]

(e) Average the values for the delta V values to get dU
1o dVavg = [ dV(11) + dV(12) + dV(21) ]

(f) Collect all left side Y pixel delta values, dY(11) and dY(21) into a plane, and refer to it as the delta left intensity plane (dLYP), (g) Collect all upper right~Y pixel delta values dY(12) into a plane, and refer to it as the delta right intensity plane (dRYP), (h) Collect all dUavg values into a plane and refer to it as the delta U plane (dUP), (i) Collect all dVavg values into a plane and refer to it as the delta V plane (dVP), Using our original (720x576) pixel picture size, the flipped image FVD would be (576x720).
This would mean that dLYP is (288x720), dRYP is (288x360), dUP is (288x360) and dVP is (288x360) in planar image size. In this example each plane has elements that have eight (8) bits of precision. That is for efficiency of implementation in software and should not be a restriction on hardware implementations.
3o Each plane is put through a continuous tone grey scale compression algorithm, such as a single plane JPEG. Prior to this though, the (FHVD), (dLYP), (dRYP), (dUP), and (dVP) are divided into horizontal bands, which correspond to vertical bands of the original image. If there was four bands with our (720x576) example, then the FVD of (576x720) becomes a (FHVD) of (288x360) consisting of four bands each sized (288x90). It is allowable to have a single band encompassing the entire image, and for efficiency it is suggested that a power of two number of bands be used. The (FHVD) is compressed in the three equally sized component planes of YLTV using a continuous tone image compression algorithm such as but not limited to JPEG. Each of these planes are (288x360).
The (FHVD) and the (FHAD) are interleaved with frame specific information such that the audio data, video data and padding are easily parsable by a server application. This is referred to as the Front half data (FHDATA). In the case that JPEG was used, this (FHDATA) should be parsable by any standard JPEG image tool, and any padding, extra information, and audio is discarded. This image is of course diagonally flipped, and needs to 1o be flipped back. The (FHAD) is duplicated in a range of successive range of corresponding frames. This is so that only one of a sequence of successive frames need to be received in order to be able to reproduce a lower quality continuous audio representation.
The (BHVD) and (BHAD) are stored following the (FHDATA) in a way so that the server can easily pull individual bands of the information out from the data. The (BHAD) is duplicated in a successive range of corresponding frames. This is similar to the (FHAD) in the (FHDATA) but the difference is in how reduntant the information is when dealing with high frequency data. The aim is to have some form of audio available as the video is presented. The (BHVD) and (BHAD) interleaved in this form is called the back half data (BHDATA).
A frame header (FRAMEHEADER), the (FHDATA) and the (BHDATA) put together is the complete frame data (FRAMEDATA).
A continuous stream of (FRAMEDATA) can be converted to audio and video. This is referred to as streamed data (STREAMDATA). A subset of (FRAMEDATA) can be constructed by the video server device. This is referred to as subframe data (SUBFRAMEDATA) and a continuous stream of this information decimated accordingly is referred to as subsampled stream data (SUBSTREAMDATA).
A collection of (FRAMEDATA) with a file header (FILEHEADER) is an unpacked media file (MEDIAFILE), and a packed compressed representative of a (MEDIAFILE) is a packed media file (PACKEDMEDIAFILE).
The server apparatus will on read a (MEDIAFILE), or capture from a live video source, and create a (STREAMDATA) that goes to a relay apparatus.
A client apparatus contacts a relay apparatus and requests a certain (STREAMDATA).
Through a continuous feedback process, the relay will customize a (SUBSTREAMDATA) based on the current instantaneous network conditions and the capabilities of the client to apparatus, and by specific user request such as but not limited to pan and scan locations.
(SUBFRAMEDATA) is created from the (FRAMEDATA) by a process of decimation, which is the discarding of information selectively. The algorithm for discarding is variable, but the essence is to discard unnecessary information, and least perceivable information first.
Only complete (SUBFRAMEDATA) elements that are reliably received in their entirety are rendered. All others are discarded and ignored.
The rendering step is as follows:
The audio data is pulled from the (SUBFRAMEDATA). If (BHAD) exists, then it is stored accordingly. (FHAD) always exists in a (SUBFRAMEDATA) and is stored accordingly.
The (FHVD) which is always available, is decompressed accordingly into its corresponding YIJV planes. This is stored accordingly.
The (BHVD) if it is available is used to create a decompressed full size image using the following algorithm.
3o alreverse the continuous tone compression algorithm so that there is a reconstructed [dRYP], [dLYP], [dUP], and [dVP] [square braces used to indicate reconstructed values.

b/values from each plane and from the (FHVD) are interleaved into a YUYV data block.
From [FHVD] . [Y22] [U22] [V22]

From [dRYP] . [dY 11 [dY21 ]
]

From [dLYP] . [dYl2]

From [dUP] . [dU]

From [dVP] . [dV]

to [Y11] - [dYll] + [Y22]

[Y12] - [dYl2] + [Y22]

[Y21 ] - [dY21 ] + [Y22]

[U1] _ [dU] + [U22]
[U2] _ [U22]
[V 1 ] _ [dV] + [V22]
[V2] _ [V22]
[Y 11 ] [U1 ] [Y21 ] [V 1 ] is the YUYV representation of the left two pixels [Y12] [U2] [Y22] [V2] is the YUYV representation of the right two pixels All pixels are collected together into a intermediate frame. This display is then put through the final reconstruction step of reversing the diagonal flip with another diagonal flip of the picture elements. Following this step the columns of YUYV data calculated above are now rows of YLJYV data, in the exact format that computer video card overlay surfaces require.
During this reverse diagonal flip, an optional filtering step can be done to further remove any visual artifacts introduced during compression and decompression.
3o The available image is displayed at the appropriate time in the sequence.
If high quality audio is available, then it is played on the audio device, otherwise the lower quality audio sample is used.
The client monitors the number of frames that it managed to receive and it managed to decompress/process. This is reported back to the server which then scales up or down the rate and the complexity of the data that is sent.
As will be apparent to those of skill in the art, television variants, such as 29.98 fps, 30 fps, and 25 fps can be downscaled to 24 frames per second by frame decimation (throwing away frames). 30 is another ideal framerate for storage, as it can be easily used for a lot of 1o downscaled framerates, but there is very little difference in the perception to the average human eye.
Any continuous tone compression algorithm can be substituted for DCT in JPEG.
A
suggested alternative is Wavelet Image compression, or fractal image compression.
Any audio rate and multispeaker/stereo/mono/subwoofer combination can be used for the high quality and low quality audio signal.
There is two levels of audio and video in the example. This algorithm can be extended to 2o have 3 levels by having a front third, middle third, and back third. The server can send either the first third, the front two thirds, or the whole encoded frame is desired.
Other variants are academic.
Any rectangular picture size is possible. In particular, 16x9 width to height picture ratios of theatrical releases can be captured using a square pixel or a squashed pixel.
Black band removal can be done either on a frame by frame basis, or across the whole video stream.
Any capture rate is possible.
Postbuffering can be done by the relay, so that the last n (FRAMEDATA) elements are stored. Any new client can have these (FRAMEDATA) or (SUBFRAMEDATA) burst across the communication channel at maximum rate to show something while the rest of the information is being prepared.
Other multimedia types are available such as text, force feedback cues, closed captioning, etc.
Multiple languages can be encoded and stored within this format, and selectively requested.
Even higher audio representations, such as Dolby Surround sound 5.1 channel AC3 format encoding can be selectively requested as well if high enough bandwidth and audio processing 1o facilities exist at the client end.
The client device can send multiple cues and requests. If the source is encoded appropriately, then multiple angle shots can be stored for either a client controlled pan, or as a client controlled position around a central action point.
There is a mechanism for selectively requesting computer generated video streams to be created and presented based on user preferences.
Essentially, the present invention consists of a method for multimedia transmission 2o comprising the following steps:
A signal from client to server specifying the line conditions for multimedia rendering so that the multimedia data that is supplied can be modified as conditions changes The same signal specifying the method by which, the full multimedia data is reduced into a fully presentable subset depending on line conditions, direct user control, and demographic positioning.
The methods by which the direct user control of the multimedia data requested, so that the audio can be modified via mixing, equalization, computer controlled text to voice additions, and language selection can be provided by the transmission server.
The same signal specifying a demographic of the audience 3o The same signal containing encryption and authentication data so that the client is identified and is provided multimedia data in accordance to the exact request of the audience The same signal transmitted through an unpredictable and unreliable communication channel in such a way that acknowledgement is required based on time elapsed rather than by amount of information received.
The same signal transmitted as full frames of video with sub-sampled redundant sets of audio and text information in such as way that at any time the probability that there is always a form of playable audio of some quality that is available is maximized The same signal transmitted with a decimated picture header so that a simplified rendering device can be constructed.
1o A MULTIMEDIA compression and coding method comprising the following steps:
A video signal is captured and compressed using a discrete cosine transform based video compression algorithm similar to JPEG, whereby the information is ordered in the MULTIMEDIA DATA STREAM from top to bottom in sets of interleaved columns rather than left to right in sets of progressive rows.
15 The same MULTIMEDIA DATA STREAM that has sets of columns interleaved into sparse tiles in a way that allows for fast parsing at the transmitter.
The same MULTIMEDIA DATA STREAM is also stored using interleaved luminance and chrominance values in YUV4:2:2 format in variably sized picture element processing sets tiles that are greater than 8 by 8 byte matrixes in units that are powers of two, such as but not 20 limited to 64 by 64 matrixes and 128 byte by 128 byte matrixes.
The same MULTIMEDIA DATA STREAM is also stored as a lower resolution decimated JPEG image as a header with the required information to reconstruct a higher resolution image stored as a secondary and tertiary set of information in comment blocks of the JPEG, or as additional data elements that may or may not be transmitted at the same time as the 25 header, both put together are termed for this documents as COMMENT TYPE
INFORMATION:
The same MULTIMEDIA DATA STREAM with the COMMENT TYPE INFORMATION
variably encrypted and authenticated in such a way that the origin of the source, and the legitimacy of the video requester can be controlled and regulated.
3o The same MULTIMEDIA DATA STREAM with audio, text, and force feedback information encoded as COMMENT TYPE INFORMATION within the image file, so that standard picture editing software will parse the file, yet not store or extract the additional multimedia information.
The same MULTIMEDIA DATA STREAM with audio encoded with variable sampling rates and compression ratios, and then packaged as COMMENT TYPE INFORMATION in such a way that a long time period of low quality audio and short periods of higher quality audio is redundantly transmitted.
The same MULTIMEDIA DATA STREAM with types of multimedia information, such as but not limited to text and subtext, language and country specific cues, force feedback cues, control information, and client side 3-d surface model rendering and texture information, 1o program flow elements, camera viewpoint information encoded as COMMENT TYPE
INFORMATION.
A MULTIMEDIA CONTENT CREATION apparatus comprising the following components:
Software or hardware that can take an industry standard interface for capturing audio, video, and other types of multimedia information, such as but not limited to text and subtext, language and country specific cues, force feedback cues, control information, and client side 3-d surface model rendering and texture information, program flow elements, camera viewpoint information, and then compressing and encoding the information into a MULTIMEDIA DATA STREAM format as described above and then storing the data into a 2o MULTIMEDIA DATA STORE.
A MULTIMEDIA TRANSMISSION apparatus comprising the following components:
A MULTIMEDIA DATA STORE that will, on an authenticated or unauthenticated request, transmit the previously described MULTIMEDIA DATA STREAM to another MULTIMEDIA TRANSMISSION apparatus in its entirety.
A MULTIMEDIA TRANSMISSION RELAY that will, on an authenticated or unauthenticated request, set up a network point that one or many MULTIMEDIA
DATA
STORE can transmit to, and that one or many MULTIMEDIA RENDERING apparatus can request said MULTIMEDIA DATA.
The same apparatus that can, based on time specified acknowledgement information, will modify the information that is presented by a process of parsing, merging, and filtering in such a way that required information is always sent redundantly, and less important information is removed first, based on a selection criteria specified by the MULTIMEDIA
RENDERING apparatus.
The same apparatus that will collect and store information based on the audience demographic, and may or may not modify the MULTIMEDIA DATA STREAM to accommodate visual cues, market based product placement, The same apparatus that will post-buffer the information that has already been sent, so that at the request from the MULTIMEDIA RENDERING apparatus, the missing information can be retransmitted at faster than real time rates.
A MULTIMEDIA RENDERING apparatus comprising the following components:
A software program or hardware device that can receive through some communication channel in a timely manner from reception time, the previously mentioned MULTIMEDIA
DATA STREAM and will produce a video picture stream and audio stream that can be presented to an audience The same apparatus that can present all other types of multimedia information, such as but not limited to text and subtext, language and country specific cues, force feedback cues, control information, and client side 3-d surface model rendering and texture information, program flow elements, camera viewpoint information.
2o The same apparatus that may or may not be a stand alone application, a plug in for an existing application, a standalone piece of hardware, or a component for an existing piece of hardware that may or may not have been originally intended for the use of being a MULTIMEDIA
RENDERING DEVICE, but can be easily modified to be such a device.
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims.

Claims

1. A method for multimedia compression, comprising the steps of, for each frame:
capturing image data for a frame;
capturing audio data for the frame;
compressing the image data;
encoding the audio data within the comment field of the compressed image data;

2. A method for multimedia compression, comprising the steps of:
capturing raw audio data for a frame;
converting the raw audio data to provide back half audio data and front half audio data;
capturing raw video data for a frame;
flipping the frame diagonally to provide flipped video data;
collecting every other pixel in the flipped video data;
encoding the remaining uncollected pixels to provide back half video data;
converting the collected pixels to YUV space to provide front half video data;
comparing and storing the back half data using a continuous tone compression algorithm.

3. A system for implementing the method according to claims 1 and 2.