EP1512286A1 - Structuration en couches temporelle, par resolution codee et en filigrane numerique des televisions de pointe - Google Patents

Structuration en couches temporelle, par resolution codee et en filigrane numerique des televisions de pointe

Info

Publication number
EP1512286A1
EP1512286A1 EP02747897A EP02747897A EP1512286A1 EP 1512286 A1 EP1512286 A1 EP 1512286A1 EP 02747897 A EP02747897 A EP 02747897A EP 02747897 A EP02747897 A EP 02747897A EP 1512286 A1 EP1512286 A1 EP 1512286A1
Authority
EP
European Patent Office
Prior art keywords
unit
units
watermark
ofthe
encrypted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP02747897A
Other languages
German (de)
English (en)
Other versions
EP1512286A4 (fr
Inventor
Gary A. Demos
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corp filed Critical Dolby Laboratories Licensing Corp
Publication of EP1512286A1 publication Critical patent/EP1512286A1/fr
Publication of EP1512286A4 publication Critical patent/EP1512286A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/16Analogue secrecy systems; Analogue subscription systems
    • H04N7/167Systems rendering the television signal unintelligible and subsequently intelligible
    • H04N7/1675Providing digital key or authorisation information for generation or regeneration of the scrambling sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • G06T1/0085Time domain based watermarking, e.g. watermarks spread over several images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • H04N19/467Embedding additional information in the video signal during the compression process characterised by the embedded information being invisible, e.g. watermarking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2347Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving video stream encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25808Management of client data
    • H04N21/25841Management of client data involving the geographical location of the client
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/26613Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel for generating or managing keys in general
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0052Embedding of the watermark in the frequency domain
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0053Embedding of the watermark in the coding stream, possibly without decoding; Embedding of the watermark in the compressed domain

Definitions

  • This invention relates to electronic communication systems, and more particularly to an advanced electronic television system having temporal and resolution layering of compressed image frames, and which provides encryption and watermarking capabilities.
  • One method specifically intended to provide for such scalability is the MPEG-2 standard.
  • the temporal and spatial scalability features specified within the MPEG-2 standard are not sufficiently efficient to accommodate the needs of advanced television for the U.S.
  • the proposal for advanced television for the U.S. is based upon the premise that temporal (frame rate) and spatial (resolution) layering are inefficient, and therefore discrete formats are necessary.
  • Movie distribution digitally to movie theaters is becoming feasible.
  • the high value copies of new movies have long been a target for theft or copying of today's film prints.
  • Digital media such as DND have attempted crude encryption and authorization schemes (such as DINX).
  • Analog cable scramblers have been in use from the beginning to enable charging for premium cable channels and pay-per-view events and movies. However these crude scramblers have been broadly compromised.
  • the present invention overcomes these and other problems of current digital content protection systems.
  • the present invention provides a method and apparatus for image compression which demonstrably achieves better than 1000-line resolution image compression at high frame rates with high quality. It also achieves both temporal and resolution scalability at this resolution at high frame rates within the available bandwidth of a conventional television broadcast channel.
  • the inventive technique efficiently achieves over twice the compression ratio being proposed for advanced television while providing for flexible encryption and watermarking techniques.
  • Image material is preferably captured at an initial or primary framing rate of 72 fps.
  • An MPEG-2 data stream is then generated, comprising: (1) a base layer, preferably encoded using only MPEG-2 P frames, comprising a low resolution (e.g., 1024x512 pixels), low frame rate (24 or 36 Hz) bitstream; (2) an optional base resolution temporal enhancement layer, encoded using only MPEG-2 B frames, comprising a low resolution (e.g. , 1024x512 pixels), high frame rate (72 Hz) bitstream; (3) an optional base temporal high resolution enhancement layer, preferably encoded using only MPEG-2 P frames, comprising a high resolution (e.g.
  • an optional high resolution temporal enhancement layer encoded using only MPEG-2 B frames, comprising a high resolution (e.g., 2kxlk pixels), high frame rate (72 Hz) bitstream.
  • the invention provides a number of key technical attributes, allowing substantial improvement over current proposals, and including: replacement of numerous resolutions and frame rates with a single layered resolution and frame rate; no need for interlace in order to achieve better than 1000-lines of resolution for 2 megapixel images at high frame rates (72 Hz) within a 6 MHz television channel; compatibility with computer displays through use of a primary framing rate of 72 fps; and greater robustness than the current unlayered format proposal for advanced television, since all available bits maybe allocated to a lower resolution base layer when "stressful" image material is encountered.
  • the disclosed layered compression technology allows a form of modularized decomposition of an image.
  • This modularity has additional benefits beyond allowing scalable decoding and better stress resilience.
  • the modularity can be further tapped as a structure which supports flexible encryption and watermarking techniques.
  • the function of encryption is to restrict viewing, performance, copying, or other use of audio/video shows unless one or more proper keys are applied to an authorized decryption system.
  • the function of watermarking is to track lost or stolen copies back to a source, so that the nature ofthe method of theft can be determined to improve the security ofthe system, and so that those involved in the theft can be identified.
  • the base layer, and various internal components ofthe base layer can be used to encrypt a compressed layered movie stream.
  • the entire picture stream can be made unrecognizable (unless decrypted) by encrypting only a small fraction of the bits of the entire picture stream.
  • a variety of encryption algorithms and strengths can be applied to various portions ofthe layered stream, including the enhancement layers (which can be seen as apremium quality service, and encrypted specially). Encryption algorithms or keys can be changed at each slice boundary as well, to provide greater intertwining ofthe encryption and the image stream.
  • the inventive layered compression structure can also be used for watermarking.
  • the goal of watermarking is to be reliably identifiable to detection, yet be essentially invisible to the eye.
  • Low order bits in DC coefficients in I frames would be invisible to the eye, but yet could be used to uniquely identify a particular picture stream with a watermark.
  • Enhancement layers can also have their own unique identifying watermark structure.
  • FIG. 1 is a timing diagram showing the pulldown rates for 24 fps and 36 fps material to be displayed at 60 Hz.
  • FIG. 2 is a first preferred MPEG-2 coding pattern.
  • FIG. 3 is a second preferred MPEG-2 coding pattern.
  • FIG. 4 is a block diagram showing temporal layer decoding in accordance with the preferred embodiment ofthe present invention.
  • FIG. 5 is a block diagram showing 60 Hz interlaced input to a converter that can output both 36 Hz and 72 Hz frames.
  • FIG. 6 is a diagram showing a "master template" for a base MPEG-2 layer at 24 or 36 Hz.
  • FIG. 7 is a diagram showing enhancement of a base resolution template using hierarchical resolution scalability utilizing MPEG-2.
  • FIG. 8 is a diagram showing the preferred layered resolution encoding process.
  • FIG. 9 is a diagram showing the preferred layered resolution decoding process.
  • FIG. 10 is a block diagram showing a combination of resolution and temporal scalable options for a decoder in accordance with the present invention.
  • FIG. 11 is a diagram showing the scope of encryption and watermarking as a function of unit dependency.
  • FIGS. 12A and 12B show diagrams of image frames with different types of watermarks.
  • FIG. 13 is a flowchart showing one method of applying the encryption techniques of the invention.
  • FIG. 14 is a flowchart showing one method of applying the watermarking techniques of the invention. Like reference symbols in the various drawings indicate like elements.
  • Optimal presentation on a 72 or 75 Hz display will occur if a camera or simulated image is created having a motion rate equal to the display rate (72 or 75 Hz, respectively), and vice versa.
  • optimal motion fidelity on a 60 Hz display will result from a 60 Hz camera or simulated image.
  • Use of 72 Hz or 75 Hz generation rates with 60 Hz displays results in a 12 Hz or 15 Hz beat frequency, respectively. This beat can be removed through motion analysis, but motion analysis is expensive and inexact, often leading to visible artifacts and temporal aliasing.
  • the beat frequency dominates the perceived display rate, making the 12 or 15 Hz beat appear to provide less accurate motion than even 24 Hz.
  • 24 Hz forms a natural temporal common denominator between 60 and 72 Hz.
  • 75 Hz has a slightly higher 15 Hz beat with 60 Hz, its motion is still not as smooth as 24 Hz, and there is no integral relationship between 75 Hz and 24 Hz unless the 24 Hz rate is increased to 25 Hz.
  • European 50 Hz countries movies are often played 4% fast at 25 Hz; this can be done to make film presentable on 75 Hz displays.
  • Motion Blur In order to further explore the issue of finding a common temporal rate higher than 24 Hz, it is useful to mention motion blur in the capture of moving images.
  • Camera sensors and motion picture film are open to sensing a moving image for a portion of the duration of each frame.
  • the duration of this exposure is adjustable.
  • Film cameras require a period of time to advance the film, and are usually limited to being open only about 210 out of 360 degrees, or a 58% duty cycle.
  • some portion ofthe frame time is often required to "read” the image from the sensor. This can vary from 10% to 50% ofthe frame time.
  • an electronic shutter must be used to blank the light during this readout time.
  • the "duty cycle" of CCD sensors usually varies from 50 to 90%, and is adjustable in some cameras. The light shutter can sometimes be adjusted to further reduce the duty cycle, if desired.
  • the most common sensor duty cycle duration is 50%.
  • Preferred Rate With this issue in mind, one can consider the use of only some ofthe frames from an image sequence captured at 60, 72, or 75 Hz. Utilizing one frame in two, three, four, etc., the subrates shown in TABLE 1 can be derived.
  • the rate of 15 Hz is a unifying rate between 60 and 75 Hz.
  • the rate of 12 Hz is a unifying rate between 60 and 72 Hz.
  • 24 Hz is not common, but the use of 3-2 pulldown has come to be accepted by the industry for presentation on 60 Hz displays.
  • the only candidate rates are therefore 30, 36, and 37.5 Hz. Since 30 Hz has a 7.5 Hz beat with 75 Hz, and a 6 Hz beat with 72 Hz, it is not useful as a candidate.
  • the motion rates of 36 and 37.5 Hz become prime candidates for smoother motion than 24 Hz material when presented on 60 and 72/75 Hz displays. Both of these rates are about 50% faster and smoother than 24 Hz.
  • the rate of 37.5 Hz is not suitable for use with either 60 or 72 Hz, so it must be eliminated, leaving only 36 Hz as having the desired temporal rate characteristics. (The motion rate of 37.5 Hz could be used if the 60 Hz display rate for television can be move 4% to 62.5 Hz. Given the interests behind 60 Hz, 62.5 Hz appears unlikely D there are even those who propose the very obsolete 59.94 Hz rate for new television systems.
  • the 3-2 pulldown pattern for 24 Hz material repeats a first frame (or field) 3 times, then the next frame 2 times, then the next frame 3 times, then the next frame 2 times, etc.
  • each pattern optimally should be repeated in a 2-1-2 pattern. This can be seen in TABLE 2 and graphically in FIG. 1.
  • 36 Hz is the optimum rate for a master, unifying motion capture and image distribution rate for use with 60 and 72 Hz displays, yielding smoother motion than 24 Hz material presented on such displays.
  • 36 Hz meets the goals set forth above, it is not the only suitable capture rate. Since 36 Hz cannot be simply extracted from 60 Hz, 60 Hz does not provide a suitable rate for capture. However, 72 Hz can be used for capture, with every other frame then used as the basis for 36 Hz distribution. The motion blur from using every other frame of 72 Hz material will be half of the motion blur at 36 Hz capture. Tests of motion blur appearance of every third frame from 72 Hz show that staccato strobing at 24 Hz is objectionable. However, utilizing every other frame from 72 Hz for 36 Hz display is not objectionable to the eye compared to 36 Hz native capture.
  • 36 Hz affords the opportunity to provide very smooth motion on 72 Hz displays by capturing at 72 Hz, while providing better motion on 60 Hz displays than 24 Hz material by using alternate frames of 72 Hz native capture material to achieve a 36 Hz distribution rate and then using 2-1-2 pulldown to derive a 60 Hz image.
  • TABLE 3 shows the preferred optimal temporal rates for capture and distribution in accordance with the present invention.
  • digital source material having the preferred temporal rate of 36 Hz should be compressed.
  • the preferred form of compression for the present invention is accomplished by using a novel variation of the MPEG-2 standard.
  • MPEG-2 Basics.
  • MPEG-2 is an international video compression standard defining a video syntax that provides an efficient way to represent image sequences in the form of more compact coded data.
  • the language ofthe coded bits is the "syntax.” For example, a few tokens can represent an entire block of 64 samples.
  • MPEG also describes a decoding (reconstruction) process where the coded bits are mapped from the compact representation into the original, "raw" format of the image sequence. For example, a flag in the coded bitstream signals whether the following bits are to be decoded with a discrete cosine transform (DCT) algorithm or with a prediction algorithm.
  • DCT discrete cosine transform
  • the algorithms comprising the decoding process are regulated by the semantics defined by MPEG.
  • MPEG-2 defines a programming language as well as a data format.
  • An MPEG-2 decoder must be able to parse and decode an incoming data stream, but so long as the data stream complies with the MPEG-2 syntax, a wide variety of possible data structures and compression techniques can be used.
  • the present invention takes advantage of this flexibility by devising a novel means and method for temporal and resolution scaling using the MPEG-2 standard.
  • MPEG-2 uses an intraframe and an interfrarne method of compression. In most video scenes, the background* remains relatively stable while action takes place in the foreground. The background may move, but a great deal of the scene is redundant.
  • MPEG-2 starts its compression by creating a reference frame called an I (for Intra) frame. I frames are compressed without reference to other frames and thus contain an entire frame of video information. I frames provide entry points into a data bitstream for random access, but can only be moderately compressed. Typically, the data representing I frames is placed in the bitstream every 10 to 15 frames. Thereafter, since only a small portion ofthe frames that fall between the reference I frames are different from the bracketing I frames, only the differences are captured, compressed and stored. Two type of frames are used for such differences D P (for Predicted) frames and B (for Bi-directional interpolated) frames.
  • D P for Predicted
  • B for Bi-directional interpolated
  • P frames generally are encoded with reference to a past frame (either an I frame or a previous P frame), and, in general, will be used as a reference for future P frames.
  • P frames receive a fairly high amount of compression.
  • B frames pictures provide the highest amount of compression but generally require both a past and a future reference in order to be encoded.
  • Bi-directional frames are never used for reference frames.
  • Macroblocks within P frames may also be individually encoded using infra-frame coding.
  • Macroblocks within B frames may also be individually encoded using intra-frame coding, forward predicted coding, backward predicted coding, or both forward and backward, or bi-directionally interpolated, predicted coding.
  • a macroblock is a 16x 16 pixel grouping of four 8x8 DCT blocks, together with one motion vector for P frames, and one or two motion vectors for B frames.
  • an MPEG data bitstream comprises a sequence of I, P, and B frames.
  • a sequence may consist of almost any pattern of I, P, and B frames (there are a few minor semantic restrictions on their placement). However, it is common in industrial practice to have a fixed pattern (e.g., IBBPBBPBBPBBPBB).
  • an MPEG-2 data stream comprising a base layer, at least one optional temporal enhancement layer, and an optional resolution enhancement layer.
  • a base layer comprising a base layer, at least one optional temporal enhancement layer, and an optional resolution enhancement layer.
  • the base layer is used to carry 36 Hz source material.
  • one of two MPEG-2 frame sequences can be used for the base layer: IBPBPBP or IPPPPPP.
  • IBPBPBPBP IBPBPBP
  • IPPPPPP IPPPPPP
  • 72 Hz Temporal Enhancement Layer When using MPEG-2 compression, it is possible to embed a 36 Hz temporal enhancement layer as B frames within the MPEG-2 sequence for the 36 Hz base layer if the P frame distance is even. This allows the single data stream to support both 36 Hz display and 72 Hz display. For example, both layers could be decoded to generate a 72 Hz signal for computer monitors, while only the base layer might be decoded and converted to generate a 60 Hz signal for television.
  • the MPEG-2 coding patterns of IPBBBPBBBPBBBP or IPBPBPBPB both allow placing alternate frames in a separate stream containing only temporal enhancement B frames to take 36 Hz to 72 Hz. These coding patterns are shown in FIG. S 2 and 3, respectively.
  • the 2-Frame P spacing coding pattern of FIG. 3 has the added advantage that the 36 Hz decoder would only need to decode P frames, reducing the required memory bandwidth if 24 Hz movies were also decoded without B frames.
  • FIG. 3 is a block diagram showing that a 36 Hz base layer MPEG-2 decoder 50 simply decodes the P frames to generate 36 Hz output, which may then be readily converted to either 60 Hz or 72 Hz display.
  • An optional second decoder 52 simply decodes the B frames to generate a second 36 Hz output, which when combined with the 36 Hz output of the base layer decoder 50 results in a 72 Hz output (a method for combining is discussed below).
  • one fast MPEG-2 decoder 50 could decode both the P frames for the base layer and the B frames for the enhancement layer.
  • profiles for resolutions and frame rates. Although these profiles are strongly biased toward computer-incompatible format parameters such as 60 Hz, non-square pixels, and interlace, many chip manufacturers appear to be developing decoder chips which operate at the "main profile, main level". This profile is defined to be any horizontal resolution up to 720 pixels, any vertical resolution up to 576 lines at up to 25 Hz, and any frame rate of up to 480 lines at up to 30 Hz. A wide range of data rates from approximately 1.5 Mbits/second to about 10 Mbits/second is also specified. However, from a chip point of view, the main issue is the rate at which pixels are decoded. The main-level, main-profile pixel rate is about 10.5
  • MPEG-2 decoder chips Although there is variation among chip manufacturers, most MPEG-2 decoder chips will in fact operate at up to 13 MPixels/second, given fast support memory. Some decoder chips will go as fast as 20 MPixels/second or more. Given that CPU chips tend to gain 50% improvement or more each year at a given cost, one can expect some near-term flexibility in the pixel rate of MPEG-2 decoder chips.
  • TABLE 4 illustrates some desirable resolutions and frame rates, and their corresponding pixel rates.
  • All of these formats can be utilized with MPEG-2 decoder chips that can generate at least 12.6 MPixels/second.
  • the very desirable 640x480 at 36 Hz format can be achieved by nearly all current chips, since its rate is 11.1 MPixels/second.
  • a widescreen 1024x512 image can be squeezed into 680x512 using a 1.5:1 squeeze, and can be supported at 36 Hz if 12.5 MPixels/second can be handled.
  • the highly desirable square pixel widescreen template of 1024x512 can achieve 36 Hz when MPEG-2 decoder chips can process about 18.9 MPixels/second. This becomes more feasible if 24 Hz and 36 Hz material is coded only with P frames, such that B frames are only required in the 72 Hz temporal enhancement layer decoders. Decoders which use only P frames require less memory and memory bandwidth, making the goal of 19 MPixels/second more accessible.
  • the 1024x512 resolution template would most often be used with 2.35:1 and 1.85:1 aspect ratio films at 24 ⁇ s. This material only requires 11.8 MPixels/second, which should fit within the limits of most existing main level-main profile decoders. All of these formats are shown in FIG. 6 in a "master template" for a base layer at 24 or
  • the present invention provides a unique way of accommodating a wide variety of aspect ratios and temporal resolution compared to the prior art. (Further discussion of a master template is set forth below).
  • the temporal enhancement layer of B frames to generate 72 Hz can be decoded using a chip with double the pixel rates specified above, or by using a second chip in parallel with additional access to the decoder memory.
  • merging can be done invisibly to the decoder chip using the MPEG-2 transport layer.
  • the MPEG-2 transport packets for two PIDs can be recognized as containing the base layer and enhancement layer, and their stream contents can both be simply passed on to a double-rate capable decoder chip, or to an appropriately configured pair of normal rate decoders.
  • the data partitioning feature allows the B frames to be marked as belonging to a different class within the MPEG-2 compressed data stream, and can therefore be flagged to be ignored by 36-Hz decoders which only support the temporal base layer rate.
  • Temporal scalability as defined by MPEG-2 video compression, is not as optimal as the simple B frame partitioning ofthe present invention.
  • the MPEG-2 temporal scalability is only forward referenced from a previous P or B frame, and thus lacks the efficiency available in the B frame encoding proposed here, which is both forward and backward referenced.
  • the simple use of B frames as a temporal enhancement layer provides a simpler and more efficient temporal scalability than does the temporal scalability defined within MPEG-2.
  • this use of B frames as the mechanism for temporal scalability is fully compliant with MPEG-2.
  • the two methods of identifying these B frames as an enhancement layer, via data partitioning or alternate PLD's for the B frames, are also fully compliant.
  • Temporal enhancement layer 50/60 Hz Temporal enhancement layer.
  • a 60 Hz temporal enhancement layer (which encodes a 24 Hz signal) can be added in similar fashion to the 36 Hz base layer.
  • a 60 Hz temporal enhancement layer is particular useful for encoding existing 60 Hz interlaced video material.
  • FIG. 5 is a block diagram showing 60 Hz interlaced input from cameras 60 or other sources (such as non-film video tape) 62 to a converter 64 that includes a de-interlacer function and a frame rate conversion function that can output a 36 Hz signal (36 Hz base layer only) and a 72 Hz signal (36 Hz base layer plus 36 Hz from the temporal enhancement layer).
  • a converter 64 that includes a de-interlacer function and a frame rate conversion function that can output a 36 Hz signal (36 Hz base layer only) and a 72 Hz signal (36 Hz base layer plus 36 Hz from the temporal enhancement layer).
  • this conversion process can be adapted to produce a second MPEG-224 Hz temporal enhancement layer on the 36 Hz base layer which would reproduce the original 60 Hz signal, although de-interlaced. If similar quantization is used for the 60 Hz temporal enhancement layer B frames, the data rate should be slightly less than the 72 Hz temporal enhancement layer, since there are fewer B frames.
  • this technique of providing a base and enhancement layer should appear similar to 72 Hz origination in terms of motion blur. Accordingly, few viewers will notice the difference, except possibly as a slight improvement, when interlaced 60 Hz NTSC material is processed into a 36 Hz base layer, plus 24 Hz from the temporal enhancement layer, and displayed at 60 Hz. However, those who buy new 72 Hz digital non-interlaced televisions will notice a small improvement when viewing NTSC, and a major improvement when viewing new material captured or originated at 72 Hz. Even the decoded 36 Hz base layer presented on 72 Hz displays will look as good as high quality digital
  • PAL video tapes are best slowed to 48 Hz prior to such conversion.
  • Live PAL requires conversion using the relatively unrelated rates of 50, 36, and 72 Hz.
  • Such converter units presently are only affordable at the source of broadcast signals, and are not presently practical at each receiving device in the home and office.
  • the process of resolution enhancement can be achieved by generating a resolution enhancement layer as an independent MPEG-2 stream and applying MPEG-2 compression to the enhancement layer. This technique differs from the "spatial scalability" defined with
  • MPEG-2 which has proven to be highly inefficient.
  • MPEG-2 contains all of the tools to construct an effective layered resolution to provide spatial scalability.
  • the preferred layered resolution encoding process ofthe present invention is shown in FIG. 8.
  • the preferred decoding process ofthe present invention is shown in FIG. 9.
  • Resolution Layer Coding In FIG. 8, an original 2kxlk image 80 is filtered in conventional fashion to 1/2 resolution in each dimension to create a 1024x512 base layer 81.
  • the base layer 81 is then compressed according to conventional MPEG-2 algorithms, generating an MPEG-2 base layer 82 suitable for transmission.
  • full MPEG-2 motion compensation can be used during this compression step. That same signal is then decompressed using conventional MPEG-2 algorithms back to a 1024x512 image 83.
  • the 1024x512 image 83 is expanded (for example, by pixel replication, or preferably by better filters such as spline interpolation) to a first 2kxlk enlargement 84. Meanwhile, as an optional step, the filtered 1024x512 base layer 81 is expanded to a second 2kxlk enlargement 85. This second 2kxlk enlargement 85 is subtracted from the original 2k lk image 80 to generate an image that represents the top octave of resolution between the original high resolution image 80 and the original base layer image 81.
  • the resulting image is optionally multiplied by a sharpness factor or weight, and then added to the difference between the original 2kxlk image 80 and the second 2kxlk enlargement 85 to generate a center-weighted 2kxlk enhancement layer source image 86.
  • This enhancement layer source image 86 is then compressed according to conventional MPEG-2 algorithms, generating a separate MPEG-2 resolution enhancement layer 87 suitable for transmission. Importantly, full MPEG-2 motion compensation can be used during this compression step.
  • the base layer 82 is decompressed using conventional MPEG-2 algorithms back to a 1024x512 image 90.
  • the 1024x512 image 90 is expanded to a first 2kxlk image 91.
  • the resolution enhancement layer 87 is decompressed using conventional MPEG-2 algorithms back to a second 2kxlk image 92.
  • the first 2kxlk image 91 and the second 2kxlk image 92 are then added to generate a high- resolution 2kxlk image 93.
  • the enhancement layer is created by expanding the decoded base layer, taking the difference between the original image and the decode base layer, and compressing.
  • a compressed resolution enhancement layer may be optionally added to the base layer after decoding to create a higher resolution image in the decoder.
  • the inventive layered resolution encoding process differs from MPEG-2 spatial scalability in several ways:
  • the enhancement layer difference picture is compressed as its own MPEG-2 data stream, with I, B, and P frames. This difference represents the major reason that resolution scalability, as proposed here, is effective, where MPEG-2 spatial scalability is ineffective.
  • the spatial scalability defined within MPEG-2 allows an upper layer to be coded as the difference between the upper layer picture and the expanded base layer, or as a motion compensated MPEG-2 data stream of the actual picture, or a combination of both. However, neither of these encodings is efficient.
  • the difference from the base layer could be considered as an I frame of the difference, which is inefficient compared to a motion-compensated difference picture, as in the present invention.
  • the upper-layer encoding defined within MPEG-2 is also inefficient, since it is identical to a complete encoding of the upper layer.
  • the motion compensated encoding ofthe difference picture, as in the present invention is therefore substantially more efficient.
  • the MPEG-2 systems transport layer (or another similar mechanism) must be used to multiplex the base layer and enhancement layer.
  • the expansion and resolution reduction filtering can be a gaussian or spline function, which are more optimal than the bilinear interpolation specified in MPEG-2 spatial scalability.
  • the image aspect ratio must match between the lower and higher layers in the preferred embodiment.
  • MPEG-2 spatial scalability extensions to width and/or height are allowed. Such extensions are not allowed in the preferred embodiment due to efficiency requirements.
  • the entire area ofthe enhancement layer is not coded.
  • the area excluded from enhancement will be the border area.
  • the 2kxlk enhancement layer source image 86 in the preferred embodiment is center-weighted, hi the preferred embodiment, a fading function (such as linear weighting) is used to "feather" the enhancement layer toward the center ofthe image and away from the border edge to avoid abrupt transitions in the image.
  • any manual or automatic method of determining regions having detail which the eye will follow can be utilized to select regions which need detail, and to exclude regions where extra detail is not required. All ofthe image has detail to the level ofthe base layer, so all ofthe image is present. Only the areas of special interest benefit from the enhancement layer. In the absence of other criteria, the edges or borders ofthe frame can be excluded from enhancement, as in the center-weighted embodiment described above.
  • “lower_layer_prediction_horizontal&vertical offset” parameters used as signed negative integers, combined with the “horizontal&vertical_subsampling_factor_m&n” values, can be used to specify the enhancement layer rectangle's overall size and placement within the expanded base layer. • A sharpness factor is added to the enhancement layer to offset the loss of sharpness which occurs during quantization. Care must be taken to utilize this parameter only to restore the clarity and sharpness ofthe original picture, and not to enhance the image.
  • the sharpness factor is the "high octave" of resolution between the original high resolution image 80 and the original base layer image 81 (after expansion).
  • This high octave image will be quite noisy, in addition to containing the sharpness and detail ofthe high octave of resolution. Adding too much of this image can yield instability in the motion compensated encoding of the enhancement layer.
  • the amount that should be added depends upon the level ofthe noise in the original image. A typical weighting value is 0.25. For noisy images, no sharpness should be added, and it even may be advisable to suppress the noise in the original for the enhancement layer before compressing using conventional noise suppression techniques which preserve detail.
  • Temporal and resolution scalability are intermixed by utilizing B frames for temporal enhancement from 36 to 72 Hz in both the base and resolution enhancement layers. In this way, four possible levels of decoding performance are possible with two layers of resolution scalability, due to the options available with two levels of temporal scalability.
  • Optional Non-MPEG-2 Coding ofthe Resolution Enhancement Layer It is possible to utilize a different compression technique for the resolution enhancement layer than MPEG-2. Further, it is not necessary to utilize the same compression technology for the resolution enhancement layer as for the base layer. For example, motion-compensated block wavelets can be utilized to match and track details with great efficiency when the difference layer is coded. Even if the most efficient position for placement of wavelets jumps around on the screen due to changing amounts of differences, it would not be noticed in the low-amplitude enhancement layer. Further, it is not necessary to cover the entire image D it is only necessary to place the wavelets on details. The wavelets can have their placement guided by detail regions in the image. The placement can also be biased away from the edge.
  • a 2kxlk template can efficiently support the common widescreen aspect ratios of 1.85:1 and 2.35:1.
  • A2kxlk template can also accommodate 1.33:1 and other aspect ratios.
  • integers especially the factor of 2 and simple fractions (3/2 & 4/3) are most efficient step sizes in resolution layering, it is also possible to use arbitrary ratios to achieve any required resolution layering.
  • using a 2048x1024 template, or something near it, provides not only a high quality digital master format, but also can provide many other convenient resolutions from a factor of two base layer (lkx512), including NTSC, the U.S. television standard.
  • digital mastering formats should be created in the frame rate ofthe film if from existing movies (i.e., at 24 frames er second).
  • the common use of both 3-2 pulldown and interlace would be inappropriate for digital film masters.
  • the digital image masters should be made at whatever frame rate the images are captured, whether at 72 Hz, 60 Hz, 36 Hz, 37.5 Hz, 75 Hz, 50 Hz, or other rates.
  • the concept of a mastering format as a single digital source picture format for all electronic release formats differs from existing practices, where PAL, NTSC, letterbox, pan- and-scan, HDTV, and other masters are all generally independently made from a film original.
  • mastering format allows both film and digital/electronic shows to be mastered once, for release on a variety of resolutions and formats.
  • Temporal enhancement is provided by decoding B frames.
  • the resolution enhancement layer also has two temporal layers, and thus also contains B frames.
  • the most efficient and lowest cost decoders might use only P frames, thereby minimizing both memory and memory bandwidth, as well as simplifying the decoder by eliminating B frame decoding.
  • decoding movies at 24 ⁇ s and decoding advanced television at 36 ⁇ s could utilize a decoder without B frame capability.
  • B frames can then be utilized between P frames to yield the higher temporal layer at 72 Hz, as shown in FIG. 3, which could be decoded by a second decoder. This second decoder could also be simplified, since it would only have to decode B frames.
  • the resolution enhancement layer can add the full temporal rate of 72 Hz at high resolution by adding B frame decoding within the resolution enhancement layer.
  • FIG. 10 The combined resolution and temporal scalable options for a decoder are illustrated in FIG. 10.
  • This example also shows an allocation of the proportions of an approximately 18 mbits/second data stream to achieve the spatio-temporal layered Advanced Television ofthe present invention.
  • a base layer MPEG-2 1024x512 pixel data stream (comprising only
  • the base resolution decoder 100 can decode at 24 or 36 ⁇ s.
  • the output of the base resolution decoder 100 comprises low resolution, low frame rate images (1024x512 pixels at 24 or 36 Hz).
  • the B frames from the same data stream are parsed out and applied to a base resolution temporal enhancement layer decoder 102. Approximately 3 mbits/per sec of bandwidth is required for such B frames.
  • the output ofthe base resolution decoder 100 is also coupled to the temporal enhancement layer decoder 102.
  • the temporal enhancement layer decoder 102 can decode at 36 ⁇ s.
  • the combined output ofthe temporal enhancement layer decoder 102 comprises low resolution, high frame rate images (1024x512 pixels at 72 Hz).
  • a resolution enhancement layer MPEG-2 2kxlk pixel data stream (comprising only P frames in the preferred embodiment) is applied to a base temporal high resolution enhancement layer decoder 104. Approximately 6 bits/per sec of bandwidth is required for the P frames.
  • the output ofthe base resolution decoder 100 is also coupled to the high resolution enhancement layer decoder 104.
  • the high resolution enhancement layer decoder 104 can decode at 24 or 36 ⁇ s.
  • the output ofthe high resolution enhancement layer decoder 104 comprises high resolution, low frame rate images (2kxlk pixels at 24 or 36 Hz).
  • the B frames from the same data stream are parsed out and applied to a high resolution temporal enhancement layer decoder 106.
  • the output ofthe high resolution enhancement layer decoder 104 is coupled to the high resolution temporal enhancement layer decoder 106.
  • the output ofthe temporal enhancement layer decoder 102 is also coupled to the high resolution temporal enhancement layer decoder 106.
  • the high resolution temporal enhancement layer decoder 106 can decode at 36 ⁇ s.
  • the combined output ofthe high resolution temporal enhancement layer decoder 106 comprises high resolution, high frame rate images (2kxlk pixels at 72 Hz).
  • MPEG-2 encoding syntax also provides efficient motion representation through the use of motion-vectors in both the base and enhancement layers. Up to some threshold of high noise and rapid image change, MPEG-2 is also efficient at coding details instead of noise within an enhancement layer through motion compensation in conjunction with DCT quantization. Above this threshold, the data bandwidth is best allocated to the base layer.
  • the compression ratios in TABLE 5 are much higher.
  • One reason for this is the loss of some coherence due to interlace. Interlace negatively affects both the ability to predict subsequent frames and fields, as well as the correlation between vertically adjacent pixels. Thus, a major portion ofthe gain in compression efficiency described here is due to the absence of interlace.
  • the large compression ratios achieved by the present invention can be considered from the perspective ofthe number of bits available to code each MPEG-2 macroblock.
  • macroblock is a 16x16 pixel grouping of four 8x8 DCT blocks, together with one motion vector for P frames, and one or two motion vectors for B " frames. The bits available per macroblock for each layer are shown in TABLE 6.
  • High Temporal 18 (5+3+6+4) 123 37 overall, 30/enh. layer w/border around hi-res center
  • the available number of bits to code each macroblock is smaller in the enhancement layer than in the base layer. This is appropriate, since it is desirable for the base layer to have as much quality as possible.
  • the motion vector requires 8 bits or so, leaving 10 to 25 bits for the macroblock type codes and for the DC and AC coefficients for all four 8x8 DCT blocks. This leaves room for only a few "strategic" AC coefficients. Thus, statistically, most of the information available for each macroblock must come from the previous frame of an enhancement layer. It is easily seen why the MPEG-2 spatial scalability is ineffective at these compression ratios, since there is not sufficient data space available to code enough DC and AC coefficients to represent the high octave of detail represented by the enhancement difference image. The high octave is represented primarily in the fifth through eighth horizontal and vertical AC coefficients. These coefficients cannot be reached if there are only a few bits available per DCT block.
  • the system described here gains its efficiency by utilizing motion compensated prediction from the previous enhancement difference frame. This is demonstrably effective in providing excellent results in temporal and resolution (spatial) layered encoding.
  • the temporal scaling and resolution scaling techniques described here work well for normal-running material at 72 frames per second using a 2kx Ik original source. These techniques also work well on film-based material which runs at 24 ⁇ s. At high frame rates, however, when a very noise-like image is coded, or when there are numerous shot cuts within an image stream, the enhancement layers may lose the coherence between frames which is necessary for effective coding. Such loss is easily detected, since the buffer-fullness/rate-control mechanism of a typical MPEG-2 encoder/decoder will attempt to set the quantizer to very coarse settings.
  • all of the bits normally used to encode the resolution enhancement layers can be allocated to the base layer, since the base layer will need as many bits as possible in order to code the stressful material. For example, at between about 0.5 and 0.33 MPixels per frame for the base layer, at 72 frames per second, the resultant pixel rate will be 24 to 36 MPixels/second. Applying all of the available bits to the base layer provides about 0.5 to 0.67 million additional bits per frame at 18.5 mbits/second, which should be sufficient to code very well, even on stressful material.
  • the adaptive quantization level is controlled by the output buffer fullness. At the high compression ratios involved in the resolution enhancement layer of the present invention, this mechanism may not function optimally.
  • Various techniques can be used to optimize the allocation of data to the most appropriate image regions. The conceptually simplest technique is to perform a pre-pass of encoding over the resolution enhancement layer to gather statistics and to search out details which should be preserved. The results from the pre-pass can be used to set the adaptive quantization to optimize the preservation of detail in the resolution enhancement layer.
  • the settings can also be artificially biased to be non-uniform over the image, such that image detail is biased to allocation in the main screen regions, and away from the macroblocks at the extreme edges ofthe frame.
  • 36 Hz As a new common ground temporal rate appears to be optimal. Demonstrations ofthe use of this frame rate indicate that it provides significant improvement over 24 Hz for both 60 Hz and 72 Hz displays. Images at 36 Hz can be created by utilizing every other frame from 72 Hz image capture. This allows combining a base layer at 36 Hz (preferably using P frames) and a temporal enhancement layer at 36 Hz (using B frames) to achieve a 72 Hz display.
  • the "future-looking" rate of 72 Hz is not compromised by the inventive approach, while providing transition for 60 Hz analog NTSC display.
  • the invention also allows a transition for other 60 Hz displays, if other passive-entertainment-only (computer incompatible) 60 Hz formats under consideration are accepted.
  • Resolution scalability can be achieved though using a separate MPEG-2 image data stream for a resolution enhancement layer. Resolution scalability can take advantage ofthe B frame approach to provide temporal scalability in both the base resolution and enhancement resolution layers.
  • the invention described here achieves many highly desirable features. It has been claimed by some involved in the U.S. advanced television process that neither resolution nor temporal scalability can be achieved at high definition resolutions witliin the approximately 18.5 mbits/second available in terrestrial broadcast. However, the present invention achieves both temporal and spatial-resolution scalability within this available data rate.
  • the present invention is also very robust, particularly compared to the current proposal for advanced television. This is made possible by the allocation of most or all ofthe bits to the base layer when very stressful image material is encountered. Such stressful material is by its nature both noise-like and very rapidly changing. In these circmnstances, the eye cannot see detail associated with the enhancement layer of resolution. Since the bits are applied to the base layer, the reproduced frames are substantially more accurate than the currently proposed advanced television system, which uses a single constant higher resolution.
  • the inventive system optimizes both perceptual and coding efficiency, while providing maximum visual impact.
  • This system provides a very clean image at a resolution and frame rate performance that had been considered by many to be impossible. It is believed that the inventive system is likely to outperform the advanced television formats being proposed at this time. In addition to this anticipated superior performance, the present invention also provides the highly valuable features of temporal and resolution layering.
  • Layered compression allows a form of modularized decomposition of an image that supports flexible encryption and watermarking techniques.
  • the base layer and various internal components ofthe base layer can be used to encrypt and/or watermark a compressed layered movie data stream. Encrypting and watermarking the compressed data stream reduces the amount of required processing compared to a high resolution data stream, which must be processed at the rate ofthe original data. The amount of computing time required for encryption and watermarking depends on the amount of data that must be processed. For a particular level of computational resources, reducing the amount of data through layered compression can yield improved encryption strength, or reduced the cost of encryption/decryption, or a combination of both.
  • Encryption allows protection ofthe compressed image (and audio) data so that only users with keys can easily access the information.
  • Layered compression divides images into components: a temporal and spatial base layer, plus temporal and spatial enhancement layer components.
  • the base layer is the key to decoding a viewable picture.
  • the enhancement layers both temporal and spatial, are of no value without the decrypted and decompressed base layer. Accordingly, by using such a layered subset of the bits, the entire picture stream can be made unrecognizable by encrypting only a small fraction ofthe bits ofthe entire stream.
  • a variety of encryption algorithms and strengths can be applied to various portions ofthe layered stream, including enhancement layers. Encryption algorithms or keys also can be changed as often as at each slice boundary (a data stream structure meant for signal error recovery), to provide greater intertwining ofthe encryption and the picture stream.
  • Watermarking invisibly (or nearly invisibly) marks copies of a work.
  • the concept originates with the practice of placing an identifiable symbol within paper to ensure that a document (e.g., money) is genuine.
  • Watermarking allows the tracking of copies which may be removed from the possession of an authorized owner or licensee.
  • watermarking can help track lost or stolen copies back to a source, so that the nature ofthe method of theft can be determined and so that those involved in a theft can be identified.
  • the concept of watermarking has been applied to images, by attempting to place a faint image symbol or signature on top ofthe real image being presented.
  • the most widely held concept of electronic watermarking is that it is a visible low-amplitude image, impressed on top ofthe visible high-amplitude image.
  • this approach alters the quality ofthe original image slightly, similar to the process of impressing a network logo in the corner ofthe screen on television. Such alteration is undesirable because it reduces picture quality.
  • the DCT transformation operates in frequency transform space. Any alterations in this space, especially if corrected from frame to frame, may be much less visible (or completely invisible).
  • Watermarking preferably uses low order bits in certain coefficients in certain frames of a layered compression movie stream to provide reliable identification while being invisible or nearly invisible to the eye. Watermarking can be applied to the base layer of a compressed data stream. However, it is possible to protect enhancement layers to a much greater degree than the base layer, since the enhancement layers are very subtle in detail to begin with. Each enhancement layer can have its own unique identifying watermark structure.
  • Encryption preferably operates in such a fashion as to scramble, or at least visibly impair, as many frames as possible from the smallest possible units of encryption.
  • Compression systems such as the various types of MPEG and motion-compensated- wavelets utilize a hierarchy of units of information which must be processed in cascade in order to decode a range of frames (a "Group of Pictures,” or GOP). This characteristic affords opportunities early in the range of concatenated decoded units to encrypt in such a way as to scramble a large range of frames from a small number of parameters.
  • GOP Group of Pictures
  • watermarking has the goal of placing a symbol and/or serial-number- style identification marks on the image stream which are detectable to analysis, but which are invisible or nearly invisible in the image (i.e., yielding no significant visual impairment).
  • watermarking preferably is applied in portions ofthe decoding unit chain which are near the end ofthe hierarchy of units, to yield a minimum impact on each frame within a group of frames.
  • FIG. 11 shows a diagram ofthe scope of encryption and watermarking as a function of unit dependency with respect to I, P, and B frames. Encryption of any frame confounds all subsequent dependent frames.
  • encryption ofthe first I frame confounds all P and B frames derived from that I frame.
  • a watermark on that I frame generally would not carry over to subsequent frames, and thus it is better to watermark the larger number of B frames to provide greater prevalence ofthe watermark throughout the data stream.
  • a compressed MPEG-type or motion-compensated- wavelets bitstream is parsed by normally extracting and processing various fundamental units of compressed information in video. This is true ofthe most efficient compression systems such as MPEG-2, MPEG-4, and motion-compensated wavelets (considering wavelets to have I, P, and B frame equivalents).
  • Such units may consist of multi-frame units (such as a GOP), single frame units (e.g., I, P, and B frame types and their motion- compensated-wavelet equivalents), sub-frame units (such as AC and DC coefficients, macro blocks, and motion vectors), and "distributed units" (described below).
  • each GOP can be encrypted with independent methods and/or keys.
  • each GOP can have the benefits of unique treatment and modularity, and can be decoded and/or decrypted in parallel or out-of-order with other GOPs in non-realtime or near-realtime (slightly delayed by a few seconds) applications (such as electronic cinema and broadcast).
  • the final frames need only be ordered for final presentation.
  • encryption of certain units may confound proper decoding of other units that dependent on information derived from the encrypted unit. That is, some information within a frame may be required for decoding the video information of subsequent frames; encrypting only the earlier frame confounds decoding of later frames that are not otherwise encrypted.
  • Sub-units of frames may be encrypted and still have a confounding affect, while reducing encryption and decryption processing time.
  • encryption ofthe certain infra- frame units influences subsequent frames at various levels as set forth in TABLE 8:
  • Delay can be applied in many applications (such as broadcast and digital cinema), allowing an aggregation of items from units of like types to be encrypted before transmission. This allows for a "distributed unit", where the bits comprising an encryption/decryption unit are physically allocated across a data stream in conventional units ofthe type described above, making decrypting without knowledge ofthe key even more difficult.
  • a sufficient number of conventional units would be aggregated (e.g., in a buffer) and decrypted as a group.
  • DC coefficients can be collected into groups for an entire frame or GOP.
  • motion vectors are coded differentially and predicted one to the next from one macroblock to the next throughout the frame, and thus can be encrypted and decrypted in aggregations.
  • Variable-length-coding tables can also be aggregated into groups and form modular units between "start codes". Additional examples of units or subunits that can be aggregated, encrypted, and then have the encrypted bits separated or spread in the data stream include: motion vectors, DC coefficients, AC coefficients, and quantizer scale factors.
  • one or more ofthe units described above may be selected for encryption, and each unit can be encrypted independently rather than as a combined stream (as with MPEG- 1 , MPEG-2, and MPEG-4). Encryption of each unit may use different keys of different strengths (e.g., number of bits per key) and may use different encryption algorithms.
  • Encryption can be applied uniquely to each distinct copy of a work (when physical media is used, such as DVD-RAM), so that each copy has its own key(s).
  • an encryption algorithm can be applied on the assembled stream with critical portions ofthe stream removed from the data stream or altered before encryption (e.g., by setting all motion vectors for the left-hand macroblocks to zero), thus defining a bulk distribution copy.
  • the removed or altered portion can then be encrypted separately and uniquely for each display site, thereby defining a custom distribution copy that is sent separately to individual sites in a convenient manner (e.g., satellite transmission, modem, Internet, etc.).
  • This technique is useful, for example, where the bulk of a work is distributed on a medium such as a DVD-ROM, while unique copies ofthe smaller critical compression units are separately sent, each with their own unique keys, to independent recipient destinations (e.g., by satellite, Internet, modem, express delivery, etc.). Only when the custom portion is decrypted and recombined with the decrypted bulk distribution copy will the entire work be decodable as a video signal. The larger the bandwidth (size capacity) of such custom information, the larger the portion ofthe image that can be custom encrypted. This technique can be used with watermarking as well.
  • a variant of this approach is to encrypt a subset of units from a data stream as a custom distribution copy, and not encrypt the remaining units at all.
  • the remaining units may be distributed in bulk form, separately from the custom distribution copy. Only when the custom portion is decrypted and recombined with the unencrypted bulk distribution copy will the entire work be decodable as a video signal.
  • One or more overall encryptions can be concatenated or combined with special customized encryptions for various ofthe crucial units of video decoding information. For example, the entire video data stream may be "lightly" encrypted (e.g., using a short key or simple algorithm) while certam key units ofthe data stream are more "heavily” encrypted (e.g., using a longer key or more complex algorithm).
  • the highest resolution and/or temporal layers may be more heavily encrypted to define a premium signal that provides the best appearing image when properly decrypted.
  • Lower layers ofthe image would be unaffected by such encryption. This approach would allow different grades of signal service for end-users. If units are encrypted independently of each other, then decryption may be performed in parallel using one or more concurrently processed decryption methods on separate units within the compressed image stream.
  • the DC coefficients can have extra bits (10 and 11 bits are allowed in MPEG2, and up to 14 bits in MPEG4).
  • the low order bit(s) can code a specific watermark identifier without degrading the image in any visible way. Further, these low order bits might only be present in I frames, since a clear watermark need not be present on every frame.
  • Such an imaged pattern would be detected by subtracting the decoded image from the unperturbed (unwatermarked) decompressed original (and also from the uncompressed original source work), and then greatly increasing the amplitude. A series of very large blurry letters or numbers would then appear.
  • FIGS. 12A and 12B show diagrams of image frames 1200 with different types of watermarks.
  • FIG. 12A shows a frame 1200 with a single symbol ("X") 1202 in one corner.
  • FIG. 12B shows a frame 1200 with a set of marks (dots, in this example) 1204 scattered around the frame 1200.
  • Such watermarks are detectable only by data comparison to yield the watermark signal.
  • a precise decoder can detect LSB variations between an original work and a watermarked work that are invisible to the eye, but which uniquely watermark the customized copy ofthe original work.
  • watermarking may be used that do not impose specific images or symbols, but do form unique patterns in the data streams. For example, certain decisions of coding are nearly invisible, and may be used to watermark a data stream. For example, minor rate control variations are invisible to the eye, but can be used to mark each copy such that each copy has a slightly different number of AC coefficients in some locations. Examples of other such decisions include:
  • second-best choices for motion vectors which are nearly as good as optimum motion vectors may be used to create a watermark code.
  • a system can use second-best selections for exactly the same SADs (sum of absolute differences, a common motion vector match criteria) when and where they occur.
  • Other non-optimum (e.g., third and higher ranked) motion vector matches can also be used, if needed, with very little visual impairment.
  • Such second-choice (and higher) motion vectors need only be used occasionally (e.g., a few per frame) in a coherent pattern to form a watermark code.
  • Image variations are less visible near the periphery ofthe frame (i.e., near the top, bottom, right, edge, and left edge). It is therefore better to apply image or symbol type watermarks to image edge regions if the selected watermark is possibly slightly visible.
  • Watermark methods of very low visibility can be used everywhere on the image.
  • Watermarking also can be coded as a unique serial-number-style code for each watermarked copy.
  • 1,000 copies of an original work would each be watermarked in a slightly different fashion using one or more techniques described above.
  • Watermark techniques which are vulnerable to being confounded by adding noise include use of LSBs in DC or AC coefficients.
  • watermarking preferably is used in conjunction with encryption of a suitable strength for the application.
  • FIG. 13 is a flowchart showing one method of applying the encryption techniques ofthe invention.
  • Aunit to be encrypted is selected (STEP 1300). This may be any ofthe units described above (e.g. , a distributed unit, a multi-frame unit, a single frame umt, or a sub-frame unit), or other units with similar properties.
  • An encryption algorithm is selected (STEP 1302).
  • This may be a single algorithm applied throughout an encryption session, or may be a selection per unit, as noted above. Suitable algorithms are well known, and include, for example, both private and public key algorithms, such as DES, Triple DES, RSA, Blowfish, etc.
  • one or more keys are generated (STEP 1304). This involves selection of both key length and key value. Again, this maybe a single selection applied throughout an encryption session, or truly may be a selection per unit, as noted above.
  • the unit is encrypted using the selected algorithm and key(s) (STEP 1306). The process then repeats for a next unit. Of course, a number ofthe steps may be carried out in different orders, particularly steps 1300, 1302, and 1304. For decompression, the relevant key(s) would be applied to decrypt the data stream.
  • the data sfream would be decompressed and decoded, as described above, to generate a displayable image.
  • FIG. 14 is a flowchart showing one method of applying the watermarking techniques of the invention.
  • Aunit to be watermarked is selected (STEP 1400). Again, this maybe any ofthe units described above (e.g. , a distributed unit, a multi-frame unit, a single frame unit, or a sub- frame unit), or other units with similar properties.
  • One or more watermarking techniques are then selected, such as a noise-tolerant method and a non-noise tolerant method (STEP 1402). This may be a single selection applied throughout a watermarking session, or truly may be a selection per unit (or class of units, where two or more watermarking techniques are applied to different types of units).
  • the selected unit is watermarked using the selected technique
  • Encryption decryption keys may be tied to various items of information, in order to construct more secure or synchronized keys.
  • public or private encryption and decryption keys may be generated to include or be derived from any ofthe following components:
  • a serial number of a destination device e.g., a theater projector having a secure serial number.
  • a date or time range (using a secure clock), such that the key only works during specific time periods (e.g., only on certain days ofthe week, or only for a relative period, such as one week).
  • an encryption system may plan for the use of a secure GPS (global positioning satellite) in the decoder as a source for time. The decrypting processor would only need access to that secure time source to decrypt the image file or stream.
  • Location ofthe decryption processor A GPS capability would allow fairly exact real-time location information to be incorporated into a key.
  • An internet protocol (LP) static address of a known destination could also be used.
  • a "PLN” personal identification number of a specific authorizing person (e.g. , a theater manager).
  • Physical customized-encrypted movies can be used such that the possession ofthe encrypted movie itself by a key holder at the intended site is a form of key authorization for a subsequent movie.
  • playback of a portion ofthe movie and fransmission of that portion to a remote key generation site can be part ofthe key authorization protocol.
  • the use ofthe encrypted movie data as a key element can be tied to a secure media erasure key when a distribution copy is stored on an erasable media, such as hard disk or DVD-
  • Keys can also be active for a specific number of showings or other natural units of use, requiring new keys subsequently.
  • Keys can be stored on a media (e.g. , floppy disk, CDROM) and physically shipped to a destination via overnight shipping, or transmitted electronically or in text format (e.g., by facsimile, email, direct-connect data fransmission, Internet transmission, etc.).
  • a media e.g. , floppy disk, CDROM
  • text format e.g., by facsimile, email, direct-connect data fransmission, Internet transmission, etc.
  • • Public key methods can also be used with local unique keys, as well as authenticated third-party key verification.
  • Keys may be themselves encrypted and electronically transmitted (e.g. , via direct-connect data transmission, Internet fransmission, email, etc.), with predefined rules at each destination (e.g., theater) for how to decrypt and apply the keys.
  • Possession of a current key may be required as a condition of obtaining or utilizing new keys. The current key value may be transmitted to a key management site by any suitable means, as noted above; the new key can be returned by one ofthe means noted above.
  • Use of a decryption key may require a "key handshake" with a key management site that validates or authorizes application ofthe key for every instance of decryption.
  • a decryption key may need to be combined with additional symbols maintained by the key management site, where the specific symbols vary from use to use.
  • Use of key handshakes can be used for every showing, or for every length of time of use, or for other natural value units.
  • key management can also be integrally tied to accounting systems which log uses or use durations, and apply appropriate charges to the key holder (e.g., rental charges per showing for a theater).
  • both key management and use logging can be tied to a key authorization server system which can simultaneously handle the accounting for each authorized showing or use duration.
  • Some keys may be pre-authorized keys versus keys which are authorized onsite. Pre-authorized keys generally would be issued one at a time by a key management site.
  • a key management site may issue a set of keys to a theater, thus allowing a local manager to authorize additional decryptions (and hence showings) of a movie which is more popular than originally projected,, to accommodate audience demand).
  • the system is preferably designed to signal (e.g., by email or data record sent over the Internet or by modem) the key management site about the additional showings, for accounting purposes.
  • the invention may be implemented in hardware (e.g. , an integrated circuit) or software, or a combination of both.
  • the invention is implemented in computer programs executing on one or more programmable computers each comprising at least a processor, a data storage system (including volatile and non- volatile memory and/or storage elements), an input device, and an output device.
  • Program code is applied to input data to perform the functions described herein and generate output information.
  • the output information is applied to one or more output devices, in known fashion.
  • Each such program may be implemented in any desired computer language (including machine, assembly, or high level procedural, logical, or object oriented programming languages) to communicate with a computer system.
  • the language may be a compiled or interpreted language.
  • Each such computer program is preferably stored on a storage media or device (e.g.,
  • ROM read-only memory
  • CD-ROM compact disc-read only memory
  • magnetic or optical media readable by a general or special purpose programmable computer system, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein.
  • inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein.
  • Anumber of embodiments ofthe present invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, while the preferred embodiment uses MPEG-2 coding and decoding, the invention will work with any comparable standard that provides equivalents of I, B, and P frames and layers. Accordingly, it is to be understood that the invention is not to be limited by the specific illustrated embodiment, but only by the scope of the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Television Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

L'invention concerne un procédé et un appareil de compression d'image au moyen d'une structuration en couches temporelle et par résolution de trames d'images comprimées, offrant des capacités de codage et de filigrane numérique (1404). Plus particulièrement, la compression en couches permet une forme de décomposition modularisée d'une image qui supporte les techniques de codage et de filigrane numérique (1404) flexibles. La compression en couches permet l'utilisation de la couche de base et de différents composants internes de la couche de base afin de coder un train de données de film en couches comprimées. L'utilisation d'un tel sous-ensemble en couches de bits permet de rendre tout le train d'images méconnaissable grâce au seul codage d'une fraction des bits du train entier. Plusieurs algorithmes et forces de codage peuvent être appliqués à différentes parties du train en couches, y compris des couches de renforcement. Des algorithmes ou des clés de codage peuvent être changés à chaque contour de tranche afin de fournir un meilleur entrelacement du codage et du train d'images. Le filigrane numérique (1404) repère les copies perdues ou volées pour les ramener vers la source de manière que la nature du vol puisse être déterminée et que les personnes impliquées dans le vol soient identifiées. Le filigrane numérique (1404) utilise de préférence des bits de poids faible dans certains coefficients et certains cadres d'un train de films comprimés en couches afin de fournir une identification fiable tout en étant invisible ou presque à l'oeil. Une couche de renforcement peut également avoir sa propre structure de filigrane numérique (1404) d'identification unique.
EP02747897A 2002-06-13 2002-06-13 Structuration en couches temporelle, par resolution codee et en filigrane numerique des televisions de pointe Withdrawn EP1512286A4 (fr)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2002/018884 WO2004012455A1 (fr) 2002-06-13 2002-06-13 Structuration en couches temporelle, par resolution codee et en filigrane numerique des televisions de pointe

Publications (2)

Publication Number Publication Date
EP1512286A1 true EP1512286A1 (fr) 2005-03-09
EP1512286A4 EP1512286A4 (fr) 2009-05-13

Family

ID=31185972

Family Applications (1)

Application Number Title Priority Date Filing Date
EP02747897A Withdrawn EP1512286A4 (fr) 2002-06-13 2002-06-13 Structuration en couches temporelle, par resolution codee et en filigrane numerique des televisions de pointe

Country Status (5)

Country Link
EP (1) EP1512286A4 (fr)
JP (1) JP2005530462A (fr)
AU (1) AU2002318344B2 (fr)
CA (1) CA2486448C (fr)
WO (1) WO2004012455A1 (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2843517B1 (fr) * 2002-08-06 2005-02-11 Medialive Dispositif pour le brouillage de contenus multimedias et audiovisuels de type mpeg-4
EP1855436A1 (fr) * 2006-05-12 2007-11-14 Deutsche Thomson-Brandt Gmbh Procédé et dispositif de chiffrage d'un signal audio codé
JP4902274B2 (ja) * 2006-06-23 2012-03-21 日本放送協会 暗号化コンテンツ作成装置およびそのプログラム、ならびに、コンテンツ復号化装置およびそのプログラム
JP4932452B2 (ja) * 2006-11-24 2012-05-16 三菱電機株式会社 データ変換装置及びデータ変換方法及びプログラム
US9438849B2 (en) 2012-10-17 2016-09-06 Dolby Laboratories Licensing Corporation Systems and methods for transmitting video frames
JP6605789B2 (ja) * 2013-06-18 2019-11-13 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 送信方法、受信方法、送信装置、および、受信装置
EP2960854A1 (fr) 2014-06-27 2015-12-30 Thomson Licensing Procédé et dispositif pour déterminer un ensemble d'éléments modifiables dans un groupe d'images
US9906821B1 (en) 2016-08-23 2018-02-27 Cisco Technology, Inc. Packet reordering system
CN113630606B (zh) * 2020-05-07 2024-04-19 百度在线网络技术(北京)有限公司 视频水印处理方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996020563A1 (fr) * 1994-12-27 1996-07-04 Kabushiki Kaisha Toshiba Emetteur, recepteur, systeme de traitement de communications qui les integre, et systeme de telediffusion numerique
US5988863A (en) * 1996-01-30 1999-11-23 Demografx Temporal and resolution layering in advanced television
WO2000031964A1 (fr) * 1998-11-20 2000-06-02 Telefonaktiebolaget Lm Ericsson (Publ) Procede et dispositif de cryptage d'images
WO2000041357A1 (fr) * 1999-01-08 2000-07-13 Nortel Networks Limited Echange de donnees secretes sur reseau non fiable
EP1189432A2 (fr) * 2000-08-14 2002-03-20 Matsushita Electric Industrial Co., Ltd. Un schéma hiérarchique d'encryption pour la distribution sûre de contenu prédéterminé

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6069914A (en) * 1996-09-19 2000-05-30 Nec Research Institute, Inc. Watermarking of image data using MPEG/JPEG coefficients
US6332194B1 (en) * 1998-06-05 2001-12-18 Signafy, Inc. Method for data preparation and watermark insertion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996020563A1 (fr) * 1994-12-27 1996-07-04 Kabushiki Kaisha Toshiba Emetteur, recepteur, systeme de traitement de communications qui les integre, et systeme de telediffusion numerique
US5988863A (en) * 1996-01-30 1999-11-23 Demografx Temporal and resolution layering in advanced television
WO2000031964A1 (fr) * 1998-11-20 2000-06-02 Telefonaktiebolaget Lm Ericsson (Publ) Procede et dispositif de cryptage d'images
WO2000041357A1 (fr) * 1999-01-08 2000-07-13 Nortel Networks Limited Echange de donnees secretes sur reseau non fiable
EP1189432A2 (fr) * 2000-08-14 2002-03-20 Matsushita Electric Industrial Co., Ltd. Un schéma hiérarchique d'encryption pour la distribution sûre de contenu prédéterminé

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KUNKELMANN T ET AL: "VIDEO ENCRYPTION BASED ON DATA PARTITIONING AND SCALABLE CODING - A COMPARISON" INTERACTIVE DISTRIBUTED MULTIMEDIA SYSTEMS AND TELECOMMUNICATIONSERVICES, XX, XX, 8 September 1998 (1998-09-08), pages 95-106, XP000997705 *
See also references of WO2004012455A1 *
TOSUN A S ET AL: "Efficient multi-layer coding and encryption of MPEG video streams" MULTIMEDIA AND EXPO, 2000. ICME 2000. 2000 IEEE INTERNATIONAL CONFEREN CE ON NEW YORK, NY, USA 30 JULY-2 AUG. 2000, PISCATAWAY, NJ, USA,IEEE, US, vol. 1, 30 July 2000 (2000-07-30), pages 119-122, XP010511416 ISBN: 978-0-7803-6536-0 *

Also Published As

Publication number Publication date
AU2002318344A1 (en) 2004-02-16
EP1512286A4 (fr) 2009-05-13
JP2005530462A (ja) 2005-10-06
CA2486448C (fr) 2012-01-24
WO2004012455A1 (fr) 2004-02-05
AU2002318344B2 (en) 2008-01-31
CA2486448A1 (fr) 2004-02-05

Similar Documents

Publication Publication Date Title
US7428639B2 (en) Encrypted and watermarked temporal and resolution layering in advanced television
CA2245172C (fr) Stratification spatio-temporelle en television acats
KR100205701B1 (ko) 송신 장치, 수신 장치 및 이들을 통합한 통신처리 시스템과, 디지탈 텔레비젼 방송 시스템
US6829301B1 (en) Enhanced MPEG information distribution apparatus and method
US7925097B2 (en) Image display method, image coding apparatus, and image decoding apparatus
US6671376B1 (en) Video scramble/descramble apparatus
US20050185795A1 (en) Apparatus and/or method for adaptively encoding and/or decoding scalable-encoded bitstream, and recording medium including computer readable code implementing the same
US20020186769A1 (en) System and method for transcoding
JP2003531514A (ja) アドバンスドテレビジョンの強化された時相及び解像度の階層化
CA2486448C (fr) Structuration en couches temporelle, par resolution codee et en filigrane numerique des televisions de pointe
JP2001258004A (ja) 画像符号化装置及び画像復号装置とその方法
JP2005516560A (ja) 高品質の音響映像作品を処理するための安全化装置
US20050243924A1 (en) Device for scrambling mpeg-4-type audio-visual and multimedia content
AU2008200152B2 (en) Encrypted and watermarked temporel and resolution layering in advanced television
JP2008048447A (ja) 次世代テレビジョンにおける暗号化および透かし処理を施される時間的および解像度レイヤ構造
Spinsante et al. Masking video information by partial encryption of H. 264/AVC coding parameters
JP2008035551A (ja) 次世代テレビジョンにおける暗号化および透かし処理を施される時間的および解像度レイヤ構造
Pazarci et al. Video scrambler for MPEG-based applications

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20041206

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1071828

Country of ref document: HK

DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20090416

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20090630

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1071828

Country of ref document: HK