WO2000074385A2 - 3d wavelet based video codec with human perceptual model - Google Patents

3d wavelet based video codec with human perceptual model Download PDF

Info

Publication number
WO2000074385A2
WO2000074385A2 PCT/US2000/014552 US0014552W WO0074385A2 WO 2000074385 A2 WO2000074385 A2 WO 2000074385A2 US 0014552 W US0014552 W US 0014552W WO 0074385 A2 WO0074385 A2 WO 0074385A2
Authority
WO
WIPO (PCT)
Prior art keywords
slices
bitstreams
low frequency
subbands
generated
Prior art date
Application number
PCT/US2000/014552
Other languages
French (fr)
Other versions
WO2000074385A3 (en
Inventor
Junfeng Gu
Yimin Jiang
John S. Baras
Original Assignee
University Of Maryland, College Park
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Maryland, College Park filed Critical University Of Maryland, College Park
Priority to AU52942/00A priority Critical patent/AU5294200A/en
Priority to US09/979,930 priority patent/US7006568B1/en
Publication of WO2000074385A2 publication Critical patent/WO2000074385A2/en
Publication of WO2000074385A3 publication Critical patent/WO2000074385A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • H04N19/126Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/1883Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit relating to sub-band structure, e.g. hierarchical level, directional tree, e.g. low-high [LH], high-low [HL], high-high [HH]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/62Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding by frequency transforming in three dimensions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/89Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder
    • H04N19/895Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder in combination with error concealment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2347Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving video stream encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4405Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving video stream decryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/16Analogue secrecy systems; Analogue subscription systems
    • H04N7/167Systems rendering the television signal unintelligible and subsequently intelligible
    • H04N7/1675Providing digital key or authorisation information for generation or regeneration of the scrambling sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/65Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/89Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder

Definitions

  • the present invention relates generally to the compression and encryption of data, and more particularly to compressing data with a human perceptual model.
  • JND just-noticeable- distortion
  • VDP visible difference predictor
  • three-component image model provides each pixel with a threshold of error visibility, below which reconstruction errors are rendered imperceptible.
  • JND model provides each pixel with a threshold of error visibility, below which reconstruction errors are rendered imperceptible.
  • a second problem is how to encode these signals with the lowest possible bit rate without exceeding the error visibility threshold.
  • conditional access is used to control which customer can get particular program services. Specific programs are only accessible to customers who have satisfied prepayment requirements.
  • Conditional access is often implemented by way of a key -based system involving a combination of scrambling and encryption to prevent unauthorized reception. Encryption is the process of masking the secret keys that enable the descrambler to unscramble transmitted signals into viewable scenes. The integrity of the various encryption routines used presently is constantly being tested. As a result, vulnerabilities are identified and the push to develop improved systems continues. In view of the considerations set forth above, what is presently needed is a secure system that effectively meets the need to provide high quality images while making efficient use of limited bandwidths.
  • the invention described herein is a system and method for compressing and encrypting data.
  • video images are transformed into compressed and encrypted bitstreams.
  • the bitstreams are then transmitted over a satellite broadcast channel to a receiving station where they are decoded and presented.
  • the system has a human visual model based encoder for receiving frames of video data.
  • the human visual model based encoder is also responsible for transforming the video data into compressed and encrypted bitstreams.
  • the internal components of the human visual model based encoder include a frame counter for recording the number of video frames received.
  • a motion detector is included for analyzing the video frames for an indication that motion is occurring.
  • a 3-D wavelet analyzer module is used to generate low and high frequency subbands.
  • a just-noticeable distortion model generator is used to calculate a profile of the video image based upon local signal properties.
  • a perceptually tuned quantizer provides a quantization for the wavelet coefficients present in each of the high and low frequency subbands.
  • a differential pulse code modulator is used to provide prediction error calculations for the low frequency subbands.
  • a slicer segments the subbands into smaller bitstreams so the influence of error from one bitstream to another is reduced.
  • An entropy encoder is used to encode the high frequency subbands.
  • a conditional access system is included for encoding the low frequency subbands in cases where authorization is required to decode the video images. Although the conditional access system is described as being located within the human visual based encoder, it is able to operate independent of the encoder.
  • the conditional access system has a random slicer which is used for segmenting the low frequency subbands and generating key values representing the starting positions for each of the segments.
  • An encryption module is used to encode the key values generated by the random slicer.
  • a subscriber authorization module provides authorization codes used to verify that the transmitted image signal is being decoded by an authorized party.
  • An entropy encoder is used to perform additional encoding of the low frequency subbands in preparation for transmission over a broadcast channel.
  • the video transmission system also includes an unequal error protection channel encoder to further encode the bitstreams prior to broadcasting them over a satellite channel.
  • an unequal error protection channel decoder is provided for decoding the bitstreams after they have been transmitted over the satellite channel.
  • a human visual model based decoder is then used to further decode the bitstreams and regenerate the frames of video data.
  • the internal components of human visual model based decoder include a decoder for performing arithmetic decoding of encoded video bitstreams.
  • the decoder also transforms the video bitstreams into slices of wavelet coefficients.
  • An error detector is responsible for recognizing corrupted video slices.
  • a concealment module is provided to discard any corrupted video slices identified by the error detector.
  • An inverse quantizer is used to reverse the quantization results of the video slices.
  • a 3-D wavelet synthesizer is provided to enable transformation of the slices into video frames.
  • the method for compressing video frames includes arranging two successively received video frames into a set, decomposing said set into a plurality of high frequency and low frequency subbands; generating a human perceptual model for each of said plurality of subbands, encoding said low frequency subbands to produce encoded low frequency subbands, quantizing said high frequency subbands and said encoded low frequency subbands according to said generated human perceptual models to generate a bitstream for each of said high frequency subbands and a bitstream for each of said encoded low frequency subbands, redefining said generated bitstreams for each of said high frequency subbands to produce a plurality of redefined high frequency bitstreams, redefining said generated bitstreams for each of said encoded low frequency subbands to produce a plurality of redefined low frequency bitstreams, channel coding said plurality of redefined high frequency bitstreams and said plurality of redefined low frequency bitstreams, and repeating steps the above steps for a next set of video frames until each of said received video
  • FIG. 1 is a block diagram illustration of a human perception model based video transmission system according to an embodiment of the present invention.
  • FIG. 2 is a block diagram illustration of a human visual model based encoder according to an embodiment of the present invention.
  • FIG. 3 is a block diagram illustration of a conditional access system according to an embodiment of the present invention.
  • FIG. 4 is a block diagram illustration of a human visual model based decoder according to an embodiment of the present invention.
  • FIG. 5 illustrates a method of encoding and decoding data transmitted over a broadcast network according to an embodiment of the present invention.
  • FIG. 6 is a block diagram illustration of 3-D wavelet decomposition.
  • FIG. 7 illustrates a method of Just Noticeable Distortion model generation according to an embodiment of the present invention.
  • FIG. 8 illustrates a method of detecting a scene cut according to an embodiment of the present invention.
  • FIG. 9 illustrates a method of detecting drastic motion according to an embodiment of the present invention.
  • FIG.10 illustrates a quantization method according to an embodiment of the present invention.
  • FIG.1 1 illustrates a method of encoding low frequency subbands according to an embodiment of the present invention.
  • FIG. 12 illustrates a method of encoding high frequency subbands according to an embodiment of the present invention.
  • FIG. 13 illustrates a method of channel coding according to an embodiment of the present invention.
  • FIG. 14 illustrates a method of decoding a video transmission according to an embodiment of the present invention.
  • FIG. 15 is an example of subbands arranged into categories according to an embodiment of the present invention.
  • System 100 shows a human visual model based encoder 1 10 for receiving frames of video data.
  • Human visual model based encoder 1 10 is further responsible for transforming the video data into compressed and encrypted bitstreams.
  • An unequal error protection channel encoder 1 15 is used to further encode the bitstreams prior to broadcasting them over a broadcast channel 118.
  • An unequal error protection channel decoder 120 is provided for decoding the bitstreams after they have been transmitted over the satellite channel.
  • a human visual model based decoder 125 is provided to further decode the bitstreams and regenerate the frames of video data.
  • FIG. 2 shows the internal components of human visual model based encoder 110.
  • a frame counter 200 is provided for recording the number of video frames received.
  • a motion detector 205 analyzes the video frames for an indication that motion is occurring.
  • a 3-D wavelet analyzer module 210 is used to generate low and high frequency subbands from the received video frames.
  • a just-noticeable distortion model generator 215 is used to calculate a profile of the video image based upon local signal properties.
  • a perceptually tuned quantizer 220 provides a quantization for the wavelet coefficients present in each of the high and low frequency subbands.
  • a differential pulse code modulator 225 is use to provide prediction error calculations for the low frequency subbands.
  • a slicer is used to generate low and high frequency subbands from the received video frames.
  • a just-noticeable distortion model generator 215 is used to calculate a profile of the video image based upon local signal properties.
  • a perceptually tuned quantizer 220 provides a quantization for the wavelet coefficients present in each of the high and low frequency subbands.
  • a differential pulse code modulator 225 is use to provide prediction error calculations for the low frequency subbands.
  • An entropy encoder 235 is used to encode the high frequency subbands.
  • a conditional access system 240 is included for further encoding of the low frequency subbands in cases where authorization is required to decode the video images.
  • Conditional access system 240 has a random slicer 300 which is used for segmenting the low frequency subbands and generating key values representing the starting positions for each of the segments.
  • Entropy encoder 305 is used to perform encoding of the low frequency subbands in preparation for transmission over a broadcast channel.
  • Encryption module 310 is used to encode the key values generated by random slicer 300.
  • Subscriber Authorization module 315 provides authorization codes used to verify that the transmitted image signal is being decoded by an authorized party.
  • conditional access system 240 is able to operate independent of System 100.
  • Human visual model based decoder 125 includes a decoder 400 for performing arithmetic decoding of encoded video bitstreams. Decoder 400 also transforms the video bitstreams into slices of wavelet coefficients. Error detector 405 is responsible for recognizing corrupted video slices. Concealment module 410 is used to discard any corrupted video slices identified by Error detector 405. Inverse quantizer 415 is used to reverse the quantization results of the video slices. Finally, a 3-D wavelet synthesizer 420 is provided to enable transformation of the slices into video frames. The operation of System 100 is described with respect to figures 5-15.
  • JND just- noticeable distortion
  • a biologically correct and complete model of the human perceptual system would incorporate descriptions of several physical phenomena including peripheral as well as higher level effects, feedback from higher to lower levels in perception, interactions between audio and visual channels, as well as elaborate descriptions of time-frequency processing and nonlinear behavior.
  • Some of the above effects are reflected in existing coder algorithms, either by design or by accident.
  • certain forms of adaptive quantization and prediction systems provide efficient performance although they present inadequate response times.
  • the slow response time can often be attributed to temporal noise masking.
  • the basic time- frequency analyzers in the human perceptual chain are described as bandpass filters. Bandpass filters in perception are sometimes reflected in coder design and telecommunication practice in the form of "rules of thumb.”
  • a particularly interesting aspect of the signal processing model of the human system is non-uniform frequency processing.
  • the critical bands in vision are nonuniform. It is necessary to use masking models with a non-uniform frequency support to incorporate this in coder design.
  • masking refers to the ability of one signal to hinder the perception of another within a certain time or frequency range. It is also necessary to recognize that high-frequency signals in visual information tend to have a short time or space support, while low- frequency signals tend to last longer.
  • An efficient perceptual coder therefore needs to not only exploit properties of distortion masking in time and frequency, but also have a time-frequency analysis module that is sufficiently flexible to incorporate the complex phenomena of distortion masking by non-stationary input signals. All of this is in contrast to the classical redundancy-removing coder, driven purely by considerations of minimum mean square error (MMSE), MMSE bit allocation, or MMSE noise shaping matched to the input spectrum.
  • MMSE minimum mean square error
  • Distortion sensitivity profiles of human perception are driven as functions of frequency, brightness, texture, and temporal parameters. These four kinds of sensitivity are considered for gray scale video/images: (1) Brightness sensitivity:
  • visibility thresholds are defined as functions of the amplitude of luminance edge in which perturbation is increased until it becomes just discernible.
  • the visibility threshold in this approach is associated with the masking function defined at each pixel as the maximum prediction error from the four neighboring pixels.
  • Temporal sensitivity The masking of temporally changing stimuli is extremely important in interframe coding. However, temporal masking is complicated by many factors, and its application to video coding is still in its infancy. Many researches have attempted to evaluate the losses of spatial resolution and magnitude resolution as an object moves in a scene. If movement is drastic, such as scene change, the perceived spatial and intensity resolution is significantly reduced immediately after the scene change. It was found that the eye is noticeably more sensitive to flicker at high luminance than at low luminance.
  • the human perception of distortion depends on its frequency distribution.
  • the response of the human visual system (HVS) to sine wave gratings of different frequencies has been experimentally measured as the so-called contrast sensitivity function (CSF).
  • CSF contrast sensitivity function
  • Many spatial-domain CSF models indicating general bandpass characteristics have been proposed.
  • the spatial-domain CSF has been widely used to improve the quality of the coded still images.
  • spatio-temporal CSF provides relative sensitivities of the HVS to different spatio-temporal frequencies, or relative tolerance of noises at different spatio-temporal frequencies. It can be used to allocate coding bits, or distortion, by adjusting the quantizer step size of the target signal as inversely proportional to the sensitivity of the corresponding frequency.
  • the inputs of the quantizing system are typically sequences of prediction errors or transform coefficients.
  • the idea is to quantize the prediction error, or the transform coefficients just finely enough to render the resulting distortion imperceptible, although not mathematically zero. If the available bit rate is not sufficient to realize this kind of perceptual transparency, the intent is to minimize the perceptibility of the distortion by shaping it advantageously in space or frequency, so that as many of its components as possible are masked by the input signal itself.
  • perceptual coding is used to signify the matching of the quantizer to the human visual system, with the goal of either minimizing perceived distortion, or driving it to ze-o where possible. These goals do not correspond to the maximization of signal-to-noise ratios or the minimization of mean square error.
  • JND Just-noticeable-distortion
  • JND provides each signal to be coded with a visibility threshold of distortion, below which reconstruction errors are rendered imperceptible.
  • the JND profile of a still image is a function of local signal properties such as the background luminance and the activity of luminance changes in the spatial domain.
  • the derivation of JND profiles must take both spatial and temporal masking effects into consideration.
  • the subject should not be able to discern the difference between a video sequence and its JND-contaminated version.
  • the method begins at step 500.
  • frame counter 200 is initialized by setting its value to zero.
  • the video frames to be encoded are received by the human visual model based encoder 1 10.
  • the frames count is increased in response to each received frame.
  • two frames received in successive order are associated as a set in preparation for
  • 3-D wavelet analysis In a step 510 the 3-D wavelet analyzer module 210 performs decomposition of the video frame set into a multi resolution subband representation. In step 512 control is passed to the JND model generator 215. 3-D Wavelet Analysis
  • Wavelet multi resolution analysis techniques have been applied primarily to ID and 2D signals. These techniques project the signal onto a chain of embedded approximation and detail spaces designed to represent the signal and its details at various levels of resolution. For practical purposes, the projection coefficients are obtained using a discrete subband transform that employs a quadrature mirror filter pair related to the type of wavelet used in the analysis. In conventional 2D wavelet multi-resolution analysis, the separable 2D approximation spaces are formed from the tensor product of identical ID approximation spaces. This restriction generates analyzing filters with homogeneous spectral characteristics in 2D frequency space.
  • the multi-resolution analysis is constructed from a separable 3D analyzing or "scaling" function formed from the product of three nonidentical 1 D scaling functions, or two identical ID spatial scaling functions and one different ID temporal scaling function.
  • This brings a much richer set of orthonormal basis vectors with which to represent 3D signals, and it produces filters that can be easily tailored to more closely match the spatial and temporal frequency characteristics of the 3D signal.
  • An L 2 (St) multi-resolution analysis consists of a chain of closed, linear "approximation" spaces V and a scaling function ⁇ which satisfy the following properties for all e L 2 (9t).
  • V J+] V j ⁇ W j
  • W l is typically referred to as the yth detail space, because it captures the difference in signal information between the approximation spaces V t , and V r
  • h n and g n are the coefficients of the QMF pair which is used to compute the approximation and detail projection associated with V, and W from the approximation at the next higher scale V ,.
  • Approximation and detail signals are created by orthogonally projecting the input signal /onto the appropriate approximation or detail space. Since each space is spanned by an orthonormal basis set, the signal projection onto a given approximation or detail space at the/th resolution, is equivalent to the sequence of projection coefficients obtained by the inner product operations
  • FIG. 6 shows the block diagram of heterogeneous 3D wavelet decomposition.
  • each video frame set is divided into 1 1 subbands.
  • the inventors use the simple two-tap Haar filter to separate the signal into temporal low frequency (LFP) and high frequency parts. (HFP) The low frequency part is then decomposed for two levels with the spatial 2-D wavelet decomposition and the high frequency part is decomposed fc: > 'lie level.
  • the Antonini filter is used here.
  • the Haar and Antonini filters work together to achieve a satisfying balance between complexity and performance.
  • the coefficients for the Antonini wavelet filter are: For the LFP, the h tripod are given by ⁇ 0.0378284555, 0.0238494650, 0.1106244044, 0.3774028556, 0.852698679, 0.3774028556, 0.1106244044, 0.0238494650, 0.0378284555 ⁇ .
  • the g tun are given by ⁇ 0.0645388826, 0.0406894176, 0.4180922732, 0.788485616, 0.4180922732, 0.0406894176, 0.0645388826 ⁇ .
  • the JND model provides each signal a threshold of visible distortion, below which reconstruction errors are rendered imperceptible.
  • the spatial -temporal JND profile for each set of two frames in the video sequence and the JND profiles for subbands are generated sequentially.
  • the encoder Since the generation of the spatial-temporal JND profile for each frame requires one previous frame as reference, when a video sequence is being coded, the encoder has to assign the first frame a reference frame. In an embodiment of the present invention, the first frame in the video sequence will use itself as a reference to generate each renewed JND profile.
  • the generation of the JND model consists of several steps described in
  • FIG. 7 First, in a step 700 the perceptual redundancy inherent in the spatial domain is quantitatively measured as a 2D profile by a perceptual model that incorporates the visibility thresholds due to average background luminance and texture masking.
  • JND S (x, y) max ⁇ f,(bg(x, y), mg(x, y)),f 2 (bg(x, y)) ⁇ , for 0 ⁇ x ⁇ W,0 ⁇ y ⁇ H (1)
  • H and W denote respectively the height and width of the image
  • mg(x, y) denotes the maximal weighted average of luminance gradients around the pixel at (x, y)
  • bg (x, y) is the average background luminance.
  • T u , ⁇ and ⁇ are found to be 17, 3/128 and Vi through experiments.
  • the value of mg(x, y) across the pixel at (x, y) is determined by calculating the weighted average of luminance changes around the pixel in four directions.
  • Four operators, G k (I,j),fo ⁇ k ⁇ ,.. ⁇ , and / - 1 , ....5 , are employed to perform the calculation, where the weighting coefficient decreases as the distance away from the central pixel increases.
  • G -1-3 0 3 1
  • G A 0 8 0 -8 0 0 -8-3 0 0 0 3 0-3 0 0 0 0 -1 0 0 0 1 0 -1 0
  • the JND profile representing the error visibility threshold in the spatio-temporal domain is determined according to the following expression:
  • JND s _ T (x,y,n) f 3 ⁇ ild(x,y,n)j - JND s (x,y,n) (6)
  • ild(x, y, ) denotes the average interframe luminance difference between the n th and (n - 1) th frame.
  • / represents the error visibility threshold due to motion.
  • the scale factor is switched to 0.8 as ⁇ ild(x, y, ⁇ 5.
  • the inventors note that the error visibility threshold is increased with the increasing interframe luminance difference. This confirms the research findings that after a rapid scene change or large temporal difference, the sensitivity of the HVS to spatial details is decreased. Moreover, the inventors also found that temporal masking due to high-to-low luminance changes is more prominent than that due to low-to-high luminance changes.
  • the JND profiles for each subband are set up with certain distortion allocation for different subbands in step 710.
  • the JND for a subband q is a function of spatio-temporal JND values at corresponding locations multiplied by a weight that indicates the perceptual importance of this subband.
  • the relationship between the full-band JND profile and the component JND profiles can be obtained by the following equations:
  • the weighting function for distributing the full-band JND energy to a subband can be derived as the relative sensitivity of the HVS to the frequency subband. For example, with 11 spatio-temporal subbands, the weighting function of the -/th subband. is obtained as
  • S ⁇ represents the average sensitivity of the HVS to the qt subband.
  • S is obtained from the spatio-temporal CSF.
  • MND minimally-noticeable- distortion
  • JND The energy of JND can be understood as the minimum energy of quantization error that will cause conceivable loss to the human visual system. So, if the error in a small area is so small that the error energy is less than the JND energy, the compression will be perceptually lossless.
  • the energy of the MND of a small area indexed by (7, j) is defined as: JND 2 (x,y) - ⁇ (i, j) (11)
  • ⁇ IJ is the distortion index for this block.
  • the inventors define a global human perceptual distortion measure( ⁇ G ) based on evaluating ⁇ I, j) as follows:
  • ⁇ (k, I) is the distortion measure of a medium block indexed by (k, I).
  • the whole image is decomposed into K by L non-overlapped medium blocks (R k/ ).
  • Each medium block is divided into by N small blocks (r ⁇ (k, /)), i.e.,
  • ⁇ (k, 1) is defined as:
  • ⁇ f is the larger the subjective perceptual distortion is.
  • ⁇ f has the same convenience to describe the picture quality with one quantitative value.
  • a G takes the human visual model into account and therefore reflects subjective visual quality better than PS ⁇ R or MSE.
  • MSE or PS ⁇ R is meaningless to video/image subjective evaluation and when two images are compressed, the comparison of their MSE or PS ⁇ R values cannot give a creditable psychovisual conclusion.
  • the distortion ⁇ (; can be explained as "this image is compressed at a scale of ⁇ G times the perceptually noticeable distortion.”
  • the inventors use ⁇ as the index of performance for the video compression system.
  • step 514 the frame counter 200 is used to count the number of frames being coded.
  • control is passed back to step 512 so that the JND model can be refreshed.
  • the inventors assume that the scenes depicted in the frames remain almost the same so the original JND model remains effective until the update. If, however, drastic movement or a scene cut happens, the scene changes greatly. If the scene changes, the original JND has to be refreshed right away to follow the change. This is why a motion detector is necessary.
  • the inventors adopt a simple motion detection scheme. It considers two factors relative to picture content's change: a scene cut and drastic movement.
  • a step 516 the detection of a scene cut is made in accordance with the process described in FIG. 8.
  • a step 800 the energy in the spatial-LLLL temporal-L subband is calculated for each set of two frames and stored as LE old .
  • step 805 LE new is calculated for the next set of frames.
  • step 810 LE old is then compared with the new generated energy LE new . If the difference exceeds a predefined threshold it is assumed that a scene cut has occurred and a new JND model will be generated. The appropriate threshold level is determined by the results obtained from sample calculations.
  • a step 518 the detection of drastic movement is determined according to the process described in FIG. 9.
  • the energy in the spatial-LL temporal-H subband is calculated.
  • HE new is calculated for the next set of frames.
  • HE old is then compared with the new generated energy HE new . If the difference exceeds a predetermined threshold it is assumed that drastic motion is occurring and a new JND model will be generated. The appropriate threshold level is determined by the results obtained from sample calculations.
  • step 520 the wavelet coefficients in the lowest frequency subbands (LFS) are encoded using differential pulse code modulation (DPCM).
  • DPCM is a well known method for providing pixel value prediction error calculations.
  • control is passed to the perceptually tuned quantizer 220.
  • inverse quantization is performed in a step
  • step 524 Upon completion of step 524 subband processing continues with steps 526 and 528. These steps can be performed in serial order or in parallel. Next, in step 530 perceptual channel coding is performed. Finally, in step 535 the encoded bitstreams are ready for transmittal.
  • the perceptually tuned quantizer 220 is the core of the source encoder.
  • the most important task is to allocate the available bits to certain coefficients obtained from the wavelet decomposition as efficiently as possible.
  • the beauty of JND is that it provides a quantitative measure of the error sensitivity threshold with spatial and frequency localization.
  • the DCT coefficients are quantized using a quantization table that assigns more bits to the more important lower frequency coefficients in one 8 by 8 block.
  • a quantization table is designed according to the HVS response based on psychovisual theory and experiments, its wide use all over the whole picture brings shortcomings because different parts in a picture can have different visual importance at different scenes.
  • using a quantization table based on JND that can adapt itself to local scenes' results in less perceptual distortion.
  • ⁇ G used here is the same symbol used to indicate the new JND based human perceptual distortion measure since both have the same psychological meaning.
  • the powerful functionality to control the compressed video quality ahead of time distinguishes this scheme from conventional video coding systems.
  • the coding distortion is perceptually evenly distributed across the whole image.
  • FIG.10 illustrates a quantization method according to an embodiment of the present invention.
  • a global distortion index ⁇ G is given for the quantization procedure.
  • the global distortion index ranges from approximately 1.0 to 10.0, where 1.0 stands for just noticeable distortion.
  • each subband is partitioned into non-overlapped blocks (r ⁇ (k, I)). These blocks are set up with the size of 8x8 or 16x16.
  • step 1010 for each block r t/ (k, I), the step size of the mid-rising uniform quantizer is maximized under the condition that the quantization error energy is less than or equal to the MND energy in the block that has the distortion index ⁇ I, j) equal to ⁇ (; , i.e..
  • w(x, y) is the wavelet coefficient
  • w(x, y) is the quantized wavelet coefficient
  • ⁇ ⁇ (k, I) is the quantization step size of (k, l) .
  • one quantization table that leads to the uniform error energy distribution over all subbands is set up for each subband.
  • the quantization index values from the quantization table are transmitted in the header of the bit stream for this subband.
  • a determination of the proportion of zero signals in a block is made to determine if after the quantization, the proportion of zero signals in the block is larger than a specified threshold, for example 7/8 or 15/16.
  • this block is assumed to be unimportant. Accordingly, its step size is recorded as 0. The coefficients in such an unimportant block need not be transmitted. As a result, in the decoder all values in this block are recovered as 0's.
  • ⁇ G can be kept constant to maintain the video quality at the same level. If the bandwidth is fixed, i.e., the bit rate is limited for this source encoder, several ⁇ t; values should be calculated and the corresponding bit rates compared with the available channel capacity. One ⁇ ( , value that provides proper bit rate will be chosen finally. The procedure for refreshing the quantization table and ⁇ (; value choice is repeated when the Frame Counter reaches a certain number or a drastic movement happens, resulting in the need to update the JND model.
  • an optimum mean square error quantizer is used. This quantizer minimizes the mean square error for a given number of quantization levels. For a random variable u, the reconstruction levels a k are calculated according to the expression [31 ]
  • the distribution of the subband samples needs to be known. It has been suggested that the probability density function of the subband values and their prediction errors is Laplacian. Although more accurate probability distribution of the subband values, i.e., generalized Gaussian distribution, has been suggested and its shape parameters estimation method is explored, the resulting overhead for the calculation and transmission of the parameters is too large. So the inventors use the Laplacian distribution as a reasonable approximation.
  • the first step selects a global object distortion index ⁇ c; .
  • the variance of the wavelet coefficients for each subband is calculated.
  • all the coefficients are unified.
  • each subband is partitioned into blocks (r ⁇ (k, I)) with size of 8 by 8 or 16 by 16.
  • the levels of Lloyd-Max quantizer are minimized under the condition that the quantization error energy is less than or equal to the MND energy in this block that has the distortion index ⁇ I, j) equal to ⁇ , i.e.,
  • MND energy of MND is defined as in equation (11)
  • M>(X, y) is the wavelet coefficient
  • 2N+1 is the maximal number of quantization levels
  • m indicates the interval in which w(x, y) is located
  • a (n) (x, y) is the quantized wavelet coefficient for a quantizer with
  • the index of levels for the optimum quantizer is transmitted in the header of the bit stream of this subband.
  • step 524 the subbands g' ⁇ (i.rough inverse quantization to correct errors occurring during the quantization process.
  • control is then passed to the slicing and arithmetic coding modules.
  • the arithmetic coding scheme developed by Witten, Neal and Cleary is widely used in video codecs due to its superior performance. It easily accommodates adaptive models and is computationally very efficient.
  • the arithmetic coding scheme provides efficient compression, however the decoding result of one coefficient depends on the decoding result of the previous one because of the adaptive coding procedure employed.
  • arithmetic coding encodes a message as a number in the unit interval. Unlike most schemes, including Huffman Coding, there is no specific code word defined for any specific symbol, but how each symbol is encoded depends, in general, on the previous symbols of the string. For the source sequence x", ⁇ etp(x") denote its probability mass function and F(x") its cumulative distribution function.
  • decoding is not unique unless the length of the message is known, since a single point may represent any interval of which it is a member. Without knowledge of the message length, decoding can proceed indefinitely. This may be overcome in practice either by sending a length indicator or by using an end-of-message symbol. Both of these add overhead to the encoding.
  • each symbol in the message is assigned a half-open subinterval of the unit interval of length equal to its probability. This is referred to as the coding interval for that symbol.
  • the coding interval for that symbol.
  • a nesting of subintervals is defined. Each successive subinterval is defined by reducing the previous subinterval in proportion to the length of the current symbol's coding interval. This process continues until all symbols have been encoded.
  • the source sequence x" is represented by its cumulative distribution function E(x").
  • the encoding output of the current symbol depends on the previous one due to the tree-structured encoding procedure. So basically, it is a polyalphabetic cryptosystem.
  • the decoder restores the source symbol one by one with successive dependency. This dependency is observed in the following examples. In this implementation, a binary bit stream is obtained as the encoding output.
  • the source sequence for AC encoding is 1 1 1 2 0 2 0 1 0 1 0 1 2 2 1 0 0 2 1 0
  • the encoded binary bit stream is a
  • the frequency model doesn't match the distribution of source sequence. But it only decreases the compression ratio and doesn't change the characteristics of AC . If the third symbol 1 is replaced by 2 and 0 respectively, the encoder will give out the corresponding binary bit stream
  • the loss of source symbol also diversifies the encoding of the following symbols. If the sequence of source symbols is
  • the encoding output is
  • the inventors also recognized the importance of knowing the precise location of the start bit of an AC encoded bit stream and thus, developed a conditional access sub-system based on this property.
  • step 526 is directed to the processing of the low frequency subbands in accordance with a conditional access scheme.
  • the conditional access process beginning at step 526 will now be discussed with respect to FIG. 11.
  • step 1100 the data frame being processed for transmission is broken into slices ⁇ s,, s 2 , ... s M ⁇ .
  • the locations of breaking points are decided by a randomly generated vector v.
  • This vector v is updated after a short time interval.
  • each slice is entropy encoded respectively into bit stream ⁇ b,, b 2 , ... b M ⁇ .
  • arithmetic coding is relied upon.
  • step 1110 these bit streams are concatenated into one bit stream b total .
  • step 1 115 the function /(b) is assigned to represent the
  • bit stream b length of bit stream b and the J is used to determine the start bit
  • step 1 120 the values of /(b,) (i.e. the scrambler key K) are encrypted into ECM using any available encryption algorithm and inserted into the header of this data frame and transmitted.
  • the NoS will also be encrypted and transmitted in the frame header.
  • the wavelet coefficients in the video coding system the data range differs frame by frame, so the illegal user cannot know it a priori.
  • the wavelet coefficients in the lowest frequency subbands contain the most important information.
  • only the LFS subbands are processed through the conditional access system.
  • the high frequency subbands are simply sliced at fixed locations and arithmetically encoded.
  • the high frequency subband processing step 528 will now be discussed with respect to FIG. 12.
  • the idea here is to generate small bitstreams that carry the same amount of "distortion sensitivity".
  • the high frequency subband S is segmented into 7 short bit streams.
  • a new adaptive statistical model is set up for it.
  • the slices are entropy coded using arithmetic coding, for example.
  • header and trailer symbols are added to the slices so each slice is capable of being selected out from the received bit stream at the decoder side.
  • the encoded slices are merged into one bit stream. The slicing information is then transmitted along with the data stream.
  • Arithmetic coding is a powerful entropy coding scheme for data compression. It is a promising replacement to Huffman coding in video broadcasting systems because of its superior performance.
  • the combination of AC with the conditional access subsystem according to an embodiment of the present invention provides an encryption solution at low cost.
  • the next step is to perform perceptual channel coding, as shown at step 530.
  • compression serves to reduce the amount of bits used to represent bitstream data
  • channel coding works to expand or increases the length of the bitstream.
  • bitstream data can be lost, resulting in poor video quality.
  • the percentage of data lost can be calculated.
  • Channel coding seeks to protect important bits by adding non-important bits to the bitstream. The goal is for the non-important bits to make up the greater amount of bits represented in the lost data percentage, thus allowing the more important bits to be successfully transmitted.
  • rate compatible punctured convolutional (RCPC) codes are added.
  • RCPC codes are used to protect the data from spatial LLLL temporal L subband.
  • Cyclic redundancy check (CRC) codes are combined with RCPC for other less significant subbands to assure acceptable video quality even under bad channel conditions.
  • FIG. 15 shows the D ⁇ values for a sample video sequence.
  • the subbands are arranged into groups based upon their calculated D, values. Any classification algorithm known to one of ordinary skill in the art can be used to determine the groups.
  • the division ofS could result in three categories ⁇ S 0 ⁇ , ⁇ S,, S, medicine S 3 , S 4 , S 5 , S 7 ⁇ , ⁇ S 8 , S 9 ⁇ , ⁇ S 10 ⁇ as shown in FIG. 15.
  • the subbands are assigned different RCPC coding rates according to their levels of importance. The degree of importance is based upon the sensitivity to error associated with the subband in terms of the human visual model.
  • the low and high frequency bitstreams are ready for transmission.
  • decoder 400 processes the encoded bitstreams to produce slices of wavelet coefficients. Any well known decoding process equivalent to the encoding process, for example, arithmetic decoding, can be used.
  • error detector 405 determines if any of the slices contain coding errors. Any well known error detecting method, for example CRC, can be used.
  • the error concealment module 410 discards those slices where errors are detected. At times, no error is detected, although there are some errors in the received slice. However, Decoder 400 can be used to detect conflicts during decoding thereby identifying previously undetected errors. In this case, coefficients in the corrupted slice are retrieved from its DPCM reference if it belongs to the spatio-LLLL temporal-L subband. If the slice belongs to subbands of higher frequency, the coefficients are set to zero's since the error effect is trivial and will be confined within the slice. In the worst case where errors are not detected, they will not spread to the whole subband due to slicing.
  • inverse quantization is applied to the slices.
  • the slices are put through a wavelet synthesizing process resulting in the presentation of viewable video frames in step 535.

Abstract

A video encoding/decoding system based on 3D wavelet decomposition and the human perceptual model is implemented. JND is applied in quantizer design to improve the subjective quality of compressed video. The 3D wavelet decomposition helps to remove spatial and temporal redundancy and provides scalability of video quality. In order to conceal the errors that may occur under bad wireless channel conditions, a slicing method and a joint source channel coding scenario, that combines RCPC with CRC and utilises the distortion information to allocate convolutional coding rates are proposed. A new subjective quality index based on JND is presented and used to evaluate the overall system performance at different signal to noise rations (SNR) and at different compression ratios. Due to the wide use of arithmetic coding (AC) in data compression, it is considered as a readily available unit in the video codec system for broadcasting. A new scheme for conditional access (CA) sub-system is designed based on the cryptographic property of arithmetic coding. Its performance is analyzed along with its application in a multi-resolution video compression system. This scheme simplifies the conditional access sub-system and provides satisfactory system reliability.

Description

3D Wavelet Based Video Codec with Human Perceptual Model
Statement as to Rights to Invention Made Under Federally-Sponsored Research and Development Part of the work performed during development of this invention utilized
U.S. Government funds. The U.S. Government may have certain rights in this invention.
Cross-Reference to Related Application
This application claims the benefit of U.S. Provisional Application No. 60/136,633, filed May 27, 1999 (incorporated in its entirety herein by reference).
Background of the Invention
Field of the Invention
The present invention relates generally to the compression and encryption of data, and more particularly to compressing data with a human perceptual model.
Related Art
Computer applications, audio devices, and video devices based on digital data technology continue to appear in the marketplace. In response, the need for presenting audio and video signals in digital form continues to increase. In the video broadcasting area, conventional video broadcast systems include cable television systems and satellite -based broadcast systems. Although the scale and power of such systems continue to increase, the overall bandwidth available for transmitting video and audio signals remains limited. Thus, the need for signal compression remains central to digital communication and signal-storage technology. Competing with the need to compress data is the need to maintain the subjective visual quality of an image, arguably, the ultimate objective of video transmission systems. To this end, performance metrics that take the psycho visual properties of the human visual system (HVS) into account are needed. Visual sensitivity to: frequency, brightness, texture, and color are examples of psycho-visual properties. A few human visual models incorporating these psycho-visual properties into their algorithms include: just-noticeable- distortion (JND), visible difference predictor (VDP) and three-component image model. The JND model for example, provides each pixel with a threshold of error visibility, below which reconstruction errors are rendered imperceptible. To balance these considerations, one problem to be solved for optimizing the delivery of digital data streams is how to locate perceptually important signals in each frequency subband. A second problem is how to encode these signals with the lowest possible bit rate without exceeding the error visibility threshold.
A third consideration in the video broadcasting industry is the need to maintain conditional access to the transmitted data. In a commercial setting, conditional access is used to control which customer can get particular program services. Specific programs are only accessible to customers who have satisfied prepayment requirements. Conditional access is often implemented by way of a key -based system involving a combination of scrambling and encryption to prevent unauthorized reception. Encryption is the process of masking the secret keys that enable the descrambler to unscramble transmitted signals into viewable scenes. The integrity of the various encryption routines used presently is constantly being tested. As a result, vulnerabilities are identified and the push to develop improved systems continues. In view of the considerations set forth above, what is presently needed is a secure system that effectively meets the need to provide high quality images while making efficient use of limited bandwidths.
Summary of the Invention
The invention described herein is a system and method for compressing and encrypting data. In an embodiment, video images are transformed into compressed and encrypted bitstreams. The bitstreams are then transmitted over a satellite broadcast channel to a receiving station where they are decoded and presented. The system has a human visual model based encoder for receiving frames of video data. The human visual model based encoder is also responsible for transforming the video data into compressed and encrypted bitstreams. The internal components of the human visual model based encoder include a frame counter for recording the number of video frames received. A motion detector is included for analyzing the video frames for an indication that motion is occurring. A 3-D wavelet analyzer module is used to generate low and high frequency subbands. A just-noticeable distortion model generator is used to calculate a profile of the video image based upon local signal properties. A perceptually tuned quantizer provides a quantization for the wavelet coefficients present in each of the high and low frequency subbands. A differential pulse code modulator is used to provide prediction error calculations for the low frequency subbands. A slicer segments the subbands into smaller bitstreams so the influence of error from one bitstream to another is reduced. An entropy encoder is used to encode the high frequency subbands. Finally, a conditional access system is included for encoding the low frequency subbands in cases where authorization is required to decode the video images. Although the conditional access system is described as being located within the human visual based encoder, it is able to operate independent of the encoder. The conditional access system has a random slicer which is used for segmenting the low frequency subbands and generating key values representing the starting positions for each of the segments. An encryption module is used to encode the key values generated by the random slicer. A subscriber authorization module provides authorization codes used to verify that the transmitted image signal is being decoded by an authorized party. An entropy encoder is used to perform additional encoding of the low frequency subbands in preparation for transmission over a broadcast channel.
The video transmission system also includes an unequal error protection channel encoder to further encode the bitstreams prior to broadcasting them over a satellite channel. On the receiving end, an unequal error protection channel decoder is provided for decoding the bitstreams after they have been transmitted over the satellite channel. A human visual model based decoder is then used to further decode the bitstreams and regenerate the frames of video data. The internal components of human visual model based decoder include a decoder for performing arithmetic decoding of encoded video bitstreams. The decoder also transforms the video bitstreams into slices of wavelet coefficients. An error detector is responsible for recognizing corrupted video slices. A concealment module is provided to discard any corrupted video slices identified by the error detector. An inverse quantizer is used to reverse the quantization results of the video slices. Finally, a 3-D wavelet synthesizer is provided to enable transformation of the slices into video frames.
In one embodiment, the method for compressing video frames includes arranging two successively received video frames into a set, decomposing said set into a plurality of high frequency and low frequency subbands; generating a human perceptual model for each of said plurality of subbands, encoding said low frequency subbands to produce encoded low frequency subbands, quantizing said high frequency subbands and said encoded low frequency subbands according to said generated human perceptual models to generate a bitstream for each of said high frequency subbands and a bitstream for each of said encoded low frequency subbands, redefining said generated bitstreams for each of said high frequency subbands to produce a plurality of redefined high frequency bitstreams, redefining said generated bitstreams for each of said encoded low frequency subbands to produce a plurality of redefined low frequency bitstreams, channel coding said plurality of redefined high frequency bitstreams and said plurality of redefined low frequency bitstreams, and repeating steps the above steps for a next set of video frames until each of said received video frames have been compressed. Brief Description of the Figures
FIG. 1 is a block diagram illustration of a human perception model based video transmission system according to an embodiment of the present invention.
FIG. 2 is a block diagram illustration of a human visual model based encoder according to an embodiment of the present invention.
FIG. 3 is a block diagram illustration of a conditional access system according to an embodiment of the present invention.
FIG. 4 is a block diagram illustration of a human visual model based decoder according to an embodiment of the present invention.
FIG. 5 illustrates a method of encoding and decoding data transmitted over a broadcast network according to an embodiment of the present invention.
FIG. 6 is a block diagram illustration of 3-D wavelet decomposition.
FIG. 7 illustrates a method of Just Noticeable Distortion model generation according to an embodiment of the present invention.
FIG. 8 illustrates a method of detecting a scene cut according to an embodiment of the present invention.
FIG. 9 illustrates a method of detecting drastic motion according to an embodiment of the present invention.
FIG.10 illustrates a quantization method according to an embodiment of the present invention. FIG.1 1 illustrates a method of encoding low frequency subbands according to an embodiment of the present invention.
FIG. 12 illustrates a method of encoding high frequency subbands according to an embodiment of the present invention.
FIG. 13 illustrates a method of channel coding according to an embodiment of the present invention.
FIG. 14 illustrates a method of decoding a video transmission according to an embodiment of the present invention.
FIG. 15 is an example of subbands arranged into categories according to an embodiment of the present invention.
Detailed Description of the Preferred Embodiments
Physical Design for the System of the Present Invention
A system according to an embodiment of the present invention is illustrated in FIG. 1. System 100 shows a human visual model based encoder 1 10 for receiving frames of video data. Human visual model based encoder 1 10 is further responsible for transforming the video data into compressed and encrypted bitstreams. An unequal error protection channel encoder 1 15 is used to further encode the bitstreams prior to broadcasting them over a broadcast channel 118. An unequal error protection channel decoder 120 is provided for decoding the bitstreams after they have been transmitted over the satellite channel. A human visual model based decoder 125 is provided to further decode the bitstreams and regenerate the frames of video data. FIG. 2 shows the internal components of human visual model based encoder 110. A frame counter 200 is provided for recording the number of video frames received. A motion detector 205 analyzes the video frames for an indication that motion is occurring. A 3-D wavelet analyzer module 210 is used to generate low and high frequency subbands from the received video frames. A just-noticeable distortion model generator 215 is used to calculate a profile of the video image based upon local signal properties. A perceptually tuned quantizer 220 provides a quantization for the wavelet coefficients present in each of the high and low frequency subbands. A differential pulse code modulator 225 is use to provide prediction error calculations for the low frequency subbands. A slicer
230 segments the subbands into smaller bitstreams so the influence of error from one bitstream to another is reduced. An entropy encoder 235 is used to encode the high frequency subbands. A conditional access system 240 is included for further encoding of the low frequency subbands in cases where authorization is required to decode the video images.
Referring to Fig.3, Conditional access system 240 has a random slicer 300 which is used for segmenting the low frequency subbands and generating key values representing the starting positions for each of the segments. Entropy encoder 305 is used to perform encoding of the low frequency subbands in preparation for transmission over a broadcast channel. Encryption module 310 is used to encode the key values generated by random slicer 300. Subscriber Authorization module 315 provides authorization codes used to verify that the transmitted image signal is being decoded by an authorized party. In an alternative embodiment, conditional access system 240 is able to operate independent of System 100.
The internal components of human visual model based decoder 125 are described with respect to FIG. 4. Human visual model based decoder 125 includes a decoder 400 for performing arithmetic decoding of encoded video bitstreams. Decoder 400 also transforms the video bitstreams into slices of wavelet coefficients. Error detector 405 is responsible for recognizing corrupted video slices. Concealment module 410 is used to discard any corrupted video slices identified by Error detector 405. Inverse quantizer 415 is used to reverse the quantization results of the video slices. Finally, a 3-D wavelet synthesizer 420 is provided to enable transformation of the slices into video frames. The operation of System 100 is described with respect to figures 5-15.
However, it is helpful to begin with a discussion of human perceptual modeling and just- noticeable distortion (JND) as they relate to the present invention.
The Human Perceptual Model of JND and Perceptual Coding
Human Perceptual Models
A biologically correct and complete model of the human perceptual system would incorporate descriptions of several physical phenomena including peripheral as well as higher level effects, feedback from higher to lower levels in perception, interactions between audio and visual channels, as well as elaborate descriptions of time-frequency processing and nonlinear behavior. Some of the above effects are reflected in existing coder algorithms, either by design or by accident. For example, certain forms of adaptive quantization and prediction systems provide efficient performance although they present inadequate response times. The slow response time can often be attributed to temporal noise masking. The basic time- frequency analyzers in the human perceptual chain are described as bandpass filters. Bandpass filters in perception are sometimes reflected in coder design and telecommunication practice in the form of "rules of thumb."
A particularly interesting aspect of the signal processing model of the human system is non-uniform frequency processing. The critical bands in vision are nonuniform. It is necessary to use masking models with a non-uniform frequency support to incorporate this in coder design. Here masking refers to the ability of one signal to hinder the perception of another within a certain time or frequency range. It is also necessary to recognize that high-frequency signals in visual information tend to have a short time or space support, while low- frequency signals tend to last longer. An efficient perceptual coder therefore needs to not only exploit properties of distortion masking in time and frequency, but also have a time-frequency analysis module that is sufficiently flexible to incorporate the complex phenomena of distortion masking by non-stationary input signals. All of this is in contrast to the classical redundancy-removing coder, driven purely by considerations of minimum mean square error (MMSE), MMSE bit allocation, or MMSE noise shaping matched to the input spectrum.
Distortion sensitivity profiles of human perception are driven as functions of frequency, brightness, texture, and temporal parameters. These four kinds of sensitivity are considered for gray scale video/images: (1) Brightness sensitivity:
It was found that human visual perception is sensitive to luminance contrast rather than absolute luminance values. The ability of human eyes to detect the magnitude difference between an object and its background is dependent on the average value of background luminance. According to Web's Law, if the luminance of a test stimulus is just noticeable from the surrounding luminance, the ratio of just noticeable luminance difference to stimulus's luminance (Weber fraction) is almost constant. However, due to the ambient illumination failing on the display, the noise in dark areas tends to be less perceptible than that occurring in regions of high luminance. In general, high visibility thresholds will occur in either very dark or very bright regions, and tower thresholds will occur in regions of gray levels close to the mid-gray luminance, which is 127 for 8 bit sampling. (2) Texture sensitivity:
The reduction in the visibility of stimuli due to the increase in spatial nonuniformity of the background luminance is known as texture masking. Several efforts have been made to utilize some forms of texture making to improve coding efficiency. In many approaches, visibility thresholds are defined as functions of the amplitude of luminance edge in which perturbation is increased until it becomes just discernible. The visibility threshold in this approach is associated with the masking function defined at each pixel as the maximum prediction error from the four neighboring pixels.
(3) Temporal sensitivity: The masking of temporally changing stimuli is extremely important in interframe coding. However, temporal masking is complicated by many factors, and its application to video coding is still in its infancy. Many researches have attempted to evaluate the losses of spatial resolution and magnitude resolution as an object moves in a scene. If movement is drastic, such as scene change, the perceived spatial and intensity resolution is significantly reduced immediately after the scene change. It was found that the eye is noticeably more sensitive to flicker at high luminance than at low luminance.
(4) Frequency sensitivity:
Many psycho-visual studies have shown that the human perception of distortion depends on its frequency distribution. The response of the human visual system (HVS) to sine wave gratings of different frequencies has been experimentally measured as the so-called contrast sensitivity function (CSF). Many spatial-domain CSF models indicating general bandpass characteristics have been proposed. The spatial-domain CSF has been widely used to improve the quality of the coded still images. However, there are only a few models of spatio- temporal CSF reported. The spatio-temporal CSF provides relative sensitivities of the HVS to different spatio-temporal frequencies, or relative tolerance of noises at different spatio-temporal frequencies. It can be used to allocate coding bits, or distortion, by adjusting the quantizer step size of the target signal as inversely proportional to the sensitivity of the corresponding frequency.
Perceptual Coding
There are two intrinsic operations to signal coding: removal of redundancy and reduction of irrelevancy. The removal of redundancy is the effect of predictive coding or transform coding. Almost all sampled signals in coding are redundant because Nyquist sampling typically tends to preserve some degree of inter-sample correlation. This is reflected in the form of a non flat power spectrum. Greater degrees of non- flatness, as resulting from a low-pass function for signal energy versus frequency, or from periodicities, lead to greater gains in redundancy removal. These gains are also referred to as prediction gains or transform coding gains, depending on whether the redundancy is processed in the spatial domain or in the frequency (or transform) domain.
The reduction of irrelevancy is the result of amplitude quantization. In a signal compression algorithm, the inputs of the quantizing system are typically sequences of prediction errors or transform coefficients. The idea is to quantize the prediction error, or the transform coefficients just finely enough to render the resulting distortion imperceptible, although not mathematically zero. If the available bit rate is not sufficient to realize this kind of perceptual transparency, the intent is to minimize the perceptibility of the distortion by shaping it advantageously in space or frequency, so that as many of its components as possible are masked by the input signal itself. The term perceptual coding is used to signify the matching of the quantizer to the human visual system, with the goal of either minimizing perceived distortion, or driving it to ze-o where possible. These goals do not correspond to the maximization of signal-to-noise ratios or the minimization of mean square error.
Just-noticeable-distortion (JND) Profile
To remove the redundancy due to spatial and temporal correlation and the irrelevancy of perceptually insignificant components from video signals, the concept of a just-noticeable distortion has been successfully applied to perceptual coding of video and image. JND provides each signal to be coded with a visibility threshold of distortion, below which reconstruction errors are rendered imperceptible. The JND profile of a still image is a function of local signal properties such as the background luminance and the activity of luminance changes in the spatial domain. For video sequences, the derivation of JND profiles must take both spatial and temporal masking effects into consideration. For a successful estimation of JND profiles, the subject should not be able to discern the difference between a video sequence and its JND-contaminated version.
The following papers are related to video transmission systems using the human visual model and are incorporated herein by reference:
"A Perceptually Optimized Joint Source/Channel Coding Scheme for Video Transmission over Satellite Broadcast Channel", Yimin Jiang, Junfeng Gu, and John s. Baras.(l 999)
"A Video Transmission System Based on Human Visual Model for Satellite Channel", Junfeng Gu, Yimin Jiang, and John S. Baras.(1999)
Video Coding/Decoding System
A method according to an embodiment of the present invention will now be described with respect to FIG. 5
The method begins at step 500. At a step 502 frame counter 200 is initialized by setting its value to zero. At a step 504 the video frames to be encoded are received by the human visual model based encoder 1 10. In a step 506 the frames count is increased in response to each received frame. In a step 508 two frames received in successive order are associated as a set in preparation for
3-D wavelet analysis. In a step 510 the 3-D wavelet analyzer module 210 performs decomposition of the video frame set into a multi resolution subband representation. In step 512 control is passed to the JND model generator 215. 3-D Wavelet Analysis
Wavelet multi resolution analysis techniques have been applied primarily to ID and 2D signals. These techniques project the signal onto a chain of embedded approximation and detail spaces designed to represent the signal and its details at various levels of resolution. For practical purposes, the projection coefficients are obtained using a discrete subband transform that employs a quadrature mirror filter pair related to the type of wavelet used in the analysis. In conventional 2D wavelet multi-resolution analysis, the separable 2D approximation spaces are formed from the tensor product of identical ID approximation spaces. This restriction generates analyzing filters with homogeneous spectral characteristics in 2D frequency space. When extended to three dimensions, the multi-resolution analysis is constructed from a separable 3D analyzing or "scaling" function formed from the product of three nonidentical 1 D scaling functions, or two identical ID spatial scaling functions and one different ID temporal scaling function. This brings a much richer set of orthonormal basis vectors with which to represent 3D signals, and it produces filters that can be easily tailored to more closely match the spatial and temporal frequency characteristics of the 3D signal.
An L2 (St) multi-resolution analysis consists of a chain of closed, linear "approximation" spaces V and a scaling function φ which satisfy the following properties for all e L2 (9t).
1) ... V_2 d V_λ c F0 c V c V2 c ...
2) [ Vj = E2(9t); ] Vj = {0} jeZ j eZ
f(x) e Vj < f(2x) e VJ+l; j ε Z
Figure imgf000014_0001
4) The set of functions 2'2
Figure imgf000015_0001
Z,n l Z i forms an orthonormal basis for the approximation space Vr
The purpose of multi-resolution analysis is to create a mathematical framework that facilitates the construction of a wavelet orthonormal basis for the space for all finite energy signals L2 (3t). To this end, the orthogonal complement of Vf in Vj , by W/ where
VJ+] = Vj θ Wj
and the symbol θ indicates the direct sum is denoted. Wl is typically referred to as the yth detail space, because it captures the difference in signal information between the approximation spaces Vt , and Vr
It has been shown that one can create a mother wavelet ψ(x) such that the
set of functions G Z } forms an orthonormal basis for
Figure imgf000015_0002
Wr The spaces Wn where/ e Z, are mutually orthogonal; thus, by the denseness property of the multi-resolution analysis, the set of scaled and dilated wavelets
// 2 2 n \2J x - n ) j G Z ,n e Z forms an orthonormal basis for L2 (Si). The
scaling functions and the mother wavelet are related by the "two-scale" recursion relations
Figure imgf000015_0003
ψ(x) = ∑ gn φ{2x - n)
where hn and gn are the coefficients of the QMF pair which is used to compute the approximation and detail projection associated with V, and W from the approximation at the next higher scale V ,. Approximation and detail signals are created by orthogonally projecting the input signal /onto the appropriate approximation or detail space. Since each space is spanned by an orthonormal basis set, the signal projection onto a given approximation or detail space at the/th resolution, is equivalent to the sequence of projection coefficients obtained by the inner product operations
Figure imgf000016_0001
where aj n and dj n represent the jt approximation and detail coefficients respectively.
Figure 6 shows the block diagram of heterogeneous 3D wavelet decomposition. In this example, each video frame set is divided into 1 1 subbands. In one embodiment, the inventors use the simple two-tap Haar filter to separate the signal into temporal low frequency (LFP) and high frequency parts. (HFP) The low frequency part is then decomposed for two levels with the spatial 2-D wavelet decomposition and the high frequency part is decomposed fc: > 'lie level. The Antonini filter is used here. The Haar and Antonini filters work together to achieve a satisfying balance between complexity and performance. The coefficients for the Antonini wavelet filter are: For the LFP, the h„ are given by {0.0378284555, 0.0238494650, 0.1106244044, 0.3774028556, 0.852698679, 0.3774028556, 0.1106244044, 0.0238494650, 0.0378284555}.
For the HFP, the g„ are given by {0.0645388826, 0.0406894176, 0.4180922732, 0.788485616, 0.4180922732, 0.0406894176, 0.0645388826} . JND Model Generation
The JND model provides each signal a threshold of visible distortion, below which reconstruction errors are rendered imperceptible. In this part, the spatial -temporal JND profile for each set of two frames in the video sequence and the JND profiles for subbands are generated sequentially.
Since the generation of the spatial-temporal JND profile for each frame requires one previous frame as reference, when a video sequence is being coded, the encoder has to assign the first frame a reference frame. In an embodiment of the present invention, the first frame in the video sequence will use itself as a reference to generate each renewed JND profile.
The generation of the JND model consists of several steps described in
FIG. 7. First, in a step 700 the perceptual redundancy inherent in the spatial domain is quantitatively measured as a 2D profile by a perceptual model that incorporates the visibility thresholds due to average background luminance and texture masking. The following expression yields:
JNDS (x, y) = max {f,(bg(x, y), mg(x, y)),f2(bg(x, y))}, for 0 <x < W,0 ≤y < H (1)
where/ represents the error visibility threshold due to texture masking,/, the visibility threshold due to average background luminance; H and W denote respectively the height and width of the image; mg(x, y) denotes the maximal weighted average of luminance gradients around the pixel at (x, y); bg (x, y) is the average background luminance.
f\ [bg(x,
Figure imgf000017_0001
y)j (2)
Figure imgf000018_0001
(bg{x, y)) = bg(x, y) • 0.0001 + 0.115
Figure imgf000018_0002
where Tu, γ and λ are found to be 17, 3/128 and Vi through experiments.
The value of mg(x, y) across the pixel at (x, y) is determined by calculating the weighted average of luminance changes around the pixel in four directions. Four operators, Gk (I,j),foτk=\,..Λ, and / - 1 , ....5 , are employed to perform the calculation, where the weighting coefficient decreases as the distance away from the central pixel increases.
mg {,χ,y)- y) (4)
Figure imgf000018_0003
5 5 gradk ( , y) = -J 2J 2J /?(x - 3 + , >- - 3 + /) • Gk (i,j)
for 0≤x<H, ≤y<W,
where p(x, y) denotes the pixel at (x, y). The operators Gk (I, j) are 0 0 0 0 0 0 0 1 0 0 1 3 8 3 1 0 8 3 0 0
G, 0 0 0 0 0 G2 = 1 3 0 -3 -1 -1-3-8-3-1 0 0-3 -8 0 0 0 0 0 0 0 0-1 0 0
0 0 1 0 0 0 1 0 -1 0 0 0 3 8 0 0 3 0 -3 0
G, = -1-3 0 3 1 GA = 0 8 0 -8 0 0 -8-3 0 0 0 3 0-3 0 0 0 -1 0 0 0 1 0 -1 0
The average background luminance, bg(x, y). is calculated by a weighted lowpass operator, B(I, j), I,j=l,...,5.
Figure imgf000019_0001
ι = \ j--
1 1 1 1 1 1 2 2 2 1 where B = 1 2 0 2 1 1 2 2 2 1 1 1 1 1 1 In a step 705, the JND profile representing the error visibility threshold in the spatio-temporal domain is determined according to the following expression:
JNDs_T(x,y,n) = f3{ild(x,y,n)j - JNDs(x,y,n) (6)
where ild(x, y, ) denotes the average interframe luminance difference between the n th and (n - 1) th frame. Thus, to calculate the spatio-temporal JND profile for each frame in a video sequence, the spatio JND profile of itself and its previous reference frame are required. ild(x,y,n) = 0.5 p(x,y,n)- p(x,y,n - l)+ bg(x,y,n) - bg(x,y,n - l)j
/, represents the error visibility threshold due to motion. To purposely minimize the allowable distortion in the nonmoving area, the scale factor is switched to 0.8 as \ild(x, y,
Figure imgf000020_0001
< 5. The inventors note that the error visibility threshold is increased with the increasing interframe luminance difference. This confirms the research findings that after a rapid scene change or large temporal difference, the sensitivity of the HVS to spatial details is decreased. Moreover, the inventors also found that temporal masking due to high-to-low luminance changes is more prominent than that due to low-to-high luminance changes.
After the spatio-temporal JND profile is determined in " p705, the JND profiles for each subband are set up with certain distortion allocation for different subbands in step 710.
The JND for a subband q is a function of spatio-temporal JND values at corresponding locations multiplied by a weight that indicates the perceptual importance of this subband. In the example provided in FIG. 6, the relationship between the full-band JND profile and the component JND profiles can be obtained by the following equations:
JNDq(x,y)
Figure imgf000020_0002
for 0 < q < 3, 0 ≤ < ^/ 4, 0 < y < H/ 4 and
Figure imgf000021_0001
for 4 ≤ q ≤ \0, 0 < x < W/ 2, 0 < y < HI 2
The weighting function for distributing the full-band JND energy to a subband can be derived as the relative sensitivity of the HVS to the frequency subband. For example, with 11 spatio-temporal subbands, the weighting function of the -/th subband. is obtained as
ω
Figure imgf000021_0002
i=0
where S¥ represents the average sensitivity of the HVS to the qt subband. S is obtained from the spatio-temporal CSF.
A Novel Human Perceptual Distortion Measure Based on JND
Based on the basic concept of the JND, the idea of minimally-noticeable- distortion (MND) is developed for situations where the bit-rate budget is tight and the distortion in the reconstructed image is perceptually minimal at the available bit-rate and uniformly distributed over the whole image. The perceptual quality of the reconstructed image is accordingly expected to degrade evenly if bit-rate is reduced. MND is expressed as:
MND[x,y) ≡ JND(x,y) - d(x,y) (10) where 0 < x < W, 0 ≤ y < H, W and H are the width and height of an image respectively, d(x, y) is the distortion index at point (x, y).
The energy of JND can be understood as the minimum energy of quantization error that will cause conceivable loss to the human visual system. So, if the error in a small area is so small that the error energy is less than the JND energy, the compression will be perceptually lossless. The energy of the MND of a small area indexed by (7, j) is defined as: JND2 (x,y) - δ(i, j) (11)
Figure imgf000022_0001
where rη is a small block (typically 8 by 8 pixels), ό\IJ) is the distortion index for this block. The inventors define a global human perceptual distortion measure(ΔG) based on evaluating ό\I, j) as follows:
Figure imgf000022_0002
where ε(k, I) is the distortion measure of a medium block indexed by (k, I). The whole image is decomposed into K by L non-overlapped medium blocks (Rk/). Each medium block is divided into by N small blocks (rη (k, /)), i.e.,
Figure imgf000022_0003
ε(k, 1) is defined as:
ε{k, l) ≡ < j < Nj (13)
Figure imgf000022_0004
The larger Δf; is the larger the subjective perceptual distortion is. Compared with PSΝR or MSE, Δf; has the same convenience to describe the picture quality with one quantitative value. However, AG takes the human visual model into account and therefore reflects subjective visual quality better than PSΝR or MSE. It is well accepted that the value of MSE or PSΝR is meaningless to video/image subjective evaluation and when two images are compressed, the comparison of their MSE or PSΝR values cannot give a creditable psychovisual conclusion. On the other hand, the distortion Δ(; can be explained as "this image is compressed at a scale of ΔG times the perceptually noticeable distortion." Generally speaking, if one image is coded with Δ(/ larger than another one, the former's subjective quality is higher. Due to these considerations, the inventors use Δ as the index of performance for the video compression system.
Frame Counting & Motion Detection
The computation of the JND model is resource consuming for the realtime video system. Even if it is implemented with dedicated hardware, the reconstruction of the JND profile for each pair of incoming video frames is expensive and unnecessary. Thus, Frame Counter 200 & Motion Detector 205 are designed to control the JND renewal process.
Referring back to FIG. 5, in step 514 the frame counter 200 is used to count the number of frames being coded. After a certain number of frames have been coded with the current JND model, for example 10 or 20, control is passed back to step 512 so that the JND model can be refreshed. The inventors assume that the scenes depicted in the frames remain almost the same so the original JND model remains effective until the update. If, however, drastic movement or a scene cut happens, the scene changes greatly. If the scene changes, the original JND has to be refreshed right away to follow the change. This is why a motion detector is necessary. In the present embodiment, the inventors adopt a simple motion detection scheme. It considers two factors relative to picture content's change: a scene cut and drastic movement.
In a step 516 the detection of a scene cut is made in accordance with the process described in FIG. 8. In a step 800 the energy in the spatial-LLLL temporal-L subband is calculated for each set of two frames and stored as LEold.
In step 805 LEnew is calculated for the next set of frames. In step 810 LEold is then compared with the new generated energy LEnew. If the difference exceeds a predefined threshold it is assumed that a scene cut has occurred and a new JND model will be generated. The appropriate threshold level is determined by the results obtained from sample calculations.
In a step 518 the detection of drastic movement is determined according to the process described in FIG. 9. In a step 900 the energy in the spatial-LL temporal-H subband is calculated. In a step 905 HEnew is calculated for the next set of frames. In step 910 HEold is then compared with the new generated energy HEnew. If the difference exceeds a predetermined threshold it is assumed that drastic motion is occurring and a new JND model will be generated. The appropriate threshold level is determined by the results obtained from sample calculations.
At a step 520 the wavelet coefficients in the lowest frequency subbands (LFS) are encoded using differential pulse code modulation (DPCM). DPCM is a well known method for providing pixel value prediction error calculations. At a step 522 control is passed to the perceptually tuned quantizer 220. Following the completion of step 522, inverse quantization is performed in a step
524. Upon completion of step 524 subband processing continues with steps 526 and 528. These steps can be performed in serial order or in parallel. Next, in step 530 perceptual channel coding is performed. Finally, in step 535 the encoded bitstreams are ready for transmittal. The above mentioned
Figure imgf000024_0001
will now be discussed in greater detail with reference to additional figures where appropriate.
Perceptually Tuned Quantization
The perceptually tuned quantizer 220 is the core of the source encoder. With the JND profiles for each subband at hand, the most important task is to allocate the available bits to certain coefficients obtained from the wavelet decomposition as efficiently as possible. The beauty of JND is that it provides a quantitative measure of the error sensitivity threshold with spatial and frequency localization. In schemes using DCT like MPEG-2, the DCT coefficients are quantized using a quantization table that assigns more bits to the more important lower frequency coefficients in one 8 by 8 block. Although such a table is designed according to the HVS response based on psychovisual theory and experiments, its wide use all over the whole picture brings shortcomings because different parts in a picture can have different visual importance at different scenes. In comparison, using a quantization table based on JND that can adapt itself to local scenes' results in less perceptual distortion.
The symbol ΔG used here is the same symbol used to indicate the new JND based human perceptual distortion measure since both have the same psychological meaning. Ideally, the coding procedure controlled by the object global distortion index Δ = d will produce a compressed video sequence whose average perceptual distortion index Δ(; value is d' = βd). The powerful functionality to control the compressed video quality ahead of time distinguishes this scheme from conventional video coding systems. In addition, the coding distortion is perceptually evenly distributed across the whole image.
Uniform Quantization
In one embodiment, a mid-rising uniform quantizer is used due to its simplicity of error analysis and its sound performance under certain conditions for optimality . FIG.10 illustrates a quantization method according to an embodiment of the present invention. In a step 1000 a global distortion index ΔG is given for the quantization procedure. The global distortion index ranges from approximately 1.0 to 10.0, where 1.0 stands for just noticeable distortion. In step 1005 each subband is partitioned into non-overlapped blocks (rη (k, I)). These blocks are set up with the size of 8x8 or 16x16. In step 1010, for each block rt/(k, I), the step size of the mid-rising uniform quantizer is maximized under the condition that the quantization error energy is less than or equal to the MND energy in the block that has the distortion index ό\I, j) equal to Δ(;, i.e..
Figure imgf000026_0001
where the energy of MND defined as equation (1 1), w(x, y) is the wavelet coefficient, w(x, y) is the quantized wavelet coefficient, τη(k, I) is the quantization step size of (k, l) . In a step 1015, one quantization table that leads to the uniform error energy distribution over all subbands is set up for each subband. In a step 1020, the quantization index values from the quantization table are transmitted in the header of the bit stream for this subband. In a step 1025 a determination of the proportion of zero signals in a block is made to determine if after the quantization, the proportion of zero signals in the block is larger than a specified threshold, for example 7/8 or 15/16. If so, this block is assumed to be unimportant. Accordingly, its step size is recorded as 0. The coefficients in such an unimportant block need not be transmitted. As a result, in the decoder all values in this block are recovered as 0's. When the bandwidth is dynamically assigned to this source encoder. ΔG can be kept constant to maintain the video quality at the same level. If the bandwidth is fixed, i.e., the bit rate is limited for this source encoder, several Δt; values should be calculated and the corresponding bit rates compared with the available channel capacity. One Δ(, value that provides proper bit rate will be chosen finally. The procedure for refreshing the quantization table and Δ(; value choice is repeated when the Frame Counter reaches a certain number or a drastic movement happens, resulting in the need to update the JND model. Lloyd-Max Quantizer
In an alternative embodiment, an optimum mean square error quantizer is used. This quantizer minimizes the mean square error for a given number of quantization levels. For a random variable u, the reconstruction levels ak are calculated according to the expression [31 ]
J * upu[u)du ak = (17) r pu(u)du
where pu (u) is the continuous probability density function of the random variable u, and tk is the transition level
Figure imgf000027_0001
To design an appropriate quantizer, the distribution of the subband samples needs to be known. It has been suggested that the probability density function of the subband values and their prediction errors is Laplacian. Although more accurate probability distribution of the subband values, i.e., generalized Gaussian distribution, has been suggested and its shape parameters estimation method is explored, the resulting overhead for the calculation and transmission of the parameters is too large. So the inventors use the Laplacian distribution as a reasonable approximation.
According to the quantization procedure of an alternative embodiment, the first step selects a global object distortion index Δc;. Second, the variance of the wavelet coefficients for each subband is calculated. Next, all the coefficients are unified. Fourth, each subband is partitioned into blocks (rη (k, I)) with size of 8 by 8 or 16 by 16. For each block r;/ (k, T) the levels of Lloyd-Max quantizer are minimized under the condition that the quantization error energy is less than or equal to the MND energy in this block that has the distortion index ό\I, j) equal to Δ , i.e.,
{x,y) rη {k )
Figure imgf000028_0001
{x,y) er„ (k ,/),!/<,">,
Figure imgf000028_0002
where the energy of MND is defined as in equation (11), M>(X, y) is the wavelet coefficient, n is the number of quantizer levels with n=3,5,...2N+l, 2N+1 is the maximal number of quantization levels, m indicates the interval in which w(x, y) is located, and a (n) (x, y) is the quantized wavelet coefficient for a quantizer with
n levels. Here a look-up table is set up for the tn" and an" since the wavelet,
coefficients for quantization have been unified. The index of levels for the optimum quantizer is transmitted in the header of the bit stream of this subband.
Referring back to FIG. 5. in a step 524 the subbands g'< (i.rough inverse quantization to correct errors occurring during the quantization process. At the conclusion of step 524 control is then passed to the slicing and arithmetic coding modules.
Arithmetic Coding and Slicing
The arithmetic coding scheme developed by Witten, Neal and Cleary is widely used in video codecs due to its superior performance. It easily accommodates adaptive models and is computationally very efficient. The arithmetic coding scheme provides efficient compression, however the decoding result of one coefficient depends on the decoding result of the previous one because of the adaptive coding procedure employed. Theoretically, arithmetic coding encodes a message as a number in the unit interval. Unlike most schemes, including Huffman Coding, there is no specific code word defined for any specific symbol, but how each symbol is encoded depends, in general, on the previous symbols of the string. For the source sequence x", \etp(x") denote its probability mass function and F(x") its cumulative distribution function. A number in the interval (F(x")-p(x"), E(x")) can be used as the code for x". For example, expressing E(x") to an accuracy of -log »(x")"| will yield a uniquely decodable code for the source. However this is equal to the entropy of the message encoded, so that by Shannon's theory the theoretical bound has been achieved. There are two reasons why the theoretical bound cannot usually be achieved in practice:
(1) For a message of unbounded length, arithmetic of unbounded precision is required for maintaining the value of the current interval. In practice this is overcome by scaling procedures which add to the average encoded word length by decreasing the size of the actual interval calculated.
(2) The decoding is not unique unless the length of the message is known, since a single point may represent any interval of which it is a member. Without knowledge of the message length, decoding can proceed indefinitely. This may be overcome in practice either by sending a length indicator or by using an end-of-message symbol. Both of these add overhead to the encoding.
In the sequential encoding procedure, each symbol in the message is assigned a half-open subinterval of the unit interval of length equal to its probability. This is referred to as the coding interval for that symbol. As encoding proceeds a nesting of subintervals is defined. Each successive subinterval is defined by reducing the previous subinterval in proportion to the length of the current symbol's coding interval. This process continues until all symbols have been encoded.
In some cases a precise probability distribution for the source is unavailable. In these cases, a dynamical procedure of updating the symbol frequency model, which renders removal of the redundancy quite efficiently, will be used to adapt to the source. This is the basic idea of adaptive arithmetic coding.
Dependency of Arithmetic Coding
In arithmetic coding, the source sequence x" is represented by its cumulative distribution function E(x"). The encoding output of the current symbol depends on the previous one due to the tree-structured encoding procedure. So basically, it is a polyalphabetic cryptosystem. With the knowledge of a source symbol frequency model, the decoder restores the source symbol one by one with successive dependency. This dependency is observed in the following examples. In this implementation, a binary bit stream is obtained as the encoding output.
Example 1
0 1 2
Assume the source symbol frequency model is 0.3 0.3 0.4
which indicates that Prob(symbol=0) is 0.3, Prob(symbol=l) is 0.3,
Prob(symbol=2) is 0.4. The source sequence for AC encoding is 1 1 1 2 0 2 0 1 0 1 0 1 2 2 1 0 0 2 1 0
The encoded binary bit stream is
10010000 10010001 01000110 11 100100 1 1 (34 bits)
Here the frequency model doesn't match the distribution of source sequence. But it only decreases the compression ratio and doesn't change the characteristics of AC . If the third symbol 1 is replaced by 2 and 0 respectively, the encoder will give out the corresponding binary bit stream
10001000 0001 1000 1001 1011 01000101 1 (33 bits) and
10010111 01111010 11010100 01100011 00 (34 bits) Only a few bits in the front are the same in these three cases. The numbers of output bits are also different, i.e., the change of the previous symbol changes the encoding of the following symbols completely.
The loss of source symbol also diversifies the encoding of the following symbols. If the sequence of source symbols is
1120201010122100210 (with the first symbol lost), it's encoded to 10001100100100001000010101101111 (32 bits)
Example 2
Assume the same symbol frequency model as that in the previous case.
The sequence of symbols for encoding is
12202010101221002101
The encoding output is
0111000001010011101100011111001001 (34 bits) Suppose there is one bit (7th bit) error in the bit stream, which turns it into
0111001001010011101100011111001001.
Then the decoded symbol sequence is
12200100221020222020
The symbols after the 3rd one are totally different from the original symbols. If the decoder erroneously locates the start bit position in the bit stream, the following symbols cannot be decoded correctly since in both cases the decoding path in the tree structure is misled.
Input symbol sequence:
12202010101221002101 Output bit stream after encoding:
0111000001010011101100011111001001 (34 bits)
1 bit left shift:
111000001010011101100011111001001 Output symbol sequence after decoding: 01102012222112012121
1 bit right shift:
00111000001010011101100011111001001 Output symbol sequence after decoding:
21120202120010212112
From these examples we can see that even with a fixed frequency model, the correctness of encoding/decoding of the previous symbol dominantly decides the correctness of the following one. In the environment of a noisy channel the possibility for decoding errors is increased. In order to prevent the decoding errors from spreading, the inventors derived a slicing algorithm to truncate the whole subband into short bit streams before arithmetic coding.
The inventors also recognized the importance of knowing the precise location of the start bit of an AC encoded bit stream and thus, developed a conditional access sub-system based on this property.
Conditional Access with Arithmetic Coding
Referring to Fig. 5, step 526 is directed to the processing of the low frequency subbands in accordance with a conditional access scheme. The conditional access process beginning at step 526 will now be discussed with respect to FIG. 11. First, in step 1100 the data frame being processed for transmission is broken into slices {s,, s2, ... sM}. The locations of breaking points are decided by a randomly generated vector v. This vector v is updated after a short time interval. Next, in step 1105 each slice is entropy encoded respectively into bit stream {b,, b2, ... bM}. In an embodiment of the present invention, arithmetic coding is relied upon. In step 1110, these bit streams are concatenated into one bit stream btotal. In step 1 115, the function /(b) is assigned to represent the
length of bit stream b and the J is used to determine the start bit
Figure imgf000033_0001
positions of b„ 1=1, 2, ...M, in btota,. In step 1 120, the values of /(b,) (i.e. the scrambler key K) are encrypted into ECM using any available encryption algorithm and inserted into the header of this data frame and transmitted.
For the illegal user, cryptanalysis of such a bit stream will be almost computationally impossible in the broadcasting case, even if the only thing unknown to him is the /(b,) values. Assume that the frame size is 352x288. If each frame is wavelet decomposed in 2 levels, the most important subband LLLL (LFS), which actually needs encryption, has the size of 88x72. These coefficients are quantized and organized into 8x8 non-overlapped blocks which yields 1 1x9 blocks totally. Suppose these 99 blocks are broken into 11 slices for random AC
There will be T .4 x 10 ' 3 schemes to choose . Only if the illegal user decides
Figure imgf000033_0002
these slice lengths in the procedure of decoding, can he know when he has got enough symbols for the current slice. At this point he should stop the current AC decoding process and start a new procedure for the next slice. But the large amount of possibilities in deciding slice locations eliminates the feasibility of the method of "making mistakes and trying", i.e., the key space is large enough to stop the real-time decryption of the data in commercial broadcasting. In an alternative embodiment, an adaptive source symbol frequency model will be used, no fixed source symbol frequency model is made available and the frequency model is dynamically updated along with the coding procedure. Thus, the AC encoding and decoding of the symbol depends more tightly on the previous one. Even if the illegal user gets the correct start position of b,, 1=1,2,...,M, he cannot decode it without knowing the NoS (number of symbols), which is vital for setting up the adaptive model and variable in different slices. The NoS will also be encrypted and transmitted in the frame header. For the wavelet coefficients in the video coding system, the data range differs frame by frame, so the illegal user cannot know it a priori. As previously explained, the wavelet coefficients in the lowest frequency subbands contain the most important information. Thus, in one embodiment, only the LFS subbands are processed through the conditional access system. The high frequency subbands are simply sliced at fixed locations and arithmetically encoded.
High Frequency Subband Slicing Referring again to Fig. 5, the high frequency subband processing step 528 will now be discussed with respect to FIG. 12. The idea here is to generate small bitstreams that carry the same amount of "distortion sensitivity". In a step 1200 the high frequency subband S, is segmented into 7 short bit streams. Here, the inventors define 7 sets G, of the point (x, y), such that for each set G[ (- = l, ... ):£/ = U G/ and
Figure imgf000035_0001
so every time when such a short bit stream is being encoded, a new adaptive statistical model, is set up for it. In a step 1205 the slices are entropy coded using arithmetic coding, for example. In a step 1210, header and trailer symbols are added to the slices so each slice is capable of being selected out from the received bit stream at the decoder side. In a step 1215 the encoded slices are merged into one bit stream. The slicing information is then transmitted along with the data stream.
Arithmetic coding is a powerful entropy coding scheme for data compression. It is a promising replacement to Huffman coding in video broadcasting systems because of its superior performance. The combination of AC with the conditional access subsystem according to an embodiment of the present invention provides an encryption solution at low cost. The illegal user who tries to break the AC based conditional access system faces these problems: (1) The positions of slices {b,, b2,... bM} in the concatenated bit stream are unknown so, they cannot be picked out for decoding. (2) The number of symbols in the adaptive source symbol frequency model is unknown thus creating ambiguity in the decoding. (3) Certain b,. 1=1,2,...,M, in the concatenated bit stream are encrypted with probability p<l . Such a mixture of cipherdata and plaindata brings more difficulties to cryptanalysis. Once slicing and arithmetic coding has been applied to the bitstreams, control is passed to channel coder 115 for further processing.
Perceptual Channel Coding
Referring again to Figure 5. the next step is to perform perceptual channel coding, as shown at step 530. While compression serves to reduce the amount of bits used to represent bitstream data, channel coding works to expand or increases the length of the bitstream. In the video arena, under bad transmission conditions, such as noisy lines, for example, bitstream data can be lost, resulting in poor video quality. The percentage of data lost can be calculated. Channel coding seeks to protect important bits by adding non-important bits to the bitstream. The goal is for the non-important bits to make up the greater amount of bits represented in the lost data percentage, thus allowing the more important bits to be successfully transmitted. In one embodiment of the present invention, rate compatible punctured convolutional (RCPC) codes are added. The advantage of using RCPC codes is that the high rate codes are embedded into the lower rate codes of the family and the same Viterbi decoder can be used for all codes of a family. Reed- Solomon code and Ramsay interleaver plus RCPC is used to protect the data from spatial LLLL temporal L subband. Cyclic redundancy check (CRC) codes are combined with RCPC for other less significant subbands to assure acceptable video quality even under bad channel conditions. In order to optimize the overall subjective video quality at a reasonable coding cost, a rate allocation scheme based on JND distortion will now be described with reference to FIG. 13.
In a step 1300, the average JND distortion of subband / (for example, /=0,...,10) is determined as follows:
Figure imgf000036_0001
where S, is the set of pixels of subband /, H, and W, are height and weight of it separately. D, is an indication of the robustness of S, to errors. The larger D, is, the more robust it is to errors, thus a higher coding rate is chosen. FIG. 15 shows the Dι values for a sample video sequence. In a step 1305, the subbands are arranged into groups based upon their calculated D, values. Any classification algorithm known to one of ordinary skill in the art can be used to determine the groups. With 1 1 subbands for example, the division ofS, could result in three categories {S0}, {S,, S,„ S3, S4, S5, S7}, {S8, S9}, {S10}as shown in FIG. 15. Finally, in a step 1310, the subbands are assigned different RCPC coding rates according to their levels of importance. The degree of importance is based upon the sensitivity to error associated with the subband in terms of the human visual model. Upon completion of step 1310 the low and high frequency bitstreams are ready for transmission.
Video Decoder and Error Concealment
The functionality of the modules in the video decoder are described with respect to FIG. 14. In a stepl400, decoder 400 processes the encoded bitstreams to produce slices of wavelet coefficients. Any well known decoding process equivalent to the encoding process, for example, arithmetic decoding, can be used.
In a step 1405 error detector 405 determines if any of the slices contain coding errors. Any well known error detecting method, for example CRC, can be used.
In a step 1410, the error concealment module 410 discards those slices where errors are detected. At times, no error is detected, although there are some errors in the received slice. However, Decoder 400 can be used to detect conflicts during decoding thereby identifying previously undetected errors. In this case, coefficients in the corrupted slice are retrieved from its DPCM reference if it belongs to the spatio-LLLL temporal-L subband. If the slice belongs to subbands of higher frequency, the coefficients are set to zero's since the error effect is trivial and will be confined within the slice. In the worst case where errors are not detected, they will not spread to the whole subband due to slicing.
In the MPEG system, decoding error is also confined within slices. But the corruption of data will destroy the whole slice thoroughly. In the wavelet- based system of the present invention, even if slices in one subband are corrupted, slices in other subbands will contribute to the same area of the frame. In a step 1415 inverse quantization is applied to the slices. In a final step 1420 the slices are put through a wavelet synthesizing process resulting in the presentation of viewable video frames in step 535.
Conclusion
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant art(s) that various changes in form and detail can be made therein without departing from the spirit and scope of the invention. Thus the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

What is claimed is:
1. A method for compressing video frames comprising the steps of:
(a) arranging two successively received video frames into a set; (b) decomposing said set into a plurality of high frequency and low frequency subbands;
(c) generating a human perceptual model for each of said plurality of subbands;
(d) encoding said low frequency subbands to produce encoded low frequency subbands;
(e) quantizing said high frequency subbands and said encoded low frequency subbands according to said generated human perceptual models to generate a bitstream for each of said high frequency subbands and a bitstream for each of said encoded low frequency subbands;
(f) redefining said generated bitstreams for each of said high frequency subbands to produce a plurality of redefined high frequency bitstreams; (g) redefining said generated bitstreams for each of said encoded low frequency subbands to produce a plurality of redefined low frequency bitstreams;
(h) channel coding said plurality of redefined high frequency bitstreams and said plurality of redefined low frequency bitstreams;
(i) repeating steps (a)-(h) for a next set until each of said received video frames have been compressed.
2. The method of claim 1 wherein a frame counter is advanced as each one of said plurality of video frames is received.
3. The method of claim 2 wherein before said quantizing step (e) a scene cut determination is made according to the following steps: (i) calculating a temporal-L energy level for said set and storing said result as LEold;
(ii) calculating a temporal-L energy level for a next set and storing said result as LEnew;
(iii) calculating a difference between said LEnew and LEold values and if said difference is above a determined threshold then generating a new human perceptual model; and
(iv) saving said LEneΛ value as LE0,d.
4. The method of claim 3 wherein before said quantizing step (e) a drastic motion determination is made according to the following steps:
(i) calculating a temporal-H energy level for said set and storing said result as HEold; (ii) calculating a temporal-H energy level for a next set and storing said result as HEnew;
(iii) calculating a difference between said HEnew and HEold values and if said difference is above a determined threshold then generating a new human perceptual model; and (iv) saving said HEnew value as HE0,d.
5. The method of claim 4 wherein a new human perceptual model is created each time said frame counter advances a determined number of times.
6. The method of claim 5 wherein said set is decomposed using 3-D wavelet analysis.
7. The method of claim 6 wherein said human perceptual model is generated based upon a just-noticeable distortion profile.
8. The method of claim 7 wherein said just-noticeable distortion profile based human perceptual model is used to generate a minimally-noticeable distortion profile.
9. The method of claim 8 wherein said low frequency subbands are encoded using differential pulse code modulation.
10. The method of claim 9 wherein said quantizing step (e) includes the following steps:
(i) selecting a global object distortion index;
(ii) calculating a wavelet coefficient value for each of said subbands;
(iii) unifying said calculated wavelet coefficient values;
(iv) partitioning each of said subands into non- overlapping blocks;
(v) for each block, minimizing a quantizer level such that a quantization error energy value is less than or equal to a determined energy value for said minimally-noticeable distortion profile of said block having a distortion index equal to said global object distortion index; (vi) generating a quantization table based on said minimizing results; and
(vii) copying an index range corresponding to said quantization table into each of said generated bitstreams.
1. The method of claim 9 wherein said quantizing step (e) includes the following steps:
(i) selecting a global object distortion index;
(ii) partitioning each of said subbands into non-overlapping blocks;
(iii) for each block, maximizing a quantizer level such that a quantization error energy value is less than or equal to a determined energy value for said minimally-noticeable distortion profile of said block having a distortion index equal to said global object distortion index;
(iv) copying said maximized quantizer levels into a header portion of said bitstream corresponding to each subband; and
(v) determining if a proportion of zero signals in said blocks exceed a determined threshold.
12. The method of claim 11, wherein for each of said generated bitstreams corresponding to said high frequency subbands, said plurality of redefined high frequency bitstreams are produced according to the following steps:
(i) dividing said generated bitstream according to said corresponding generated human perceptual model to produce a plurality of high frequency slices, (ii) entropy coding said plurality of high frequency slices, (iii) adding a header symbol indicating a starting point for each of said high frequency slices and a trailer symbol indicating an ending point for each of said high frequency slices to each of said plurality of high frequency slices. (iv) arranging said plurality of high frequency slices to produce a redefined high frequency bitstream for each subband;
13. The method of claim 12, wherein for each of said generated bitstreams corresponding to said encoded low frequency subbands, said plurality of redefined encoded low frequency bitstreams are produced according to the following steps:
(i) dividing said generated bitstream at randomly selected points to produce a plurality of low frequency slices, (ii) entropy coding said plurality low frequency slices, (iii) arranging said entropy coded low frequency slices to produce a redefined low frequency bitstream for each subband. (iv) determining a bit start position value for each of said entropy coded low frequency slices included in said redefined low frequency bitstream, (v) encrypting each of said bit start position values, and (vi) inserting said encrypted bit start position values into a header of said redefined low frequency bitstream;
14. The method of claim 13 wherein said plurality of low frequency bitstreams and said plurality of high frequency bitstreams are entropy coded using arithmetic coding.
15. The method of claim 14 wherein said channel encoding is performed according to the following steps: (i) calculating an average just-noticeable distortion value for each of said plurality of subbands;
(ii) associating each of said plurality of subbands having approximately the same average just-noticeable distortion value into a plurality of subband groups;
(iii) assigning rate compatible punctured convolutional codes to each of said encoded bitstreams for each subband within said subband groups according to a determined significance for each of said plurality of subbands.
16. A method for encrypting a data stream being transferred over a network, the method comprising the steps of:
(a) dividing the data stream into a plurality of slices;
(b) entropy coding said plurality of slices; (c) arranging said plurality of entropy coded slices into a combined bitstream; (d) determining a bit start position value for each of said plurality of entropy coded slices included in said combined bitstream; (e) encrypting each of said bit start position values;
(f) inserting said encrypted bit start position values into a header of said combined bitstream.
17. The method of claim 16 where said plurality of slices are encoded by arithmetic coding.
18. A method for decoding bitstreams, the method comprising:
(a) receiving human visual model based encoded bitstreams;
(b) decoding said bitstreams to produce a plurality of wavelet coefficient slices; (c) determining if said slices were encoded with an error.
(d) discarding said slices having an encoding error;
(e) performing inverse quantizing on said slices;
(f) performing wavelet synthesis to transform said slices into a viewable video frame.
19. A system for encoding and decoding video frames comprising: (a) a human visual model based encoder for receiving the video frames and generating a plurality of compressed video bitstreams;
(b) a channel encoder for further encoding of said generated bitstreams;
(c) a channel decoder for decoding said compressed bitstreams;
(d) a human visual model based decoder for further decoding said generated bitstreams to produce restored video frames.
20. The system of claim 19 wherein said human visual model based encoder comprises:
(a) a 3-D wavelet analyzer module for generating wavelet coefficient high and low frequency subbands;
(b) a frame counter for tracking a number of received video frames;
(c) a motion detector for detecting motion within said received video frames; (d) a just-noticeable spatial-temporal model generator for measuring the perceptual redundancy inherent in said received video frames;
(e) a just-noticeable subband profile generator for generating a just-noticeable profile based on said just-noticeable spatial-temporal model generated output;
(f) a perceptually tuned quantizer for quantizing said high and low frequency subbands according to a profile generated by said just-noticeable subband profile generator; (g) a differential pulse code modulator for encoding low frequency subbands generated by said 3-D wavelet analyzer module;
(h) a slicer for segmenting said high and low frequency subbands; (i) an encoder for arithmetic coding of said high and low frequency subbands.
21. The system of claim 19 wherein said human visual model based decoder comprises:
(a) a decoder for arithmetic decoding of said generated bitstreams and transforming said generated bitstreams into slices of wavelet coefficients;
(b) an error detector for locating errors in said slices;
(c) an error concealment module for discarding slices having errors;
(d) an inverse quantizer for reversing quantization of said slices;
(e) a 3-D wavelet systhesizer decoder for reconstitution of the slices into video frames.
PCT/US2000/014552 1999-05-27 2000-05-26 3d wavelet based video codec with human perceptual model WO2000074385A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU52942/00A AU5294200A (en) 1999-05-27 2000-05-26 3d wavelet based video codec with human perceptual model
US09/979,930 US7006568B1 (en) 1999-05-27 2000-05-26 3D wavelet based video codec with human perceptual model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13663399P 1999-05-27 1999-05-27
US60/136,633 1999-05-27

Publications (2)

Publication Number Publication Date
WO2000074385A2 true WO2000074385A2 (en) 2000-12-07
WO2000074385A3 WO2000074385A3 (en) 2001-05-31

Family

ID=22473683

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/014552 WO2000074385A2 (en) 1999-05-27 2000-05-26 3d wavelet based video codec with human perceptual model

Country Status (2)

Country Link
AU (1) AU5294200A (en)
WO (1) WO2000074385A2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002050772A1 (en) * 2000-12-21 2002-06-27 Unisearch Limited Method and apparatus for scalable compression of video
WO2003107683A1 (en) * 2002-06-12 2003-12-24 Unisearch Limited Method and apparatus for scalable compression of video
FR2862449A1 (en) * 2003-11-17 2005-05-20 Canon Kk Coding/decoding digital signals for image compression having signals divided/entropic code found then symbols number determined quantification steps then signals generated/coded
WO2012079237A1 (en) * 2010-12-16 2012-06-21 北京航空航天大学 Wavelet coefficient quantization method applying human -eye visual model in image compression
US10250905B2 (en) 2008-08-25 2019-04-02 Microsoft Technology Licensing, Llc Conversion operations in scalable video encoding and decoding
CN110955730A (en) * 2019-12-03 2020-04-03 湖南国奥电力设备有限公司 Underground cable three-dimensional modeling method and device based on block chain
WO2022238183A1 (en) * 2021-05-11 2022-11-17 Interdigital Ce Patent Holdings, Sas Pixel modification to reduce energy consumption of a display device
WO2024054467A1 (en) * 2022-09-07 2024-03-14 Op Solutions, Llc Image and video coding with adaptive quantization for machine-based applications

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7956930B2 (en) 2006-01-06 2011-06-07 Microsoft Corporation Resampling and picture resizing operations for multi-resolution video coding and decoding
US8953673B2 (en) 2008-02-29 2015-02-10 Microsoft Corporation Scalable video coding and decoding with sample bit depth and chroma high-pass residual layers
US8711948B2 (en) 2008-03-21 2014-04-29 Microsoft Corporation Motion-compensated prediction of inter-layer residuals

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0614308A1 (en) * 1993-03-05 1994-09-07 Eastman Kodak Company Method and apparatus for controlling access to selected image components in an image storage and retrieval system
GB2306833A (en) * 1995-10-30 1997-05-07 Sony Uk Ltd Video data compression
EP0858222A1 (en) * 1994-04-22 1998-08-12 Thomson Consumer Electronics, Inc. Apparatus and method for processing packets of program component data
EP0892557A1 (en) * 1997-07-18 1999-01-20 Texas Instruments Inc. Image compression
JPH1118070A (en) * 1997-06-26 1999-01-22 Matsushita Electric Ind Co Ltd Image compressing device, image extending device and transmission equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0614308A1 (en) * 1993-03-05 1994-09-07 Eastman Kodak Company Method and apparatus for controlling access to selected image components in an image storage and retrieval system
EP0858222A1 (en) * 1994-04-22 1998-08-12 Thomson Consumer Electronics, Inc. Apparatus and method for processing packets of program component data
GB2306833A (en) * 1995-10-30 1997-05-07 Sony Uk Ltd Video data compression
JPH1118070A (en) * 1997-06-26 1999-01-22 Matsushita Electric Ind Co Ltd Image compressing device, image extending device and transmission equipment
EP0892557A1 (en) * 1997-07-18 1999-01-20 Texas Instruments Inc. Image compression

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
NGAN K N ET AL: "VERY LOW BIT RATE VIDEO CODING USING 3D SUBBAND APPROACH" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,US,IEEE INC. NEW YORK, vol. 4, no. 3, 1 June 1994 (1994-06-01), pages 309-316, XP000460762 ISSN: 1051-8215 *
OHM J R: "THREE-DIMENSIONAL SBC-VQ WITH MOTION COMPENSATION" PROCEEDINGS OF THE PICTURE CODING SYMPOSIUM (PCS),CH,LAUSANNE, SFIT, vol. -, 17 March 1993 (1993-03-17), pages 115-A-115-B, XP000346436 *
PATENT ABSTRACTS OF JAPAN vol. 1999, no. 04, 30 April 1999 (1999-04-30) & JP 11 018070 A (MATSUSHITA ELECTRIC IND CO LTD), 22 January 1999 (1999-01-22) *
RUF M J ET AL: "OPERATIONAL RATE-DISTORTION PERFORMANCE FOR JOINT SOURCE AND CHANNEL CODING OF IMAGES" IEEE TRANSACTIONS ON IMAGE PROCESSING,US,IEEE INC. NEW YORK, vol. 8, no. 3, March 1999 (1999-03), pages 305-320, XP000832622 ISSN: 1057-7149 *
SAFRANEK R J ET AL: "PERCEPTUAL CODING OF IMAGE SIGNALS" PROCEEDINGS OF THE ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS,US,NEW YORK, IEEE, vol. CONF. 24, 5 November 1990 (1990-11-05), pages 346-350, XP000280043 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002050772A1 (en) * 2000-12-21 2002-06-27 Unisearch Limited Method and apparatus for scalable compression of video
AU2002215709B2 (en) * 2000-12-21 2008-05-22 Newsouth Innovations Pty Limited Method and apparatus for scalable compression of video
US7889791B2 (en) 2000-12-21 2011-02-15 David Taubman Method and apparatus for scalable compression of video
WO2003107683A1 (en) * 2002-06-12 2003-12-24 Unisearch Limited Method and apparatus for scalable compression of video
FR2862449A1 (en) * 2003-11-17 2005-05-20 Canon Kk Coding/decoding digital signals for image compression having signals divided/entropic code found then symbols number determined quantification steps then signals generated/coded
US10250905B2 (en) 2008-08-25 2019-04-02 Microsoft Technology Licensing, Llc Conversion operations in scalable video encoding and decoding
CN102934428A (en) * 2010-12-16 2013-02-13 北京航空航天大学 Wavelet coefficient quantization method applying human -eye visual model in image compression
CN102934428B (en) * 2010-12-16 2015-08-05 北京航空航天大学 The wavelet coefficient quantization method of human vision model is utilized in a kind of image compression
EP2654294A4 (en) * 2010-12-16 2015-11-11 Univ Beihang Wavelet coefficient quantization method applying human -eye visual model in image compression
WO2012079237A1 (en) * 2010-12-16 2012-06-21 北京航空航天大学 Wavelet coefficient quantization method applying human -eye visual model in image compression
CN110955730A (en) * 2019-12-03 2020-04-03 湖南国奥电力设备有限公司 Underground cable three-dimensional modeling method and device based on block chain
CN110955730B (en) * 2019-12-03 2023-10-13 湖南国奥电力设备有限公司 Underground cable three-dimensional modeling method and device based on block chain
WO2022238183A1 (en) * 2021-05-11 2022-11-17 Interdigital Ce Patent Holdings, Sas Pixel modification to reduce energy consumption of a display device
WO2024054467A1 (en) * 2022-09-07 2024-03-14 Op Solutions, Llc Image and video coding with adaptive quantization for machine-based applications

Also Published As

Publication number Publication date
WO2000074385A3 (en) 2001-05-31
AU5294200A (en) 2000-12-18

Similar Documents

Publication Publication Date Title
US7006568B1 (en) 3D wavelet based video codec with human perceptual model
US7382925B2 (en) Compression and decompression system with reversible wavelets and lossy reconstruction
Chou et al. A perceptually optimized 3-D subband codec for video communication over wireless channels
US6272253B1 (en) Content-based video compression
Girod et al. Scalable video coding with multiscale motion compensation and unequal error protection
JP3743384B2 (en) Image encoding apparatus and method, and image decoding apparatus and method
Deng et al. Robust image compression based on compressive sensing
Norcen et al. Selective encryption of the JPEG2000 bitstream
US20050193311A1 (en) Content-based video compression
US20070092146A1 (en) System and method for transform coding randomization
Ran et al. A perceptually motivated three-component image model-part II: applications to image compression
EP1578134A1 (en) Methods and systems for encoding/decoding signals, and computer program product therefor
Zhu et al. Error-resilient and error concealment 3-D SPIHT for multiple description video coding with added redundancy
WO2000074385A2 (en) 3d wavelet based video codec with human perceptual model
WO2007093923A1 (en) Image processing apparatus using partial encryption
US6445823B1 (en) Image compression
US9264736B2 (en) Encoding method, decoding method, encoding device, and decoding device
Wegmann et al. Visual-system-based polar quantization of local amplitude and local phase of orientation filter outputs
Gu 3 d wavelet based video codec with human perceptual model
Ramos et al. Perceptually based scalable image coding for packet networks
Creusere Family of image compression algorithms which are robust to transmission errors
EP1280359A2 (en) Image and video coding arrangement and method
Apostolopoulos et al. Video compression for digital advanced television systems
Osborne et al. Overview of wavelets for image processing for wireless applications
Bajit et al. A Perceptually Optimized Foveation Based Wavelet Embedded ZeroTree Image Coding

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 09979930

Country of ref document: US

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP