WO2005020584A1 - Method and system for pre-processing of video sequences to achieve better compression - Google Patents

Method and system for pre-processing of video sequences to achieve better compression Download PDF

Info

Publication number
WO2005020584A1
WO2005020584A1 PCT/US2004/017415 US2004017415W WO2005020584A1 WO 2005020584 A1 WO2005020584 A1 WO 2005020584A1 US 2004017415 W US2004017415 W US 2004017415W WO 2005020584 A1 WO2005020584 A1 WO 2005020584A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
video sequence
frame
filtering
pixel location
Prior art date
Application number
PCT/US2004/017415
Other languages
French (fr)
Inventor
Adriana Dumitras
James Oliver Normile
Ryan R. Salsbury
Original Assignee
Apple Computer, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/640,734 external-priority patent/US7403568B2/en
Priority claimed from US10/640,944 external-priority patent/US7430335B2/en
Application filed by Apple Computer, Inc. filed Critical Apple Computer, Inc.
Publication of WO2005020584A1 publication Critical patent/WO2005020584A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/162Detection; Localisation; Normalisation using pixel segmentation or colour matching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/804Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components
    • H04N9/8042Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components involving data reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region
    • H04N5/21Circuitry for suppressing or minimising disturbance, e.g. moiré or halo

Definitions

  • the invention addresses pre-processing for data reduction of video sequences and bit rate reduction of compressed video sequences.
  • Video is currently being transitioned from an analog medium to a digital medium.
  • the old analog NTSC television broadcasting standard is slowly being replaced by the digital ATSC television broadcasting standard.
  • analog video cassette tapes are increasingly being replaced by digital versatile discs (DVDs).
  • DVDs digital versatile discs
  • An ideal digital video encoding system will provide a very high picture quality with the minimum number of bits.
  • the pre-processing of video sequences can be an important part of digital video encoding systems.
  • a good video pre-processing system can achieve a bit rate reduction in the final compressed digital video streams.
  • the visual quality of the decoded sequences is often higher when a good pre-processing system has been applied as compared to that obtained without pre-processing.
  • Embodiments of the present invention provide methods for pre-processing of video sequences prior to compression to provide data reduction of the video sequence.
  • the bit rate of the pre-processed and compressed video sequence will be lower than the bit rate of the video sequence after compression but without pre-processing.
  • Some embodiments of the present invention provide a temporal filtering method for preprocessing of video frames of a video sequence.
  • pixel values such as luminance and chrominance values
  • the high and low threshold values are determined adaptively depending on the illumination level of a video frame to reduce noise in the video sequence.
  • filtering strength is increased (i.e., a larger number of pixels are filtered) to reduce the greater amount of noise found in video frames having lower illumination levels.
  • filtering strength is decreased (i.e., a smaller number of pixels are filtered) to reduce the lesser amount of noise found in video frames having higher illumination levels.
  • the method provides adaptive threshold values to provide variability of filtering strength depending on the illumination levels of a video frame.
  • Some embodiments use conventional spatial anisotropic diffusion filters such as a Perona-Malik anisotropic diffusion filter or a Fallah-Ford diffusion filter which are not traditionally applied for bit rate reduction.
  • Other embodiments use an omni-directional spatial filtering method that extends the traditional Perona-Malik diffusion filter (that normally performs diffusion only in horizontal or vertical directions) so that diffusion is also performed in at least one diagonal direction.
  • the omni-directional filtering method provides diffusion filtering in eight directions (north, south, east, west, north-east, south-east, south-west, and northwest).
  • the present invention also includes a foregroundbackground differentiation pre- processing method that performs filtering differently on a foreground region of a video frame in a video sequence than on a background region of the video frame.
  • the method includes identifying pixel locations in the video frame having pixel values that match characteristics of human skin.
  • a bounding shape is then determined for each contiguous grouping of matching pixel locations (i.e., regions-of-interest), the bounding shape enclosing all or a portion of the contiguous grouping of matching pixel locations.
  • the totality of all pixel locations of the video frame contained in a bounding shape is referred to as a foreground region. Any pixel locations in the video frame not contained within the foreground region comprises a background region.
  • the method then filters pixel locations in the foreground region differently than pixel locations in the background region. Performing different types of filtering on different regions of the video frame allows greater data reduction in unimportant regions of the video frame while preserving sharp edges in regions-of-interest.
  • the present invention provides automatic detection of regions-of-interest (e.g., a person's face) and implements bounding shapes instead of exact segmentation of a region-of-interest.
  • the spatial filtering methods of the present invention may be used independently or in conjunction with temporal filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence.
  • the foreground/background differentiation methods of the present invention may be used independently or in conjunction with the temporal filtering methods and/or the spatial filtering methods of the present invention to pre-process a video sequence.
  • the temporal filtering method of the present invention may be used independently or in conjunction with the spatial filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence.
  • Figure 1 illustrates a coding system with pre-processing and post-processing components.
  • Figure 2 illustrates a pre-processing component with separate temporal pre-filtering and spatial pre-filtering components.
  • Figure 3 illustrates a flowchart for a temporal pre-filtering method in accordance with the present invention.
  • Figure 4a illustrates a graph of an exemplary high luminance threshold function that determines a high luminance threshold value.
  • Figure 4b illustrates a graph of an exemplary low luminance threshold function that determines a low luminance threshold value.
  • Figure 5 illustrates a flowchart depicting a method for pre-processing a video sequence using Fallah-Ford spatial anisotropic diffusion filtering for data reduction.
  • Figure 6 illustrates a flowchart depicting a method for pre-processing a video sequence using Perona-Malik spatial anisotropic diffusion filtering for data reduction.
  • Figure 7 illustrates a conceptual diagram of a diffusion pattern of a conventional Perona- Malik anisotropic diffusion filter.
  • Figure 8 illustrates a conceptual diagram of a diffusion pattern of an omni-directional anisotropic diffusion filter in accordance with the present invention.
  • Figure 9 illustrates a flowchart depicting a method for pre-processing a video sequence using omni-directional spatial anisotropic diffusion filtering for data reduction.
  • Figure 10 illustrates a flowchart depicting a foreground/background differentiation method in accordance with the present invention.
  • Figure 11a illustrates an example of a video frame having two regions-of-interest.
  • Figure lib illustrates an example of a video frame having two regions-of-interest, each region-of-interest being enclosed by a bounding shape.
  • Figure lie illustrates a video frame after a foreground binary mask M/ g has been applied.
  • Figure lid illustrates a video frame after a background binary mask M bg has been applied.
  • Figure 12 is a flowchart of a method for using omni-directional spatial filtering in conjunction with the foreground/background differentiation method of Figure 10.
  • Video Pre-processing As set forth in the background, a good video pre-processing system can achieve a bit rate reduction in the final compressed digital video streams. Furthermore, a good video preprocessing system may also improve the visual quality of the decoded sequences. Typically, a video pre-processing system may employ filtering, down-sampling, brightness/contrast correction, and/or other image processing techniques. The pre-processing step of filtering is refe ⁇ ed to as pre-filtering. Pre-filtering can be accomplished using temporal, spatial, or spatial- temporal filters, all of which achieve partial noise reduction and/or frame rate reduction. Temporal filtering is a pre-processing step used for smoothing motion fields, frame rate reduction, and tracking and noise reduction between sequential frames of a video sequence.
  • Temporal filtering operations in one dimension are applied to two or more frames to make use of the temporal redundancy in a video sequence.
  • the main difficulty in designing and applying temporal filters stems from temporal effects, such as motion jaggedness, ghosting, etc., that are sometimes caused temporal pre-filtering. Such artifacts are particularly visible and difficult to tolerate by the viewers. These artifacts are partly due to the fact that conventional temporal filters are not adaptive to the content or illumination levels of frames in a video sequence.
  • Spatial filtering is a pre-processing step used for anti-aliasing and smoothing (by removing details of a video frame that are unimportant for the perceived visual quality) and segmentation.
  • Spatial filter design aims at achieving a tradeoff between noise/detail reduction within the frame and the amount of blurring/smoothing that is being introduced.
  • a balance between the bit rate reduction as a result of pre- filtering and the subjective quality of the filtered sequences is difficult to achieve.
  • noticeable distortion is often introduced in the filtered video sequences (and consequently in the decoded sequences that have been pre-filtered before encoding).
  • the distortions may take the form of excessive smoothing of flat areas, blurring of edges (for spatial filters), ghosting, and/or other temporal effects (for temporal filters).
  • Such artifacts are particularly disturbing when they affect regions-of-interest (ROIs) such as a person's face in videoconferencing applications.
  • ROIs regions-of-interest
  • Video Pre-processing in the Present Invention provide methods for pre-processing of video sequences prior to compression to provide data reduction of the video sequence.
  • the bit rate of the pre-processed and compressed video sequence may be lower than the bit rate of the video sequence after compression but without pre-processing.
  • Some embodiments of the present invention provide a temporal filtering method for preprocessing of video frames of a video sequence.
  • pixel values such as luminance and chrominance values
  • the high and low threshold values are determined adaptively depending on the illumination level of a video frame to provide variability of filtering strength depending on the illumination levels of a video frame.
  • the method provides for data reduction of the video sequence and bit rate reduction of the compressed video sequence.
  • Some embodiments of the present invention provide a spatial filtering method for pre- processing a video sequence using spatial anisotropic diffusion filtering.
  • Some embodiments use conventional spatial anisotropic diffusion filters such as a Perona-Malik anisotropic diffusion filter or a Fallah-Ford diffusion filter.
  • Other embodiments use an omni-directional spatial filtering method that extends the traditional Perona-Malik diffusion filter (that performs diffusion in four horizontal or vertical directions) so that diffusion is also performed in at least one diagonal direction.
  • the omni-directional filtering method provides diffusion filtering in eight directions (north, south, east, west, north-east, south-east, south-west, and north-west).
  • the present invention also includes a foreground/background differentiation preprocessing method that performs filtering differently on a foreground region of a video frame in a video sequence than on a background region of the video frame.
  • the method includes identifying pixel locations in the video frame having pixel values that match characteristics of human skin.
  • the method includes identifying pixel locations in the video frame having pixel values that match other characteristics, such as a predetermined color or brightness.
  • a bounding shape is then determined for each contiguous grouping of matching pixel locations (i.e., regions-of-interest), the bounding shape enclosing all or a portion of the contiguous grouping of matching pixel locations.
  • the totality of all pixel locations of the video frame contained in a bounding shape is referred to as a foreground region. Any pixel locations in the video frame not contained within the foreground region comprises a background region.
  • the method filters pixel locations in the foreground region differently than pixel locations in the background region.
  • the method provides automatic detection of regions-of-interest (e.g., a person's face) and implements bounding shapes instead of exact segmentation of a region-of- interest. This allows for a simple and fast filtering method that is viable in real-time applications (such as videoconferencing) and bit rate reduction of the compressed video sequence.
  • the temporal filtering method of the present invention may be used independently or in conjunction with the spatial filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence.
  • the spatial filtering methods of the present invention may be used independently or in conjunction with the temporal filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence.
  • the foreground/background differentiation methods of the present invention may be used independently or in conjunction with the temporal filtering methods and/or the spatial filtering methods of the present invention to pre-process a video sequence.
  • Some embodiments described below relate to video frames in YUV format.
  • One of ordinary skill in the art, however, will realize that these embodiments may also relate to a variety of formats other than YUV.
  • other video frame formats such as RGB
  • RGB can easily be transformed into the YUV format.
  • some embodiments are described with reference to a videoconferencing application.
  • One of ordinary skill in the art, however, will realize that the teachings of the present invention may also relate to other video encoding applications (e.g., DVD, digital storage media, television broadcasting, internet streaming, communication, etc.) in real-time or post-time.
  • Embodiments of the present invention may also be used with video sequences having different coding standards such as H.263 and H.264 (also known as MPEG-
  • embodiments of the present invention provide methods for preprocessing of video sequences prior to compression to provide data reduction.
  • data reduction of a video sequence refers to a reduced amount of details and/or noise in a pre- processed video sequence before compression in comparison to the same video sequence before compression but without pre-processing.
  • data reduction of a video sequence refers to a comparison of the details and/or noise in a pre-processed and uncompressed video sequence, and an uncompressed-only video sequence, and does not refer to the reduction in frame size or frame rate.
  • embodiments of the present invention provide that after compression of the pre-processed video sequence, the bit rate of the pre-processed and compressed video sequence will be lower than the bit rate of compressed video sequence made without any pre-processing.
  • reduction or lowering of the bit rate of a compressed video sequence refers to a reduced or lowered bit rate of a pre-processed video sequence after compression in comparison to the same video sequence after compression but without pre-processing.
  • reduction or lowering of the bit rate of a compressed video sequence refers to a comparison of the bit rates of a pre-processed and compressed video sequence and a compressed-only video sequence and does not refer to the reduction or lowering of the bit rate of a video sequence caused by compression (i.e., encoding).
  • the various embodiments described below provide a method for pre-processing/pre- filtering of video sequences for data reduction of the video sequences and bit rate reduction of the compressed video sequences.
  • Embodiments relating to temporal pre-filtering are described in Section I.
  • Embodiments relating to spatial pre-filtering are described in Section II.
  • Embodiments relating to filtering foreground and background regions of a video frame differently are described in Section III.
  • FIG. 1 illustrates a coding system 100 with pre-processing and post-processing components.
  • a typical coding system includes an encoder component 110 preceded by a preprocessing component 105 and a decoder component 115 followed by a post-processing component 120.
  • Pre-filtering of a video sequence is performed by the pre-processing component 105, although in other embodiments, the pre-filtering is performed by the encoding component 110.
  • an original video sequence is received by the pre-processing component 105, the original video sequence being comprised of multiple video frames and having an associated original data amount.
  • the pre-processing component 105 pre-filters the original video sequence to remove noise and details and produces a pre- processed (i.e., pre-filtered) video sequence having an associated pre-processed data amount that is less than the original data amount associated with the original video sequence.
  • the data amount of a video sequence reflects an amount of data used to represent the video sequence.
  • the encoding component 110 then receives the pre-processed video sequence and encodes (i.e., compresses) the pre-processed video sequence to produce a pre-processed and compressed video sequence.
  • Pre-filtering methods performed by the pre-processing component 105 allows removal of noise and details from the original video sequence thus allowing for greater compression of the pre-processed video sequence by the encoding component 110.
  • the bit rate of the pre-processed and compressed video sequence is lower than the bit rate that would be obtained by compressing the original video sequence (without pre-preprocessing) with an identical compression method using the encoding component 110.
  • the bit rate of a video sequence reflects an amount of binary coded data required to represent the video sequence over a given period of time and is typically measured in kilobits per second.
  • the compressed video sequence is received by the decoder component 115 which processes the compressed video sequence to produce a decoded video sequence. In some systems, the decoded video sequence may be further post-processed by the post processing component 120.
  • Figure 2 illustrates a block diagram of video pre-processing component 105 with separate temporal pre-filtering and spatial pre-filtering components 205 and 210, respectively.
  • the video pre-processing component 105 receives an original video sequence comprised of multiple video frames and produces a pre-processed video sequence.
  • the temporal pre-filtering component 205 performs pre-processing operations on the received video sequence and sends the video sequence to the spatial pre-filtering component 210 for further preprocessing.
  • the spatial pre-filtering component 210 performs preprocessing operations on the received video sequence and sends the video sequence to the temporal pre-filtering component 205 for further pre-processing.
  • pre- processing is performed only by the temporal pre-filtering component 205 or only by the spatial pre-filtering component 210.
  • the temporal pre-filtering component 205 and the spatial pre-filtering component 210 are configured to perform particular functions through instructions of a computer program product having a computer readable medium. Data reduction of the video frames of the original video sequence is achieved by the temporal pre-filtering component 205 and/or the spatial pre-filtering component 210.
  • the temporal pre-filtering component 205 performs temporal pre-filtering methods of the present invention (as described in Section I) while the spatial pre-filtering component 210 performs spatial pre-filtering methods of the present invention (as described in Sections II and III).
  • the spatial pre-filtering component 210 may use spatial anisotropic diffusion filtering for data reduction in a video sequence.
  • Temporal Pre-Filtering Figure 3 illustrates a flowchart for a temporal pre-filtering method 300 in accordance with the present invention.
  • the method 300 may be performed, for example, by the temporal pre- filtering component 205 or the encoder component 110.
  • the temporal pre-filtering method 300 commences by receiving an original video sequence in YUV format (at 305).
  • the original video sequence comprises a plurality of video frames and having an associated data amount.
  • a video sequence in another format is received.
  • the method sets (at 310) a first video frame in the video sequence as a current frame (i.e., frame f) and a second video frame in the video sequence as a next frame (i.e., frame f + 1).
  • the current frame is comprised of a current luminance (Y) frame and current chrominance (U and V) frames.
  • the next frame is comprised of a next luminance (Y) frame and next chrominance (U and V) frames.
  • the current and next frames are each comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values (such as luminance and chrominance values from the luminance and chrominance frames, respectively).
  • Pixels and pixel locations are identified by discrete row (e.g., i) and column (e.g., j) indices (i.e., coordinates) such that 1 ⁇ i ⁇ M and 1 ⁇ j ⁇ N where M x N is the size of the current and next frame in pixel units.
  • the method determines (at 315) the mean of the luminance values in the current luminance frame. Using the mean luminance (abbreviated as mean (Y) or mu), the method
  • the method then sets (at 325) row (i) and column (j) values for initial current pixel location coordinates. For example, the initial current pixel location coordinates may be set to equal (0, 0).
  • the method 300 then computes (at 330) a difference between a luminance value at the current pixel location coordinates in the next luminance frame and a luminance value at the current pixel location coordinates in the current luminance frame.
  • the method 300 determines (at 335) if the luminance difference (difY, j ) at the current pixel location coordinates is within the high and low luminance threshold values
  • step 345 ( ⁇ / ! ⁇ ar ⁇ c L ma > respectively). If not, the method proceeds directly to step 345. If, however, the
  • the method determines (at 335 - Yes) that the luminance difference (difY ) is within the high and low luminance threshold values, the luminance values at the current pixel location coordinates in the current and next luminance frames are filtered (at 340).
  • the luminance value at the current pixel location coordinates in the next luminance frame is set to equal the average of the luminance values at the current pixel location coordinates in the current luminance frame and the next luminance frame.
  • other filtering methods are used.
  • the method 300 then computes (at 345) differences in chrominance values of the next chrominance (U and V) frames and current chrominance (U and V) frames at the current pixel location coordinates.
  • the method 300 determines (at 350) if the U chrominance difference (difU, j ) at the current pixel location coordinates is within the high and low U chrominance threshold values
  • step 360 ( ⁇ " hroma an ⁇ ⁇ c L hroma , respectively). If not, the method proceeds directly to step 360. If, however, the
  • the method determines (at 350 - Yes) that the U chrominance difference (difU, j ) is within the high and low U chrominance threshold values, then the U chrominance values at the current pixel location coordinates in the current and next U chrominance frames are filtered (at 355).
  • the value at the current pixel location coordinates in the next U chrominance frame is set (at 355) to equal the average of the values at the current pixel location coordinates in the current U chrominance frame and the next U chrominance frame.
  • the method 300 determines (at 360) if the V chrominance difference (difV, d ) at the current pixel location coordinates is within the high and low V chrominance threshold values ⁇ chroma and ⁇ chroma > respectively). If not, the method proceeds directly to step 370. If, however, the V chrominance difference (difV, d ) at the current pixel location coordinates is within the high and low V chrominance threshold values ⁇ chroma and ⁇ chroma > respectively). If not, the method proceeds directly to step 370. If, however, the
  • the method determines (at 360 - Yes) that the V chrominance difference (difV. j ) is within the high and low V chrominance threshold values, then the V chrominance values at the current pixel location coordinates in the current and next V chrominance frames are filtered (at 365).
  • the method 300 determines (at 370) if the current pixel location coordinates are last pixel location coordinates of the current frame. For example, the method may determine whether the current row (i) coordinate is equal to M and the current column (j) coordinate is equal to N where M x N is the size of the current frame in pixel units. If not, the method sets (at 375) next pixel location coordinates in the current frame as the current pixel location coordinates. The method then continues at step 330.
  • the method 300 determines (at 380) if the next frame is a last frame of the video sequence (received at 305). If not, the method sets (at 385) the next frame as the current frame (i.e., frame f) and a frame in the video sequence subsequent to the next frame as the next frame (i.e., frame f + 1). For example, if the current frame is a first frame and the next frame is a second frame of the video sequence, the second frame is set (at 385) as the current frame and a third frame of the video sequence is set as the next frame. The method then continues at step 315.
  • the method 300 determines (at 380 - Yes) that the next frame is the last frame of the video sequence, the method outputs (at 390) a pre-filtered video sequence being comprised of multiple pre-filtered video frames and having an associated data amount that is less than the data amount associated with the original video sequence (received at 305).
  • the pre-filtered video sequence may be received, for example, by the spatial pre-filtering component 210 for further pre-processing or the encoder component 110 for encoding (i.e., compression). After compression by the encoder component 110, the bit rate of the pre-filtered and compressed video sequence is lower than the bit rate that would be obtained by compressing the original video sequence (without pre-filtering) using the same compression method.
  • Figure 4a illustrates a graph of an exemplary high luminance threshold function 405 that
  • high luminance threshold function 405 is a piecewise linear function of the mean luminance (mean (Y)) of a video frame, the mean luminance being equal to mu, as expressed by the following equations:
  • Figure 4b illustrates a graph of an exemplary low luminance threshold function 415 that
  • luminance threshold function 415 is a piecewise linear function of the high luminance threshold value as expressed by the following equations:
  • Hi, Li, u 2 , u 3 , H 2 , and H 3 are predetermined values.
  • the value of Hi determines the saturation level of the high luminance threshold function 405 and the value of Li determines the saturation level of the low luminance threshold function 415.
  • the values u 2 and u 3 determine cutoff points for the linear variation of the high luminance threshold function 405 and the values H 2 , and H 3 determine cutoff points for the linear variation of the low luminance threshold function 415.
  • Correct specification of values u 2 , u 3 , H 2 , and H 3 are required to prevent temporal artifacts such as ghosting or trailing to appear in a temporal-filtered video sequence.
  • the high chrominance threshold value ( ⁇ " hromtl ) is based on the high
  • luminance threshold value ( ⁇ " uma ) and the low chrominance threshold value ( ⁇ roma ) is based on
  • the low luminance threshold value the values for the
  • the high luminance threshold ( ⁇ " ma ) is a function of the mean
  • the low luminance threshold ( ⁇ uma ) is a function of the high luminance threshold ( ⁇ " ma )
  • the high chrominance threshold ( ⁇ " hroma ) is based on the high
  • luminance threshold ( ⁇ " ma ) luminance threshold ( ⁇ " ma )
  • low chrominance threshold ( ⁇ c L hroma ) is based on the low
  • the high and low luminance and chrominance threshold are high and low luminance and chrominance threshold.
  • values are based on the mean luminance of a video frame and thus provide variability of filtering strength depending on the illumination levels of the frame to provide noise and data reduction.
  • Section II Spatial Pre-Filtering
  • Some embodiments of the present invention provide a method for pre-processing a video sequence using spatial anisotropic diffusion filtering to provide data reduction of the video sequence.
  • the bit rate of the pre-processed and compressed video sequence will be lower than the bit rate of the video sequence after compression but without pre-processing.
  • Some embodiments use conventional spatial anisotropic diffusion filters such as a Fallah- Ford diffusion filter (as described with reference to Figure 5) or a Perona-Malik anisotropic diffusion filter (as described with reference to Figure 6).
  • Other embodiments use an omnidirectional spatial filtering method that extends the traditional Perona-Malik diffusion filter to perform diffusion in at least one diagonal direction (as described with reference to Figure 8 and
  • the mean curvature diffusion (MCD) Fallah-Ford spatial anisotropic diffusion filter is used.
  • the MCD Fallah-Ford filter makes use of a surface diffusion model as opposed to a plane diffusion model employed by the Perona-Malik anisotropic diffusion filter discussed below.
  • an image is a function of two spatial location coordinates (x, y) and a third (gray level) z coordinate.
  • the MCD diffusion is modeled by the MCD diffusion equation: where the function h is given by the equation:
  • FIG. 5 is a flowchart showing a method 500 for pre-processing a video sequence using Fallah-Ford spatial anisotropic diffusion filtering to reduce the data amount of the video sequence and to reduce the bit rate of the compressed video sequence.
  • the method 500 may be performed, for example, by the spatial pre-filtering component 210 or the encoder component 110.
  • the method 500 starts when an original video sequence comprised of multiple video frames is received (at 505), the original video sequence having an associated data amount.
  • the method sets (at 510) a first video frame in the video sequence as a current frame.
  • the current frame is comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values (such as luminance and chrominance values).
  • the Y luminance values (gray level values) of the current frame are filtered.
  • the U chrominance values or the V chrominance values of the current frame are filtered. Pixels and pixel locations are identified by discrete row (e.g., x) and column (e.g., y) coordinates such that 1 ⁇ x ⁇ M and 1 ⁇ y ⁇ N where M x N is the size of the current frame in pixel units.
  • the method then sets (at 515) row (x) and column (y) values for an initial current pixel location.
  • the method also sets (at 520) the number of iterations (no_iterations), i.e., time steps (t), to be performed for each pixel location (x, y). The number of iterations can be determined depending on the amount of details to be removed.
  • the method estimates (at 525) components and a magnitude of the image
  • the Sobel edge detector is used since
  • the Sobel edge detector makes use of a difference of averages operator and has a good response to diagonal edges. However, other edge detectors may be used.
  • the method then computes (at 530) a change in surface area AA using the following equation:
  • ⁇ (t + 1) f f ⁇ Vh(x, y, t + 1)
  • the method computes (at 535) diffusion coefficient c(x, y, t) as the inverse of the surface gradient magnitude using the equation:
  • TM ⁇ 8 l?A(s- ⁇ .y - j
  • the method then convolves (at 545) the 3x3 filter kernel with an image neighborhood of the pixel at the current pixel location (x, y).
  • the method decrements (at 550) no iterations by one and determines (at 555) if no_iterations is equal to 0. If not, the method continues at step 525. If so, the method determines (at 560) if the current pixel location is a last pixel location of the current frame. If not, the method sets (at 565) a next pixel location in the current frame as the current pixel location. The method then continues at step 520.
  • the method 500 determines (at 560 - Yes) that the current pixel location is the last pixel location of the current frame, the method then determines (at 570) if the current frame is a last frame of the video sequence (received at 505). If not, the method sets (at 575) a next frame in the video sequence as the current frame. The method then continues at step 515. If the method determines (at 570 - Yes) that the current frame is the last frame of the video sequence, the method outputs (at 580) a pre-filtered video sequence being comprised of multiple pre-filtered video frames and having an associated data amount that is less than the data amount associated with the original video sequence (received at 505).
  • the pre-filtered video sequence may be received, for example, by the temporal pre- filtering component 205 for further pre-processing or the encoder component 110 for encoding (i.e., compression).
  • the bit rate of the pre- filtered and compressed video sequence is lower than the bit rate that would be obtained by compressing the original video sequence (without pre-filtering) using the same compression method.
  • a traditional Perona-Malik anisotropic diffusion filtering method is used for pre-processing a video frame to reduce the data amount of the video sequence and to reduce the bit rate of the compressed video sequence.
  • g( ) is an edge stopping function that satisfies the condition g(Vj) ⁇ 0 when VT ⁇ oo such that the diffusion operation is stopped across the edges of the video frame.
  • Xf ⁇ y, t 4- 1) KJB, y, t) + ⁇ [cyf ⁇ y, i) V ⁇ N (x y, i) + cs(x, y, t) VI s (x, y, t) + Cjjf ⁇ V, t) YXB(X, V, t) + w(x, V, t) ZHrf ⁇ v* ⁇ )]
  • • subscripts (N, S, E, W) correspond to four horizontal or vertical directions of diffusion (north, south, east, and west) with respect to a pixel location (x, y); and scalar constant ⁇ is less than or equal to where ⁇ (x, y) ⁇ is the number of neighboring pixels which is equal to four (except at the video frame boundaries where it is less than four) so that ⁇ — . 4 Notations c#, cs, C E , and c are diffusion coefficient
  • VJ p (iE, y, t) l p ⁇ x> y, t) - l(x, y, i), p € rfe, y)
  • the gradient can be computed as the difference
  • VZ , y) X(x, y + l, t) - Z(x, y, t)
  • Figure 6 is a flowchart showing a method 600 for pre-processing a video sequence using Perona-Malik spatial anisotropic diffusion filtering to reduce the data amount of the video sequence and to reduce the bit rate of the compressed video sequence.
  • the method 600 may be performed, for example, by the spatial pre-filtering component 210 or the encoder component 110.
  • the method 600 starts when an original video sequence comprised of multiple video frames is received (at 605), the original video sequence having an associated data amount.
  • the method sets (at 610) a first video frame in the video sequence as a cu ⁇ ent frame.
  • the current frame is comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values (such as luminance and chrominance values).
  • the luminance values (i.e., the luminance plane) of the cu ⁇ ent frame are filtered.
  • the chrominance (U) values i.e., the chrominance (U) plane
  • the chrominance (V) values i.e., the chrominance (V) plane
  • Pixels and pixel locations are identified by discrete row (e.g., x) and column (e.g., y) coordinates such that 1 ⁇ x ⁇ M and 1 ⁇ y ⁇ N where M x N is the size of the cu ⁇ ent frame in pixel units.
  • the method sets (at 615) row (x) and column (y) values for an initial cu ⁇ ent pixel location.
  • the method also sets (at 620) the number of iterations (no_iterations), i.e., time steps (t), to be performed for each pixel location (x,y). The number of iterations can be determined depending on the amount of details to be removed.
  • the method selects (at 625) an edge-stopping function g(x) and values of parameters (such as ⁇ and k).
  • the method then computes (at 645) a new pixel value for the cu ⁇ ent pixel location using
  • I(x, y, t + 1) I( ⁇ , y, t) + ⁇ y, i) I N (x f y, i) + csfa V > *) V2j(r, y, i) + CE& V t *) V1E( ⁇ , y, t) + q ⁇ [x, y, t) V2 w (x f y, t)]
  • I(x, y) I(x, y) + ⁇ (c w ⁇ N + cs ⁇ s + c £ ⁇ E + c ⁇ )
  • I(x, y) is the luminance (Y) plane.
  • I(x, y) is the chrominance (U) plane or the chrominance (V) plane.
  • the method then decrements (at 650) no iterations by one and determines (at 655) if no_iterations is equal to 0. If not, the method continues at step 630. If so, the method determines (at 660) if the cu ⁇ ent pixel location is a last pixel location of the cu ⁇ ent frame. If not, the method sets (at 665) a next pixel location in the cu ⁇ ent frame as the cu ⁇ ent pixel location. The method then continues at step 630.
  • the method 600 determines (at 660 - Yes) that the cu ⁇ ent pixel location is the last pixel location of the cu ⁇ ent frame
  • the method determines (at 670) if the cu ⁇ ent frame is a last frame of the video sequence (received at 605). If not, the method sets (at 675) a next frame in the video sequence as the cu ⁇ ent frame. The method then continues at step 615. If the method determines (at 670 - Yes) that the cu ⁇ ent frame is the last frame of the video sequence, the method outputs (at 680) a pre-filtered video sequence being comprised of multiple pre-filtered video frames and having an associated data amount that is less than the data amount associated with the original video sequence (received at 605).
  • the pre-filtered video sequence may be received, for example, by the temporal pre- filtering component 205 for further pre-processing or the encoder component 110 for encoding (i.e., compression).
  • the bit rate of the pre- filtered and compressed video sequence is lower than the bit rate that would be obtained by compressing the original video sequence (without pre-filtering) using the same compression method.
  • Figure 7 illustrates a conceptual diagram of a diffusion pattern of a conventional Perona- Malik anisotropic diffusion filter.
  • a conventional Perona-Malik anisotropic filter performs diffusion on a pixel 705 in only horizontal and vertical directions (north, south, east and west) with respect to the pixel's location (x, y).
  • a conventional anisotropic diffusion filter will perform diffusion filtering in the horizontal or vertical directions from the pixel location (2, 2) towards the horizontal or vertical neighboring pixel locations (2, 3), (2, 1), (3, 2), and (1, 2).
  • spatial diffusion filtering is performed on a pixel in at least one diagonal direction (north-east, north-west, south-east, or south-west) with respect to the pixel's location (x, y).
  • the method of the present invention performs diffusion filtering in at least one diagonal direction from the pixel location (2, 2) towards the direction of a diagonal neighboring pixel location (3, 3), (1, 3), (3, 1) and or (1, 1).
  • diffusion filtering is performed in four diagonal directions (north-east, northwest, south-east, and south-west) with respect to a pixel location (x, y).
  • FIG 8 illustrates a conceptual diagram of a diffusion pattern of an omni-directional anisotropic diffusion filter in accordance with the present invention.
  • the omni-directional anisotropic diffusion filter performs diffusion in four horizontal or vertical directions (north, south, east and west) and four diagonal directions (north-east, north-west, south-east, and south-west) with respect to a pixel 805 at pixel location (x, y).
  • the omni-directional anisotropic diffusion filter will perform diffusion filtering in four horizontal or vertical directions from the pixel location (2, 2) towards the horizontal or vertical neighboring pixel locations (2, 3), (2, 1), (3, 2), and (1, 2) and in four diagonal directions from the pixel location (2, 2) towards the diagonal neighboring pixel locations (3, 3), (1, 3), (3, 1) and (1, 1).
  • a video frame is pre-processed using omni-directional diffusion filtering in four horizontal or vertical directions and four diagonal directions as expressed by the following omni-directional spatial filtering equation (shown in two dimensional form):
  • Vj(x, y, t) is the image gradient; • (x, y) specifies a pixel location in a discrete, two dimensional grid covering the video frame; • t denotes discrete time steps (i.e., iterations); • scalar constant ⁇ determines the rate of diffusion, ⁇ being a positive real number that is less than or equal to where (x,y)
  • g(x) satisfies the condition g(x) ⁇ 0 when x ⁇ ⁇ such that the diffusion operation is stopped across the edges of the video frame. Because the distance between a pixel location (x, y) and any of its diagonal pixel neighbors is larger than the distance between the distance between the pixel location and its horizontal or vertical pixel neighbors, the diagonal pixel differences are scaled by a factor ⁇ , which is a function of the frame dimensions M, N. Also employed is the approximation of the image gradient V/(x, y, t) in a selected
  • VX p (x, y, t) X p ⁇ x, y, t) - X(x, y, t), p € rf(x, y)
  • the image gradient VJ(x, y, t) can be computed
  • Figure 9 is a flowchart showing a method 900 for pre-processing a video sequence using omni-directional spatial anisotropic diffusion filtering to reduce the data amount of the video sequence and to reduce the bit rate of the compressed video sequence.
  • the method 900 may be performed, for example, by the spatial pre-filtering component 210 or the encoder component
  • the method 900 starts when an original video sequence comprised of multiple video frames is received (at 905), the original video sequence having an associated data amount.
  • the method sets (at 910) a first video frame in the video sequence as a cu ⁇ ent frame.
  • the cu ⁇ ent frame is comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values (such as luminance and chrominance values).
  • the luminance values (i.e., the luminance plane) of the cu ⁇ ent frame are filtered.
  • the chrominance (U) values i.e., the chrominance (U) plane
  • the chrominance (V) values i.e., the chrominance (V) plane
  • Pixels and pixel locations are identified by discrete row (e.g., x) and column (e.g., y) coordinates such that 1 ⁇ x ⁇ M and 1 ⁇ y ⁇ N where M x N is the size of the cu ⁇ ent frame in pixel units.
  • the method sets (at 915) row (x) and column (y) values for an initial cu ⁇ ent pixel location.
  • the method sets (at 915) row (x) and column (y) values for an initial cu ⁇ ent pixel location.
  • the method selects (at 925) an edge-stopping function g(x) and values of parameters
  • the method then computes (at 930) approximations of the image gradient in
  • the method then computes (at 940) diffusion coefficients in the north, south, east, west,
  • C ⁇ north-east, north-west, south-east, and south-west directions (C ⁇ , C S , eg, c , CNE, NW, SE, and csw,
  • I(x, y) I(x, y) + ⁇ [(c w ⁇ N + cs ⁇ s + c £ ⁇ E + cn ⁇ ) + ⁇ (c W£ ⁇ NE + CJVWONW + c 5£ ⁇ S E + c s ⁇ S w)]
  • I(x, y) is the luminance (Y) plane.
  • I(x, y) is the chrominance (U) plane or the chrominance (V) plane.
  • the method sets (at 965) a next pixel location in the cu ⁇ ent frame as the cu ⁇ ent pixel location. The method then continues at step 930.
  • the method 900 determines (at 960 - Yes) that the cu ⁇ ent pixel location is the last pixel location of the cu ⁇ ent frame, the method then determines (at 970) if the cu ⁇ ent frame is a last frame of the video sequence (received at 905). If not, the method sets (at 975) a next frame in the video sequence as the cu ⁇ ent frame. The method then continues at step 915. If the method determines (at 970 - Yes) that the cu ⁇ ent frame is the last frame of the video sequence, the method outputs (at 980) a pre-filtered video sequence being comprised of multiple pre-filtered video frames and having an associated data amount that is less than the data amount associated with the original video sequence (received at 905).
  • the pre-filtered video sequence may be received, for example, by the temporal pre- filtering component 205 for further pre-processing or the encoder component 110 for encoding (i.e., compression).
  • the bit rate of the pre-filtered video sequence after compression using the encoder component 110 is lower than the bit rate of the original video sequence (without pre- filtering) after compression using the same compression method.
  • foreground/background differentiation methods are used to pre- filter a video sequence so that filtering is performed differently on a foreground region of a video frame of the video sequence than on a background region of the video frame.
  • Performing different filtering on different regions of the video frame allows a system to provide greater data reduction in unimportant background regions of the video frame while preserving sharp edges in regions-of-interest in the foreground region.
  • the bit rate of the pre-processed and compressed video sequence will be lower than the bit rate of the compressed video sequence made without pre-processing.
  • This foreground/background differentiation method is especially beneficial in videoconferencing applications but can be used in other applications as well.
  • the foreground/background differentiation method of the present invention includes five general steps: 1) identifying pixel locations in a video frame having pixel values that match color characteristics of human skin and identification of contiguous groupings of matching pixel locations (i.e., regions-of-interest); 2) determining a bounding shape for each region-of-interest, the totality of all pixel locations contained in a bounding shape comprising a foreground region and all other pixel locations in the frame comprising a background region; 3) creating a binary mask Mf g for the foreground region and a binary mask M bg for the background region; 4) filtering the foreground and background regions using different filtering methods or parameters using the binary masks; and 5) combining the filtered foreground and background regions into a single filtered frame.
  • Figure 10 illustrates a flowchart depicting a foreground/background differentiation method 1000 in accordance with the present invention.
  • the foreground/background differentiation method 1000 may be performed, for example, by the spatial pre-filtering component 210 or the encoder component 110.
  • the foreground/background differentiation method 1000 commences by receiving an original video sequence in YUV format (at 1005).
  • the video sequence comprises a plurality of video frames and having an associated data amount.
  • a video sequence in another format is received.
  • the method sets (at 1010) a first video frame in the video sequence as a cu ⁇ ent frame.
  • the cu ⁇ ent frame is comprised of a cu ⁇ ent luminance (Y) frame and cu ⁇ ent chrominance (U and V) frames.
  • the cu ⁇ ent frame is comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values (such as luminance and chrominance values from the luminance and chrominance frames, respectively). Pixels and pixel locations are identified by discrete row (e.g., x) and column (e.g., y) coordinates such that 1 ⁇ x ⁇ M and 1 ⁇ y ⁇ N where M x N is the size of the cu ⁇ ent frame in pixel units.
  • the method sets (at 1015) row (x) and column (y) values for an initial cu ⁇ ent pixel location.
  • the initial cu ⁇ ent pixel location may be set to equal (0, 0).
  • the method determines (at 1020) if the cu ⁇ ent pixel location in the cu ⁇ ent frame contains one or more pixel values that fall within predetermined low and high threshold values.
  • the method determines if the cu ⁇ ent pixel location has pixel values that satisfy the condition U ⁇ ow ⁇ U(x, y) ⁇ U hlg h and V ⁇ ow ⁇ V(x, y) ⁇ V hlgh where U and V are chrominance values of the cu ⁇ ent pixel location (x, y) and threshold values U ⁇ ow , U h , gh , V tow , and Vh,gh are predetermined chrominance values that reflect the range of color characteristics (i.e., chrominance values U, V) of human skin.
  • the present invention makes use of the fact that, for all human races, the chrominance ranges of the human face/skin are consistently the same.
  • the method includes identifying pixel locations in the video frame having pixel values that match other characteristics, such as a predetermined color or brightness.
  • the method determines (at 1020 - Yes) that the cu ⁇ ent pixel location contains pixel values that fall within the predetermined low and high threshold values, the cu ⁇ ent pixel location is refe ⁇ ed to as a matching pixel location and is added (at 1025) to a set of matching pixel locations. Otherwise, the method proceeds directly to step 1030.
  • the foreground/background differentiation method 1000 determines (at 1030) if the cu ⁇ ent pixel location is a last pixel location of the cu ⁇ ent frame. For example, the method may determine whether the row (x) coordinate of the cu ⁇ ent pixel location is equal to M and the column (y) coordinate of the cu ⁇ ent pixel location is equal to N where M x N is the size of the cu ⁇ ent frame in pixel units.
  • steps 1020 through 1035 compose a human skin identifying system that identifies pixel locations in a video frame having pixel values that match characteristics of human skin.
  • steps 1020 through 1035 compose a human skin identifying system that identifies pixel locations in a video frame having pixel values that match characteristics of human skin.
  • Other human skin identifying systems well known in the art, however, may be used in place of the human skin identifying system described herein without departing from the scope of the invention. If the method 1000 determines (at 1030 - Yes) that the cu ⁇ ent pixel location is the last pixel location of the cu ⁇ ent frame, the method then determines (at 1040) contiguous groupings of matching pixel locations in the set of matching pixel locations.
  • a region-of-interest can be defined, for example, by spatial proximity wherein all matching pixel locations within a specified distance are grouped in the same region-of-interest.
  • An ROI is typically a distinct entity represented in the cu ⁇ ent frame, such as a person's face or an object (e.g., cup) having chrominance values similar to that of human skin.
  • Figure 11a illustrates an example of a video frame 1100 having two ROIs.
  • the first ROI represents a person's face 1105 and the second ROI represents a cup 1115 having chrominance values similar to that of human skin (i.e., having chrominance values that fall within the predetermined chrominance threshold values). Also shown in Figure 11a are representations of a person's clothed body 1110, a carton 1120, and a book 1125, none of which have chrominance values similar to that of human skin.
  • a bounding shape is then determined (at 1045) for each ROI, the bounding shape enclosing all or a portion of the ROI (i.e., the bounding shape encloses all or some of the matching pixel locations in the ROI).
  • the bounding shape may be of various geometric forms, such as a four-sided, three-sided, or circular form.
  • the bounding shape is a in the form of a box where a first side of the bounding shape is determined by the lowest x coordinate, a second side of the bounding shape is determined by the highest x coordinate, a third side of the bounding shape is determined by the lowest y coordinate, and a fourth side of the bounding shape is determined by the highest y coordinate of the matching pixel locations in the ROI.
  • the bounding shape does not enclose the entire ROI and encloses over V 2 or 3 ⁇ of the matching pixel locations in the ROI.
  • Figure lib illustrates an example of a video frame 1100 having two ROIs, each ROI being enclosed by a bounding shape.
  • the first ROI (the person's face 1105) is enclosed by a first bounding shape 1130 and the second ROI (the cup 1115) is enclosed by a second bounding shape 1135.
  • Use of a bounding shape for each ROI gives a fast and simple approximation of an ROI in the video frame 1100.
  • a bounding shape will typically enclose a number of non-matching pixel locations along with the matching pixel locations of the
  • the method determines (at 1050) foreground and background regions of the cu ⁇ ent frame.
  • the foreground region is comprised of a totality of regions in the cu ⁇ ent frame enclosed within a bounding shape.
  • the foreground region is comprised of a set of foreground pixel locations (matching or non-matching) of the cu ⁇ ent frame enclosed within a bounding shape.
  • the foreground region is comprised of the totality of the regions or pixel locations enclosed by the first bounding shape 1130 and the second bounding shape 1135.
  • the background region is comprised of a totality of regions in the cu ⁇ ent frame not enclosed within a bounding shape.
  • the background region is comprised of a set of background pixel locations not included in the foreground region.
  • the background region is comprised of the regions or pixel locations not enclosed by the first bounding shape 1130 and the second bounding shape 1135.
  • the method then constructs (at 1055) a binary mask M/ g for the foreground region and a binary mask Mh g for the background region.
  • the foreground binary mask Mf g is defined to contain values equal to 1 at pixel locations in the foreground region and to contain values equal to 0 at pixel locations not in the background region.
  • Figure lie illustrates the video frame 1100 after a foreground binary mask Mf g has been applied.
  • the foreground binary mask Mf g removes the background region so that only the set of foreground pixel locations or the foreground region (i.e., the regions enclosed by the first bounding shape 1130 and the second bounding shape 1135) of the frame remains.
  • the background binary mask M t , g is defined as the complement of the foreground binary mask Mf g so that it contains values equal to 0 at pixel locations in the foreground region and contains values equal to 1 at pixel locations not in the background region.
  • Figure lid illustrates the video frame 1100 after a background binary mask M bg has been applied.
  • the method then performs (at 1060) different filtering of the foreground and background regions (i.e., the set of foreground pixel locations and the set of background pixel locations are filtered differently).
  • foreground and background regions are filtered using anisotropic diffusion where different edge stopping functions and/or parameter values are used for the foreground and background regions.
  • the foreground/background differentiation method 1000 determines (at 1070) if the cu ⁇ ent frame is a last frame of the video sequence (received at 1005). If not, the method sets (at 1075) a next frame in the video sequence as the cu ⁇ ent frame. The method then continues at step 1015. If the method 1000 determines (at 1070 - Yes) that the cu ⁇ ent frame is the last frame of the video sequence, the method outputs (at 1080) a pre-filtered video sequence being comprised of multiple pre-filtered video frames and having an associated data amount that is less than the data amount associated with the original video sequence (received at 1005).
  • the pre-filtered video sequence may be received, for example, by the temporal pre- filtering component 205 for further pre-processing or the encoder component 110 for encoding (i.e., compression).
  • the bit rate of the pre-filtered video sequence after compression using the encoder component 110 is lower than the bit rate of the video sequence without pre-filtering after compression using the same compression method.
  • the foreground and background regions may be filtered using different filtering methods or different filtering parameters.
  • spatial filtering methods diffusion filtering has the important property of generating a scale space via a partial differential equation. In the scale space, analysis of object boundaries and other information at the co ⁇ ect resolution where they are most visible can be performed.
  • FIG 12 illustrates a flowchart of a method 1200 for using omni-directional spatial filtering method (described with reference to Figure 9) in conjunction with the foreground/background differentiation method 1000 (described with reference to Figure 10).
  • the method 1200 may be performed, for example, by the spatial pre-filtering component 210 or the encoder component 110.
  • the method 1200 begins when it receives (at 1205) a video frame (i.e., the cu ⁇ ent frame being processed by the method 1000).
  • the cu ⁇ ent frame is comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values. Pixel locations are identified by discrete row (x) and column (y) coordinates such that 1 ⁇ x ⁇ M and 1 ⁇ y ⁇ N where M x N is the size of the frame in pixel units.
  • the method 1200 also receives (at 1210) a foreground binary mask Mf g and a background binary mask M bg (constructed at step 1055 of Figure 10). The method 1200 then applies (at 1215) the foreground binary mask Mf g to the cu ⁇ ent frame to produce a set of foreground pixel locations that comprise the foreground region (as shown, for example, in Figure lie).
  • the method then sets (at 1220) row (x) and column (y) values for an initial cu ⁇ ent pixel location to equal the coordinates of one of the foreground pixel locations.
  • the initial cu ⁇ ent pixel location may be set to equal the coordinates of a foreground pixel location having the lowest row (x) or the lowest column (y) coordinate in the set of foreground pixel locations.
  • the method 1200 then applies (at 1225) omni-directional diffusion filtering to the cu ⁇ ent pixel location using a foreground edge stopping function g/ g (x) and a set of foreground parameter values Pf g (that includes parameter values kf g and ⁇ f g ).
  • the omni-directional diffusion filtering is expressed by the omni-directional spatial filtering equation:
  • Parameter value ⁇ / g is a foreground parameter value that determines the rate of diffusion in the omni-directional spatial filtering in the foreground region.
  • the foreground edge stopping function g g (x) is expressed by the following equation:
  • parameter value kf g is a foreground parameter value that controls diffusion as a function of the gradient. If the value of the parameter is low, diffusion stops across the edges. If the value of the parameter is high, intensity gradients have a reduced influence on diffusion.
  • the method 1200 determines (at 1230) if the cu ⁇ ent pixel location is a last pixel location of the set of foreground pixel locations. If not, the method sets (at 1235) a next pixel location in the set of foreground pixel locations as the cu ⁇ ent pixel location. The method then continues at step 1225. If the method 1200 determines (at 1230 - Yes) that the cu ⁇ ent pixel location is the last pixel location of the set of foreground pixel locations, the method continues at
  • step 1240 The method 1200 applies (at 1240) the background binary mask M bg to the cu ⁇ ent frame to produce a set of background pixel locations that comprise the background region (as shown, for example, in Figure lid).
  • the method sets (at 1245) row (x) and column (y) values for an initial cu ⁇ ent pixel location to equal the coordinates of one of the background pixel locations.
  • the initial cu ⁇ ent pixel location may be set to equal the coordinates of a background pixel location having the lowest row (x) or the lowest column (y) coordinate in the set of background pixel locations.
  • the method 1200 then applies (at 1250) omni-directional diffusion filtering to the cu ⁇ ent pixel location using a background edge stopping function g &g (x) and a set of background parameter values P bg (that includes parameter values k b g and ⁇ bg ).
  • the omni-directional diffusion filtering is expressed by the omni-directional spatial filtering equation given above.
  • at least one background parameter value in the set of background parameters P b g is not equal to a co ⁇ esponding foreground parameter value in the set of foreground parameters Pf g .
  • Parameter value ⁇ bg is a background parameter value that determines the rate of diffusion in the omni-directional spatial filtering in the background region.
  • the background parameter value bg is not equal to the foreground parameter value ⁇ f g .
  • the background edge stopping function g 6g (x) is different than the foreground edge stopping function g / - g (x) and is expressed by the following equation:
  • parameter value k bg is a background parameter value that controls diffusion as a function of the gradient. If the value of this parameter is low, diffusion stops across the edges. If the value of this parameter is high, intensity gradients have a reduced influence on diffusion. In some embodiments, the background parameter value k bg is not equal to the foreground parameter value
  • the method 1200 determines (at 1255) if the cu ⁇ ent pixel location is a last pixel location of the set of background pixel locations. If not, the method sets (at 1260) a next pixel location in the set of background pixel locations as the cu ⁇ ent pixel location. The method then continues at step 1250. If the method 1200 determines (at 1255 - Yes) that the cu ⁇ ent pixel location is the last pixel location of the set of background pixel locations, the method ends.
  • Different embodiments of the present invention as described above may be used independently to pre-process a video sequence or may be used in any combination with any other embodiment of the present invention and in any sequence.
  • the temporal filtering method of the present invention may be used independently or in conjunction with the spatial filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence.
  • the spatial filtering methods of the present invention may be used independently or in conjunction with the temporal filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence.
  • the foreground/background differentiation method of the present invention may be used independently or in conjunction with the temporal filtering methods and/or the spatial filtering methods of the present invention to pre-process a video sequence.
  • Embodiments of the present invention may also relate to a variety of formats other than YUV.
  • other video frame formats such as RGB
  • RGB can easily be changed into YUV format.
  • Some embodiments described above relate to a videoconferencing application.
  • One of ordinary skill in the art, however, will realize that these embodiments may also relate to other applications (e.g., DVD, digital storage media, television broadcasting, internet streaming, communication, etc.) in real-time or post-time.
  • Embodiments of the present invention may also be used with video sequences having different coding standards such as H.263 and H.264 (also known as MPEG-4/Part 10).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Methods for pre-processing video sequences prior to compression to provide data reduction of the video sequence. After compression of the pre-processed video sequence, the bit rate of the pre-processed and compressed video sequence will be lower than the bit rate of the video sequence after compression but without pre-processing. Pre-processing may include temporal filtering where pixel values of successive frames are filtered when the difference in the pixel values between the successive frames are within high and low threshold values. The high and low threshold values are determined adaptively depending on the illumination level of a frame. Pre-processing may include spatial anisotropic diffusion filtering such as Perona-Malik filtering, Fallah-Ford filtering, or omni-directional filtering that extends Perona-Malik filtering to perform filtering in at least one diagonal direction. Pre-processing may also include performing filtering differently on a foreground region than on a background region of a frame.

Description

METHOD AND SYSTEM FOR PRE - PROCESSING OF VIDEO SEQUENCES TO ACHIEVE BETTER COMPRESSION
RELATED APPLICATIONS This patent application claims the benefit to U.S. Patent Application entitled "Preprocessing Method and System for Data Reduction of Video Sequences and Bit Rate Reduction of Compressed Video Sequences Using Temporal Filtering," Attorney Docket No. APLE.P0056, Serial Number 10/640,734, filed 8/13/2003, and to U.S. Patent Application entitled "Preprocessing Method and System for Data Reduction of Video Sequences and Bit Rate Reduction of Compressed Video Sequences Using Spatial Filtering," Attorney Docket No. APLE.P0059, Serial Number 10/640,944, filed 8/13/2003.
FIELD OF THE INVENTION The invention addresses pre-processing for data reduction of video sequences and bit rate reduction of compressed video sequences.
BACKGROUND OF THE INVENTION Video is currently being transitioned from an analog medium to a digital medium. For example, the old analog NTSC television broadcasting standard is slowly being replaced by the digital ATSC television broadcasting standard. Similarly, analog video cassette tapes are increasingly being replaced by digital versatile discs (DVDs). Thus, it is important to identify efficient methods of digitally encoding video information. An ideal digital video encoding system will provide a very high picture quality with the minimum number of bits. The pre-processing of video sequences can be an important part of digital video encoding systems. A good video pre-processing system can achieve a bit rate reduction in the final compressed digital video streams. Furthermore, the visual quality of the decoded sequences is often higher when a good pre-processing system has been applied as compared to that obtained without pre-processing. Thus, it would be beneficial to design video pre-processing systems that will alter a video sequence in a manner that will improve the compression of the video sequence by a digital video encoder.
SUMMARY OF THE INVENTION Embodiments of the present invention provide methods for pre-processing of video sequences prior to compression to provide data reduction of the video sequence. In addition, after compression of the pre-processed video sequence, the bit rate of the pre-processed and compressed video sequence will be lower than the bit rate of the video sequence after compression but without pre-processing. Some embodiments of the present invention provide a temporal filtering method for preprocessing of video frames of a video sequence. In the method, pixel values (such as luminance and chrominance values) of successive frames are filtered when the difference in the pixel values between the successive frames are within a specified range as defined by high and low threshold values. The high and low threshold values are determined adaptively depending on the illumination level of a video frame to reduce noise in the video sequence. In some embodiments, at low illumination levels, filtering strength is increased (i.e., a larger number of pixels are filtered) to reduce the greater amount of noise found in video frames having lower illumination levels. At high illumination levels, filtering strength is decreased (i.e., a smaller number of pixels are filtered) to reduce the lesser amount of noise found in video frames having higher illumination levels. As such, the method provides adaptive threshold values to provide variability of filtering strength depending on the illumination levels of a video frame. Some embodiments of the present invention provide a spatial method for pre-processing video frames of a video sequence using spatial anisotropic diffusion filtering. Some embodiments use conventional spatial anisotropic diffusion filters such as a Perona-Malik anisotropic diffusion filter or a Fallah-Ford diffusion filter which are not traditionally applied for bit rate reduction. Other embodiments use an omni-directional spatial filtering method that extends the traditional Perona-Malik diffusion filter (that normally performs diffusion only in horizontal or vertical directions) so that diffusion is also performed in at least one diagonal direction. In some embodiments, the omni-directional filtering method provides diffusion filtering in eight directions (north, south, east, west, north-east, south-east, south-west, and northwest). By extending the spatial filter to perform omni-directional diffusion, the effectiveness of the pre-filtering stage is significantly improved such that less smoothing and/or blurring of edges is produced in the final decoded frames. The present invention also includes a foregroundbackground differentiation pre- processing method that performs filtering differently on a foreground region of a video frame in a video sequence than on a background region of the video frame. The method includes identifying pixel locations in the video frame having pixel values that match characteristics of human skin. A bounding shape is then determined for each contiguous grouping of matching pixel locations (i.e., regions-of-interest), the bounding shape enclosing all or a portion of the contiguous grouping of matching pixel locations. The totality of all pixel locations of the video frame contained in a bounding shape is referred to as a foreground region. Any pixel locations in the video frame not contained within the foreground region comprises a background region. The method then filters pixel locations in the foreground region differently than pixel locations in the background region. Performing different types of filtering on different regions of the video frame allows greater data reduction in unimportant regions of the video frame while preserving sharp edges in regions-of-interest. The present invention provides automatic detection of regions-of-interest (e.g., a person's face) and implements bounding shapes instead of exact segmentation of a region-of-interest. This allows for a simple and fast filtering method that is viable in real-time applications (such as videoconferencing) and bit rate reduction of the compressed video sequence. Different embodiments of the present invention may be used independently to pre-process a video sequence or may be used in any combination with any other embodiment of the present invention and in any sequence. As such, the spatial filtering methods of the present invention may be used independently or in conjunction with temporal filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence. Furthermore, the foreground/background differentiation methods of the present invention may be used independently or in conjunction with the temporal filtering methods and/or the spatial filtering methods of the present invention to pre-process a video sequence. In addition, the temporal filtering method of the present invention may be used independently or in conjunction with the spatial filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence.
BRIEF DESCRIPTION OF THE DRAWINGS The novel features of the invention are set forth in the appended claims. However, for purpose of explanation, several embodiments of the invention are set forth in the following figures. Figure 1 illustrates a coding system with pre-processing and post-processing components. Figure 2 illustrates a pre-processing component with separate temporal pre-filtering and spatial pre-filtering components. Figure 3 illustrates a flowchart for a temporal pre-filtering method in accordance with the present invention. Figure 4a illustrates a graph of an exemplary high luminance threshold function that determines a high luminance threshold value. Figure 4b illustrates a graph of an exemplary low luminance threshold function that determines a low luminance threshold value. Figure 5 illustrates a flowchart depicting a method for pre-processing a video sequence using Fallah-Ford spatial anisotropic diffusion filtering for data reduction. Figure 6 illustrates a flowchart depicting a method for pre-processing a video sequence using Perona-Malik spatial anisotropic diffusion filtering for data reduction. Figure 7 illustrates a conceptual diagram of a diffusion pattern of a conventional Perona- Malik anisotropic diffusion filter. Figure 8 illustrates a conceptual diagram of a diffusion pattern of an omni-directional anisotropic diffusion filter in accordance with the present invention. Figure 9 illustrates a flowchart depicting a method for pre-processing a video sequence using omni-directional spatial anisotropic diffusion filtering for data reduction. Figure 10 illustrates a flowchart depicting a foreground/background differentiation method in accordance with the present invention. Figure 11a illustrates an example of a video frame having two regions-of-interest. Figure lib illustrates an example of a video frame having two regions-of-interest, each region-of-interest being enclosed by a bounding shape. Figure lie illustrates a video frame after a foreground binary mask M/g has been applied. Figure lid illustrates a video frame after a background binary mask Mbg has been applied. Figure 12 is a flowchart of a method for using omni-directional spatial filtering in conjunction with the foreground/background differentiation method of Figure 10.
DETAILED DESCRIPTION OF THE INVENTION The disclosure of U.S. Patent Application entitled "Pre-processing Method and System for Data Reduction of Video Sequences and Bit Rate Reduction of Compressed Video Sequences Using Temporal Filtering," Attorney Docket No. APLE.P0056, Serial Number 10/640,734, filed 8/13/2003, is expressly incorporated herein by reference. The disclosure of U.S. Patent Application entitled "Pre-processing Method and System for Data Reduction of Video Sequences and Bit Rate Reduction of Compressed Video Sequences Using Spatial Filtering," Attorney Docket No. APLE.P0059, Serial Number 10/640,944, filed 8/13/2003, is expressly incorporated herein by reference. In the following description, numerous details are set forth for purpose of explanation.
However, one of ordinary skill in the art will realize that the invention may be practiced without the use of these specific details. In other instances, well-known structures and devices are shown in block diagram form in order not to obscure the description of the invention with unnecessary
detail. Video Pre-processing As set forth in the background, a good video pre-processing system can achieve a bit rate reduction in the final compressed digital video streams. Furthermore, a good video preprocessing system may also improve the visual quality of the decoded sequences. Typically, a video pre-processing system may employ filtering, down-sampling, brightness/contrast correction, and/or other image processing techniques. The pre-processing step of filtering is refeπed to as pre-filtering. Pre-filtering can be accomplished using temporal, spatial, or spatial- temporal filters, all of which achieve partial noise reduction and/or frame rate reduction. Temporal filtering is a pre-processing step used for smoothing motion fields, frame rate reduction, and tracking and noise reduction between sequential frames of a video sequence. Temporal filtering operations in one dimension (i.e., time dimension) are applied to two or more frames to make use of the temporal redundancy in a video sequence. The main difficulty in designing and applying temporal filters stems from temporal effects, such as motion jaggedness, ghosting, etc., that are sometimes caused temporal pre-filtering. Such artifacts are particularly visible and difficult to tolerate by the viewers. These artifacts are partly due to the fact that conventional temporal filters are not adaptive to the content or illumination levels of frames in a video sequence. Spatial filtering is a pre-processing step used for anti-aliasing and smoothing (by removing details of a video frame that are unimportant for the perceived visual quality) and segmentation. Spatial filter design aims at achieving a tradeoff between noise/detail reduction within the frame and the amount of blurring/smoothing that is being introduced. For video coding applications, a balance between the bit rate reduction as a result of pre- filtering and the subjective quality of the filtered sequences is difficult to achieve. For reasonable bit rate reductions, noticeable distortion is often introduced in the filtered video sequences (and consequently in the decoded sequences that have been pre-filtered before encoding). The distortions may take the form of excessive smoothing of flat areas, blurring of edges (for spatial filters), ghosting, and/or other temporal effects (for temporal filters). Such artifacts are particularly disturbing when they affect regions-of-interest (ROIs) such as a person's face in videoconferencing applications. Even more importantly, even if both the bit rate reduction of the compressed stream and the picture quality of the filtered video sequence prior to encoding are acceptable, there is no guarantee that the subjective quality of the decoded sequence is better than that of the decoded sequence without pre-filtering. Finally, to be viable in real-time applications such as videoconferencing, the filtering methods need to be simple and fast while addressing the limitations mentioned above. Video Pre-processing in the Present Invention Embodiments of the present invention provide methods for pre-processing of video sequences prior to compression to provide data reduction of the video sequence. In addition, after compression of the pre-processed video sequence, the bit rate of the pre-processed and compressed video sequence may be lower than the bit rate of the video sequence after compression but without pre-processing. Some embodiments of the present invention provide a temporal filtering method for preprocessing of video frames of a video sequence. In the temporal filtering method, pixel values (such as luminance and chrominance values) of successive frames are filtered when the difference in the pixel values between the successive frames are within a specified range as defined by high and low threshold values. The high and low threshold values are determined adaptively depending on the illumination level of a video frame to provide variability of filtering strength depending on the illumination levels of a video frame. As a result, the method provides for data reduction of the video sequence and bit rate reduction of the compressed video sequence. Some embodiments of the present invention provide a spatial filtering method for pre- processing a video sequence using spatial anisotropic diffusion filtering. Some embodiments use conventional spatial anisotropic diffusion filters such as a Perona-Malik anisotropic diffusion filter or a Fallah-Ford diffusion filter. Other embodiments use an omni-directional spatial filtering method that extends the traditional Perona-Malik diffusion filter (that performs diffusion in four horizontal or vertical directions) so that diffusion is also performed in at least one diagonal direction. In some embodiments, the omni-directional filtering method provides diffusion filtering in eight directions (north, south, east, west, north-east, south-east, south-west, and north-west). The present invention also includes a foreground/background differentiation preprocessing method that performs filtering differently on a foreground region of a video frame in a video sequence than on a background region of the video frame. The method includes identifying pixel locations in the video frame having pixel values that match characteristics of human skin. In other embodiments, the method includes identifying pixel locations in the video frame having pixel values that match other characteristics, such as a predetermined color or brightness. A bounding shape is then determined for each contiguous grouping of matching pixel locations (i.e., regions-of-interest), the bounding shape enclosing all or a portion of the contiguous grouping of matching pixel locations. The totality of all pixel locations of the video frame contained in a bounding shape is referred to as a foreground region. Any pixel locations in the video frame not contained within the foreground region comprises a background region. The method then filters pixel locations in the foreground region differently than pixel locations in the background region. The method provides automatic detection of regions-of-interest (e.g., a person's face) and implements bounding shapes instead of exact segmentation of a region-of- interest. This allows for a simple and fast filtering method that is viable in real-time applications (such as videoconferencing) and bit rate reduction of the compressed video sequence. Different embodiments of the present invention may be used independently to pre-process a video sequence or may be used in any combination with any other embodiment of the present invention and in any sequence. As such, the temporal filtering method of the present invention may be used independently or in conjunction with the spatial filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence. In addition, the spatial filtering methods of the present invention may be used independently or in conjunction with the temporal filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence. Furthermore, the foreground/background differentiation methods of the present invention may be used independently or in conjunction with the temporal filtering methods and/or the spatial filtering methods of the present invention to pre-process a video sequence. Some embodiments described below relate to video frames in YUV format. One of ordinary skill in the art, however, will realize that these embodiments may also relate to a variety of formats other than YUV. In addition, other video frame formats (such as RGB) can easily be transformed into the YUV format. Furthermore, some embodiments are described with reference to a videoconferencing application. One of ordinary skill in the art, however, will realize that the teachings of the present invention may also relate to other video encoding applications (e.g., DVD, digital storage media, television broadcasting, internet streaming, communication, etc.) in real-time or post-time. Embodiments of the present invention may also be used with video sequences having different coding standards such as H.263 and H.264 (also known as MPEG-
4/Part 10). As stated above, embodiments of the present invention provide methods for preprocessing of video sequences prior to compression to provide data reduction. As used herein, data reduction of a video sequence refers to a reduced amount of details and/or noise in a pre- processed video sequence before compression in comparison to the same video sequence before compression but without pre-processing. As such, data reduction of a video sequence refers to a comparison of the details and/or noise in a pre-processed and uncompressed video sequence, and an uncompressed-only video sequence, and does not refer to the reduction in frame size or frame rate.
In addition, embodiments of the present invention provide that after compression of the pre-processed video sequence, the bit rate of the pre-processed and compressed video sequence will be lower than the bit rate of compressed video sequence made without any pre-processing. As used herein, reduction or lowering of the bit rate of a compressed video sequence refers to a reduced or lowered bit rate of a pre-processed video sequence after compression in comparison to the same video sequence after compression but without pre-processing. As such, reduction or lowering of the bit rate of a compressed video sequence refers to a comparison of the bit rates of a pre-processed and compressed video sequence and a compressed-only video sequence and does not refer to the reduction or lowering of the bit rate of a video sequence caused by compression (i.e., encoding). The various embodiments described below provide a method for pre-processing/pre- filtering of video sequences for data reduction of the video sequences and bit rate reduction of the compressed video sequences. Embodiments relating to temporal pre-filtering are described in Section I. Embodiments relating to spatial pre-filtering are described in Section II. Embodiments relating to filtering foreground and background regions of a video frame differently are described in Section III. Figure 1 illustrates a coding system 100 with pre-processing and post-processing components. A typical coding system includes an encoder component 110 preceded by a preprocessing component 105 and a decoder component 115 followed by a post-processing component 120. Pre-filtering of a video sequence is performed by the pre-processing component 105, although in other embodiments, the pre-filtering is performed by the encoding component 110.
As illustrated in Figure 1, an original video sequence is received by the pre-processing component 105, the original video sequence being comprised of multiple video frames and having an associated original data amount. In some embodiments, the pre-processing component 105 pre-filters the original video sequence to remove noise and details and produces a pre- processed (i.e., pre-filtered) video sequence having an associated pre-processed data amount that is less than the original data amount associated with the original video sequence. The data amount of a video sequence reflects an amount of data used to represent the video sequence. The encoding component 110 then receives the pre-processed video sequence and encodes (i.e., compresses) the pre-processed video sequence to produce a pre-processed and compressed video sequence. Pre-filtering methods performed by the pre-processing component 105 allows removal of noise and details from the original video sequence thus allowing for greater compression of the pre-processed video sequence by the encoding component 110. As such, the bit rate of the pre-processed and compressed video sequence is lower than the bit rate that would be obtained by compressing the original video sequence (without pre-preprocessing) with an identical compression method using the encoding component 110. The bit rate of a video sequence reflects an amount of binary coded data required to represent the video sequence over a given period of time and is typically measured in kilobits per second. The compressed video sequence is received by the decoder component 115 which processes the compressed video sequence to produce a decoded video sequence. In some systems, the decoded video sequence may be further post-processed by the post processing component 120.
Figure 2 illustrates a block diagram of video pre-processing component 105 with separate temporal pre-filtering and spatial pre-filtering components 205 and 210, respectively. The video pre-processing component 105 receives an original video sequence comprised of multiple video frames and produces a pre-processed video sequence. In some embodiments, the temporal pre-filtering component 205 performs pre-processing operations on the received video sequence and sends the video sequence to the spatial pre-filtering component 210 for further preprocessing. In other embodiments, the spatial pre-filtering component 210 performs preprocessing operations on the received video sequence and sends the video sequence to the temporal pre-filtering component 205 for further pre-processing. In further embodiments, pre- processing is performed only by the temporal pre-filtering component 205 or only by the spatial pre-filtering component 210. In some embodiments, the temporal pre-filtering component 205 and the spatial pre-filtering component 210 are configured to perform particular functions through instructions of a computer program product having a computer readable medium. Data reduction of the video frames of the original video sequence is achieved by the temporal pre-filtering component 205 and/or the spatial pre-filtering component 210. The temporal pre-filtering component 205 performs temporal pre-filtering methods of the present invention (as described in Section I) while the spatial pre-filtering component 210 performs spatial pre-filtering methods of the present invention (as described in Sections II and III). In particular, the spatial pre-filtering component 210 may use spatial anisotropic diffusion filtering for data reduction in a video sequence.
Section I: Temporal Pre-Filtering Figure 3 illustrates a flowchart for a temporal pre-filtering method 300 in accordance with the present invention. The method 300 may be performed, for example, by the temporal pre- filtering component 205 or the encoder component 110. The temporal pre-filtering method 300 commences by receiving an original video sequence in YUV format (at 305). The original video sequence comprises a plurality of video frames and having an associated data amount. In other embodiments, a video sequence in another format is received. The method then sets (at 310) a first video frame in the video sequence as a current frame (i.e., frame f) and a second video frame in the video sequence as a next frame (i.e., frame f + 1). The current frame is comprised of a current luminance (Y) frame and current chrominance (U and V) frames. Similarly, the next frame is comprised of a next luminance (Y) frame and next chrominance (U and V) frames. As such, the current and next frames are each comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values (such as luminance and chrominance values from the luminance and chrominance frames, respectively). Pixels and pixel locations are identified by discrete row (e.g., i) and column (e.g., j) indices (i.e., coordinates) such that 1< i < M and 1< j < N where M x N is the size of the current and next frame in pixel units. The method then determines (at 315) the mean of the luminance values in the current luminance frame. Using the mean luminance (abbreviated as mean (Y) or mu), the method
determines (at 320) high and low luminance threshold values (0"ma and θ ma , respectively) and
high and low chrominance threshold values (θ"hromaanά θc L hrotm, respectively), as discussed below
with reference to Figures 4a and 4b. The method then sets (at 325) row (i) and column (j) values for initial current pixel location coordinates. For example, the initial current pixel location coordinates may be set to equal (0, 0). The method 300 then computes (at 330) a difference between a luminance value at the current pixel location coordinates in the next luminance frame and a luminance value at the current pixel location coordinates in the current luminance frame. This luminance difference (difYij ) can be expressed mathematically as: difYld = x1J(Yf+1) - x1,J(Yf) where i and j are coordinates for the rows and columns, respectively, and f indicates the cuπent frame and f+1 indicates the next frame. The method 300 then determines (at 335) if the luminance difference (difY,j ) at the current pixel location coordinates is within the high and low luminance threshold values
(^/! α arιc Lma > respectively). If not, the method proceeds directly to step 345. If, however, the
method determines (at 335 - Yes) that the luminance difference (difY ) is within the high and low luminance threshold values, the luminance values at the current pixel location coordinates in the current and next luminance frames are filtered (at 340). In some embodiments, the luminance value at the current pixel location coordinates in the next luminance frame is set to equal the average of the luminance values at the current pixel location coordinates in the current luminance frame and the next luminance frame. This operation can be expressed mathematically as: x1)J(Yf+ι) = (χ1J(Yf) + x1J(Yf+ι))/2. In other embodiments, other filtering methods are used.
The method 300 then computes (at 345) differences in chrominance values of the next chrominance (U and V) frames and current chrominance (U and V) frames at the current pixel location coordinates. These chrominance differences (difU,j and difV,j) can be expressed mathematically as: difU.j = x,j(Ufn) - x,0(Uf) and difV10 = x1)J(Vf+1) - x1J(Vf). The method 300 then determines (at 350) if the U chrominance difference (difU,j ) at the current pixel location coordinates is within the high and low U chrominance threshold values
(θ"hromaanά θc L hroma , respectively). If not, the method proceeds directly to step 360. If, however, the
method determines (at 350 - Yes) that the U chrominance difference (difU,j ) is within the high and low U chrominance threshold values, then the U chrominance values at the current pixel location coordinates in the current and next U chrominance frames are filtered (at 355). In some embodiments, the value at the current pixel location coordinates in the next U chrominance frame is set (at 355) to equal the average of the values at the current pixel location coordinates in the current U chrominance frame and the next U chrominance frame. This operation can be expressed mathematically as: x1J(Uf+1) = (x1J(Uf) + x1J(Uf+1))/2. In other embodiments, other filtering methods are used. The method 300 then determines (at 360) if the V chrominance difference (difV,d) at the current pixel location coordinates is within the high and low V chrominance threshold values Θ chroma and θ chroma > respectively). If not, the method proceeds directly to step 370. If, however, the
method determines (at 360 - Yes) that the V chrominance difference (difV.j) is within the high and low V chrominance threshold values, then the V chrominance values at the current pixel location coordinates in the current and next V chrominance frames are filtered (at 365). In some embodiments, the value at the current pixel location coordinates in the next V chrominance frame is set to equal the average of the values at the current pixel location coordinates in the current V chrominance frame and the next V chrominance frame. This operation can be expressed mathematically as: x1J(Vf+1) = (x1J(Vf) + xld(Vf+1))/2. In other embodiments, other filtering methods are used. The method 300 then determines (at 370) if the current pixel location coordinates are last pixel location coordinates of the current frame. For example, the method may determine whether the current row (i) coordinate is equal to M and the current column (j) coordinate is equal to N where M x N is the size of the current frame in pixel units. If not, the method sets (at 375) next pixel location coordinates in the current frame as the current pixel location coordinates. The method then continues at step 330. If the method 300 determines (at 370 - Yes) that the current pixel location coordinates are the last pixel location coordinates of the current frame, the method 300 then determines (at 380) if the next frame is a last frame of the video sequence (received at 305). If not, the method sets (at 385) the next frame as the current frame (i.e., frame f) and a frame in the video sequence subsequent to the next frame as the next frame (i.e., frame f + 1). For example, if the current frame is a first frame and the next frame is a second frame of the video sequence, the second frame is set (at 385) as the current frame and a third frame of the video sequence is set as the next frame. The method then continues at step 315. If the method 300 determines (at 380 - Yes) that the next frame is the last frame of the video sequence, the method outputs (at 390) a pre-filtered video sequence being comprised of multiple pre-filtered video frames and having an associated data amount that is less than the data amount associated with the original video sequence (received at 305). The pre-filtered video sequence may be received, for example, by the spatial pre-filtering component 210 for further pre-processing or the encoder component 110 for encoding (i.e., compression). After compression by the encoder component 110, the bit rate of the pre-filtered and compressed video sequence is lower than the bit rate that would be obtained by compressing the original video sequence (without pre-filtering) using the same compression method. Figure 4a illustrates a graph of an exemplary high luminance threshold function 405 that
determines a high luminance threshold value (θ"uma ). In the example shown in Figure 4a, the
high luminance threshold function 405 is a piecewise linear function of the mean luminance (mean (Y)) of a video frame, the mean luminance being equal to mu, as expressed by the following equations:
} 2 Hi , if μ < β2 -a μ + b f if μ2 < β < μz Hi , if μ > øs
Figure 4b illustrates a graph of an exemplary low luminance threshold function 415 that
determines a low luminance threshold value (θuma ). In the example shown in Figure 4b, the low
luminance threshold function 415 is a piecewise linear function of the high luminance threshold value as expressed by the following equations:
Figure imgf000022_0001
In Figures 4a and 4b, Hi, Li, u2, u3, H2, and H3 are predetermined values. The value of Hi determines the saturation level of the high luminance threshold function 405 and the value of Li determines the saturation level of the low luminance threshold function 415. The values u2 and u3 determine cutoff points for the linear variation of the high luminance threshold function 405 and the values H2, and H3 determine cutoff points for the linear variation of the low luminance threshold function 415. Correct specification of values u2, u3, H2, and H3 are required to prevent temporal artifacts such as ghosting or trailing to appear in a temporal-filtered video sequence. In some embodiments, the high chrominance threshold value ( θ"hromtl ) is based on the high
luminance threshold value (θ"uma ) and the low chrominance threshold value (θ roma) is based on
the low luminance threshold value
Figure imgf000022_0002
). For example, in some embodiments, the values for the
high and low chrominance threshold values (θ"hromaavA θc L hroma , respectively) can be determined
by the following equations:
Figure imgf000022_0003
vckram& ~ viιam As described above, the high luminance threshold (θ"ma ) is a function of the mean
luminance of a video frame, the low luminance threshold (θ^uma) is a function of the high luminance threshold (θ"ma ), the high chrominance threshold (θ"hroma ) is based on the high
luminance threshold (θ"ma ), and the low chrominance threshold (θc L hroma) is based on the low
luminance threshold
Figure imgf000023_0001
). As such, the high and low luminance and chrominance threshold
values are based on the mean luminance of a video frame and thus provide variability of filtering strength depending on the illumination levels of the frame to provide noise and data reduction.
Section II: Spatial Pre-Filtering Some embodiments of the present invention provide a method for pre-processing a video sequence using spatial anisotropic diffusion filtering to provide data reduction of the video sequence. In addition, after compression of the pre-processed video sequence, the bit rate of the pre-processed and compressed video sequence will be lower than the bit rate of the video sequence after compression but without pre-processing. Some embodiments use conventional spatial anisotropic diffusion filters such as a Fallah- Ford diffusion filter (as described with reference to Figure 5) or a Perona-Malik anisotropic diffusion filter (as described with reference to Figure 6). Other embodiments use an omnidirectional spatial filtering method that extends the traditional Perona-Malik diffusion filter to perform diffusion in at least one diagonal direction (as described with reference to Figure 8 and
Figure 9). Fallah-Ford Spatial Filtering In some embodiments, the mean curvature diffusion (MCD) Fallah-Ford spatial anisotropic diffusion filter is used. The MCD Fallah-Ford filter makes use of a surface diffusion model as opposed to a plane diffusion model employed by the Perona-Malik anisotropic diffusion filter discussed below. In the MCD model, an image is a function of two spatial location coordinates (x, y) and a third (gray level) z coordinate. For each pixel located at the pixel location coordinates (x, y) in the image I, the MCD diffusion is modeled by the MCD diffusion equation:
Figure imgf000024_0001
where the function h is given by the equation:
and the diffusion coefficient c(x, y, t) is computed as the inverse of the surface gradient magnitude, i.e.:
<* t V- Λ V - - _ ||VA_|| ~ - ^ 1 + !
It can be shown that the MCD theory holds if the image is linearly scaled and the implicit surface function is redefined as: h{xt f, z) — z — m (s, y, t) - n
where m and n are real constants. The diffusion coefficient of MCD becomes
Figure imgf000024_0002
The edges satisfying the condition II 11 m are preserved. The smaller the value of m, the greater the diffusion in each iteration and the faster the surface evolves. From iteration t to t+1, the total absolute change in the image surface area is given by the equation: ΔΛ(t + 1) = J J I Vh(x, y, t + 1)| - Vh(x, y, t)\ \ dxdy
Note that if the mean curvature is defined as the average value of the normal curvature in any two orthogonal directions, then selecting the diffusion coefficient to be equal to the inverse of the surface gradient magnitude results in the diffusion of the surface at a rate equal to twice the mean curvature, and hence the name of the algorithm. Figure 5 is a flowchart showing a method 500 for pre-processing a video sequence using Fallah-Ford spatial anisotropic diffusion filtering to reduce the data amount of the video sequence and to reduce the bit rate of the compressed video sequence. The method 500 may be performed, for example, by the spatial pre-filtering component 210 or the encoder component 110. The method 500 starts when an original video sequence comprised of multiple video frames is received (at 505), the original video sequence having an associated data amount. The method sets (at 510) a first video frame in the video sequence as a current frame. The current frame is comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values (such as luminance and chrominance values). In some embodiments, the Y luminance values (gray level values) of the current frame are filtered. In other embodiments, the U chrominance values or the V chrominance values of the current frame are filtered. Pixels and pixel locations are identified by discrete row (e.g., x) and column (e.g., y) coordinates such that 1< x < M and 1< y < N where M x N is the size of the current frame in pixel units. The method then sets (at 515) row (x) and column (y) values for an initial current pixel location. The method also sets (at 520) the number of iterations (no_iterations), i.e., time steps (t), to be performed for each pixel location (x, y). The number of iterations can be determined depending on the amount of details to be removed. The method then estimates (at 525) components and a magnitude of the image
gradient I W|| using an edge detector. In one embodiment, the Sobel edge detector is used since
the Sobel edge detector makes use of a difference of averages operator and has a good response to diagonal edges. However, other edge detectors may be used. The method then computes (at 530) a change in surface area AA using the following equation:
ΔΛ(t + 1) = f f \ Vh(x, y, t + 1)| - \Vh(x, y, t)\ \ ώcdy
The method computes (at 535) diffusion coefficient c(x, y, t) as the inverse of the surface gradient magnitude using the equation:
where the scaling parameter m is selected to be equal to the inverse of the percentage change of AA . The MCD diffusion equation given by: dh(x, y, z, t) = d (c Vh)
can be then approximated in a discrete form using first order spatial differences. The method then computes (at 540) components of a 3x3 filter kernel using the following equations:
™ι = 8 l?A(s- ι.y - j|
Figure imgf000026_0001
U4 S Vh(p - l,y) w(x, y) = 1 - ∑ -! Wi W5 = 8 \V h{x + l, v)\
Figure imgf000026_0002
The method then convolves (at 545) the 3x3 filter kernel with an image neighborhood of the pixel at the current pixel location (x, y). The method decrements (at 550) no iterations by one and determines (at 555) if no_iterations is equal to 0. If not, the method continues at step 525. If so, the method determines (at 560) if the current pixel location is a last pixel location of the current frame. If not, the method sets (at 565) a next pixel location in the current frame as the current pixel location. The method then continues at step 520. If the method 500 determines (at 560 - Yes) that the current pixel location is the last pixel location of the current frame, the method then determines (at 570) if the current frame is a last frame of the video sequence (received at 505). If not, the method sets (at 575) a next frame in the video sequence as the current frame. The method then continues at step 515. If the method determines (at 570 - Yes) that the current frame is the last frame of the video sequence, the method outputs (at 580) a pre-filtered video sequence being comprised of multiple pre-filtered video frames and having an associated data amount that is less than the data amount associated with the original video sequence (received at 505).
The pre-filtered video sequence may be received, for example, by the temporal pre- filtering component 205 for further pre-processing or the encoder component 110 for encoding (i.e., compression). After compression by the encoder component 110, the bit rate of the pre- filtered and compressed video sequence is lower than the bit rate that would be obtained by compressing the original video sequence (without pre-filtering) using the same compression method.
Traditional Perona-Malik Spatial Filtering
In some embodiments, a traditional Perona-Malik anisotropic diffusion filtering method is used for pre-processing a video frame to reduce the data amount of the video sequence and to reduce the bit rate of the compressed video sequence. Conventional Perona- Malik anisotropic diffusion is expressed in discrete form by the following equation: l{x, y, t + 1) = l{x, y, t) + λ ∑ 9(Vlp( , Vft)) lp χ, yt) p6t}(ap, y) where: • I(x, y, t) is a discrete image;
• VT(x, y, t) is the image gradient;
• (x, y) specifies a pixel location in a discrete, two dimensional grid covering the video frame;
• t denotes discrete time steps (i.e., iterations); • scalar constant λ determines the rate of diffusion, λ being a positive real number;
• η (x, y) represents the spatial neighborhood of the pixel having location (x, y); and
• g( ) is an edge stopping function that satisfies the condition g(Vj) → 0 when VT→ oo such that the diffusion operation is stopped across the edges of the video frame.
In two dimensions, the equation becomes: Xfø y, t 4- 1) = KJB, y, t) + λ [cyfø y, i) VΪN(x y, i) + cs(x, y, t) VIs(x, y, t) + Cjjfø V, t) YXB(X, V, t) + w(x, V, t) ZHrfø v* <)] where: • subscripts (N, S, E, W) correspond to four horizontal or vertical directions of diffusion (north, south, east, and west) with respect to a pixel location (x, y); and scalar constant λ is less than or equal to where \η(x, y)\ is the number of
Figure imgf000029_0001
neighboring pixels which is equal to four (except at the video frame boundaries where it is less than four) so that λ< — . 4 Notations c#, cs, CE, and c are diffusion coefficients, each being refeπed to as an edge stopping function g(x) of VT(x,y,t) in a corresponding direction as expressed in the following
equations: cjv(x, y, t) = g(VJ)v(x, y, t)) C5(x, y, t) = g(VJs(x, y, t)) c£(x, y, t) = g(VT£(x, y, t)) c x, y, t) = g(VJ) x, y, t)).
The approximation of the image gradient in a selected direction is employed using the equation:
VJp(iE, y, t) = lp{x> y, t) - l(x, y, i), p € rfe, y) For instance, in the "northern" direction the gradient can be computed as the difference
given by: VZ , y) = X(x, y + l, t) - Z(x, y, t)
Various edge-stopping functions g(x) may be used such as: g( , y, i) = exp 'VZfa y, i) h and
Figure imgf000030_0001
where:
• notations k and K denote parameters with constant values during the diffusion process; and • e > 0 and 0 < < 1.
Figure 6 is a flowchart showing a method 600 for pre-processing a video sequence using Perona-Malik spatial anisotropic diffusion filtering to reduce the data amount of the video sequence and to reduce the bit rate of the compressed video sequence. The method 600 may be performed, for example, by the spatial pre-filtering component 210 or the encoder component 110.
The method 600 starts when an original video sequence comprised of multiple video frames is received (at 605), the original video sequence having an associated data amount. The method sets (at 610) a first video frame in the video sequence as a cuπent frame. The current frame is comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values (such as luminance and chrominance values). In some embodiments, the luminance values (i.e., the luminance plane) of the cuπent frame are filtered. In other embodiments, the chrominance (U) values (i.e., the chrominance (U) plane) or the chrominance (V) values (i.e., the chrominance (V) plane) of the cuπent frame are filtered. Pixels and pixel locations are identified by discrete row (e.g., x) and column (e.g., y) coordinates such that 1< x < M and 1< y < N where M x N is the size of the cuπent frame in pixel units. The method then sets (at 615) row (x) and column (y) values for an initial cuπent pixel location. The method also sets (at 620) the number of iterations (no_iterations), i.e., time steps (t), to be performed for each pixel location (x,y). The number of iterations can be determined depending on the amount of details to be removed. The method then selects (at 625) an edge-stopping function g(x) and values of parameters (such as λ and k). The method then computes (at 630) approximations of the image gradient in the north, south, east, and west directions (6 , δs, 6E, and δw, respectively), using the equations: cw( , y, t) = g(ViA χ, y, t)) cs(x, y, t) = g(VJs(x, y. t)) c£(x, y, t) = g(VI£(x, y, t)) cw(x, y, t) = g(VJn x, y, t)).
The method then computes (at 640) diffusion coefficients in the north, south, east, and west directions (c#, cs, C£, and cw respectively) where: CN = g(δκ)
Figure imgf000031_0001
C£ = g(δE) C^= g(δw).
The method then computes (at 645) a new pixel value for the cuπent pixel location using
the equation: I(x, y, t + 1) = I(Ϊ, y, t) + λ
Figure imgf000031_0002
y, i) IN(xf y, i) + csfa V> *) V2j(r, y, i) + CE& Vt *) V1E(Ϊ, y, t) + qγ[x, y, t) V2w(xf y, t)]
i.e., I(x, y) = I(x, y) + λ(cwδN + csδs + c£δE + cκ^ ) where I(x, y) is the luminance (Y) plane. In other embodiments, I(x, y) is the chrominance (U) plane or the chrominance (V) plane. The method then decrements (at 650) no iterations by one and determines (at 655) if no_iterations is equal to 0. If not, the method continues at step 630. If so, the method determines (at 660) if the cuπent pixel location is a last pixel location of the cuπent frame. If not, the method sets (at 665) a next pixel location in the cuπent frame as the cuπent pixel location. The method then continues at step 630.
If the method 600 determines (at 660 - Yes) that the cuπent pixel location is the last pixel location of the cuπent frame, the method then determines (at 670) if the cuπent frame is a last frame of the video sequence (received at 605). If not, the method sets (at 675) a next frame in the video sequence as the cuπent frame. The method then continues at step 615. If the method determines (at 670 - Yes) that the cuπent frame is the last frame of the video sequence, the method outputs (at 680) a pre-filtered video sequence being comprised of multiple pre-filtered video frames and having an associated data amount that is less than the data amount associated with the original video sequence (received at 605).
The pre-filtered video sequence may be received, for example, by the temporal pre- filtering component 205 for further pre-processing or the encoder component 110 for encoding (i.e., compression). After compression by the encoder component 110, the bit rate of the pre- filtered and compressed video sequence is lower than the bit rate that would be obtained by compressing the original video sequence (without pre-filtering) using the same compression method.
Non-Traditional Perona-Malik Spatial Filtering Figure 7 illustrates a conceptual diagram of a diffusion pattern of a conventional Perona- Malik anisotropic diffusion filter. As shown in Figure 7, a conventional Perona-Malik anisotropic filter performs diffusion on a pixel 705 in only horizontal and vertical directions (north, south, east and west) with respect to the pixel's location (x, y). For example, for a pixel location of (2, 2), a conventional anisotropic diffusion filter will perform diffusion filtering in the horizontal or vertical directions from the pixel location (2, 2) towards the horizontal or vertical neighboring pixel locations (2, 3), (2, 1), (3, 2), and (1, 2). In some embodiments, spatial diffusion filtering is performed on a pixel in at least one diagonal direction (north-east, north-west, south-east, or south-west) with respect to the pixel's location (x, y). For example, for a pixel location of (2, 2), the method of the present invention performs diffusion filtering in at least one diagonal direction from the pixel location (2, 2) towards the direction of a diagonal neighboring pixel location (3, 3), (1, 3), (3, 1) and or (1, 1). In other embodiments, diffusion filtering is performed in four diagonal directions (north-east, northwest, south-east, and south-west) with respect to a pixel location (x, y). The various embodiments of spatial diffusion filtering may be performed, for example, by the spatial pre- filtering component 210 or the encoder component 110. Figure 8 illustrates a conceptual diagram of a diffusion pattern of an omni-directional anisotropic diffusion filter in accordance with the present invention. As shown in Figure 8, the omni-directional anisotropic diffusion filter performs diffusion in four horizontal or vertical directions (north, south, east and west) and four diagonal directions (north-east, north-west, south-east, and south-west) with respect to a pixel 805 at pixel location (x, y). For example, for a pixel location of (2, 2), the omni-directional anisotropic diffusion filter will perform diffusion filtering in four horizontal or vertical directions from the pixel location (2, 2) towards the horizontal or vertical neighboring pixel locations (2, 3), (2, 1), (3, 2), and (1, 2) and in four diagonal directions from the pixel location (2, 2) towards the diagonal neighboring pixel locations (3, 3), (1, 3), (3, 1) and (1, 1). In some embodiments, a video frame is pre-processed using omni-directional diffusion filtering in four horizontal or vertical directions and four diagonal directions as expressed by the following omni-directional spatial filtering equation (shown in two dimensional form):
X(x, y, t + l) = l(x, y, t) + ∑ cm(x, y, t) VXm(x, y, t)
+ β ∑ Cn(x, y, t) V nfa, y, t) ] where: • I(x, y, t) is a discrete image;
• Vj(x, y, t) is the image gradient; • (x, y) specifies a pixel location in a discrete, two dimensional grid covering the video frame; • t denotes discrete time steps (i.e., iterations); • scalar constant λ determines the rate of diffusion, λ being a positive real number that is less than or equal to where (x,y)| is the number of neighboring
Figure imgf000034_0001
pixels which is equal to eight (except at the video frame boundaries where it is less than eight) so that λ< — ; and 8 • subscripts m and n coπespond to the eight directions of diffusion with respect to the pixel location (x, y), where m is a horizontal or vertical direction (N, S, E, W) and n is a diagonal direction (NE, SE, SW, NW). Notations cm and c„ are diffusion coefficients where horizontal or vertical directions (N,
S, E, W) are indexed by subscript m and diagonal directions (NE, SE, SW, NW) are indexed by subscript n. Each diffusion coefficient is refeπed to as an edge stopping function g(x) of Vj(x,y,t)
in the coπesponding direction as expressed in the following equations: cm(x, y, t) = g(Vrm(x, y, t)) c„(x, y, t) = g(Vr„(x, y, t))
where g(x) satisfies the condition g(x) → 0 when x → ∞ such that the diffusion operation is stopped across the edges of the video frame. Because the distance between a pixel location (x, y) and any of its diagonal pixel neighbors is larger than the distance between the distance between the pixel location and its horizontal or vertical pixel neighbors, the diagonal pixel differences are scaled by a factor β, which is a function of the frame dimensions M, N. Also employed is the approximation of the image gradient V/(x, y, t) in a selected
direction as given by the equation:
VXp(x, y, t) = Xpξx, y, t) - X(x, y, t), p € rf(x, y)
For example, in the northern (N) direction, the image gradient VJ(x, y, t) can be computed
as a difference given by the equation: J^z, y) = (x, y + 1, t) - X(x, y, t) Various edge-stopping functions g(x) may be used such as:
Figure imgf000036_0001
or
Figure 9 is a flowchart showing a method 900 for pre-processing a video sequence using omni-directional spatial anisotropic diffusion filtering to reduce the data amount of the video sequence and to reduce the bit rate of the compressed video sequence. The method 900 may be performed, for example, by the spatial pre-filtering component 210 or the encoder component
110.
The method 900 starts when an original video sequence comprised of multiple video frames is received (at 905), the original video sequence having an associated data amount. The method sets (at 910) a first video frame in the video sequence as a cuπent frame. The cuπent frame is comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values (such as luminance and chrominance values). In some embodiments, the luminance values (i.e., the luminance plane) of the cuπent frame are filtered. In other embodiments, the chrominance (U) values (i.e., the chrominance (U) plane) or the chrominance (V) values (i.e., the chrominance (V) plane) of the cuπent frame are filtered. Pixels and pixel locations are identified by discrete row (e.g., x) and column (e.g., y) coordinates such that 1< x < M and 1< y < N where M x N is the size of the cuπent frame in pixel units. The method then sets (at 915) row (x) and column (y) values for an initial cuπent pixel location. The method
also sets (at 920) the number of iterations (no iterations), i.e., time steps (t), to be performed for each pixel location (x,y). The number of iterations can be determined depending on the amount
of details to be removed. The method then selects (at 925) an edge-stopping function g(x) and values of parameters
(such as λ and k). The method then computes (at 930) approximations of the image gradient in
the north, south, east, west, north-east, north-west, south-east, and south-west directions (5N, δs,
δe, δ , 6NE, Nw, δsε, and δsw, respectively) using the equations:
CJV(X, y, t) = g(Vj/v(x, y, t)) Cs(x, y, t) = g(VTs(x, y, t)) c£(x, y, t) = g(VJ£(x, y, t)) c^x, y, t) = g(VJ»{x, y» 0) CΛT£(X, y, t) = g(VI(x, y, t)) cNW(x, y, t) - g(Vr^x, y, t)) c(x, y, t) = g(VT(x, y, t))
Figure imgf000037_0001
The method then computes (at 940) diffusion coefficients in the north, south, east, west,
north-east, north-west, south-east, and south-west directions (CΛΓ, CS, eg, c , CNE, NW, SE, and csw,
respectively) where:
CN = g(δw) c^ = g(δs) C£ = g(δE) c^ = g(δw) cNE = g(&m) NW = (δNw) C E = g(δsE) csw = g(δsw). The method then computes (at 945) a new pixel value for the cuπent pixel location using the equation:
X{x, y, t + 1) = l(x, y, t) + λ t ∑ < (s> y, t) VXm x, y, t) AEtW
Figure imgf000038_0001
i.e., I(x, y) = I(x, y) + λ [(cwδN + csδs + c£δE + cnδ ) + β(cδNE + CJVWONW + cδSE + cs^Sw)] where I(x, y) is the luminance (Y) plane. In other embodiments, I(x, y) is the chrominance (U) plane or the chrominance (V) plane. The method then decrements (at 950) no_iterations by one and determines (at 955) if no_iterations is equal to 0. If not, the method continues at step 930. If so, the method determines
(at 960) if the cuπent pixel location is a last pixel location of the cuπent frame. If not, the method sets (at 965) a next pixel location in the cuπent frame as the cuπent pixel location. The method then continues at step 930.
If the method 900 determines (at 960 - Yes) that the cuπent pixel location is the last pixel location of the cuπent frame, the method then determines (at 970) if the cuπent frame is a last frame of the video sequence (received at 905). If not, the method sets (at 975) a next frame in the video sequence as the cuπent frame. The method then continues at step 915. If the method determines (at 970 - Yes) that the cuπent frame is the last frame of the video sequence, the method outputs (at 980) a pre-filtered video sequence being comprised of multiple pre-filtered video frames and having an associated data amount that is less than the data amount associated with the original video sequence (received at 905). The pre-filtered video sequence may be received, for example, by the temporal pre- filtering component 205 for further pre-processing or the encoder component 110 for encoding (i.e., compression). The bit rate of the pre-filtered video sequence after compression using the encoder component 110 is lower than the bit rate of the original video sequence (without pre- filtering) after compression using the same compression method.
Section III: Foreground/Background Differentiation Method In some embodiments, foreground/background differentiation methods are used to pre- filter a video sequence so that filtering is performed differently on a foreground region of a video frame of the video sequence than on a background region of the video frame. Performing different filtering on different regions of the video frame allows a system to provide greater data reduction in unimportant background regions of the video frame while preserving sharp edges in regions-of-interest in the foreground region. In addition, after compression of the pre-processed video sequence, the bit rate of the pre-processed and compressed video sequence will be lower than the bit rate of the compressed video sequence made without pre-processing. This foreground/background differentiation method is especially beneficial in videoconferencing applications but can be used in other applications as well. The foreground/background differentiation method of the present invention includes five general steps: 1) identifying pixel locations in a video frame having pixel values that match color characteristics of human skin and identification of contiguous groupings of matching pixel locations (i.e., regions-of-interest); 2) determining a bounding shape for each region-of-interest, the totality of all pixel locations contained in a bounding shape comprising a foreground region and all other pixel locations in the frame comprising a background region; 3) creating a binary mask Mfg for the foreground region and a binary mask Mbg for the background region; 4) filtering the foreground and background regions using different filtering methods or parameters using the binary masks; and 5) combining the filtered foreground and background regions into a single filtered frame. These steps are discussed with reference to Figures 10 and 11a through lid. Figure 10 illustrates a flowchart depicting a foreground/background differentiation method 1000 in accordance with the present invention. The foreground/background differentiation method 1000 may be performed, for example, by the spatial pre-filtering component 210 or the encoder component 110. The foreground/background differentiation method 1000 commences by receiving an original video sequence in YUV format (at 1005). The video sequence comprises a plurality of video frames and having an associated data amount. In other embodiments, a video sequence in another format is received. The method then sets (at 1010) a first video frame in the video sequence as a cuπent frame. The cuπent frame is comprised of a cuπent luminance (Y) frame and cuπent chrominance (U and V) frames. As such, the cuπent frame is comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values (such as luminance and chrominance values from the luminance and chrominance frames, respectively). Pixels and pixel locations are identified by discrete row (e.g., x) and column (e.g., y) coordinates such that 1< x < M and 1< y < N where M x N is the size of the cuπent frame in pixel units. The method then sets (at 1015) row (x) and column (y) values for an initial cuπent pixel location. For example, the initial cuπent pixel location may be set to equal (0, 0). The method then determines (at 1020) if the cuπent pixel location in the cuπent frame contains one or more pixel values that fall within predetermined low and high threshold values. In some embodiments, the method determines if the cuπent pixel location has pixel values that satisfy the condition Uιow < U(x, y) < Uhlgh and Vιow < V(x, y) < Vhlgh where U and V are chrominance values of the cuπent pixel location (x, y) and threshold values Uιow, Uh,gh, Vtow, and Vh,gh are predetermined chrominance values that reflect the range of color characteristics (i.e., chrominance values U, V) of human skin. As such, the present invention makes use of the fact that, for all human races, the chrominance ranges of the human face/skin are consistently the same. In some embodiments, the following predetermined threshold values are used: Uιo = 75, Uhigh = 130, V]0W = 130, and Vhlgh = 160. In other embodiments, the method includes identifying pixel locations in the video frame having pixel values that match other characteristics, such as a predetermined color or brightness. If the method determines (at 1020 - Yes) that the cuπent pixel location contains pixel values that fall within the predetermined low and high threshold values, the cuπent pixel location is refeπed to as a matching pixel location and is added (at 1025) to a set of matching pixel locations. Otherwise, the method proceeds directly to step 1030. The foreground/background differentiation method 1000 determines (at 1030) if the cuπent pixel location is a last pixel location of the cuπent frame. For example, the method may determine whether the row (x) coordinate of the cuπent pixel location is equal to M and the column (y) coordinate of the cuπent pixel location is equal to N where M x N is the size of the cuπent frame in pixel units. If not, the method sets (at 1035) a next pixel location in the cuπent frame as the cuπent pixel location. The method then continues at step 1020. As described above, steps 1020 through 1035 compose a human skin identifying system that identifies pixel locations in a video frame having pixel values that match characteristics of human skin. Other human skin identifying systems well known in the art, however, may be used in place of the human skin identifying system described herein without departing from the scope of the invention. If the method 1000 determines (at 1030 - Yes) that the cuπent pixel location is the last pixel location of the cuπent frame, the method then determines (at 1040) contiguous groupings of matching pixel locations in the set of matching pixel locations. Each contiguous grouping of matching pixel locations is refeπed to as a region-of-interest (ROI). A region-of-interest can be defined, for example, by spatial proximity wherein all matching pixel locations within a specified distance are grouped in the same region-of-interest. An ROI is typically a distinct entity represented in the cuπent frame, such as a person's face or an object (e.g., cup) having chrominance values similar to that of human skin. Figure 11a illustrates an example of a video frame 1100 having two ROIs. The first ROI represents a person's face 1105 and the second ROI represents a cup 1115 having chrominance values similar to that of human skin (i.e., having chrominance values that fall within the predetermined chrominance threshold values). Also shown in Figure 11a are representations of a person's clothed body 1110, a carton 1120, and a book 1125, none of which have chrominance values similar to that of human skin. A bounding shape is then determined (at 1045) for each ROI, the bounding shape enclosing all or a portion of the ROI (i.e., the bounding shape encloses all or some of the matching pixel locations in the ROI). The bounding shape may be of various geometric forms, such as a four-sided, three-sided, or circular form. In some embodiments, the bounding shape is a in the form of a box where a first side of the bounding shape is determined by the lowest x coordinate, a second side of the bounding shape is determined by the highest x coordinate, a third side of the bounding shape is determined by the lowest y coordinate, and a fourth side of the bounding shape is determined by the highest y coordinate of the matching pixel locations in the ROI. In other embodiments, the bounding shape does not enclose the entire ROI and encloses over V2 or 3Λ of the matching pixel locations in the ROI. Figure lib illustrates an example of a video frame 1100 having two ROIs, each ROI being enclosed by a bounding shape. The first ROI (the person's face 1105) is enclosed by a first bounding shape 1130 and the second ROI (the cup 1115) is enclosed by a second bounding shape 1135. Use of a bounding shape for each ROI gives a fast and simple approximation of an ROI in the video frame 1100. Being an approximation of an ROI, a bounding shape will typically enclose a number of non-matching pixel locations along with the matching pixel locations of the
ROI. The method then determines (at 1050) foreground and background regions of the cuπent frame. The foreground region is comprised of a totality of regions in the cuπent frame enclosed within a bounding shape. In other words, the foreground region is comprised of a set of foreground pixel locations (matching or non-matching) of the cuπent frame enclosed within a bounding shape. In the example shown in Figure lib, the foreground region is comprised of the totality of the regions or pixel locations enclosed by the first bounding shape 1130 and the second bounding shape 1135. The background region is comprised of a totality of regions in the cuπent frame not enclosed within a bounding shape. In other words, the background region is comprised of a set of background pixel locations not included in the foreground region. In the example shown in Figure lib, the background region is comprised of the regions or pixel locations not enclosed by the first bounding shape 1130 and the second bounding shape 1135. The method then constructs (at 1055) a binary mask M/g for the foreground region and a binary mask Mhg for the background region. In some embodiments, the foreground binary mask Mfg is defined to contain values equal to 1 at pixel locations in the foreground region and to contain values equal to 0 at pixel locations not in the background region. Figure lie illustrates the video frame 1100 after a foreground binary mask Mfg has been applied. As shown in Figure lie, application of the foreground binary mask Mfg removes the background region so that only the set of foreground pixel locations or the foreground region (i.e., the regions enclosed by the first bounding shape 1130 and the second bounding shape 1135) of the frame remains. The background binary mask Mt,g is defined as the complement of the foreground binary mask Mfg so that it contains values equal to 0 at pixel locations in the foreground region and contains values equal to 1 at pixel locations not in the background region. Figure lid illustrates the video frame 1100 after a background binary mask Mbg has been applied. As shown in Figure lid, application of the background binary mask Mhg removes the foreground region so that only the set of background pixel locations or the background region (i.e., the regions not enclosed by the first bounding shape 1130 and the second bounding shape 1135) of the frame remains. Using the binary masks Mfg and Mbg, the method then performs (at 1060) different filtering of the foreground and background regions (i.e., the set of foreground pixel locations and the set of background pixel locations are filtered differently). In some embodiments, foreground and background regions are filtered using anisotropic diffusion where different edge stopping functions and/or parameter values are used for the foreground and background regions. Conventional anisotropic diffusion methods may be used, or an improved omni-directional anisotropic diffusion method (as described with reference to Figures 8 and 12) may be used to filter the foreground and background regions. In other embodiments, other filtering methods are used and applied differently to the foreground and background regions. The filtered foreground and background regions are then combined (at 1065) to form a cuπent filtered frame.
The foreground/background differentiation method 1000 then determines (at 1070) if the cuπent frame is a last frame of the video sequence (received at 1005). If not, the method sets (at 1075) a next frame in the video sequence as the cuπent frame. The method then continues at step 1015. If the method 1000 determines (at 1070 - Yes) that the cuπent frame is the last frame of the video sequence, the method outputs (at 1080) a pre-filtered video sequence being comprised of multiple pre-filtered video frames and having an associated data amount that is less than the data amount associated with the original video sequence (received at 1005)..
The pre-filtered video sequence may be received, for example, by the temporal pre- filtering component 205 for further pre-processing or the encoder component 110 for encoding (i.e., compression). The bit rate of the pre-filtered video sequence after compression using the encoder component 110 is lower than the bit rate of the video sequence without pre-filtering after compression using the same compression method. The foreground and background regions may be filtered using different filtering methods or different filtering parameters. Among spatial filtering methods, diffusion filtering has the important property of generating a scale space via a partial differential equation. In the scale space, analysis of object boundaries and other information at the coπect resolution where they are most visible can be performed. Anisotropic diffusion methods have been shown to be particularly effective because of their ability to reduce details in images without impairing the subjective quality. In other embodiments, other filtering methods are used to filter the foreground and background regions differently. Figure 12 illustrates a flowchart of a method 1200 for using omni-directional spatial filtering method (described with reference to Figure 9) in conjunction with the foreground/background differentiation method 1000 (described with reference to Figure 10). The method 1200 may be performed, for example, by the spatial pre-filtering component 210 or the encoder component 110. The method 1200 begins when it receives (at 1205) a video frame (i.e., the cuπent frame being processed by the method 1000). The cuπent frame is comprised of a plurality of pixels at pixel locations where each pixel location contains one or more pixel values. Pixel locations are identified by discrete row (x) and column (y) coordinates such that 1< x < M and 1< y < N where M x N is the size of the frame in pixel units. The method 1200 also receives (at 1210) a foreground binary mask Mfg and a background binary mask Mbg (constructed at step 1055 of Figure 10). The method 1200 then applies (at 1215) the foreground binary mask Mfg to the cuπent frame to produce a set of foreground pixel locations that comprise the foreground region (as shown, for example, in Figure lie). The method then sets (at 1220) row (x) and column (y) values for an initial cuπent pixel location to equal the coordinates of one of the foreground pixel locations. For example, the initial cuπent pixel location may be set to equal the coordinates of a foreground pixel location having the lowest row (x) or the lowest column (y) coordinate in the set of foreground pixel locations. The method 1200 then applies (at 1225) omni-directional diffusion filtering to the cuπent pixel location using a foreground edge stopping function g/g(x) and a set of foreground parameter values Pfg (that includes parameter values kfg and λfg). The omni-directional diffusion filtering is expressed by the omni-directional spatial filtering equation:
X{χ, y, t + l) = l(x, y, t) + ∑ Cimfa* V, t) Z^ y, t)
+ ^ cfø y, t) V∑nix, y, t) ] NEtSEtSWt W
Parameter value λ/g is a foreground parameter value that determines the rate of diffusion in the omni-directional spatial filtering in the foreground region. In some embodiments, the foreground edge stopping function gg(x) is expressed by the following equation:
§(x, y, t) = exp _ fVZ(x, y, t)V
where parameter value kfg is a foreground parameter value that controls diffusion as a function of the gradient. If the value of the parameter is low, diffusion stops across the edges. If the value of the parameter is high, intensity gradients have a reduced influence on diffusion. The method 1200 then determines (at 1230) if the cuπent pixel location is a last pixel location of the set of foreground pixel locations. If not, the method sets (at 1235) a next pixel location in the set of foreground pixel locations as the cuπent pixel location. The method then continues at step 1225. If the method 1200 determines (at 1230 - Yes) that the cuπent pixel location is the last pixel location of the set of foreground pixel locations, the method continues at
step 1240. The method 1200 applies (at 1240) the background binary mask Mbg to the cuπent frame to produce a set of background pixel locations that comprise the background region (as shown, for example, in Figure lid). The method then sets (at 1245) row (x) and column (y) values for an initial cuπent pixel location to equal the coordinates of one of the background pixel locations. For example, the initial cuπent pixel location may be set to equal the coordinates of a background pixel location having the lowest row (x) or the lowest column (y) coordinate in the set of background pixel locations. The method 1200 then applies (at 1250) omni-directional diffusion filtering to the cuπent pixel location using a background edge stopping function g&g(x) and a set of background parameter values Pbg (that includes parameter values kbg and λbg). The omni-directional diffusion filtering is expressed by the omni-directional spatial filtering equation given above. In some embodiments, at least one background parameter value in the set of background parameters Pbg is not equal to a coπesponding foreground parameter value in the set of foreground parameters Pfg. Parameter value λbg is a background parameter value that determines the rate of diffusion in the omni-directional spatial filtering in the background region. In some embodiments, the background parameter value bg is not equal to the foreground parameter value λfg. In some embodiments, the background edge stopping function g6g(x) is different than the foreground edge stopping function g/-g(x) and is expressed by the following equation:
Figure imgf000048_0001
where parameter value kbg is a background parameter value that controls diffusion as a function of the gradient. If the value of this parameter is low, diffusion stops across the edges. If the value of this parameter is high, intensity gradients have a reduced influence on diffusion. In some embodiments, the background parameter value kbg is not equal to the foreground parameter value
kfg- The method 1200 then determines (at 1255) if the cuπent pixel location is a last pixel location of the set of background pixel locations. If not, the method sets (at 1260) a next pixel location in the set of background pixel locations as the cuπent pixel location. The method then continues at step 1250. If the method 1200 determines (at 1255 - Yes) that the cuπent pixel location is the last pixel location of the set of background pixel locations, the method ends. Different embodiments of the present invention as described above may be used independently to pre-process a video sequence or may be used in any combination with any other embodiment of the present invention and in any sequence. As such, the temporal filtering method of the present invention may be used independently or in conjunction with the spatial filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence. In addition, the spatial filtering methods of the present invention may be used independently or in conjunction with the temporal filtering methods and/or the foreground/background differentiation methods of the present invention to pre-process a video sequence. Furthermore, the foreground/background differentiation method of the present invention may be used independently or in conjunction with the temporal filtering methods and/or the spatial filtering methods of the present invention to pre-process a video sequence. Some embodiments described above relate to video frames in YUV format. One of ordinary skill in the art, however, will realize that these embodiments may also relate to a variety of formats other than YUV. In addition, other video frame formats (such as RGB) can easily be changed into YUV format. Some embodiments described above relate to a videoconferencing application. One of ordinary skill in the art, however, will realize that these embodiments may also relate to other applications (e.g., DVD, digital storage media, television broadcasting, internet streaming, communication, etc.) in real-time or post-time. Embodiments of the present invention may also be used with video sequences having different coding standards such as H.263 and H.264 (also known as MPEG-4/Part 10). While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.

Claims

CLAIMS We claim:
1. A method of pre-filtering an original video sequence, the method comprising: a) receiving the original video sequence; b) pre-filtering the original video sequence using anisotropic diffusion filtering; c) outputting a pre-filtered video sequence; and d) compressing the pre-filtered video sequence using a first compression method to produce a pre-filtered and compressed video sequence.
2. The method of claim 1 wherein the anisotropic diffusion filtering is Fallah-Ford diffusion filtering.
3. The method of claim 1 wherein the anisotropic diffusion filtering is Perona-Malik diffusion filtering.
4. The method of claim 1 wherein the anisotropic diffusion filtering performs diffusion filtering in at least one diagonal direction with respect to a pixel location of a video frame of the original video sequence.
5. A method of pre-processing a video frame having a plurality of pixels at pixel locations where each pixel location contains one or more pixel values, the method comprising: performing anisotropic diffusion filtering on a pixel in the plurality of pixels, the pixel having a pixel location wherein the diffusion filtering is performed in at least one diagonal direction with respect to the pixel location.
The method of claim 5 wherein the diffusion filtering is performed in four horizontal or vertical directions (north, south, east and west) and four diagonal directions (north-east, north-west, south-east, and south-west) with respect to the pixel location. The method of claim 6 wherein the diffusion filtering is expressed by the equation: X{x, y, t + l) = X(x, y, t) + ■ ∑ cm(x, y, t) VX , y, t) N,S,EtW +fi ∑ cn(x, y, t) VXn(x, y, t) ] NE^SEfSWtNW
where: • I(x, y, t) is a discrete image; • VT(x, y, t) is the image gradient; • (x, y) specifies the pixel location in a discrete, two dimensional grid covering the video frame; • t denotes discrete time steps; • scalar constant λ determines a rate of diffusion; • subscripts m and n coπespond to eight directions of diffusion with respect to the pixel location (x, y), where m is a horizontal or vertical direction (north, south, east and west) and n is a diagonal direction (north-east, north-west, south-east, and south-west); and • notations cm and c„ are diffusion coefficients where horizontal or vertical directions (north, south, east and west) are indexed by subscript m and diagonal directions (north-east, north-west, south-east, and south-west) are indexed by subscript n, each diffusion coefficient being refeπed to as an edge stopping function g(x) of VT(x,y,t) in the coπesponding direction as expressed in the following equations: cm(x, y, t) = g(Vjm(x, y, t)) c„(x, y, t) = g(VTn(x, y, t)) where g(x) satisfies the condition g(x) → 0 when x → ∞ such that the diffusion filtering is stopped across edges that are present in the video frame.
8. A method of pre-filtering an original video sequence, the original video sequence being comprised of a plurality of frames, each frame having a plurality of pixel locations where each pixel location contains a pixel value, the method comprising: a) setting a cuπent frame of the original video sequence; b) identifying a region-of-interest in the cuπent frame; c) determining a bounding shape that encloses all or a portion of the region-of- interest; and d) filtering pixel locations in the bounding shape differently than other pixel locations in the cuπent frame.
9. The method of claim 8 wherein the bounding shape has a four-sided, three-sided, or circular form.
10. The method of claim 8 wherein each pixel location in the region-of-interest has a chrominance value within a predetermined low chrominance threshold value and a predetermined high chrominance threshold value.
11. The method of claim 8 wherein the bounding shape encloses over V_ or 3A of the pixel locations in the region-of-interest.
12. The method of claim 8 wherein: the region of the cuπent frame within the bounding shape is refeπed to as a foreground region; and the region of the cuπent frame not within the bounding shape is refeπed to as a background region.
13. The method of claim 12 wherein the filtering comprises applying anisotropic diffusion to the pixel locations in the foreground and background regions where different parameter values are used for the foreground and background regions.
14. The method of claim 12 wherein the filtering comprises applying anisotropic diffusion to the pixel locations in the foreground and background regions where different edge stopping functions are used for the foreground and background regions.
15. The method of claim 12 wherein the filtering comprises applying Fallah-Ford diffusion filtering to the pixel locations in the foreground region differently than to the pixel locations in the background region.
16. The method of claim 12 wherein the filtering comprises applying Perona-Malik diffusion filtering to the pixel locations in the foreground region differently than to the pixel locations in the background region.
17. The method of claim 12 wherein the filtering comprises applying anisotropic diffusion to the pixel locations in the foreground region differently than to the pixel locations in the background region, the anisotropic diffusion performing filtering on a pixel location in at least one diagonal direction with respect to the pixel location.
18. The method of claim 8 further comprising: e) setting a next frame in the original video sequence as the cuπent frame; f) repeating steps b), c), and d) for each frame of the plurality of frames in the original video sequence; g) outputting a pre-filtered video sequence comprised of a plurality of pre-filtered video frames; and h) compressing the pre-filtered video sequence using a first compression method to produce a pre-filtered and compressed video sequence, wherein a bit rate associated with the pre-filtered and compressed video sequence is lower than a bit rate that would result from compressing the original video sequence using the first compression method without performing steps a) through f).
19. A method of pre-filtering a video frame having a plurality of pixel locations where each pixel location contains a pixel value and is identifiable by pixel location coordinates, the method comprising: identifying matching pixel locations in the video frame containing a chrominance (U) value within a predetermined low chrominance (U) threshold value and a predetermined high chrominance (U) threshold value and a chrominance (V) value within a predetermined low chrominance (V) threshold value and a predetermined high chrominance (V) threshold value, the matching pixel locations comprising a set of matching pixel locations; identifying at least one contiguous grouping of matching pixel locations in the set of matching pixel locations, the contiguous grouping of matching pixel locations being refeπed to as a region-of-interest; determining a bounding shape for each region-of-interest, the bounding shape enclosing all or a portion of the region-of-interest; determining a foreground region of the video frame, the foreground region being comprised of a totality of pixel locations in the video frame enclosed within a bounding shape; determining a background region of the video frame, the background region being comprised of a totality of pixel locations in the video frame not enclosed within a bounding shape; and filtering the pixel locations in the foreground region differently than the pixel locations in the background region.
20. The method of claim 19 wherein the low and high chrominance (U) threshold values and the low and high chrominance (V) threshold values reflect the chrominance (U, V) ranges of human skin.
21. The method of claim 19 wherein the filtering comprises applying anisotropic diffusion to the pixel locations in the foreground region differently than to the pixel locations in the background region.
22. A computer program product having a computer readable medium having computer program instructions recorded thereon, the computer program product comprising: instructions for receiving the original video sequence; instructions for pre-filtering the original video sequence using anisotropic diffusion filtering; instructions for outputting a pre-filtered video sequence; and instructions for compressing the pre-filtered video sequence.
23. The computer program product of claim 22 wherein the anisotropic diffusion filtering is Fallah-Ford diffusion filtering.
24. The computer program product of claim 22 wherein the anisotropic diffusion filtering is Perona-Malik diffusion filtering.
25. The computer program product of claim 22 wherein the anisotropic diffusion filtering performs diffusion filtering in at least one diagonal direction with respect to a pixel location of a video frame of the original video sequence.
26. A computer program product having a computer readable medium having computer program instructions recorded thereon for pre-processing a video frame having a plurality of pixels at pixel locations where each pixel location contains one or more pixel values, the computer program product comprising: instructions for performing anisotropic diffusion filtering on a pixel in the plurality of pixels, the pixel having a pixel location wherein the diffusion filtering is performed in at least one diagonal direction with respect to the pixel location.
27. The computer program product of claim 26 wherein the diffusion filtering is performed in four horizontal or vertical directions (north, south, east and west) and four diagonal directions (north-east, north-west, south-east, and south-west) with respect to the pixel location.
28. A computer program product having a computer readable medium having computer program instructions recorded thereon for pre-filtering an original video sequence, the original video sequence being comprised of a plurality of frames, each frame having a plurality of pixel locations where each pixel location contains a pixel value, the computer program product comprising: instructions for setting a cuπent frame of the original video sequence; instructions for identifying a region-of-interest in the cuπent frame; instructions for determining a bounding shape that encloses all or a portion of the region-of-interest; and instructions for filtering pixel locations in the bounding shape differently than other pixel locations in the cuπent frame.
29. The computer program product of claim 28 wherein each pixel location in the region-of- interest has a chrominance value within a predetermined low chrominance threshold value and a predetermined high chrominance threshold value.
30. The computer program product of claim 28 wherein the filtering comprises applying anisotropic diffusion to the pixel locations in the bounding shape differently than other pixel locations in the cuπent frame.
31. A system for processing an original video sequence, the original video sequence being comprised of a plurality of frames, each frame having a plurality of pixel locations where each pixel location contains a pixel value, the system comprising: a pre-processing component that: receives the original video sequence; pre-filters the original video sequence using anisotropic diffusion filtering; and outputs a pre-filtered video sequence; and an encoder component coupled to pre-processing component, wherein the encoder component compresses the pre-filtered video sequence.
32. The system of claim 31 wherein the anisotropic diffusion filtering is Fallah-Ford diffusion filtering.
33. The system of claim 31 wherein the anisotropic diffusion filtering is Perona-Malik diffusion filtering.
34. The system of claim 31 wherein the anisotropic diffusion filtering performs diffusion filtering in at least one diagonal direction with respect to a pixel location of a video frame of the original video sequence.
35. A system for pre-processing a video frame having a plurality of pixels at pixel locations where each pixel location contains one or more pixel values, the system comprising: a pre-processing component that performs anisotropic diffusion filtering on a pixel in the plurality of pixels, the pixel having a pixel location wherein the diffusion filtering is performed in at least one diagonal direction with respect to the pixel location.
36. The system of claim 35 wherein the diffusion filtering is performed in four horizontal or vertical directions (north, south, east and west) and four diagonal directions (north-east, north-west, south-east, and south-west) with respect to the pixel location.
37. A system for pre-filtering an original video sequence, the original video sequence being comprised of a plurality of frames, each frame having a plurality of pixel locations where each pixel location contains a pixel value, the system comprising: a pre-processing component that: sets a cuπent frame of the original video sequence; identifies a region-of-interest in the cuπent frame; determines a bounding shape that encloses all or a portion of the region-of- interest; and filters pixel locations in the bounding shape differently than other pixel locations in the cuπent frame.
38. The system of claim 37 wherein each pixel location in the region-of-interest has a chrominance value within a predetermined low chrominance threshold value and a predetermined high chrominance threshold value.
39. The system of claim 37 wherein the filtering comprises applying anisotropic diffusion to the pixel locations in the bounding shape differently than other pixel locations in the cuπent frame.
40. A system for pre-processing a video frame having a plurality of pixels at pixel locations where each pixel location contains one or more pixel values, the system comprising: means for performing anisotropic diffusion filtering on a pixel in the plurality of pixels, the pixel having a pixel location wherein the diffusion filtering is performed in at least one diagonal direction with respect to the pixel location.
41. The system of claim 40 wherein the diffusion filtering is performed in four horizontal or vertical directions (north, south, east and west) and four diagonal directions (north-east, north-west, south-east, and south-west) with respect to the pixel location.
42. A system for pre-filtering an original video sequence, the original video sequence being comprised of a plurality of frames, each frame having a plurality of pixel locations where each pixel location contains a pixel value, the system comprising: means for setting a cuπent frame of the original video sequence; means for identifying a region-of-interest in the cuπent frame; means for determining a bounding shape that encloses all or a portion of the region-of-interest; and means for filtering pixel locations in the bounding shape differently than other pixel locations in the cuπent frame.
43. The system of claim 42 wherein the filtering comprises applying anisotropic diffusion to the pixel locations in the bounding shape differently than other pixel locations in the cuπent frame.
44. A method of pre-filtering an original video sequence, the original video sequence being comprised of a plurality of frames, each frame having a plurality of pixel locations where each pixel location contains a pixel value and is identifiable by pixel location coordinates, the method comprising: a) setting a cuπent frame and a next frame of the original video sequence; b) computing a mean luminance of the cuπent frame; c) determining a pixel value difference between a pixel value at pixel location coordinates in the next frame and a pixel value at the pixel location coordinates in the cuπent frame; and d) filtering the pixel values at the pixel location coordinates in the cuπent frame and the next frame if the pixel value difference is within a low threshold value and a high threshold value, the low and high threshold values being based on the mean luminance.
45. The method of claim 44 wherein the pixel value difference is the difference between a luminance (Y) value at the pixel location coordinates in the cuπent frame and a luminance (Y) value at the pixel location coordinates in the next frame, the low threshold value is a low luminance threshold value, and the high threshold value is a high luminance threshold value.
46. The method of claim 44 wherein the pixel value difference is the difference between a chrominance value at the pixel location coordinates in the cuπent frame and a chrominance value at the pixel location coordinates in the next frame, the low threshold value is a low chrominance threshold value, and the high threshold value is a high chrominance threshold value.
47. The method of claim 44 wherein a high threshold function (θ"uma ) that determines the high threshold value is a piecewise linear function of the luminance mean.
48. The method of claim 44 wherein a low threshold function θ^) that determines the low threshold value is a piecewise linear function of the high threshold value.
49. The method of claim 44 further comprising: e) repeating steps c) and d) for each pixel location coordinate of the cuπent frame; f) setting the next frame as the cuπent frame and a frame in the original video sequence subsequent to the next frame as the next frame; g) repeating steps b), c), d), e), and f) for each frame of the plurality of frames in the original video sequence; and h) outputting a pre-filtered video sequence comprised of a plurality of pre-filtered video frames, wherein the pre-filtered video sequence has an associated data amount that is less than a data amount associated with the original video sequence.
50. The method of claim 49 further comprising: i) compressing the pre-filtered video sequence using a first compression method to produce a pre-filtered and compressed video sequence, wherein a bit rate associated with the pre-filtered and compressed video sequence is lower than a bit rate that would result from compressing the original video sequence using the first compression method without performing steps a) through h).
51. The method of claim 49 further comprising: i) filtering the pre-filtered video sequence using spatial anisotropic diffusion filtering.
52. The method of claim 49 further comprising: i) filtering the pre-filtered video sequence using Fallah-Ford diffusion filtering.
53. The method of claim 49 further comprising: i) filtering the pre-filtered video sequence using Perona-Malik diffusion filtering.
54. The method of claim 49 further comprising: i) filtering the pre-filtered video sequence by performing diffusion in at least one diagonal direction with respect to a pixel location of a pre-filtered video frame in the pre-filtered video sequence.
55. The method of claim 49 further comprising: i) identifying a region-of-interest in a pre-filtered video frame of the pre- filtered video sequence; j) determining a bounding shape that encloses all or a portion of the region-of- interest; and k) filtering the pixel locations in the bounding shape differently than the other pixel locations in the pre-filtered video frame.
56. The method of claim 55 wherein each pixel location in the region-of-interest has a chrominance value within a predetermined low chrominance threshold value and a predetermined high chrominance threshold value.
57. The method of claim 55 wherein step k) comprises spatial anisotropic diffusion filtering.
58. The method of claim 55 wherein step k) comprises Fallah-Ford diffusion filtering.
59. The method of claim 55 wherein step k) comprises Perona-Malik diffusion filtering.
60. The method of claim 55 wherein step k) comprises performing diffusion in at least one diagonal direction with respect to a pixel location of a pre-filtered video frame in the pre- filtered video sequence.
61. A method of pre-filtering a video sequence, the video sequence being comprised of a plurality of frames, each frame having a plurality of pixel locations where each pixel location contains a pixel value and is identifiable by pixel location coordinates, the method comprising: a) receiving a cuπent frame and a next frame of the video sequence; b) computing a luminance mean of the cuπent frame; c) determining a luminance (Y) value difference between a luminance (Y) value at pixel location coordinates in the cuπent frame and a luminance (Y) value at the pixel location coordinates in the next frame; d) filtering the luminance (Y) values at the pixel location coordinates in the cuπent frame and the next frame if the luminance (Y) value difference is within a low luminance threshold value and a high luminance threshold value, the low and high luminance threshold values being based on the luminance mean; e) determining a chrominance (U) value difference between a chrominance (U) value at the pixel location coordinates in the cuπent frame and a chrominance (U) value at the pixel location coordinates in the next frame; and f) filtering the chrominance (U) values at the pixel location coordinates in the cuπent frame and the next frame if the chrominance (U) value difference is within a low chrominance threshold value and a high chrominance threshold value.
62. The method of claim 61 further comprising: g) determining a chrominance (V) value difference between a chrominance (V) value at the pixel location coordinates in the cuπent frame and a chrominance (V) value at the pixel location coordinates in the next frame; and h) filtering the chrominance (V) values at the pixel location coordinates in the cuπent frame and the next frame if the chrominance (V) value difference is within the low chrominance threshold value and the high chrominance threshold value.
63. The method of claim 61 wherein the low and high chrominance threshold values are based on the luminance mean.
64. A computer program product having a computer readable medium having computer program instructions recorded thereon, the computer program product comprising: instructions for receiving an original video sequence, the original video sequence being comprised of a plurality of frames, each frame having a plurality of pixel locations where each pixel location contains a pixel value and is identifiable by pixel location coordinates; instructions for setting a cuπent frame and a next frame of the original video sequence; instructions for computing a mean luminance of the cuπent frame; instructions for determining a pixel value difference between a pixel value at pixel location coordinates in the next frame and a pixel value at the pixel location coordinates in the cuπent frame; and instructions for filtering the pixel values at the pixel location coordinates in the cuπent frame and the next frame if the pixel value difference is within a low threshold value and a high threshold value, the low and high threshold values being based on the mean luminance.
65. The computer program product of claim 64 wherein the pixel value difference is the difference between a luminance (Y) value at the pixel location coordinates in the cuπent frame and a luminance (Y) value at the pixel location coordinates in the next frame, the low threshold value is a low luminance threshold value, and the high threshold value is a high luminance threshold value.
66. The computer program product of claim 64 wherein the pixel value difference is the difference between a chrominance value at the pixel location coordinates in the cuπent frame and a chrominance value at the pixel location coordinates in the next frame, the low threshold value is a low chrominance threshold value, and the high threshold value is a high chrominance threshold value.
67. The computer program product of claim 64 wherein the computer program product further comprises: instructions for processing each frame of the original video sequence; and instructions for outputting a pre-filtered video sequence comprised of a plurality of pre-filtered video frames, wherein the pre-filtered video sequence has an associated data amount that is less than a data amount associated with the original video sequence.
68. The computer program product of claim 67 wherein the computer program product further comprises: instructions for receiving the pre-filtered video sequence; and instructions for filtering the pre-filtered video sequence using spatial anisotropic diffusion filtering.
69. The computer program product of claim 67 wherein the computer program product further comprises: instructions for receiving the pre-filtered video sequence; and instructions for filtering the pre-filtered video sequence by performing diffusion in at least one diagonal direction with respect to a pixel location of a pre-filtered video frame in the pre-filtered video sequence.
70. The computer program product of claim 67 wherein the computer program product further comprises: instructions for receiving the pre-filtered video sequence; instructions for identifying a region-of-interest in a pre-filtered video frame of the pre-filtered video sequence; instructions for determining a bounding shape that encloses all or a portion of the region-of-interest; and instructions for filtering the pixel locations in the bounding shape differently than the other pixel locations in the pre-filtered video frame.
71. A system for pre-filtering an original video sequence, the original video sequence being comprised of a plurality of frames, each frame having a plurality of pixel locations where each pixel location contains a pixel value and is identifiable by pixel location coordinates, the system comprising: a pre-processing component that: receives the original video sequence; sets a cuπent frame and a next frame of the original video sequence; computes a mean luminance of the cuπent frame; determines a pixel value difference between a pixel value at pixel location coordinates in the next frame and a pixel value at the pixel location coordinates in the cuπent frame; and filters the pixel values at the pixel location coordinates in the cuπent frame and the next frame if the pixel value difference is within a low threshold value and a high threshold value, the low and high threshold values being based on the mean luminance.
72. The system of claim 71 wherein the pixel value difference is the difference between a luminance (Y) value at the pixel location coordinates in the cuπent frame and a luminance (Y) value at the pixel location coordinates in the next frame, the low threshold value is a low luminance threshold value, and the high threshold value is a high luminance threshold value.
73. The system of claim 71 wherein the pixel value difference is the difference between a chrominance value at the pixel location coordinates in the cuπent frame and a chrominance value at the pixel location coordinates in the next frame, the low threshold value is a low chrominance threshold value, and the high threshold value is a high chrominance threshold value.
74. The system of claim 71 wherein the pre-processing component: processes each frame of the original video sequence; and outputs a pre-filtered video sequence comprised of a plurality of pre-filtered video frames, wherein the pre-filtered video sequence has an associated data amount that is less than a data amount associated with the original video sequence.
75. The system of claim 74 wherein the pre-processing component includes a spatial-pre- filtering component that: receives the pre-filtered video sequence; and filters the pre-filtered video sequence using spatial anisotropic diffusion filtering.
76. The system of claim 74 wherein the pre-processing component includes a spatial-pre- filtering component that: receives the pre-filtered video sequence; and filters the pre-filtered video sequence by performing diffusion in at least one diagonal direction with respect to a pixel location of a pre-filtered video frame in the pre-filtered video sequence.
77. The system of claim 74 wherein the pre-processing component includes a spatial-pre- filtering component that: receives the pre-filtered video sequence; identifies a region-of-interest in a pre-filtered video frame of the pre-filtered video sequence; determines a bounding shape that encloses all or a portion of the region-of- interest; and filters the pixel locations in the bounding shape differently than the other pixel locations in the pre-filtered video frame.
78. A system for pre-filtering an original video sequence, the original video sequence being comprised of a plurality of frames, each frame having a plurality of pixel locations where each pixel location contains a pixel value and is identifiable by pixel location coordinates, the system comprising: means for receiving an original video sequence; means for setting a cuπent frame and a next frame of the original video sequence; means for computing a mean luminance of the cuπent frame; means for determining a pixel value difference between a pixel value at pixel location coordinates in the next frame and a pixel value at the pixel location coordinates in the cuπent frame; and means for filtering the pixel values at the pixel location coordinates in the cuπent frame and the next frame if the pixel value difference is within a low threshold value and a high threshold value, the low and high threshold values being based on the mean luminance.
79. The system of claim 78 wherein the system further comprises: means for processing each frame of the original video sequence; and means for outputting a pre-filtered video sequence comprised of a plurality of pre- filtered video frames, wherein the pre-filtered video sequence has an associated data amount that is less than a data amount associated with the original video sequence.
80. The system of claim 79 wherein the system further comprises: means for receiving the pre-filtered video sequence; and means for filtering the pre-filtered video sequence using spatial anisotropic diffusion filtering.
81. The system of claim 79 wherein the system further comprises: means for receiving the pre-filtered video sequence; means for identifying a region-of-interest in a pre-filtered video frame of the pre- filtered video sequence; means for determining a bounding shape that encloses all or a portion of the region-of-interest; and means for filtering the pixel locations in the bounding shape differently than the other pixel locations in the pre-filtered video frame.
PCT/US2004/017415 2003-08-13 2004-06-02 Method and system for pre-processing of video sequences to achieve better compression WO2005020584A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US10/640,734 US7403568B2 (en) 2003-08-13 2003-08-13 Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using temporal filtering
US10/640,944 2003-08-13
US10/640,944 US7430335B2 (en) 2003-08-13 2003-08-13 Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using spatial filtering
US10/640,734 2003-08-13

Publications (1)

Publication Number Publication Date
WO2005020584A1 true WO2005020584A1 (en) 2005-03-03

Family

ID=34221836

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/017415 WO2005020584A1 (en) 2003-08-13 2004-06-02 Method and system for pre-processing of video sequences to achieve better compression

Country Status (1)

Country Link
WO (1) WO2005020584A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1819172A1 (en) 2006-02-13 2007-08-15 Snell and Wilcox Limited Region-based image coding
WO2007144640A1 (en) * 2006-06-16 2007-12-21 The Robert Gordon University Credo Method of and apparatus for processing image data
US7403568B2 (en) 2003-08-13 2008-07-22 Apple Inc. Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using temporal filtering
US7430335B2 (en) 2003-08-13 2008-09-30 Apple Inc Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using spatial filtering
WO2008117963A1 (en) * 2007-03-23 2008-10-02 Lg Electronics Inc. A method and an apparatus for decoding/encoding a video signal
US8280171B2 (en) 2008-05-28 2012-10-02 Apple Inc. Tools for selecting a section of interest within an image
US8452105B2 (en) 2008-05-28 2013-05-28 Apple Inc. Selecting a section of interest within an image
US8582834B2 (en) 2010-08-30 2013-11-12 Apple Inc. Multi-image face-based image processing
US8760464B2 (en) 2011-02-16 2014-06-24 Apple Inc. Shape masks

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0853436A2 (en) * 1997-01-09 1998-07-15 Sun Microsystems, Inc. Digital video signal filtering and encoding method and apparatus
EP0863671A1 (en) * 1996-12-18 1998-09-09 Lucent Technologies Inc. Object-oriented adaptive prefilter for low bit-rate video systems
US6281942B1 (en) * 1997-08-11 2001-08-28 Microsoft Corporation Spatial and temporal filtering mechanism for digital motion video signals
WO2002028087A1 (en) * 2000-09-29 2002-04-04 Hewlett-Packard Company Method for enhancing compressibility and visual quality of scanned document images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0863671A1 (en) * 1996-12-18 1998-09-09 Lucent Technologies Inc. Object-oriented adaptive prefilter for low bit-rate video systems
EP0853436A2 (en) * 1997-01-09 1998-07-15 Sun Microsystems, Inc. Digital video signal filtering and encoding method and apparatus
US6281942B1 (en) * 1997-08-11 2001-08-28 Microsoft Corporation Spatial and temporal filtering mechanism for digital motion video signals
WO2002028087A1 (en) * 2000-09-29 2002-04-04 Hewlett-Packard Company Method for enhancing compressibility and visual quality of scanned document images

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALGAZI V R ET AL: "PREPROCESSING FOR IMPROVED PERFORMANCE IN IMAGE AND VIDEO CODING", PROCEEDINGS OF THE SPIE, SPIE, BELLINGHAM, VA, US, vol. 2564, 1995, pages 22 - 31, XP000852506, ISSN: 0277-786X *
EUNCHEOL CHOI ET AL: "Deblocking algorithm for dct-based compressed images using anisotropic diffusion", 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. (ICASSP). HONG KONG, APRIL 6 - 10, 2003, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), NEW YORK, NY : IEEE, US, vol. VOL. 1 OF 6, 6 April 2003 (2003-04-06), pages III717 - III720, XP010639173, ISBN: 0-7803-7663-3 *
FISCHL B ET AL: "ADAPTIVE NONLOCAL FILTERING: A FAST ALTERNATIVE TO ANISOTROPIC DIFFUSION FOR IMAGE ENHANCEMENT", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE INC. NEW YORK, US, vol. 21, no. 1, January 1999 (1999-01-01), pages 42 - 48, XP000803251, ISSN: 0162-8828 *
TSUJI H ET AL: "A nonlinear spatio-temporal diffusion and its application to prefiltering in MPEG-4video coding", PROCEEDINGS 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING. ICIP 2002. ROCHESTER, NY, SEPT. 22 - 25, 2002, INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, NEW YORK, NY : IEEE, US, vol. VOL. 2 OF 3, 22 September 2002 (2002-09-22), pages 85 - 88, XP010607266, ISBN: 0-7803-7622-6 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8208565B2 (en) 2003-08-13 2012-06-26 Apple Inc. Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using temporal filtering
US7403568B2 (en) 2003-08-13 2008-07-22 Apple Inc. Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using temporal filtering
US7430335B2 (en) 2003-08-13 2008-09-30 Apple Inc Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using spatial filtering
US8615042B2 (en) 2003-08-13 2013-12-24 Apple Inc. Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using spatial filtering
US7809207B2 (en) 2003-08-13 2010-10-05 Apple Inc. Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using spatial filtering
EP1819172A1 (en) 2006-02-13 2007-08-15 Snell and Wilcox Limited Region-based image coding
US8290042B2 (en) 2006-02-13 2012-10-16 Snell & Wilcox Limited Sport action coding
WO2007144640A1 (en) * 2006-06-16 2007-12-21 The Robert Gordon University Credo Method of and apparatus for processing image data
WO2008117963A1 (en) * 2007-03-23 2008-10-02 Lg Electronics Inc. A method and an apparatus for decoding/encoding a video signal
US8494046B2 (en) 2007-03-23 2013-07-23 Lg Electronics Inc. Method and an apparatus for decoding/encoding a video signal by performing illumination compensation
US8452105B2 (en) 2008-05-28 2013-05-28 Apple Inc. Selecting a section of interest within an image
US8280171B2 (en) 2008-05-28 2012-10-02 Apple Inc. Tools for selecting a section of interest within an image
US8582834B2 (en) 2010-08-30 2013-11-12 Apple Inc. Multi-image face-based image processing
US8760464B2 (en) 2011-02-16 2014-06-24 Apple Inc. Shape masks
US8891864B2 (en) 2011-02-16 2014-11-18 Apple Inc. User-aided image segmentation

Similar Documents

Publication Publication Date Title
US8615042B2 (en) Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using spatial filtering
US8208565B2 (en) Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using temporal filtering
EP2230856B1 (en) Method for up-sampling images
EP2230640B1 (en) Method for filtering depth images
Maggioni et al. Video denoising, deblocking, and enhancement through separable 4-D nonlocal spatiotemporal transforms
US7203234B1 (en) Method of directional filtering for post-processing compressed video
US7742652B2 (en) Methods and systems for image noise processing
US7822286B2 (en) Filtering artifacts in images with 3D spatio-temporal fuzzy filters
US7729426B2 (en) Video deblocking filter
US7865014B2 (en) Video auto enhancing algorithm
EP2230855A2 (en) Synthesizing virtual images from texture and depth images
US7957467B2 (en) Content-adaptive block artifact removal in spatial domain
US20100118977A1 (en) Detection of artifacts resulting from image signal decompression
KR101112139B1 (en) Apparatus and method for estimating scale ratio and noise strength of coded image
JPH08186714A (en) Noise removal of picture data and its device
JP2005166021A (en) Method for classifying pixel in image
CN110796615A (en) Image denoising method and device and storage medium
WO2005020584A1 (en) Method and system for pre-processing of video sequences to achieve better compression
JP2006148878A (en) Method for classifying pixels in image
JP4065287B2 (en) Method and apparatus for removing noise from image data
EP2599311A1 (en) Block compression artifact detection in digital video signals
JP2009095004A (en) Method of filtering pixels in sequence of images
JP2006140999A (en) Method for filtering pixel in image
JPH09116900A (en) Encoding noise removal device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase