WO2004080081A1 - Video encoding - Google Patents

Video encoding Download PDF

Info

Publication number
WO2004080081A1
WO2004080081A1 PCT/IB2004/050145 IB2004050145W WO2004080081A1 WO 2004080081 A1 WO2004080081 A1 WO 2004080081A1 IB 2004050145 W IB2004050145 W IB 2004050145W WO 2004080081 A1 WO2004080081 A1 WO 2004080081A1
Authority
WO
WIPO (PCT)
Prior art keywords
block size
spatial frequency
encoding
encoding block
frequency characteristic
Prior art date
Application number
PCT/IB2004/050145
Other languages
French (fr)
Inventor
Dzevdet Burazerovic
Gerardus J. M. Vervoort
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to EP04714399A priority Critical patent/EP1602239A1/en
Priority to JP2006506639A priority patent/JP2006519565A/en
Priority to US10/547,324 priority patent/US20060165163A1/en
Publication of WO2004080081A1 publication Critical patent/WO2004080081A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Definitions

  • the invention relates to a video encoder and method of video encoding therefore and in particular but not exclusively to video encoding in accordance with the H.264 video encoding standard.
  • video encoding standards have played a key role in facilitating the adoption of digital video in many professional- and consumer applications.
  • Most influential standards are traditionally developed by either the International Telecommunications Union (ITU-T) or the MPEG (Motion Pictures Experts Group) committee ofthe ISO/TEC (the International Organization for Standardization/the International Electrotechnical Committee).
  • the ITU-T standards known as recommendations, are typically aimed at real-time communications (e.g. videoconferencing), while most MPEG standards are optimized for storage (e.g. for Digital Versatile Disc (DVD)) and broadcast (e.g. for Digital Video Broadcast (DVB) standard).
  • MPEG-2 Motion Picture Expert Group
  • MPEG-2 is a block based compression scheme wherein a frame is divided into a plurality of blocks each comprising eight vertical and eight horizontal pixels.
  • each block is individually compressed using a Discrete Cosine Transform (DCT) followed by quantization which reduces a significant number ofthe transformed data values to zero.
  • DCT Discrete Cosine Transform
  • chrominance data the amount of chrominance data is usually first reduced by down- sampling, such that for each four luminance blocks two chrominance blocks are obtained (4:2:0 format), that are similarly compressed using the DCT and quantization.
  • Frames based only on intra-frame compression are known as Intra Frames (I-Frames).
  • MPEG-2 uses inter-frame compression to further reduce the data rate.
  • Inter-frame compression includes generation of predicted frames (P-frames) based on previous I-frames.
  • I and P frames are typically interposed by Bidirectional predicted frames (B-frames), wherein compression is achieved by only transmitting the differences between the B-frame and surrounding I- and P-frames.
  • MPEG-2 uses motion estimation wherein the image of macroblocks of one frame found in subsequent frames at different positions are communicated simply by use of a motion vector.
  • video signals of standard TV studio broadcast quality level can be transmitted at data rates of around 2-4 Mbps.
  • H.26L a new ITU-T standard, known as H.26L
  • H.26L is becoming broadly recognized for its superior coding efficiency in comparison with the existing standards such as MPEG-2.
  • JVT Joint Video Team
  • the new standard is known as H.264 or MPEG-4 AVC (Advanced Video Coding).
  • H.264-based solutions are being considered in other standardization bodies, such as the DVB and DVD Forums.
  • the H.264 standard employs the same principles of block-based motion- compensated hybrid transform coding that are known from the established standards such as MPEG-2.
  • the H.264 syntax is, therefore, organized as the usual hierarchy of headers, such as picture-, slice- and macro-block headers, and data, such as motion-vectors, block-transform coefficients, quantizer scale, etc.
  • the H.264 standard separates the Video Coding Layer (VCL), which represents the content ofthe video data, and the Network Adaptation Layer (NAL), which formats data and provides header information.
  • VCL Video Coding Layer
  • NAL Network Adaptation Layer
  • H264 allows for a much increased choice of encoding parameters. For example, it allows for a more elaborate partitioning and manipulation of 16x16 macro-blocks whereby e.g. motion compensation process can be performed on segmentations of a macro-block as small as 4x4 in size.
  • the selection process for motion compensated prediction of a sample block may involve a number of stored previously-decoded pictures, instead of only the adjacent pictures. Even with intra coding within a single frame, it is possible to form a prediction of a block using previously-decoded samples from the same frame.
  • the resulting prediction error following motion compensation may be transformed and quantized based on a 4x4 block size, instead ofthe traditional 8x8 size.
  • the H.264 standard may be considered a superset ofthe MPEG-2 video encoding syntax in that it uses the same global structuring of video data, while extending the number of possible coding decisions and parameters.
  • a consequence of having a variety of coding decisions is that a good trade-off between the bit rate and picture quality may be achieved.
  • the H.264 standard may significantly reduce typical artefacts of block-based coding, it can also accentuate other artefacts.
  • the fact that H.264 allows for an increased number of possible values for various coding parameters thus results in an increased potential for improving the encoding process but also results in increased sensitivity to the choice of video encoding parameters.
  • H.264 does not specify a normative procedure for selecting video encoding parameters, but describes through a reference implementation, a number of criteria that may be used to select video encoding parameters such as to achieve a suitable trade-off between coding efficiency, video quality and practicality of implementation.
  • the described criteria may not always result in an optimal or suitable selection of coding parameters.
  • the criteria may not result in selection of video encoding parameters optimal or desirable for the characteristics ofthe video signal, or the criteria may be based on attaining characteristics ofthe encoded signal which are not appropriate for the current application.
  • H.264 can significantly reduce some typical artefacts of MPEG-2 encoding, it can also cause other artefacts.
  • One such artefact is a partial removal of texture, resulting in a plastic- like or smeared appearance of some picture areas.
  • Another is coding artefacts creating coding noise in picture areas having a high degree of flatness. This is especially noticeable for larger picture formats, such as High Definition TV.
  • an improved system for video encoding would be advantageous and in particular an improved video encoding system exploiting the possibilities of emerging standards, such as H264, to improve video encoding is advantageous.
  • a video encoder for encoding a video signal comprising: means for determining a picture region having a spatial frequency characteristic; means for setting an encoding block size for the picture region in response to the spatial frequency characteristic; and means for encoding the video signal using the encoding block size for the picture region.
  • the invention allows for improved video encoding performance and in particular an improved video quality and/or reduced encoded data rate may be achieved.
  • the inventors have realised that the preferred encoding block sizes depend on the spatial frequency characteristics.
  • the invention allows for an improved quality and/or data rate to be achieved for a picture based on local adaptation of block encoding sizes based on local spatial frequency characteristics.
  • a dynamic and local adaptation of block encoding sizes to suit local spatial frequency characteristics may be used.
  • Local content dependent restriction of block encoding sizes may be used to improve performance ofthe video encoding.
  • the invention allows for an encoding block size to be set so as to result in high texture information being preserved for picture regions having a spatial frequency characteristic that indicates high levels of texture.
  • the invention enables a significant reduction in the loss of texture information and thus mitigates the plastification or texture smearing effect encountered in many video encoders, including for example H.264 video encoders.
  • the invention allows for an encoding block size to be set so as to result in reduced block based coding artefacts (e.g. blocking artefacts) for picture regions having a spatial frequency characteristic that indicates a high degree of flatness.
  • reduced block based coding artefacts e.g. blocking artefacts
  • the invention enables a significant reduction in the coding imperfections encountered in many video encoders, including for example H.264 video encoders.
  • the encoding block size is a motion estimation block size.
  • the invention thus enables an optimisation of a motion estimation block size to suit the local spatial frequency characteristic of a picture region.
  • the means for determining the picture region is operable to determine the picture region as a group of pixels for which the spatial frequency characteristic meets a spatial frequency criterion.
  • a picture region may be determined such that it has the same or similar spatial frequency properties and thus be suited for the same encoding block size.
  • the spatial frequency criterion may be directly associated with a given encoding block size.
  • a picture region may be determined as one or more picture areas for which the spatial frequency characteristic meets a given characteristic corresponding to a predetermined encoding block size.
  • the spatial frequency criterion is that a spatial frequency distribution comprises an energy concentration above an energy threshold for spatial frequencies below a frequency threshold.
  • a high concentration of low frequency components is indicative of a high degree of flatness ofthe picture. It has been observed that coding artefacts related to block sizes, such as blocking artefacts, often occurs in areas of high levels of flatness. This may be mitigated by appropriate selection of encoding block size. Hence, the mitigation ofthe coding artefacts and imperfections may be facilitated and/or increased.
  • the frequency properties associated with the spatial frequency characteristic may for example be performed by a frequency analysis, such as a Discrete Cosine Transform (DCT), or by determining a variance measure of surrounding pixels.
  • the means for setting the encoding block size is operable to set the encoding block size to a predetermined value.
  • a plurality of encoding block size values may be predetermined and associated with specific spatial frequency characteristics.
  • a look-up table may for example be used to correlate a spatial frequency characteristic with a predetermined encoding block size.
  • the means for determining the picture region comprises means for determining the spatial frequency characteristic in response to a variance of pixel values within the picture region. This provides a good indication ofthe spatial frequency characteristic of a picture region yet is easy to implement and does not require any transforms.
  • the means for setting the encoding block size comprises means for generating a set of allowable encoding block sizes in response to the spatial frequency characteristic; and the means for encoding comprises means for selecting the encoding block size from the set of allowable encoding block sizes.
  • the video encoding may use a encoding block size set in response to many parameters of which the spatial frequency characteristic is one. Specifically, the spatial frequency characteristic may be used to restrict the possible encoding block sizes to a limited set from which an encoding block size can be selected in response to other parameters. This allows a flexible selection of encoding block size to suit the video encoding, yet allows the performance ofthe video encoder to be controlled in response to the spatial frequency characteristic.
  • the video encoder further comprises: means for determining a second picture region having a second spatial frequency characteristic; means for setting a second encoding block size for the second picture region in response to the second spatial frequency characteristic; and wherein the means for encoding the video signal is operable to encode the video signal using the second encoding block size for the second picture region.
  • the means for processing the second picture region may be the same means for processing the first picture region.
  • the picture regions may for example be processed in parallel in different functional modules or sequentially in the same functional module.
  • Preferably a plurality of picture regions is determined and the encoding block size is set for each picture region to suit the spatial frequency characteristic of that region. This allows for the encoding block size and to be optimised for the local spatial frequency characteristics and thus for an improved video encoding.
  • the spatial frequency characteristic comprises an indication of a degree of flatness in the picture region and the means for setting the encoding block size is operable to increase the encoding block size for increasing degrees of flatness.
  • Picture areas having high degrees of flatness have been observed to be sensitive to coding imperfections such as block based coding artefacts.
  • Block based artefacts may for example be blocking artefacts. The inventors ofthe present invention have realised that this effect may be mitigated by increasing the encoding block size. Accordingly, an improved video encoding quality may be obtained.
  • the spatial frequency characteristic comprises an indication of a degree of uniformity in the picture region and the means for setting the encoding block size is operable to increase the encoding block size for increasing degrees of uniformity.
  • Picture areas having high degrees of uniformity have been observed to be sensitive to coding imperfections such as texture loss or smearing.
  • the inventors ofthe present invention have realised that this effect may be mitigated by increasing the encoding block size. Accordingly, a reduced texture loss or smearing may be achieved, and thus an improved video encoding quality may be obtained.
  • the spatial frequency characteristic comprises an indication of a concentration of energy towards lower frequencies and the means for setting the encoding block size is operable to increase the encoding block size for an increasing concentration of energy towards lower frequencies.
  • a concentration of energy towards low frequencies may indicate a high degree of flatness and a susceptibility to coding imperfections in the video encoding, and this may be mitigated by selection of larger encoding block sizes.
  • the video encoder further comprises: means for setting a quantisation level for the picture region in response to the spatial frequency characteristic; and the means for encoding the video signal is operable to use the quantisation level for the picture region.
  • the performance ofthe video encoder may furthermore be improved by setting both a quantisation level and an encoding block size in response to the spatial frequency characteristic.
  • the combined effect of quantisation levels and encoding block sizes on video encoding artefacts such as texture loss or block based coding artefacts is significant and highly correlated. Therefore, performance may be improved by adjusting both parameters in response to the spatial frequency characteristic of a picture region.
  • the video encoder is a video encoder in accordance with the H.264 recommendation defined by the International Telecommunications Union.
  • the invention thus enables an improved video encoder which is operable to work and exploit the options and restrictions ofthe H.264 standard.
  • H.264 is jointly developed by ITU-T (International Telecommunication Union - Telecommunication Standardization Sector) and ISO/TEC (the International Organization for Standardization/ the International Electrotechnical Committee).
  • ITU-T Rec. H.264 is equivalent to ISO/TEC 14496-10 AVC.
  • the encoding block size is selected from a set of motion estimate block sizes of inter prediction modes defined in the H.264 standard.
  • the invention enables an improved H.264 video encoder wherein the selection of standardised encoding block sizes is controlled so as to suit a local spatial frequency characteristic.
  • a method of video encoding comprising the steps of: determining a picture region having a spatial frequency characteristic; setting an encoding block size for the picture region in response to the spatial frequency characteristic; and encoding the video signal using the encoding block size for the picture region.
  • FIG. 1 illustrates the possible partitioning of macro-blocks into motion estimation blocks in accordance with the H.264 standard
  • FIG. 2 illustrates a block diagram of a video encoder in accordance with an embodiment ofthe invention
  • FIG. 3 illustrates a flow chart of a method of video encoding in accordance with an embodiment ofthe invention.
  • New video coding standards such as H.26L, H.264 or MPEG-4 AVC promise improved video encoding performance in terms of an improved quality to data rate ratio. Much ofthe data rate reduction offered by these standards can be attributed to improved methods of motion compensation. These methods mostly extend the basic principles of previous standards, such as MPEG-2.
  • One relevant extension is the use of multiple reference pictures for prediction, whereby a prediction block may originate in more distant (the distance is currently unrestricted) future- or past pictures.
  • Another and even more efficient extension is the possibility of using variable block sizes for prediction of a macro-block.
  • a macro-block still 16x16 pixels
  • each of these sub-blocks can be predicted separately.
  • different sub-blocks can have different motion vectors and can be retrieved from different reference pictures.
  • the number, size and orientation of prediction blocks are uniquely determined by definition of inter prediction modes, which describe possible partitioning of a macro-block into 8x8 blocks and further partitioning of each ofthe 8x8 sub-block.
  • FIG. 1 illustrates the possible partitioning of macro-blocks into motion estimation blocks in accordance with the H.264 standard.
  • H.264 can significantly reduce some typical artefacts of MPEG-2 video encoding, it can also cause other artefacts.
  • One such artefact is a partial removal of texture, resulting in texture smearing and a plastic- like appearance of some picture areas.
  • Another artefact is noise in static areas with little detail. The artefacts are most noticeable in large areas with little detail or variation and is especially noticeable for larger picture formats, such as High Definition TV.
  • the inventors ofthe current invention have realised that the coding artefacts are affected by the encoding block size used, and that it may be mitigated by improved selection of encoding block sizes.
  • FIG. 2 illustrates a block diagram of a video encoder 201 in accordance with an embodiment ofthe invention.
  • the video encoder 201 is coupled to an external video source 203 from which a video signal to be encoded is received.
  • the video signal comprises a number of pictures or frames.
  • the video encoder 201 comprises a buffer 205 coupled to the external video source 203.
  • the buffer 205 receives the video signal from the external video source 203 and stores one or more pictures or frames until the video encoder 201 is ready to encode them.
  • the external video source 203 is furthermore coupled to a segmentation processor 207.
  • the segmentation processor 207 is operable to determine a picture region by dividing the picture into different picture regions. The picture may be divided into two or more picture regions in response to any suitable algorithm or criterion and specifically the picture may be divided into two picture regions by selecting a single picture region for which a given criterion is met.
  • the segmentation processor 207 is coupled to a characteristics processor 209.
  • the characteristics processor 209 is operable to determine a spatial frequency characteristic for the picture region determined by the segmentation processor 207.
  • the spatial frequency characteristic may for example indicate a spatial frequency domain energy distribution for the determined picture region.
  • the spatial frequency characteristic may indicate the concentration of energy below a given frequency threshold.
  • the video signal to be encoded is fed to the characteristics processor 209 in predetermined picture regions.
  • individual macro-blocks may be fed directly from the external video source 203 or the buffer 205 to the characteristics processor 209.
  • the picture region is directly generated by receiving or retrieving a single macro-block an processing this.
  • the spatial frequency characteristic comprises and indication of a degree of flatness and/or uniformity of the determined picture region.
  • a region in a picture is generally considered uniform if it lacks texture/detail or if it contains texture that is stationary, i.e. has uniform variation.
  • a flat region is generally considered a region that simply lacks texture and/or detail and thus has relatively low concentrations of high frequent content.
  • a typical flat region thus appears flat to a viewer.
  • a typical example of flat regions is regions of uniform colour in cartoons. The term uniform is generally considered to be broader than flat and thus typically a flat region is also considered flat (but not necessarily vice versa).
  • H.264 compacts signal energy into a larger number of low frequency coefficients, leaving a smaller number of high frequency coefficients that are more susceptible to be suppressed during the consecutive video encoding (for example due to coefficient weighting or quantization).
  • texture information is typically of a relatively high frequency nature, a loss of texture results.
  • the spatial frequency characteristic may be a single binary parameter which indicates if a given criterion is met.
  • the spatial frequency characteristic may be set to zero if, say, more than 60% ofthe signal energy is contained within the lowest 20% ofthe relevant frequency spectrum and to one otherwise.
  • a spatial frequency characteristic value of zero indicates a high concentration of energy towards the lower frequencies. This is an indication ofthe picture region having a high degree of flatness, and therefore indicating that the picture region has a high susceptibility to coding artefacts when being encoded.
  • the characteristics processor 209 is coupled to a coding controller 211.
  • the coding controller 211 is operable to set an encoding block size for the picture region in response to the spatial frequency characteristic.
  • the encoding block size is a motion estimation block size and is specifically a prediction block size as allowed by the inter prediction modes defined in the H.264 video encoding standard.
  • the encoding block size may be set to a first block size if the spatial frequency characteristic is zero and to a second block size if the spatial frequency characteristic is a one.
  • the coding controller 211 may simply set the encoding block size by selecting a predetermined block size in response to a predetermined association between values ofthe spatial frequency characteristic and the encoding block sizes.
  • the coding controller 211 is coupled to an encode processor 213 which is furthermore coupled to the buffer 205.
  • the encode processor 213 is operable to encode the picture stored in the buffer 205 using the encoding block size set by the coding controller 211 for the picture region determined by the segmentation processor 207.
  • the video encoding will be such that the encoding block size for the picture region is specifically adapted to suit the spatial frequency characteristic of that picture region. For example, in the simple embodiment described, a concentration of signal energy towards lower spatial frequencies will result in a first larger block size being used. Otherwise a lower block size will be used or at least permitted thereby allowing for improved encoding efficiency.
  • the spatial frequency characteristic comprises an indication of a high degree of flatness (and thus a sensitivity to coding artefacts) larger encoding block sizes are used, thereby mitigating or eliminating the coding imperfections.
  • the encoding processor 213 is operable to encode the video signal in accordance with the H.264 video encoding standard.
  • An embodiment particularly suited for easy implementation is where the picture regions correspond to one macro block.
  • the macro-blocks are directly fed to the characteristics processor 209 which then determines the spatial frequency characteristics of that macro-block.
  • the coding controller 211 determines a suitable encoding block size for that macro-block, and possibly on a number of neighboring macro-blocks.
  • the encoding processor 213 receives the macro-block from the buffer 205 and encodes it using the encode block size selected for the macro-block by the coding controller. This enables parallel, and therefore more efficient execution in hardware.
  • the characteristic processor (209) may store the spatial frequency characteristics obtained for macro-blocks from subsequent pictures. This would enable an analysis of time-consistency of spatial spectral characteristics that can further be used to optimize the selection of encoding parameters. For example it may facilitate discrimination between texture ofthe underlying picture and texture origination from noise ofthe video source (e.g. the so-called "film grain” in movies).
  • FIG. 3 illustrates a flow chart of a method of video encoding in accordance with an embodiment ofthe invention. The method is applicable to the video encoder 201 of FIG. 2 and will be described with reference to this.
  • step 301 the video encoder 201 receives the video signal to be encoded from the external video source.
  • step 301 is followed by step 303 wherein the segmentation processor 207 determines a picture region.
  • the picture region may be determined in accordance with any suitable criterion or algorithm. In a simple embodiment, a single picture region may be selected in accordance with a criterion and the picture is divided into just two picture regions consisting in the selected picture region and a picture region comprising the remainder ofthe picture. However, in the preferred embodiment the picture is divided into several picture regions.
  • the picture is divided into picture regions by segmentation ofthe picture.
  • picture segmentation comprises the process of a spatial grouping of pixels based on a common property (e.g. colour).
  • a common property e.g. colour
  • Any known method or algorithm for segmentation of a picture may be used without detracting from the invention.
  • An introduction to picture or video segmentation may be found in for example E. Steinbach, P. Eisert, B. Girod, "Motion-based Analysis and Segmentation of Image Sequences using 3- D Scene Models," Signal Processing: Special Issue: Video Sequence Segmentation for Content-based Processing and Manipulation, vol. 66, no. 2, pp. 233-248, IEEE 1998 or A. Bovik: Handbook of Image and Video Processing, Academic Press. 2000.
  • the segmentation includes detecting an object in response to a common characteristic, such as a colour or a level of uniformity, and consequently tracking this object from one picture to the next.
  • a common characteristic such as a colour or a level of uniformity
  • This provides for simplified segmentation and facilitates identification of suitable regions for being encoded with the same encoding block size.
  • an initial picture may segmented and the obtained segments tracked across subsequent pictures, until a new picture is segmented independently, etc.
  • the segment tracking is preferably performed by employing known motion estimation techniques.
  • the picture regions may comprise a plurality of picture areas which are suitable for similar choices of video encoding parameters and in particular encoding block size.
  • a picture region may be formed by grouping of a plurality of segments. For example, if the video signal corresponds to a football match, all regions having a predominantly green colour may be grouped together as one picture region. As another example, all segments having a predominant colour corresponding to the colour ofthe shirts of one ofthe teams may be grouped together as one picture region.
  • the picture segments need not necessarily correspond to physical objects. For example, two neighbouring segments may represent different objects but may both be highly textured. In this case, both segments may be suited for the same encoding block size.
  • the picture region or regions may specifically be determined in response to properties or characteristics ofthe picture. Specifically, the picture regions may be determined in response to a spatial frequency characteristic.
  • the segmentation processor 207 may be operable to determine the picture region as a group of pixels for which the spatial frequency characteristic meets a spatial frequency criterion. For example, a picture region may be determined by grouping all e.g. 4x4 pixel blocks for which 50% ofthe energy are contained in the three DCT coefficients corresponding to the lowest spatial frequencies. A second picture region may be determined by grouping all remaining 4x4 pixel blocks for which 50% ofthe energy is contained in the six DCT coefficients corresponding to the lowest spatial frequencies. A third picture region may be formed by the remaining 4x4 pixel blocks.
  • the picture may simply be divided into a number of picture regions without consideration ofthe properties ofthe picture.
  • a picture may simply be divided into a number of adjacent squares of a suitable size.
  • the method does not comprise a step of segmenting 301, or equivalently the segmentation step simply comprises in retrieving or receiving a picture region such as a block to be encoded and specifically a macro-block may be received.
  • Step 303 is followed by step 305 wherein a spatial frequency characteristic of the picture region is determined by the characteristics processor 209.
  • a spatial frequency characteristic indicative ofthe uniformity or flatness ofthe picture region is determined.
  • One such measure is a spatial frequency distribution wherein a concentration of energy towards the lower frequencies indicates an increased flatness.
  • the spatial frequency characteristic may be determined by performing a Discrete Cosine Transform (DCT) on one or more blocks within the picture region.
  • DCT Discrete Cosine Transform
  • a 4x4 DCT may be performed for all 4x4 pixel blocks in the picture region.
  • the DCT coefficient values may be averaged for all the blocks in the picture region and the spatial frequency characteristic may comprise the averaged coefficient values or an indication ofthe relative magnitude ofthe different coefficient values.
  • Another method of determining a measure for flatness is by determining a variance of pixel values within the picture region.
  • This variance may not only be a statistical variance but may also be any other measure ofthe variation or spread of pixel values within the picture region.
  • the variance or spread may be calculated by taking the average of a pixel and the surrounding pixels and then measuring the difference between the pixels and the average value. This is particularly suitable for an embodiment wherein each picture region corresponds to one or more macro-blocks.
  • step 303 and 305 is to determine a picture region having a spatial frequency characteristic. This may for example be done by determining a picture region in accordance with a given criterion and subsequently determining a spatial frequency characteristic for that region. Alternatively or additionally, a picture region may directly be determined e.g. by grouping picture areas or sections that have a given spatial frequency characteristic. In this case no specific analysis ofthe picture region is necessary to determine the spatial frequency characteristic as it is inherently given by the determination ofthe picture region.
  • Step 307 is followed by step 305 wherein the coding controller 211 sets an encoding block size for the picture region in response to the spatial frequency characteristic.
  • the encoding block size is set to a predetermined value.
  • the spatial frequency characteristic may consist in a single measure ofthe concentration of energy below a given frequency threshold.
  • the coding controller 211 may comprise a look-up table wherein if the energy concentration is below a first value of say 50%, a first predetermined encoding block size is set, if the energy concentration is below a second value of say 75%, a second predetermined encoding block size is set, and otherwise a third predetermined encoding block size is set.
  • the spatial frequency characteristic comprises an indication of a degree of flatness or uniformity in the picture region and the coding controller 211 is operable to set the encoding block size such that the encoding block size increases for increasing degrees of flatness or uniformity.
  • the first predetermined encoding block size is smaller than the second predetermined encoding block size which again is smaller than the third predetermined encoding block size. This may reduce texture removal or smearing for critical picture areas as larger encoding block size causes less texture loss than smaller encoding block sizes.
  • the encoding block size may comprise a group of allowable values for the encoding block size.
  • a specific parameter value may be selected for the encoding block size, whereas in other embodiments an encoding block size having a range of allowable values may be selected.
  • the encoding block size provides a constraint or restriction for the choice of encoding parameters for the consequent video encoding.
  • the coding controller 211 controls or influences the operation ofthe encode processor 213.
  • a set of allowable encoding block sizes may be selected or set by the coding controller 211.
  • the encode processor 213 may then encode the video signal by selecting an encoding block size from the set determined by the coding controller 211.
  • the coding controller 211 is operable to generate a set of allowable encoding block sizes in response to the spatial frequency characteristic and the encode processor 213 is operable to select the encoding block size from the set of allowable encoding block sizes.
  • the selection of encoding block size preferably comprises partitioning macro- blocks into motion estimation blocks in accordance with the H.264 standard.
  • Step 307 is followed by step 309 wherein the video signal is encoded in the encode processor 213 using the encoding block size determined by the coding controller 211.
  • the video encoding is in accordance with the H.264 video encoding standard.
  • the method of a preferred embodiment may thus reduce the blocking artefacts in pictures which are encoded with the use of H.26L-like techniques of motion compensation, i.e. with the use of variable block size during inter- frame prediction.
  • the method ofthe embodiment identifies flat areas in a picture and enforces a constraint on the encoding block size in those areas. Particularly, it is enforced that larger prediction blocks are used.
  • the required discrimination of regions based on their flatness can be performed during encoding, but it can also be available beforehand (e.g. if needed for other applications).
  • the complexity of such analysis in the case of performing picture segmentation
  • the method ofthe preferred embodiment is particularly but not exclusively suited for non-real time applications, such as video streaming, broadcast or publishing.
  • the coding controller 211 is furthermore operable to set a quantisation level for the picture region in response to the spatial frequency characteristic, and the encode processor 213 is operable to use the quantisation level for the picture region.
  • a quantisation threshold may be set below which all coefficients following an encoding DCT are set to zero.
  • a lower threshold may result in reduced data rates but also reduced picture quality.
  • the texture loss is increased for increasing thresholds and accordingly, the quantisation level is preferably lowered in line with the encoding block size being increased in order to further mitigate the texture smearing effect.
  • the encoding block size set is a motion estimation prediction block size.
  • other encoding block sizes may be set in response to the spatial frequency characteristic.
  • the transformation size used for transforming video data into spatial frequencies may be set in response to the spatial frequency characteristic.
  • more than one block size may be set in response to the spatial frequency characteristic.
  • the steps ofthe method may be iterated for different picture regions or different regions may be processed in each ofthe steps.
  • the invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. However, preferably, the invention is implemented as computer software running on one or more data processors and/or digital signal processors.
  • the elements and components of an embodiment ofthe invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to a video encoder (201) for encoding a video signal. The video encoder comprises a segmentation processor (207) which divides the picture into picture regions. Preferably, picture regions having a high degree of flatness or uniformity are determined in this way. A characteristics processor (209) determine a spatial frequency characteristic for each picture region, and a coding controller (211) selects an encoding block size, such as a prediction block size for motion estimation, in response to the spatial frequency characteristic. An encode processor (213) encodes the picture using the selected encoding block size. Specifically, increasing block sizes are selected for increasing degrees of uniformity or flatness indicated by the spatial frequency characteristic. Thereby, an increasing proportion of high frequency components and a consistent choice of encoding block sizes are maintained, and thus the coding artefacts from many encoders having variable prediction block sizes is reduced. The invention is particularly suitable for H.264 and similar encoders.

Description

Video encoding
FIELD OF THE INVENTION
The invention relates to a video encoder and method of video encoding therefore and in particular but not exclusively to video encoding in accordance with the H.264 video encoding standard.
BACKGROUND OF THE INVENTION
In recent years, the use of digital storage and distribution of video signals have become increasingly prevalent. In order to reduce the bandwidth required to transmit digital video signals, it is well known to use efficient digital video encoding comprising video data compression whereby the data rate of a digital video signal may be substantially reduced.
In order to ensure interoperability, video encoding standards have played a key role in facilitating the adoption of digital video in many professional- and consumer applications. Most influential standards are traditionally developed by either the International Telecommunications Union (ITU-T) or the MPEG (Motion Pictures Experts Group) committee ofthe ISO/TEC (the International Organization for Standardization/the International Electrotechnical Committee). The ITU-T standards, known as recommendations, are typically aimed at real-time communications (e.g. videoconferencing), while most MPEG standards are optimized for storage (e.g. for Digital Versatile Disc (DVD)) and broadcast (e.g. for Digital Video Broadcast (DVB) standard). Currently, one ofthe most widely used video compression techniques is known as the MPEG-2 (Motion Picture Expert Group) standard. MPEG-2 is a block based compression scheme wherein a frame is divided into a plurality of blocks each comprising eight vertical and eight horizontal pixels. For compression of luminance data, each block is individually compressed using a Discrete Cosine Transform (DCT) followed by quantization which reduces a significant number ofthe transformed data values to zero. For compression of chrominance data, the amount of chrominance data is usually first reduced by down- sampling, such that for each four luminance blocks two chrominance blocks are obtained (4:2:0 format), that are similarly compressed using the DCT and quantization. Frames based only on intra-frame compression are known as Intra Frames (I-Frames). In addition to intra-frame compression, MPEG-2 uses inter-frame compression to further reduce the data rate. Inter-frame compression includes generation of predicted frames (P-frames) based on previous I-frames. In addition, I and P frames are typically interposed by Bidirectional predicted frames (B-frames), wherein compression is achieved by only transmitting the differences between the B-frame and surrounding I- and P-frames. In addition, MPEG-2 uses motion estimation wherein the image of macroblocks of one frame found in subsequent frames at different positions are communicated simply by use of a motion vector.
As a result of these compression techniques, video signals of standard TV studio broadcast quality level can be transmitted at data rates of around 2-4 Mbps.
Recently, a new ITU-T standard, known as H.26L, has emerged. H.26L is becoming broadly recognized for its superior coding efficiency in comparison with the existing standards such as MPEG-2. Although the gain of H.26L generally decreases in proportion to the picture size, the potential for its deployment in a broad range of applications is undoubted. This potential has been recognized through formation ofthe Joint Video Team (JVT) forum, which is responsible for finalizing H.26L as a new joint ITU-T/MPEG standard. The new standard is known as H.264 or MPEG-4 AVC (Advanced Video Coding). Furthermore, H.264-based solutions are being considered in other standardization bodies, such as the DVB and DVD Forums. The H.264 standard employs the same principles of block-based motion- compensated hybrid transform coding that are known from the established standards such as MPEG-2. The H.264 syntax is, therefore, organized as the usual hierarchy of headers, such as picture-, slice- and macro-block headers, and data, such as motion-vectors, block-transform coefficients, quantizer scale, etc. However, the H.264 standard separates the Video Coding Layer (VCL), which represents the content ofthe video data, and the Network Adaptation Layer (NAL), which formats data and provides header information.
Furthermore, H264 allows for a much increased choice of encoding parameters. For example, it allows for a more elaborate partitioning and manipulation of 16x16 macro-blocks whereby e.g. motion compensation process can be performed on segmentations of a macro-block as small as 4x4 in size. Also, the selection process for motion compensated prediction of a sample block may involve a number of stored previously-decoded pictures, instead of only the adjacent pictures. Even with intra coding within a single frame, it is possible to form a prediction of a block using previously-decoded samples from the same frame. Also, the resulting prediction error following motion compensation may be transformed and quantized based on a 4x4 block size, instead ofthe traditional 8x8 size.
The H.264 standard may be considered a superset ofthe MPEG-2 video encoding syntax in that it uses the same global structuring of video data, while extending the number of possible coding decisions and parameters. A consequence of having a variety of coding decisions is that a good trade-off between the bit rate and picture quality may be achieved. However, although it is commonly acknowledged that while the H.264 standard may significantly reduce typical artefacts of block-based coding, it can also accentuate other artefacts. The fact that H.264 allows for an increased number of possible values for various coding parameters thus results in an increased potential for improving the encoding process but also results in increased sensitivity to the choice of video encoding parameters. Similarly to other standards, H.264 does not specify a normative procedure for selecting video encoding parameters, but describes through a reference implementation, a number of criteria that may be used to select video encoding parameters such as to achieve a suitable trade-off between coding efficiency, video quality and practicality of implementation.
However, the described criteria may not always result in an optimal or suitable selection of coding parameters. For example, the criteria may not result in selection of video encoding parameters optimal or desirable for the characteristics ofthe video signal, or the criteria may be based on attaining characteristics ofthe encoded signal which are not appropriate for the current application. For example, it is commonly acknowledged that while H.264 can significantly reduce some typical artefacts of MPEG-2 encoding, it can also cause other artefacts. One such artefact is a partial removal of texture, resulting in a plastic- like or smeared appearance of some picture areas. Another is coding artefacts creating coding noise in picture areas having a high degree of flatness. This is especially noticeable for larger picture formats, such as High Definition TV.
Accordingly, an improved system for video encoding would be advantageous and in particular an improved video encoding system exploiting the possibilities of emerging standards, such as H264, to improve video encoding is advantageous.
SUMMARY OF THE INVENTION
Accordingly, the invention seeks to mitigate, alleviate or eliminate one or more ofthe above mentioned disadvantages singly or in any combination. According to a first aspect ofthe invention, there is provided a video encoder for encoding a video signal comprising: means for determining a picture region having a spatial frequency characteristic; means for setting an encoding block size for the picture region in response to the spatial frequency characteristic; and means for encoding the video signal using the encoding block size for the picture region.
The invention allows for improved video encoding performance and in particular an improved video quality and/or reduced encoded data rate may be achieved. The inventors have realised that the preferred encoding block sizes depend on the spatial frequency characteristics. The invention allows for an improved quality and/or data rate to be achieved for a picture based on local adaptation of block encoding sizes based on local spatial frequency characteristics. A dynamic and local adaptation of block encoding sizes to suit local spatial frequency characteristics may be used. Local content dependent restriction of block encoding sizes may be used to improve performance ofthe video encoding. Specifically, the invention allows for an encoding block size to be set so as to result in high texture information being preserved for picture regions having a spatial frequency characteristic that indicates high levels of texture. Thus, the invention enables a significant reduction in the loss of texture information and thus mitigates the plastification or texture smearing effect encountered in many video encoders, including for example H.264 video encoders. Alternatively and additionally, the invention allows for an encoding block size to be set so as to result in reduced block based coding artefacts (e.g. blocking artefacts) for picture regions having a spatial frequency characteristic that indicates a high degree of flatness. Thus, the invention enables a significant reduction in the coding imperfections encountered in many video encoders, including for example H.264 video encoders.
According to a feature ofthe invention, the encoding block size is a motion estimation block size. The invention thus enables an optimisation of a motion estimation block size to suit the local spatial frequency characteristic of a picture region.
According to another feature ofthe invention, the means for determining the picture region is operable to determine the picture region as a group of pixels for which the spatial frequency characteristic meets a spatial frequency criterion. A picture region may be determined such that it has the same or similar spatial frequency properties and thus be suited for the same encoding block size. The spatial frequency criterion may be directly associated with a given encoding block size. For example, a picture region may be determined as one or more picture areas for which the spatial frequency characteristic meets a given characteristic corresponding to a predetermined encoding block size. According to another feature ofthe invention, the spatial frequency criterion is that a spatial frequency distribution comprises an energy concentration above an energy threshold for spatial frequencies below a frequency threshold. A high concentration of low frequency components is indicative of a high degree of flatness ofthe picture. It has been observed that coding artefacts related to block sizes, such as blocking artefacts, often occurs in areas of high levels of flatness. This may be mitigated by appropriate selection of encoding block size. Hence, the mitigation ofthe coding artefacts and imperfections may be facilitated and/or increased. The frequency properties associated with the spatial frequency characteristic may for example be performed by a frequency analysis, such as a Discrete Cosine Transform (DCT), or by determining a variance measure of surrounding pixels. According to another feature ofthe invention, the means for setting the encoding block size is operable to set the encoding block size to a predetermined value. This allows for a simple and easy to implement way of setting the encoding block size. A plurality of encoding block size values may be predetermined and associated with specific spatial frequency characteristics. A look-up table may for example be used to correlate a spatial frequency characteristic with a predetermined encoding block size.
According to another feature ofthe invention, the means for determining the picture region comprises means for determining the spatial frequency characteristic in response to a variance of pixel values within the picture region. This provides a good indication ofthe spatial frequency characteristic of a picture region yet is easy to implement and does not require any transforms.
According to another feature ofthe invention, the means for setting the encoding block size comprises means for generating a set of allowable encoding block sizes in response to the spatial frequency characteristic; and the means for encoding comprises means for selecting the encoding block size from the set of allowable encoding block sizes. The video encoding may use a encoding block size set in response to many parameters of which the spatial frequency characteristic is one. Specifically, the spatial frequency characteristic may be used to restrict the possible encoding block sizes to a limited set from which an encoding block size can be selected in response to other parameters. This allows a flexible selection of encoding block size to suit the video encoding, yet allows the performance ofthe video encoder to be controlled in response to the spatial frequency characteristic.
According to another feature ofthe invention, the video encoder further comprises: means for determining a second picture region having a second spatial frequency characteristic; means for setting a second encoding block size for the second picture region in response to the second spatial frequency characteristic; and wherein the means for encoding the video signal is operable to encode the video signal using the second encoding block size for the second picture region. The means for processing the second picture region may be the same means for processing the first picture region. The picture regions may for example be processed in parallel in different functional modules or sequentially in the same functional module. Preferably a plurality of picture regions is determined and the encoding block size is set for each picture region to suit the spatial frequency characteristic of that region. This allows for the encoding block size and to be optimised for the local spatial frequency characteristics and thus for an improved video encoding.
According to another feature ofthe invention, the spatial frequency characteristic comprises an indication of a degree of flatness in the picture region and the means for setting the encoding block size is operable to increase the encoding block size for increasing degrees of flatness. Picture areas having high degrees of flatness have been observed to be sensitive to coding imperfections such as block based coding artefacts. Block based artefacts may for example be blocking artefacts. The inventors ofthe present invention have realised that this effect may be mitigated by increasing the encoding block size. Accordingly, an improved video encoding quality may be obtained.
According to another feature ofthe invention, the spatial frequency characteristic comprises an indication of a degree of uniformity in the picture region and the means for setting the encoding block size is operable to increase the encoding block size for increasing degrees of uniformity. Picture areas having high degrees of uniformity have been observed to be sensitive to coding imperfections such as texture loss or smearing. The inventors ofthe present invention have realised that this effect may be mitigated by increasing the encoding block size. Accordingly, a reduced texture loss or smearing may be achieved, and thus an improved video encoding quality may be obtained.
According to another feature ofthe invention, the spatial frequency characteristic comprises an indication of a concentration of energy towards lower frequencies and the means for setting the encoding block size is operable to increase the encoding block size for an increasing concentration of energy towards lower frequencies. A concentration of energy towards low frequencies may indicate a high degree of flatness and a susceptibility to coding imperfections in the video encoding, and this may be mitigated by selection of larger encoding block sizes. According to another feature ofthe invention, the video encoder further comprises: means for setting a quantisation level for the picture region in response to the spatial frequency characteristic; and the means for encoding the video signal is operable to use the quantisation level for the picture region. The performance ofthe video encoder may furthermore be improved by setting both a quantisation level and an encoding block size in response to the spatial frequency characteristic. The combined effect of quantisation levels and encoding block sizes on video encoding artefacts such as texture loss or block based coding artefacts is significant and highly correlated. Therefore, performance may be improved by adjusting both parameters in response to the spatial frequency characteristic of a picture region.
According to another feature ofthe invention, the video encoder is a video encoder in accordance with the H.264 recommendation defined by the International Telecommunications Union. The invention thus enables an improved video encoder which is operable to work and exploit the options and restrictions ofthe H.264 standard. H.264 is jointly developed by ITU-T (International Telecommunication Union - Telecommunication Standardization Sector) and ISO/TEC (the International Organization for Standardization/ the International Electrotechnical Committee). ITU-T Rec. H.264 is equivalent to ISO/TEC 14496-10 AVC.
According to another feature ofthe invention, the encoding block size is selected from a set of motion estimate block sizes of inter prediction modes defined in the H.264 standard. Thus, the invention enables an improved H.264 video encoder wherein the selection of standardised encoding block sizes is controlled so as to suit a local spatial frequency characteristic.
According to a second aspect ofthe invention, there is provided a method of video encoding comprising the steps of: determining a picture region having a spatial frequency characteristic; setting an encoding block size for the picture region in response to the spatial frequency characteristic; and encoding the video signal using the encoding block size for the picture region.
These and other aspects, features and advantages ofthe invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.
BRIEF DESCRIPTION OF THE DRAWINGS
An embodiment ofthe invention will be described, by way of example only, with reference to the drawings, in which: FIG. 1 illustrates the possible partitioning of macro-blocks into motion estimation blocks in accordance with the H.264 standard;
FIG. 2 illustrates a block diagram of a video encoder in accordance with an embodiment ofthe invention; and FIG. 3 illustrates a flow chart of a method of video encoding in accordance with an embodiment ofthe invention.
DESCRIPTION OF PREFERRED EMBODIMENTS
The following description focuses on an embodiment ofthe invention applicable to video encoding in accordance with the H.26L, H.264 or MPEG-4 AVC video encoding standards. However, it will be appreciated that the invention is not limited to this application but may be applied to many other video encoding algorithms, specifications or standards.
Most established video coding standards (e.g. MPEG-2) inherently use block- based motion compensation as a practical method of exploiting correlation between subsequent pictures in video. This method attempts to predict each macro-block (16x16 pixels) in a certain picture by its "best match" in an adjacent reference picture. If the pixel- wise difference between a macro-block and its prediction is small enough, this difference is encoded rather than the macro-block itself. The relative displacement ofthe prediction block with respect to the coordinates ofthe actual macro-block is indicated by a motion vector, which is coded separately.
New video coding standards such as H.26L, H.264 or MPEG-4 AVC promise improved video encoding performance in terms of an improved quality to data rate ratio. Much ofthe data rate reduction offered by these standards can be attributed to improved methods of motion compensation. These methods mostly extend the basic principles of previous standards, such as MPEG-2.
One relevant extension is the use of multiple reference pictures for prediction, whereby a prediction block may originate in more distant (the distance is currently unrestricted) future- or past pictures. Another and even more efficient extension is the possibility of using variable block sizes for prediction of a macro-block. Accordingly, a macro-block (still 16x16 pixels) may be partitioned into a number of smaller blocks and each of these sub-blocks can be predicted separately. Hence, different sub-blocks can have different motion vectors and can be retrieved from different reference pictures. The number, size and orientation of prediction blocks are uniquely determined by definition of inter prediction modes, which describe possible partitioning of a macro-block into 8x8 blocks and further partitioning of each ofthe 8x8 sub-block. FIG. 1 illustrates the possible partitioning of macro-blocks into motion estimation blocks in accordance with the H.264 standard. Various experiments with video encoding according to H.264 have demonstrated that the use of multiple reference pictures and especially smaller prediction blocks can lead to significant bit-rate reductions for the same quality level. However, it has also been observed that that while H.264 can significantly reduce some typical artefacts of MPEG-2 video encoding, it can also cause other artefacts. One such artefact is a partial removal of texture, resulting in texture smearing and a plastic- like appearance of some picture areas. Another artefact is noise in static areas with little detail. The artefacts are most noticeable in large areas with little detail or variation and is especially noticeable for larger picture formats, such as High Definition TV.
The inventors ofthe current invention have realised that the coding artefacts are affected by the encoding block size used, and that it may be mitigated by improved selection of encoding block sizes.
FIG. 2 illustrates a block diagram of a video encoder 201 in accordance with an embodiment ofthe invention.
The video encoder 201 is coupled to an external video source 203 from which a video signal to be encoded is received. The video signal comprises a number of pictures or frames.
The video encoder 201 comprises a buffer 205 coupled to the external video source 203. The buffer 205 receives the video signal from the external video source 203 and stores one or more pictures or frames until the video encoder 201 is ready to encode them. The external video source 203 is furthermore coupled to a segmentation processor 207. The segmentation processor 207 is operable to determine a picture region by dividing the picture into different picture regions. The picture may be divided into two or more picture regions in response to any suitable algorithm or criterion and specifically the picture may be divided into two picture regions by selecting a single picture region for which a given criterion is met. The segmentation processor 207 is coupled to a characteristics processor 209.
The characteristics processor 209 is operable to determine a spatial frequency characteristic for the picture region determined by the segmentation processor 207. The spatial frequency characteristic may for example indicate a spatial frequency domain energy distribution for the determined picture region. For example, the spatial frequency characteristic may indicate the concentration of energy below a given frequency threshold.
In other embodiments, no specific segmentation is performed in the segmentation processor 207. Rather, the video signal to be encoded is fed to the characteristics processor 209 in predetermined picture regions. Specifically, individual macro-blocks may be fed directly from the external video source 203 or the buffer 205 to the characteristics processor 209. In this embodiment the picture region is directly generated by receiving or retrieving a single macro-block an processing this.
In the preferred embodiment, the spatial frequency characteristic comprises and indication of a degree of flatness and/or uniformity of the determined picture region.
A region in a picture is generally considered uniform if it lacks texture/detail or if it contains texture that is stationary, i.e. has uniform variation. A flat region is generally considered a region that simply lacks texture and/or detail and thus has relatively low concentrations of high frequent content. A typical flat region thus appears flat to a viewer. A typical example of flat regions is regions of uniform colour in cartoons. The term uniform is generally considered to be broader than flat and thus typically a flat region is also considered flat (but not necessarily vice versa).
In regions that have low variation, such as uniform or flat regions, deviations are much easier noticed. Hence, coding imperfections and artefacts may be particularly disadvantageous in these regions. For example, a significant problem with flat areas is that they are characterized by low frequent content to which the human eye is more responsive and therefore also more sensitive to artefacts. Moreover, flat areas often correspond to more static objects or the background in a scene (e.g. walls, sky, etc.), where the human eye has more time to focus. To reduce the data rate, most video coders rely on the property ofthe human eye to be relatively less sensitive to high frequency content, and accordingly the video coders include mechanisms for suppressing higher frequencies in the spectrum of a video signal. With standard block-based coders, this is mostly achieved through block transforms and weighting and quantization ofthe transform coefficients, which are designed in such that lower order coefficients are preserved at the cost ofthe higher order coefficients.
The inventors have realised that in flat areas coding artefacts related to block based coding can be particularly disturbing. Such artefacts may occur in conventional coders due to inconsistent selection of encoding block sizes and the corresponding quantization levels. The inventors have further realised that the partial texture loss or smearing typical of conventional encoders are affected by the selection of encoding block sizes. A possible explanation for the removal of texture,1 which is of a predominantly high frequency nature, is that in H.264, a 16x16 macro-block may be transformed using a 4x4 block transform. In contrast, MPEG-2 uses an 8x8 DCT transform for the same purpose.
Accordingly, by using smaller transform blocks, H.264 compacts signal energy into a larger number of low frequency coefficients, leaving a smaller number of high frequency coefficients that are more susceptible to be suppressed during the consecutive video encoding (for example due to coefficient weighting or quantization). As texture information is typically of a relatively high frequency nature, a loss of texture results.
In a simple embodiment, the spatial frequency characteristic may be a single binary parameter which indicates if a given criterion is met. For example, the spatial frequency characteristic may be set to zero if, say, more than 60% ofthe signal energy is contained within the lowest 20% ofthe relevant frequency spectrum and to one otherwise. In this case, a spatial frequency characteristic value of zero indicates a high concentration of energy towards the lower frequencies. This is an indication ofthe picture region having a high degree of flatness, and therefore indicating that the picture region has a high susceptibility to coding artefacts when being encoded.
The characteristics processor 209 is coupled to a coding controller 211. The coding controller 211 is operable to set an encoding block size for the picture region in response to the spatial frequency characteristic. In the preferred embodiment, the encoding block size is a motion estimation block size and is specifically a prediction block size as allowed by the inter prediction modes defined in the H.264 video encoding standard.
In the simple embodiment mentioned above, the encoding block size may be set to a first block size if the spatial frequency characteristic is zero and to a second block size if the spatial frequency characteristic is a one. Thus, in some embodiments, the coding controller 211 may simply set the encoding block size by selecting a predetermined block size in response to a predetermined association between values ofthe spatial frequency characteristic and the encoding block sizes. The coding controller 211 is coupled to an encode processor 213 which is furthermore coupled to the buffer 205. The encode processor 213 is operable to encode the picture stored in the buffer 205 using the encoding block size set by the coding controller 211 for the picture region determined by the segmentation processor 207. Thus, the video encoding will be such that the encoding block size for the picture region is specifically adapted to suit the spatial frequency characteristic of that picture region. For example, in the simple embodiment described, a concentration of signal energy towards lower spatial frequencies will result in a first larger block size being used. Otherwise a lower block size will be used or at least permitted thereby allowing for improved encoding efficiency. Hence, if the spatial frequency characteristic comprises an indication of a high degree of flatness (and thus a sensitivity to coding artefacts) larger encoding block sizes are used, thereby mitigating or eliminating the coding imperfections. In the preferred embodiment, the encoding processor 213 is operable to encode the video signal in accordance with the H.264 video encoding standard. An embodiment particularly suited for easy implementation is where the picture regions correspond to one macro block. In this embodiment, the macro-blocks are directly fed to the characteristics processor 209 which then determines the spatial frequency characteristics of that macro-block. In response, the coding controller 211 determines a suitable encoding block size for that macro-block, and possibly on a number of neighboring macro-blocks.
The encoding processor 213 receives the macro-block from the buffer 205 and encodes it using the encode block size selected for the macro-block by the coding controller. This enables parallel, and therefore more efficient execution in hardware.
Furthermore, the characteristic processor (209) may store the spatial frequency characteristics obtained for macro-blocks from subsequent pictures. This would enable an analysis of time-consistency of spatial spectral characteristics that can further be used to optimize the selection of encoding parameters. For example it may facilitate discrimination between texture ofthe underlying picture and texture origination from noise ofthe video source (e.g. the so-called "film grain" in movies). FIG. 3 illustrates a flow chart of a method of video encoding in accordance with an embodiment ofthe invention. The method is applicable to the video encoder 201 of FIG. 2 and will be described with reference to this.
In step 301, the video encoder 201 receives the video signal to be encoded from the external video source. Step 301 is followed by step 303 wherein the segmentation processor 207 determines a picture region. The picture region may be determined in accordance with any suitable criterion or algorithm. In a simple embodiment, a single picture region may be selected in accordance with a criterion and the picture is divided into just two picture regions consisting in the selected picture region and a picture region comprising the remainder ofthe picture. However, in the preferred embodiment the picture is divided into several picture regions.
In the preferred embodiment, the picture is divided into picture regions by segmentation ofthe picture. In the preferred embodiment, picture segmentation comprises the process of a spatial grouping of pixels based on a common property (e.g. colour). There exist several approaches to picture- and video segmentation, and the effectiveness of each will generally depend on the application. It will be appreciated that any known method or algorithm for segmentation of a picture may be used without detracting from the invention. An introduction to picture or video segmentation may be found in for example E. Steinbach, P. Eisert, B. Girod, "Motion-based Analysis and Segmentation of Image Sequences using 3- D Scene Models," Signal Processing: Special Issue: Video Sequence Segmentation for Content-based Processing and Manipulation, vol. 66, no. 2, pp. 233-248, IEEE 1998 or A. Bovik: Handbook of Image and Video Processing, Academic Press. 2000.
In the preferred embodiment, the segmentation includes detecting an object in response to a common characteristic, such as a colour or a level of uniformity, and consequently tracking this object from one picture to the next. This provides for simplified segmentation and facilitates identification of suitable regions for being encoded with the same encoding block size. As an example, an initial picture may segmented and the obtained segments tracked across subsequent pictures, until a new picture is segmented independently, etc. The segment tracking is preferably performed by employing known motion estimation techniques.
In the preferred embodiment, the picture regions may comprise a plurality of picture areas which are suitable for similar choices of video encoding parameters and in particular encoding block size. Thus, a picture region may be formed by grouping of a plurality of segments. For example, if the video signal corresponds to a football match, all regions having a predominantly green colour may be grouped together as one picture region. As another example, all segments having a predominant colour corresponding to the colour ofthe shirts of one ofthe teams may be grouped together as one picture region. The picture segments need not necessarily correspond to physical objects. For example, two neighbouring segments may represent different objects but may both be highly textured. In this case, both segments may be suited for the same encoding block size.
In a specific embodiment, the picture region or regions may specifically be determined in response to properties or characteristics ofthe picture. Specifically, the picture regions may be determined in response to a spatial frequency characteristic. Thus, the segmentation processor 207 may be operable to determine the picture region as a group of pixels for which the spatial frequency characteristic meets a spatial frequency criterion. For example, a picture region may be determined by grouping all e.g. 4x4 pixel blocks for which 50% ofthe energy are contained in the three DCT coefficients corresponding to the lowest spatial frequencies. A second picture region may be determined by grouping all remaining 4x4 pixel blocks for which 50% ofthe energy is contained in the six DCT coefficients corresponding to the lowest spatial frequencies. A third picture region may be formed by the remaining 4x4 pixel blocks.
In other embodiments, the picture may simply be divided into a number of picture regions without consideration ofthe properties ofthe picture. For example, a picture may simply be divided into a number of adjacent squares of a suitable size.
In yet other embodiments, the method does not comprise a step of segmenting 301, or equivalently the segmentation step simply comprises in retrieving or receiving a picture region such as a block to be encoded and specifically a macro-block may be received. Step 303 is followed by step 305 wherein a spatial frequency characteristic of the picture region is determined by the characteristics processor 209. In the preferred embodiment, a spatial frequency characteristic indicative ofthe uniformity or flatness ofthe picture region is determined. One such measure is a spatial frequency distribution wherein a concentration of energy towards the lower frequencies indicates an increased flatness. In one embodiment, the spatial frequency characteristic may be determined by performing a Discrete Cosine Transform (DCT) on one or more blocks within the picture region. For example, a 4x4 DCT may be performed for all 4x4 pixel blocks in the picture region. The DCT coefficient values may be averaged for all the blocks in the picture region and the spatial frequency characteristic may comprise the averaged coefficient values or an indication ofthe relative magnitude ofthe different coefficient values.
Another method of determining a measure for flatness is by determining a variance of pixel values within the picture region. This variance may not only be a statistical variance but may also be any other measure ofthe variation or spread of pixel values within the picture region. The variance or spread may be calculated by taking the average of a pixel and the surrounding pixels and then measuring the difference between the pixels and the average value. This is particularly suitable for an embodiment wherein each picture region corresponds to one or more macro-blocks.
It will be appreciated that the combined effect of step 303 and 305 is to determine a picture region having a spatial frequency characteristic. This may for example be done by determining a picture region in accordance with a given criterion and subsequently determining a spatial frequency characteristic for that region. Alternatively or additionally, a picture region may directly be determined e.g. by grouping picture areas or sections that have a given spatial frequency characteristic. In this case no specific analysis ofthe picture region is necessary to determine the spatial frequency characteristic as it is inherently given by the determination ofthe picture region.
Step 307 is followed by step 305 wherein the coding controller 211 sets an encoding block size for the picture region in response to the spatial frequency characteristic. In some embodiments, the encoding block size is set to a predetermined value. For example, the spatial frequency characteristic may consist in a single measure ofthe concentration of energy below a given frequency threshold. The coding controller 211 may comprise a look-up table wherein if the energy concentration is below a first value of say 50%, a first predetermined encoding block size is set, if the energy concentration is below a second value of say 75%, a second predetermined encoding block size is set, and otherwise a third predetermined encoding block size is set.
In the preferred embodiment, the spatial frequency characteristic comprises an indication of a degree of flatness or uniformity in the picture region and the coding controller 211 is operable to set the encoding block size such that the encoding block size increases for increasing degrees of flatness or uniformity. In the previous example, the first predetermined encoding block size is smaller than the second predetermined encoding block size which again is smaller than the third predetermined encoding block size. This may reduce texture removal or smearing for critical picture areas as larger encoding block size causes less texture loss than smaller encoding block sizes.
In some embodiments, the encoding block size may comprise a group of allowable values for the encoding block size. Hence, in some cases, a specific parameter value may be selected for the encoding block size, whereas in other embodiments an encoding block size having a range of allowable values may be selected. Accordingly, the encoding block size provides a constraint or restriction for the choice of encoding parameters for the consequent video encoding. Thus, in the preferred embodiment, the coding controller 211 controls or influences the operation ofthe encode processor 213. Thus, rather, than a single encoding block size value being selected by the coding controller 211, a set of allowable encoding block sizes may be selected or set by the coding controller 211. The encode processor 213 may then encode the video signal by selecting an encoding block size from the set determined by the coding controller 211. Hence, in some embodiments, the coding controller 211 is operable to generate a set of allowable encoding block sizes in response to the spatial frequency characteristic and the encode processor 213 is operable to select the encoding block size from the set of allowable encoding block sizes.
In some embodiments, where each picture region corresponds to one or more macro-block, the selection of encoding block size preferably comprises partitioning macro- blocks into motion estimation blocks in accordance with the H.264 standard.
Step 307 is followed by step 309 wherein the video signal is encoded in the encode processor 213 using the encoding block size determined by the coding controller 211. In the preferred embodiment, the video encoding is in accordance with the H.264 video encoding standard.
Specifically, the method of a preferred embodiment may thus reduce the blocking artefacts in pictures which are encoded with the use of H.26L-like techniques of motion compensation, i.e. with the use of variable block size during inter- frame prediction. The method ofthe embodiment identifies flat areas in a picture and enforces a constraint on the encoding block size in those areas. Particularly, it is enforced that larger prediction blocks are used. The required discrimination of regions based on their flatness can be performed during encoding, but it can also be available beforehand (e.g. if needed for other applications). The complexity of such analysis (in the case of performing picture segmentation) may in some cases be a restrictive factor for real-time implementation. The method ofthe preferred embodiment is particularly but not exclusively suited for non-real time applications, such as video streaming, broadcast or publishing.
In the preferred embodiment, the coding controller 211 is furthermore operable to set a quantisation level for the picture region in response to the spatial frequency characteristic, and the encode processor 213 is operable to use the quantisation level for the picture region. For example, a quantisation threshold may be set below which all coefficients following an encoding DCT are set to zero. A lower threshold may result in reduced data rates but also reduced picture quality. The texture loss is increased for increasing thresholds and accordingly, the quantisation level is preferably lowered in line with the encoding block size being increased in order to further mitigate the texture smearing effect. In the preferred embodiment, the encoding block size set is a motion estimation prediction block size. However, it will be appreciated that other encoding block sizes may be set in response to the spatial frequency characteristic. For example, the transformation size used for transforming video data into spatial frequencies may be set in response to the spatial frequency characteristic. Furthermore, more than one block size may be set in response to the spatial frequency characteristic. For example, in some embodiments it may be advantageous to set both a prediction block size and a transform block size in response to the spatial frequency characteristic and in particular to set these to the same block size. The steps ofthe method may be iterated for different picture regions or different regions may be processed in each ofthe steps.
The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. However, preferably, the invention is implemented as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment ofthe invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors. Although the present invention has been described in connection with the preferred embodiment, it is not intended to be limited to the specific form set forth herein. Rather, the scope ofthe present invention is limited only by the accompanying claims. In the claims, the term comprising does not exclude the presence of other elements or steps. Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is no feasible and/or advantageous. In addition, singular references do not exclude a plurality. Thus references to "a", "an", "first", "second" etc do not preclude a plurality.

Claims

CLAIMS:
1. A video encoder (201) for encoding a video signal comprising: means (207, 209) for determining a picture region having a spatial frequency characteristic; means (211) for setting an encoding block size for the picture region in response to the spatial frequency characteristic; and means (213) for encoding the video signal using the encoding block size for the picture region.
2. A video encoder (201) as claimed in claim 1 wherein the encoding block size is a motion estimation block size.
3. A video encoder (201) as claimed in claim 1 wherein the means (207, 209) for determining the picture region is operable to determine the picture region as a group of pixels for which the spatial frequency characteristic meets a spatial frequency criterion.
4. A video encoder (201) as claimed in claim 3 wherein the spatial frequency criterion is that a spatial frequency distribution comprises an energy concentration above an energy threshold for spatial frequencies below a frequency threshold.
5. A video encoder (201) as claimed in claim 3 wherein the means (211) for setting the encoding block size is operable to set the encoding block size to a predetermined value.
6. A video encoder (201) as claimed in claim 1 wherein the means (207, 209) for determining the picture region comprises means for determining the spatial frequency characteristic in response to a variance of pixel values within the picture region.
7. A video encoder (201) as claimed in claim 1 wherein the means (211) for setting the encoding block size comprises means for generating a set of allowable encoding block sizes in response to the spatial frequency characteristic; and the means (213) for encoding comprises means for selecting the encoding block size from the set of allowable encoding block sizes.
8. A video encoder (201) as claimed in claim 1 further comprising: means for determining a second picture region having a second spatial frequency characteristic; means for setting a second encoding block size for the second picture region in response to the second spatial frequency characteristic; and wherein the means (213) for encoding the video signal is operable to encode the video signal using the second encoding block size for the second picture region.
9. A video encoder (201) as claimed in claim 1 wherein the spatial frequency characteristic comprises an indication of a degree of flatness in the picture region and the means (211) for setting the encoding block size is operable to increase the encoding block size for increasing degrees of flatness.
10. A video encoder (201) as claimed in claim 1 wherein the spatial frequency characteristic comprises an indication of a degree of uniformity in the picture region and the means (211) for setting the encoding block size is operable to increase the encoding block size for increasing degrees of uniformity.
11. A video encoder (201) as claimed in claim 1 wherein the spatial frequency characteristic comprises an indication of a concentration of energy towards lower frequencies and the means (211) for setting the encoding block size is operable to increase the encoding block size for an increasing concentration of energy towards lower frequencies.
12. A video encoder (201) as claimed in claim 1 further comprising: means for setting a quantisation level for the picture region in response to the spatial frequency characteristic; and wherein the means (213) for encoding the video signal is operable to use the quantisation level for the picture region.
13. A video encoder (201) as claimed in claim 1 wherein the video encoder (201) is a video encoder in accordance with the H.264 recommendation defined by the International Telecommunications Union.
14. A video encoder (201) as claimed in claim 13 wherein the encoding block size is selected from a set of motion estimate block sizes of inter prediction modes defined in the H.26L standard.
15. A method of video encoding (300) comprising the steps of: determining (303, 305) a picture region having a spatial frequency characteristic; setting (307) an encoding block size for the picture region in response to the spatial frequency characteristic; and encoding (309) the video signal using the encoding block size for the picture region.
16. A computer program enabling the carrying out of a method according to claim 15.
17. A record carrier comprising a computer program as claimed in claim 16.
PCT/IB2004/050145 2003-03-03 2004-02-25 Video encoding WO2004080081A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP04714399A EP1602239A1 (en) 2003-03-03 2004-02-25 Video encoding
JP2006506639A JP2006519565A (en) 2003-03-03 2004-02-25 Video encoding
US10/547,324 US20060165163A1 (en) 2003-03-03 2004-02-25 Video encoding

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP03100520 2003-03-03
EP03100520.0 2003-03-03

Publications (1)

Publication Number Publication Date
WO2004080081A1 true WO2004080081A1 (en) 2004-09-16

Family

ID=32946913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2004/050145 WO2004080081A1 (en) 2003-03-03 2004-02-25 Video encoding

Country Status (6)

Country Link
US (1) US20060165163A1 (en)
EP (1) EP1602239A1 (en)
JP (1) JP2006519565A (en)
KR (1) KR20050105268A (en)
CN (1) CN1757237A (en)
WO (1) WO2004080081A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3481063A4 (en) * 2016-07-04 2019-05-08 Sony Corporation Image processing device and method

Families Citing this family (65)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8472792B2 (en) 2003-12-08 2013-06-25 Divx, Llc Multimedia distribution system
US7519274B2 (en) 2003-12-08 2009-04-14 Divx, Inc. File format for multiple track digital data
US9647952B2 (en) * 2004-08-06 2017-05-09 LiveQoS Inc. Network quality as a service
US9189307B2 (en) 2004-08-06 2015-11-17 LiveQoS Inc. Method of improving the performance of an access network for coupling user devices to an application server
US8009696B2 (en) 2004-08-06 2011-08-30 Ipeak Networks Incorporated System and method for achieving accelerated throughput
US7933328B2 (en) * 2005-02-02 2011-04-26 Broadcom Corporation Rate control for digital video compression processing
US7515710B2 (en) 2006-03-14 2009-04-07 Divx, Inc. Federated digital rights management scheme including trusted systems
CN101636726B (en) 2007-01-05 2013-10-30 Divx有限责任公司 Video distribution system including progressive playback
US8737485B2 (en) * 2007-01-31 2014-05-27 Sony Corporation Video coding mode selection system
KR101385957B1 (en) * 2007-10-04 2014-04-17 삼성전자주식회사 Method and appartus for correcting the coefficients in the decoder
EP2048887A1 (en) * 2007-10-12 2009-04-15 Thomson Licensing Encoding method and device for cartoonizing natural video, corresponding video signal comprising cartoonized natural video and decoding method and device therefore
WO2009051690A1 (en) * 2007-10-16 2009-04-23 Thomson Licensing Methods and apparatus for artifact removal for bit depth scalability
US8233768B2 (en) 2007-11-16 2012-07-31 Divx, Llc Hierarchical and reduced index structures for multimedia files
KR20090099720A (en) * 2008-03-18 2009-09-23 삼성전자주식회사 Method and apparatus for video encoding and decoding
US8325796B2 (en) 2008-09-11 2012-12-04 Google Inc. System and method for video coding using adaptive segmentation
CN101686388B (en) * 2008-09-24 2013-06-05 国际商业机器公司 Video streaming encoding device and method thereof
WO2010090484A2 (en) * 2009-02-09 2010-08-12 삼성전자 주식회사 Video encoding method and apparatus using low-complexity frequency transformation, and video decoding method and apparatus
JP5133290B2 (en) * 2009-03-31 2013-01-30 株式会社Kddi研究所 Video encoding apparatus and decoding apparatus
JP5491073B2 (en) * 2009-05-22 2014-05-14 キヤノン株式会社 Image processing apparatus, image processing method, and program
US8902985B2 (en) * 2009-06-22 2014-12-02 Panasonic Intellectual Property Corporation Of America Image coding method and image coding apparatus for determining coding conditions based on spatial-activity value
US20110038416A1 (en) * 2009-08-14 2011-02-17 Apple Inc. Video coder providing improved visual quality during use of heterogeneous coding modes
JP5723888B2 (en) 2009-12-04 2015-05-27 ソニック アイピー, インコーポレイテッド Basic bitstream cryptographic material transmission system and method
JP2011239365A (en) * 2010-04-12 2011-11-24 Canon Inc Moving image encoding apparatus and method for controlling the same, and computer program
US8660174B2 (en) * 2010-06-15 2014-02-25 Mediatek Inc. Apparatus and method of adaptive offset for video coding
US8842184B2 (en) * 2010-11-18 2014-09-23 Thomson Licensing Method for determining a quality measure for a video image and apparatus for determining a quality measure for a video image
US8914534B2 (en) 2011-01-05 2014-12-16 Sonic Ip, Inc. Systems and methods for adaptive bitrate streaming of media stored in matroska container files using hypertext transfer protocol
US10951743B2 (en) 2011-02-04 2021-03-16 Adaptiv Networks Inc. Methods for achieving target loss ratio
US9590913B2 (en) 2011-02-07 2017-03-07 LiveQoS Inc. System and method for reducing bandwidth usage of a network
US8717900B2 (en) 2011-02-07 2014-05-06 LivQoS Inc. Mechanisms to improve the transmission control protocol performance in wireless networks
KR101898464B1 (en) * 2011-03-17 2018-09-13 삼성전자주식회사 Motion estimation apparatus and method for estimating motion thereof
US8812662B2 (en) 2011-06-29 2014-08-19 Sonic Ip, Inc. Systems and methods for estimating available bandwidth and performing initial stream selection when streaming content
US9467708B2 (en) 2011-08-30 2016-10-11 Sonic Ip, Inc. Selection of resolutions for seamless resolution switching of multimedia content
CN103875248B (en) 2011-08-30 2018-09-07 帝威视有限公司 For encoding the system and method with stream process by using the video of multiple Maximum Bit Rate grade encodings
US8787570B2 (en) 2011-08-31 2014-07-22 Sonic Ip, Inc. Systems and methods for automatically genenrating top level index files
US8799647B2 (en) 2011-08-31 2014-08-05 Sonic Ip, Inc. Systems and methods for application identification
US8909922B2 (en) 2011-09-01 2014-12-09 Sonic Ip, Inc. Systems and methods for playing back alternative streams of protected content protected using common cryptographic information
US8964977B2 (en) 2011-09-01 2015-02-24 Sonic Ip, Inc. Systems and methods for saving encoded media streamed using adaptive bitrate streaming
US9398300B2 (en) * 2011-10-07 2016-07-19 Texas Instruments Incorporated Method, system and apparatus for intra-prediction in video signal processing using combinable blocks
US20130179199A1 (en) 2012-01-06 2013-07-11 Rovi Corp. Systems and methods for granting access to digital content using electronic tickets and ticket tokens
US9936267B2 (en) 2012-08-31 2018-04-03 Divx Cf Holdings Llc System and method for decreasing an initial buffering period of an adaptive streaming system
US9313510B2 (en) 2012-12-31 2016-04-12 Sonic Ip, Inc. Use of objective quality measures of streamed content to reduce streaming bandwidth
US9191457B2 (en) 2012-12-31 2015-11-17 Sonic Ip, Inc. Systems, methods, and media for controlling delivery of content
US9906785B2 (en) 2013-03-15 2018-02-27 Sonic Ip, Inc. Systems, methods, and media for transcoding video data according to encoding parameters indicated by received metadata
US10397292B2 (en) 2013-03-15 2019-08-27 Divx, Llc Systems, methods, and media for delivery of content
CN104871544B (en) * 2013-03-25 2018-11-02 麦克赛尔株式会社 Coding method and code device
US9094737B2 (en) 2013-05-30 2015-07-28 Sonic Ip, Inc. Network video streaming with trick play based on separate trick play files
US9100687B2 (en) 2013-05-31 2015-08-04 Sonic Ip, Inc. Playback synchronization across playback devices
US9380099B2 (en) 2013-05-31 2016-06-28 Sonic Ip, Inc. Synchronizing multiple over the top streaming clients
CN104683801B (en) 2013-11-29 2018-06-05 华为技术有限公司 Method for compressing image and device
US9386067B2 (en) 2013-12-30 2016-07-05 Sonic Ip, Inc. Systems and methods for playing adaptive bitrate streaming content by multicast
US9866878B2 (en) 2014-04-05 2018-01-09 Sonic Ip, Inc. Systems and methods for encoding and playing back video at different frame rates using enhancement layers
US9392272B1 (en) 2014-06-02 2016-07-12 Google Inc. Video coding using adaptive source variance based partitioning
US9578324B1 (en) 2014-06-27 2017-02-21 Google Inc. Video coding using statistical-based spatially differentiated partitioning
EP3989477A1 (en) 2014-08-07 2022-04-27 DivX, LLC Systems and methods for protecting elementary bitstreams incorporating independently encoded tiles
CN113259731B (en) 2015-01-06 2023-07-04 帝威视有限公司 System and method for encoding content and sharing content between devices
KR101897959B1 (en) 2015-02-27 2018-09-12 쏘닉 아이피, 아이엔씨. System and method for frame replication and frame extension in live video encoding and streaming
CN115278232A (en) * 2015-11-11 2022-11-01 三星电子株式会社 Method for decoding video and method for encoding video
US10075292B2 (en) 2016-03-30 2018-09-11 Divx, Llc Systems and methods for quick start-up of playback
US10231001B2 (en) 2016-05-24 2019-03-12 Divx, Llc Systems and methods for providing audio content during trick-play playback
US10129574B2 (en) 2016-05-24 2018-11-13 Divx, Llc Systems and methods for providing variable speeds in a trick-play mode
US10148989B2 (en) 2016-06-15 2018-12-04 Divx, Llc Systems and methods for encoding video content
US10498795B2 (en) 2017-02-17 2019-12-03 Divx, Llc Systems and methods for adaptive switching between multiple content delivery networks during adaptive bitrate streaming
CN108416794A (en) * 2018-03-21 2018-08-17 湘潭大学 A kind of nickel foam surface defect image dividing method
EP3935581A4 (en) 2019-03-04 2022-11-30 Iocurrents, Inc. Data compression and communication using machine learning
BR112021018802A2 (en) 2019-03-21 2021-11-23 Divx Llc Systems and methods for multimedia swarms

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4319267A (en) * 1979-02-16 1982-03-09 Nippon Telegraph And Telephone Public Corporation Picture coding and/or decoding equipment
US5113256A (en) * 1991-02-08 1992-05-12 Zenith Electronics Corporation Method of perceptually modeling a video image signal
EP0541302A2 (en) * 1991-11-08 1993-05-12 AT&T Corp. Improved video signal quantization for an MPEG like coding environment
US6078619A (en) * 1996-09-12 2000-06-20 University Of Bath Object-oriented video system
US6084908A (en) * 1995-10-25 2000-07-04 Sarnoff Corporation Apparatus and method for quadtree based variable block size motion estimation
WO2001056298A1 (en) * 2000-01-28 2001-08-02 Qualcomm Incorporated Quality based image compression
EP1322121A2 (en) * 2001-12-19 2003-06-25 Matsushita Electric Industrial Co., Ltd. Video encoder and decoder with improved motion detection precision

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4319267A (en) * 1979-02-16 1982-03-09 Nippon Telegraph And Telephone Public Corporation Picture coding and/or decoding equipment
US5113256A (en) * 1991-02-08 1992-05-12 Zenith Electronics Corporation Method of perceptually modeling a video image signal
EP0541302A2 (en) * 1991-11-08 1993-05-12 AT&T Corp. Improved video signal quantization for an MPEG like coding environment
US6084908A (en) * 1995-10-25 2000-07-04 Sarnoff Corporation Apparatus and method for quadtree based variable block size motion estimation
US6078619A (en) * 1996-09-12 2000-06-20 University Of Bath Object-oriented video system
WO2001056298A1 (en) * 2000-01-28 2001-08-02 Qualcomm Incorporated Quality based image compression
EP1322121A2 (en) * 2001-12-19 2003-06-25 Matsushita Electric Industrial Co., Ltd. Video encoder and decoder with improved motion detection precision

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SAUPE D ET AL: "Variance-based quadtrees in fractal image compression", ELECTRONICS LETTERS, IEE STEVENAGE, GB, vol. 33, no. 1, 2 January 1997 (1997-01-02), pages 46 - 48, XP006006923, ISSN: 0013-5194 *
WANG L ET AL: "Interlace Coding Tools for H.26L Video Coding", ITU STUDY GROUP 16 - VIDEO CODING EXPERTS GROUP, XX, XX, 4 December 2001 (2001-12-04), pages 1 - 20, XP002240263 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3481063A4 (en) * 2016-07-04 2019-05-08 Sony Corporation Image processing device and method
US11272180B2 (en) 2016-07-04 2022-03-08 Sony Corporation Image processing apparatus and method

Also Published As

Publication number Publication date
EP1602239A1 (en) 2005-12-07
US20060165163A1 (en) 2006-07-27
KR20050105268A (en) 2005-11-03
CN1757237A (en) 2006-04-05
JP2006519565A (en) 2006-08-24

Similar Documents

Publication Publication Date Title
US20060165163A1 (en) Video encoding
US20060204115A1 (en) Video encoding
TWI626842B (en) Motion picture coding device and its operation method
US8331449B2 (en) Fast encoding method and system using adaptive intra prediction
US20070140349A1 (en) Video encoding method and apparatus
US6122400A (en) Compression encoder bit allocation utilizing colormetric-adaptive weighting as in flesh-tone weighting
US20050265447A1 (en) Prediction encoder/decoder, prediction encoding/decoding method, and computer readable recording medium having recorded thereon program for implementing the prediction encoding/decoding method
US20060002466A1 (en) Prediction encoder/decoder and prediction encoding/decoding method
US20070036218A1 (en) Video transcoding
US20060239347A1 (en) Method and system for scene change detection in a video encoder
EP1461959A2 (en) Sharpness enhancement in post-processing of digital video signals using coding information and local spatial features
US7092442B2 (en) System and method for adaptive field and frame video encoding using motion activity
US20060256856A1 (en) Method and system for testing rate control in a video encoder
JP2006517362A (en) Video encoding
WO2004093462A1 (en) Content analysis of coded video data
WO2005094083A1 (en) A video encoder and method of video encoding
US8442113B2 (en) Effective rate control for video encoding and transcoding
US20070223578A1 (en) Motion Estimation and Segmentation for Video Data
KR20040110755A (en) Method of and apparatus for selecting prediction modes and method of compressing moving pictures by using the method and moving pictures encoder containing the apparatus and computer-readable medium in which a program for executing the methods is recorded
Hrarti et al. A macroblock-based perceptually adaptive bit allocation for H264 rate control
Tsang et al. H. 264 video coding with multiple weighted prediction models
US20060239344A1 (en) Method and system for rate control in a video encoder
Chen et al. An adaptive macroblock-mean difference based sorting scheme for fast normalized partial distortion search motion estimation
Wang et al. Quantization Parameter Decision of Initial and Scene Change Frame in Real-Time H. 264/AVC
Yin et al. An efficient mode decision algorithm for real-time high-definition H. 264/AVC transcoding

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004714399

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2006165163

Country of ref document: US

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 10547324

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 20048056745

Country of ref document: CN

Ref document number: 2114/CHENP/2005

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2006506639

Country of ref document: JP

Ref document number: 1020057016345

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 1020057016345

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2004714399

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 10547324

Country of ref document: US

WWW Wipo information: withdrawn in national office

Ref document number: 2004714399

Country of ref document: EP